Re: [hibernate-dev] Re: [Hibernate-JIRA] Commented: (HSEARCH-115) Add a default value for indexing null value

2008-04-23 Thread Hardy Ferentschik
On Tue, 22 Apr 2008 18:47:14 +0200, Sanne Grinovero  
[EMAIL PROTECTED] wrote:



Yes right but I don't see how this is worse than searching for
 foo:bar OR foo:NULL-KEYWORD, just some less ambiguity.
 If you just want to search for null fields, fooIsNull:true


It is not much worse, but as less intuative. One has to know that if  
@IndexNullMarker
is added another field fooIsNull is added. It's less intuative. On the  
other hand would

actually work without ambiguities. I guess one would have to RTFM.



Interesting I wasn't expecting the index to grow as I remove a Field
 and replace it with another; I've made a test for this: on 10,000,000
 docs having 50% a random text value (chosen from 800 constants to
 limit total string tokens) and 50% nulls the index
 size grows by 3.5% compared to no null values (same docs and 800  
consts).

 I wasn't expecting any growth above some bytes, anyway I think 3.5% is
 quite good.
If you just add one of the fields (foo or fooIsNull) at the time we are  
fine. It could
be more of an issue if we have always a fooIsNull:false for consistency as  
you mentioned as well.



Nevertheless, I think your idea is still better than the straight forward  
approach. It just comes with

more complexity usage wise.


--Hardy
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Re: [Hibernate-JIRA] Commented: (HSEARCH-115) Add a default value for indexing null value

2008-04-23 Thread Hardy Ferentschik

Hi again :)

One more thing comes to my mind:

On Tue, 22 Apr 2008 18:47:14 +0200, Sanne Grinovero  
[EMAIL PROTECTED] wrote:



  The Field and StringBridge API would remain as-is;


I am not so sure about that. Looking at the DocumentBuilder code it would  
make sense to
let the FieldBridge handle the null marker. DocumentBuilder just iterates  
over the members
of the entity and for each with @Field annotated member calls the  
reponsible FieldBridge.
It also passes along additonal annotation values like the boost. It would  
make sense to handle the

@IndexNullMarker the same way, right?

--Hardy
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Re: [Hibernate-JIRA] Commented: (HSEARCH-115) Add a default value for indexing null value

2008-04-23 Thread Sanne Grinovero
Hi,
hoping the rest of list users won't hate me:

IMHO I think it would be preferable to handle this in the DocumentBuilder,
so that people won't have to repeat this complexity in custom
FieldBridge implementations,
and you don't have to update all built-in FieldBridges.
We could make a wrapping FieldBridge that adds this functionality, but
the special null marker Field is actually a constant and could be
added to the metaproperties at DocumentBuilder initialization; also
when building the Document you could test first for null
and avoid passing all other options to the FieldBridge, just adding
the constant Field to the document and skipping all further processing
for the current property.

We could skip this same processing also for properties which don't use
the new feature, effectively speeding it up a bit, so we don't feel
guilty for complicating the DocumentBuilder but are actually doing
some optimization and code cleanup.
Don't know if that is a dangerous optimization for backwards
compatibility... I expect no FieldBridge to generate Fields on a null
value but someone could rely on it..?

Also I'm not sure about adding @IndexNullMarker or adding an option to @Field,
what do others think about this?

Emmanuel said:
If we go that path, we should add a NullQuery class that can be
combined with other *Query from Lucene and hide the complexity.

This looks brilliant, but how should we instantiate a new NullQuery?
If an @IndexNullMarker option could override the used fieldname we
need a factory class with a reference to the DocumentBuilder or some
other way to discover the special field name,
so the simpleas way is to not have an option to choose the
keyword-fieldname and define it only from the property name. Would it
be acceptable that the user can't override the fieldname?

regards,
Sanne

2008/4/23, Hardy Ferentschik [EMAIL PROTECTED]:
 Hi again :)

  One more thing comes to my mind:

  On Tue, 22 Apr 2008 18:47:14 +0200, Sanne Grinovero
 [EMAIL PROTECTED] wrote:


The Field and StringBridge API would remain as-is;
 

  I am not so sure about that. Looking at the DocumentBuilder code it would
 make sense to
  let the FieldBridge handle the null marker. DocumentBuilder just iterates
 over the members
  of the entity and for each with @Field annotated member calls the
 reponsible FieldBridge.
  It also passes along additonal annotation values like the boost. It would
 make sense to handle the
  @IndexNullMarker the same way, right?

  --Hardy

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Re: [Hibernate-JIRA] Commented: (HSEARCH-115) Add a default value for indexing null value

2008-04-22 Thread Hardy Ferentschik
On Tue, 22 Apr 2008 03:29:18 +0200, Sanne Grinovero  
[EMAIL PROTECTED] wrote:



A new proposal:
I got inspired by the 3VL considerations described in Emmanuel's
link to wikipedia, and think backwards compatibility is nice:
add a @IndexNullMarker on the property, this will add an additional
Field to the index for null values:


Hmm, interesting idea. It addresses one of the biggest concerns I have
with this null marker thing, namely ambiguities. But would querying look  
like in this
case. Wouldn't it become harder? Whenever you want to use this feature you  
would have
to combine two fields - foo and fooIsFalse - within a boolean query to get  
the expected

result. Something like this: foo:bar OR fooIsNull:true.
Of course it would also mean that the index size grows since we are adding  
more fields.

And the bigger the index, ...


The Field and StringBridge API would remain as-is;
If you prefer not to add an additional @IndexNullMarker could be
dropped if you think adding this field is acceptable for all fields.
I think it should stay an optional and explicit feature. Adding one  
addtional field for
each indexed properties does not seem justified. Especially, since we  
agree that
the best solution would be to re-think your design and come up with a  
proper
non-null default. So by offering this feature we might end up encouraging  
people

to stick with there less optimal design ;-)

Cheers,
Hardy



___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


[hibernate-dev] Re: [Hibernate-JIRA] Commented: (HSEARCH-115) Add a default value for indexing null value

2008-04-21 Thread Emmanuel Bernard

Hey
The more I think about the feature, the less I like it.

Here is what I have written in Hibernate Search in Action

Hibernate Search, by default, does not store null attributes into the  
index. Lucene does not have the notion of null fields, the field is  
simply not there. Hibernate Search could offer the ability (and most  
likely will in the future) to use a special string as a null marker to  
still be able to search by null.
But before you jump at the Hibernate Search team throat, you need to  
understand why they have not offered this feature so far. Null is not  
a value per se. Null means that the data is not known (or does not  
make sense). Therefore, searching by null as if it was a value is  
somewhat odd. The authors are well aware that this is a raging debate  
especially amongst the relational model experts (see http://en.wikipedia.org/wiki/Null_%28SQL%29) 
.
Whenever you feel the need for searching by null, you should ask  
yourself if storing a special marker value in the database would make  
more sense. If you store a special marker value in the database, a lot  
of the null inconsistencies vanish. It also has the side effect of  
being queriable in Lucene and Hibernate Search.


So before we jump on the boat for this feature, I would like to know  
if people think it's still a good idea to offer this feature.


To answer your questions, the reason why I do not pass @Field but the  
raw set of data is because the @Field.index is translated into it's  
Lucene representation: some work is done.
Most people will write StringBridge implementation anyway where the  
null handling will be taken care of transparently (by  
String2FieldBridgeAdaptor).


I think I like 1 or 3. Note that get should be changed as well. Three  
is interesting indeed, rename it IndexingStragegy.


On  Apr 21, 2008, at 10:07, Hardy Ferentschik wrote:

Hi Emmanuel,

what's you take on this? Just adding another String parameter will  
work, but are we not getting too many parameters into the method?  
Wouldn't it be nicer to pass the actual @Field annotation. I think  
this might make things also clearer for the implementor of the  
interface.


I am also trying here to get a little into your head to understand  
your ideas behind the code design - hope

you don't mind ;-)

--Hardy



--- Forwarded message ---
From: Hardy Ferentschik (JIRA) [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Hibernate-JIRA] Commented: (HSEARCH-115) Add a default  
value for indexing null value

Date: Mon, 21 Apr 2008 14:04:33 +0200


   [ http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_30032 
 ]


Hardy Ferentschik commented on HSEARCH-115:
---

Ok, here are a few suggestions:

1. This is the simplest way. Basically just add a new property named  
'indexNullAs' to @Field and @ClassBridge. Accordingly extend the  
FieldBridge interface to set(String name, Object value, Document  
document, Field.Store store, Field.Index index, Field.TermVector  
termVector, Float boost, String indexNullAs).


2. Alternatively one could change the FieldBridge API to actually  
pass in the Field annotation itself: set(String name, Object value,  
Document document, Field fieldAnnotation,  Float boost). This would  
reduce the amount of parameters and might actually be more  
transparent for users implementing custom bridges. Unfortunately,  
one would have to introduce a ClassBridge interface as well in this  
case. I am not sure whether it is a good design choice to pass  
annotation instances around.


3. We ccould also change the API into something like this:  
set(String name, Object value, Document document, IndexProperties  
props), where IndexProperties is just a wrapper class for  
Field.Store, Field.Index, ... The drawback is that this just  
increases the number of classes.


Any comments?


Add a default value for indexing null value
---

   Key: HSEARCH-115
   URL: 
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-115
   Project: Hibernate Search
Issue Type: Improvement
Components: mapping
  Reporter: Julien Brulin
  Assignee: Hardy Ferentschik
   Fix For: 3.1.0


Hi,
Null elements are not indexed by lucene then it's not easy to use a  
nullable property in lucene query.
I have a TagTranslation entity in my model with a nullable property  
language. In this case null is used as default language for tag  
translation.

Each translation may have many variations like synonyms.
Because I can specified a default value for null value in the  
@Field annotation like this @Field(index=Index.UN_TOKENIZED,  
store=Store.NO, default='null'), i can't search a cat tag with a  
default translation like this : +value:cat* +lang:null

pre/code
@Entity()
@Table(name=indexing_tag_trans)