Thanks!

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: [email protected]

 

From: Peng Cheng [mailto:[email protected]] 
Sent: Wednesday, January 22, 2014 5:23 PM
To: [email protected]
Subject: Re: (Lucene-core) Is Query's rewrite method mandated to preserver 
original Query's hashcode?

 

opened as https://issues.apache.org/jira/browse/LUCENE-5409

 

On Tue, Jan 14, 2014 at 5:42 PM, Uwe Schindler <[email protected]> wrote:

Yes, open an issue!

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: [email protected]

 

From: Peng Cheng [mailto:[email protected]] 
Sent: Tuesday, January 14, 2014 10:41 PM
To: [email protected]
Subject: Re: (Lucene-core) Is Query's rewrite method mandated to preserver 
original Query's hashcode?

 

Do you suggest me to open a jira ticket about it? I think its a bug considering 
common interface standard (rewrite should not be exposed to the end user), 
documentation and running efficiency (as you said, rewrite is slow).

 

On Tue, Jan 14, 2014 at 4:38 AM, Peng Cheng <[email protected]> wrote:

I see, perhaps the best solution is to put the un-rewritten blockJoinQuries 
into the joinQueryID? The result will be the same. Right now the code have very 
strange behavior if no rewrite is called beforehand, it gives empty groups or 
correct results at random.

 

Its a great pleasure to read your reply, never expect someone to respond that 
fast.

 

Yours Peng

 

 

On Tue, Jan 14, 2014 at 2:33 AM, Uwe Schindler <[email protected]> wrote:

Hi Peng,

 

rewrite() returns a different query that will definitely not preserve the 
hashCode() or be equals() to the original one or any other rewritten one. The 
reason for this is: A rewritten query is a new query that contains information 
about the index it will be executed on (e.g., it references terms from that 
index), so it *cannot* be equal to the original one. If it cannot be equal, 
also the hashCode should be different. If you execute the query on a later 
stage you have to rewrite the original query again, because the index may have 
changed. And take care: This rewrite may produce a completely different query 
(with a new hashCode again) if the index changed in the meantime.

 

As there is a workaround (to me it looks, that the code is missing 
documentation), so you can manually rewrite the query before invoking 
getTopGroups() using Searcher#rewrite(query). Why is a hotfix needed?

 

Also rewriting the query on every call of getTopGroups is a major overhead 
(most query’s rewrites are very expensice and take as long as the execution of 
the query, e.g. MultiTermQueries), so it should only be done once, not on every 
call. Maybe that’s the reason why it was left out, but it was not documented.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: [email protected]

 

From: Peng Cheng [mailto:[email protected]] 
Sent: Tuesday, January 14, 2014 3:59 AM
To: [email protected]; [email protected]


Subject: (Lucene-core) Is Query's rewrite method mandated to preserver original 
Query's hashcode?

 

Hi developers,

 

I've recently found a few bugs in advanced features of Lucene-core 4.6 (which 
is perfectly normal as those features are less likely to be used and tested), 
the most serious one has rendered my ToParentBlockJoinCollector close to 
useless:

 

In the scorer generation stage, the ToParentBlockJoinCollector will 
automatically rewrite all the associated ToParentBlockJoinQuery (and their 
subqueries), and save them into its in-memory Look-up table, namely joinQueryID 
(see enroll() method for detail). Unfortunately, in the getTopGroups method, 
the new ToParentBlockJoinQuery parameter is not rewritten (at least users are 
not expected to do so). When the new one is searched in the old lookup table 
(considering the impact of rewrite() on hashCode()), the result (namely _slot) 
will always fail and eventually end up with a topGroup collection consisting of 
only empty groups (their hitCounts are guaranteed to be zero).

 

I'm not positive about whether rewrite() should preserver Query's hashcode, as 
I've found many counterexamples already. If this is not true, then this problem 
can be solved by rewriting the origianl BlockJoinQuery before invoking 
getTopGroups method. Nevertheless users are not expected to do so, therefore I 
would suggest submitting a hotfix that add the described rewrite step.

 

If rewrite() must preserver the hashcode, then this is a problem of the various 
rewrite() implementations and fix should be much harder.

 

This bug has caused widespread panic in my company and I would like to see it 
fixed ASAP. Please give me some suggestion so I know which hotfix I should be 
working on.

 

All the best,

 

Yours Peng

 

 

 

Reply via email to