I see, perhaps the best solution is to put the un-rewritten blockJoinQuries
into the joinQueryID? The result will be the same. Right now the code have
very strange behavior if no rewrite is called beforehand, it gives empty
groups or correct results at random.

Its a great pleasure to read your reply, never expect someone to respond
that fast.

Yours Peng



On Tue, Jan 14, 2014 at 2:33 AM, Uwe Schindler <[email protected]> wrote:

> Hi Peng,
>
>
>
> rewrite() returns a different query that will definitely not preserve the
> hashCode() or be equals() to the original one or any other rewritten one.
> The reason for this is: A rewritten query is a new query that contains
> information about the index it will be executed on (e.g., it references
> terms from that index), so it **cannot** be equal to the original one. If
> it cannot be equal, also the hashCode should be different. If you execute
> the query on a later stage you have to rewrite the original query again,
> because the index may have changed. And take care: This rewrite may produce
> a completely different query (with a new hashCode again) if the index
> changed in the meantime.
>
>
>
> As there is a workaround (to me it looks, that the code is missing
> documentation), so you can manually rewrite the query before invoking
> getTopGroups() using Searcher#rewrite(query). Why is a hotfix needed?
>
>
>
> Also rewriting the query on every call of getTopGroups is a major overhead
> (most query’s rewrites are very expensice and take as long as the execution
> of the query, e.g. MultiTermQueries), so it should only be done once, not
> on every call. Maybe that’s the reason why it was left out, but it was not
> documented.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: [email protected]
>
>
>
> *From:* Peng Cheng [mailto:[email protected]]
> *Sent:* Tuesday, January 14, 2014 3:59 AM
> *To:* [email protected]; [email protected]
>
> *Subject:* (Lucene-core) Is Query's rewrite method mandated to preserver
> original Query's hashcode?
>
>
>
> Hi developers,
>
>
>
> I've recently found a few bugs in advanced features of Lucene-core 4.6
> (which is perfectly normal as those features are less likely to be used and
> tested), the most serious one has rendered my ToParentBlockJoinCollector
> close to useless:
>
>
>
> In the scorer generation stage, the ToParentBlockJoinCollector will
> automatically rewrite all the associated ToParentBlockJoinQuery (and their
> subqueries), and save them into its in-memory Look-up table, namely
> joinQueryID (see enroll() method for detail). Unfortunately, in the
> getTopGroups method, the new ToParentBlockJoinQuery parameter is not
> rewritten (at least users are not expected to do so). When the new one is
> searched in the old lookup table (considering the impact of rewrite() on
> hashCode()), the result (namely _slot) will always fail and eventually end
> up with a topGroup collection consisting of only empty groups (their
> hitCounts are guaranteed to be zero).
>
>
>
> I'm not positive about whether rewrite() should preserver Query's
> hashcode, as I've found many counterexamples already. If this is not true,
> then this problem can be solved by rewriting the origianl BlockJoinQuery
> before invoking getTopGroups method. Nevertheless users are not expected to
> do so, therefore I would suggest submitting a hotfix that add the described
> rewrite step.
>
>
>
> If rewrite() must preserver the hashcode, then this is a problem of the
> various rewrite() implementations and fix should be much harder.
>
>
>
> This bug has caused widespread panic in my company and I would like to see
> it fixed ASAP. Please give me some suggestion so I know which hotfix I
> should be working on.
>
>
>
> All the best,
>
>
>
> Yours Peng
>

Reply via email to