Hi developers,

I've recently found a few bugs in advanced features of Lucene-core 4.6
(which is perfectly normal as those features are less likely to be used and
tested), the most serious one has rendered my ToParentBlockJoinCollector
close to useless:

In the scorer generation stage, the ToParentBlockJoinCollector will
automatically rewrite all the associated ToParentBlockJoinQuery (and their
subqueries), and save them into its in-memory Look-up table, namely
joinQueryID (see enroll() method for detail). Unfortunately, in the
getTopGroups method, the new ToParentBlockJoinQuery parameter is not
rewritten (at least users are not expected to do so). When the new one is
searched in the old lookup table (considering the impact of rewrite() on
hashCode()), the result (namely _slot) will always fail and eventually end
up with a topGroup collection consisting of only empty groups (their
hitCounts are guaranteed to be zero).

I'm not positive about whether rewrite() should preserver Query's hashcode,
as I've found many counterexamples already. If this is not true, then this
problem can be solved by rewriting the origianl BlockJoinQuery before
invoking getTopGroups method. Nevertheless users are not expected to do so,
therefore I would suggest submitting a hotfix that add the described
rewrite step.

If rewrite() must preserver the hashcode, then this is a problem of the
various rewrite() implementations and fix should be much harder.

This bug has caused widespread panic in my company and I would like to see
it fixed ASAP. Please give me some suggestion so I know which hotfix I
should be working on.

All the best,

Yours Peng

Reply via email to