opened as https://issues.apache.org/jira/browse/LUCENE-5409


On Tue, Jan 14, 2014 at 5:42 PM, Uwe Schindler <[email protected]> wrote:

> Yes, open an issue!
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: [email protected]
>
>
>
> *From:* Peng Cheng [mailto:[email protected]]
> *Sent:* Tuesday, January 14, 2014 10:41 PM
> *To:* [email protected]
> *Subject:* Re: (Lucene-core) Is Query's rewrite method mandated to
> preserver original Query's hashcode?
>
>
>
> Do you suggest me to open a jira ticket about it? I think its a bug
> considering common interface standard (rewrite should not be exposed to the
> end user), documentation and running efficiency (as you said, rewrite is
> slow).
>
>
>
> On Tue, Jan 14, 2014 at 4:38 AM, Peng Cheng <[email protected]> wrote:
>
> I see, perhaps the best solution is to put the un-rewritten
> blockJoinQuries into the joinQueryID? The result will be the same. Right
> now the code have very strange behavior if no rewrite is called beforehand,
> it gives empty groups or correct results at random.
>
>
>
> Its a great pleasure to read your reply, never expect someone to respond
> that fast.
>
>
>
> Yours Peng
>
>
>
>
>
> On Tue, Jan 14, 2014 at 2:33 AM, Uwe Schindler <[email protected]> wrote:
>
> Hi Peng,
>
>
>
> rewrite() returns a different query that will definitely not preserve the
> hashCode() or be equals() to the original one or any other rewritten one.
> The reason for this is: A rewritten query is a new query that contains
> information about the index it will be executed on (e.g., it references
> terms from that index), so it **cannot** be equal to the original one. If
> it cannot be equal, also the hashCode should be different. If you execute
> the query on a later stage you have to rewrite the original query again,
> because the index may have changed. And take care: This rewrite may produce
> a completely different query (with a new hashCode again) if the index
> changed in the meantime.
>
>
>
> As there is a workaround (to me it looks, that the code is missing
> documentation), so you can manually rewrite the query before invoking
> getTopGroups() using Searcher#rewrite(query). Why is a hotfix needed?
>
>
>
> Also rewriting the query on every call of getTopGroups is a major overhead
> (most query’s rewrites are very expensice and take as long as the execution
> of the query, e.g. MultiTermQueries), so it should only be done once, not
> on every call. Maybe that’s the reason why it was left out, but it was not
> documented.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: [email protected]
>
>
>
> *From:* Peng Cheng [mailto:[email protected]]
> *Sent:* Tuesday, January 14, 2014 3:59 AM
> *To:* [email protected]; [email protected]
>
>
> *Subject:* (Lucene-core) Is Query's rewrite method mandated to preserver
> original Query's hashcode?
>
>
>
> Hi developers,
>
>
>
> I've recently found a few bugs in advanced features of Lucene-core 4.6
> (which is perfectly normal as those features are less likely to be used and
> tested), the most serious one has rendered my ToParentBlockJoinCollector
> close to useless:
>
>
>
> In the scorer generation stage, the ToParentBlockJoinCollector will
> automatically rewrite all the associated ToParentBlockJoinQuery (and their
> subqueries), and save them into its in-memory Look-up table, namely
> joinQueryID (see enroll() method for detail). Unfortunately, in the
> getTopGroups method, the new ToParentBlockJoinQuery parameter is not
> rewritten (at least users are not expected to do so). When the new one is
> searched in the old lookup table (considering the impact of rewrite() on
> hashCode()), the result (namely _slot) will always fail and eventually end
> up with a topGroup collection consisting of only empty groups (their
> hitCounts are guaranteed to be zero).
>
>
>
> I'm not positive about whether rewrite() should preserver Query's
> hashcode, as I've found many counterexamples already. If this is not true,
> then this problem can be solved by rewriting the origianl BlockJoinQuery
> before invoking getTopGroups method. Nevertheless users are not expected to
> do so, therefore I would suggest submitting a hotfix that add the described
> rewrite step.
>
>
>
> If rewrite() must preserver the hashcode, then this is a problem of the
> various rewrite() implementations and fix should be much harder.
>
>
>
> This bug has caused widespread panic in my company and I would like to see
> it fixed ASAP. Please give me some suggestion so I know which hotfix I
> should be working on.
>
>
>
> All the best,
>
>
>
> Yours Peng
>
>
>
>
>

Reply via email to