opened as https://issues.apache.org/jira/browse/LUCENE-5409
On Tue, Jan 14, 2014 at 5:42 PM, Uwe Schindler <[email protected]> wrote: > Yes, open an issue! > > > > Uwe > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: [email protected] > > > > *From:* Peng Cheng [mailto:[email protected]] > *Sent:* Tuesday, January 14, 2014 10:41 PM > *To:* [email protected] > *Subject:* Re: (Lucene-core) Is Query's rewrite method mandated to > preserver original Query's hashcode? > > > > Do you suggest me to open a jira ticket about it? I think its a bug > considering common interface standard (rewrite should not be exposed to the > end user), documentation and running efficiency (as you said, rewrite is > slow). > > > > On Tue, Jan 14, 2014 at 4:38 AM, Peng Cheng <[email protected]> wrote: > > I see, perhaps the best solution is to put the un-rewritten > blockJoinQuries into the joinQueryID? The result will be the same. Right > now the code have very strange behavior if no rewrite is called beforehand, > it gives empty groups or correct results at random. > > > > Its a great pleasure to read your reply, never expect someone to respond > that fast. > > > > Yours Peng > > > > > > On Tue, Jan 14, 2014 at 2:33 AM, Uwe Schindler <[email protected]> wrote: > > Hi Peng, > > > > rewrite() returns a different query that will definitely not preserve the > hashCode() or be equals() to the original one or any other rewritten one. > The reason for this is: A rewritten query is a new query that contains > information about the index it will be executed on (e.g., it references > terms from that index), so it **cannot** be equal to the original one. If > it cannot be equal, also the hashCode should be different. If you execute > the query on a later stage you have to rewrite the original query again, > because the index may have changed. And take care: This rewrite may produce > a completely different query (with a new hashCode again) if the index > changed in the meantime. > > > > As there is a workaround (to me it looks, that the code is missing > documentation), so you can manually rewrite the query before invoking > getTopGroups() using Searcher#rewrite(query). Why is a hotfix needed? > > > > Also rewriting the query on every call of getTopGroups is a major overhead > (most query’s rewrites are very expensice and take as long as the execution > of the query, e.g. MultiTermQueries), so it should only be done once, not > on every call. Maybe that’s the reason why it was left out, but it was not > documented. > > > > Uwe > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: [email protected] > > > > *From:* Peng Cheng [mailto:[email protected]] > *Sent:* Tuesday, January 14, 2014 3:59 AM > *To:* [email protected]; [email protected] > > > *Subject:* (Lucene-core) Is Query's rewrite method mandated to preserver > original Query's hashcode? > > > > Hi developers, > > > > I've recently found a few bugs in advanced features of Lucene-core 4.6 > (which is perfectly normal as those features are less likely to be used and > tested), the most serious one has rendered my ToParentBlockJoinCollector > close to useless: > > > > In the scorer generation stage, the ToParentBlockJoinCollector will > automatically rewrite all the associated ToParentBlockJoinQuery (and their > subqueries), and save them into its in-memory Look-up table, namely > joinQueryID (see enroll() method for detail). Unfortunately, in the > getTopGroups method, the new ToParentBlockJoinQuery parameter is not > rewritten (at least users are not expected to do so). When the new one is > searched in the old lookup table (considering the impact of rewrite() on > hashCode()), the result (namely _slot) will always fail and eventually end > up with a topGroup collection consisting of only empty groups (their > hitCounts are guaranteed to be zero). > > > > I'm not positive about whether rewrite() should preserver Query's > hashcode, as I've found many counterexamples already. If this is not true, > then this problem can be solved by rewriting the origianl BlockJoinQuery > before invoking getTopGroups method. Nevertheless users are not expected to > do so, therefore I would suggest submitting a hotfix that add the described > rewrite step. > > > > If rewrite() must preserver the hashcode, then this is a problem of the > various rewrite() implementations and fix should be much harder. > > > > This bug has caused widespread panic in my company and I would like to see > it fixed ASAP. Please give me some suggestion so I know which hotfix I > should be working on. > > > > All the best, > > > > Yours Peng > > > > >
