[ 
https://issues.apache.org/jira/browse/LUCENE-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879447#comment-13879447
 ] 

Peng Cheng commented on LUCENE-5409:
------------------------------------

I was trying to work on a test case but have encountered the following problem:

The bug will only be triggered if a ToParentBlockJoinQuery can be rewritten 
into another query with a different hashcode (Uwe said this is a common 
situation). But I've experimented with several simple Query and they all have 
identical hashcode after the rewriting. The query that failed in my project was 
a very long CustomScoreQuery (used for feature engineering in text analysis), I 
wouldn't imagine to put that into the unit test. So can you show me an example 
of a compound query that doesn't preserve hashcode? I can finish other works.

> ToParentBlockJoinCollector.getTopGroups returns empty Groups
> ------------------------------------------------------------
>
>                 Key: LUCENE-5409
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5409
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 4.6
>         Environment: Ubuntu 12.04
>            Reporter: Peng Cheng
>            Assignee: Michael McCandless
>            Priority: Critical
>             Fix For: 4.7
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> A bug is observed to cause unstable results returned by the getTopGroups 
> function of class ToParentBlockJoinCollector.
> In the scorer generation stage, the ToParentBlockJoinCollector will 
> automatically rewrite all the associated ToParentBlockJoinQuery (and their 
> subqueries), and save them into its in-memory Look-up table, namely 
> joinQueryID (see enroll() method for detail). Unfortunately, in the 
> getTopGroups method, the new ToParentBlockJoinQuery parameter is not 
> rewritten (at least users are not expected to do so). When the new one is 
> searched in the old lookup table (considering the impact of rewrite() on 
> hashCode()), the lookup will largely fail and eventually end up with a 
> topGroup collection consisting of only empty groups (their hitCounts are 
> guaranteed to be zero).
> An easy fix would be to rewrite the original BlockJoinQuery before invoking 
> getTopGroups method. However, the computational cost of this is not optimal. 
> A better but slightly more complex solution would be to save unrewrited 
> Queries into the lookup table.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to