[
https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100098#comment-13100098
]
Doron Cohen commented on LUCENE-3412:
-------------------------------------
Thanks Michael for verifying this, I'll go ahead and commit.
> SloppyPhraseScorer returns non-deterministic results for queries with many
> repeats
> ----------------------------------------------------------------------------------
>
> Key: LUCENE-3412
> URL: https://issues.apache.org/jira/browse/LUCENE-3412
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/search
> Affects Versions: 3.1, 3.2, 3.3, 4.0
> Reporter: Michael Ryan
> Assignee: Doron Cohen
> Attachments: LUCENE-3412.patch, LUCENE-3412.patch
>
>
> Proximity queries with many repeats (four or more, based on my testing)
> return non-deterministic results. I run the same query multiple times with
> the same data set and get different results.
> So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0
> trunk.
> Steps to reproduce (using the Solr example):
> 1) In solrconfig.xml, set queryResultCache size to 0.
> 2) Add some documents with text "dog dog dog" and "dog dog dog dog".
> http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
> 3) Do a "dog dog dog dog"~1 query.
> http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
> 4) Repeat step 3 many times.
> Expected results: The document with id 2 should be returned.
> Actual results: The document with id 2 is always returned. The document with
> id 1 is sometimes returned.
> Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog
> dog dog"~100, etc show the same behavior.
> So far I've traced it down to the "repeats" array in
> SloppyPhraseScorer.initPhrasePositions() - depending on the order of the
> elements in this array, the document may or may not match. I think the
> HashSet may be to blame, but I'm not sure - that at least seems to be where
> the non-determinism is coming from.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]