[ 
https://issues.apache.org/jira/browse/LUCENE-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190420#comment-17190420
 ] 

Brian Coverstone commented on LUCENE-9418:
------------------------------------------

I believe this may still be an issue in 8.6.0, as I'm finding the last slot can 
often have an incorrect record.

I found a workaround, and that is to always select 1 more than needed.

Here is some pseudo code to demonstrate:
{quote}ComplexPhraseQueryParser cpqp = new 
ComplexPhraseQueryParser("somefield", analyzer);
Query query = cpqp.parse("somevalue");

pageSize = 10;
pageNum = 1;
requestedRecords = pageSize * pageNum + 1; //+1 workaround
startOffset = (pageNum - 1) * pageSize;

FieldComparatorSource fsc = new FieldComparatorSource() {
    @Override
    public FieldComparator<String> newComparator(String fieldname, int numhits, 
int sortPos, boolean reversed) {
        return new StringValComparatorIgnoreCase(numhits, fieldname);
    }
};

Sort sort = new Sort(new SortField("firstname", fsc, false));
IndexSearcher searcher = new IndexSearcher(reader);
TopFieldCollector tfcollector = TopFieldCollector.create(sort, requestedRecords 
+ 1, Integer.MAX_VALUE);
searcher.search(query, tfcollector);
ScoreDoc[] hits = tfcollector.topDocs(startOffset, pageSize).scoreDocs;
{quote}
At this point "hits" is correct. However, if I remove the "+1" from the 
requestedRecords above, the last item in "hits" is often incorrect.

 

> Ordered intervals can give inaccurate hits on interleaved terms
> ---------------------------------------------------------------
>
>                 Key: LUCENE-9418
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9418
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>             Fix For: 8.6
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Given the text 'A B A C', an ordered interval over 'A B C' will return the 
> inaccurate interval [2, 3], due to the way minimization is handled after 
> matches are found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to