[
https://issues.apache.org/jira/browse/LUCENE-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258054#comment-15258054
]
Adrien Grand commented on LUCENE-7255:
--------------------------------------
I like the API better as it is today since it is more explicit about how it is
working. I suspect there might be some confusion about how searchAfter works in
the non-sorted case: even though the last competitive document is provided, the
query needs to visit _all_ matches. The collector will just ignore documents
that compare better than {{after}} since they were already returned on a
previous page. Compared to regular pagination, this is better since we can use
a priority queue of size {{size}} rather than {{from+size}}, but in both cases,
all matching documents are collected.
We could make pagination work better in the case of sorted segments by tracking
the last competitive document per segment rather than at the index level. This
way, on each sorted segment, we could directly jump to the next competitive
document, so the collector would actually only collect {{numWanted}} documents
rather than {{numToSkip+numWanted}}. This would require a custom collector
however.
> Paging with SortingMergePolicy and EarlyTerminatingSortingCollector
> -------------------------------------------------------------------
>
> Key: LUCENE-7255
> URL: https://issues.apache.org/jira/browse/LUCENE-7255
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 5.3, 5.4, 5.5, 6.0
> Reporter: Andrés de la Peña
> Labels: EarlyTerminatingSortingCollector, pagination, paging,
> searchafter, sortingmergepolicy
>
> {{EarlyTerminatingSortingCollector}} seems to don't work when used with a
> {{TopDocsCollector}} searching for documents after a certain {{FieldDoc}}.
> That is, it can't be used for paging. The following code allows to reproduce
> the problem:
> {code}
> // Sort to be used both with merge policy and queries
> Sort sort = new Sort(new SortedNumericSortField(FIELD_NAME,
> SortField.Type.INT));
> // Create directory
> RAMDirectory directory = new RAMDirectory();
> // Setup merge policy
> TieredMergePolicy tieredMergePolicy = new TieredMergePolicy();
> SortingMergePolicy sortingMergePolicy = new
> SortingMergePolicy(tieredMergePolicy, sort);
> // Setup index writer
> IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new
> SimpleAnalyzer());
> indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
> indexWriterConfig.setMergePolicy(sortingMergePolicy);
> IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
> // Index values
> for (int i = 1; i <= 1000; i++) {
> Document document = new Document();
> document.add(new NumericDocValuesField(FIELD_NAME, i));
> indexWriter.addDocument(document);
> }
> // Force index merge to ensure early termination
> indexWriter.forceMerge(1, true);
> indexWriter.commit();
> // Create index searcher
> IndexReader reader = DirectoryReader.open(directory);
> IndexSearcher searcher = new IndexSearcher(reader);
> // Paginated read
> int pageSize = 10;
> FieldDoc pageStart = null;
> while (true) {
> logger.info("Collecting page starting at: {}", pageStart);
> Query query = new MatchAllDocsQuery();
> TopDocsCollector tfc = TopFieldCollector.create(sort, pageSize,
> pageStart, true, false, false);
> EarlyTerminatingSortingCollector collector = new
> EarlyTerminatingSortingCollector(tfc, sort, pageSize, sort);
> searcher.search(query, collector);
> ScoreDoc[] scoreDocs = tfc.topDocs().scoreDocs;
> for (ScoreDoc scoreDoc : scoreDocs) {
> pageStart = (FieldDoc) scoreDoc;
> logger.info("FOUND {}", scoreDoc);
> }
> logger.info("Terminated early: {}", collector.terminatedEarly());
> if (scoreDocs.length < pageSize) break;
> }
> // Close
> reader.close();
> indexWriter.close();
> directory.close();
> {code}
> The query for the second page doesn't return any results. However, it gets
> the expected results when if we don't wrap the {{TopFieldCollector}} with the
> {{EarlyTerminatingSortingCollector}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]