[jira] [Commented] (LUCENE-7255) Paging with SortingMergePolicy and EarlyTerminatingSortingCollector

Adrien Grand (JIRA) Tue, 26 Apr 2016 06:12:34 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258054#comment-15258054
 ]


Adrien Grand commented on LUCENE-7255:
--------------------------------------

I like the API better as it is today since it is more explicit about how it is 
working. I suspect there might be some confusion about how searchAfter works in 
the non-sorted case: even though the last competitive document is provided, the 
query needs to visit _all_ matches. The collector will just ignore documents 
that compare better than {{after}} since they were already returned on a 
previous page. Compared to regular pagination, this is better since we can use 
a priority queue of size {{size}} rather than {{from+size}}, but in both cases, 
all matching documents are collected.

We could make pagination work better in the case of sorted segments by tracking 
the last competitive document per segment rather than at the index level. This 
way, on each sorted segment, we could directly jump to the next competitive 
document, so the collector would actually only collect {{numWanted}} documents 
rather than {{numToSkip+numWanted}}. This would require a custom collector 
however.

> Paging with SortingMergePolicy and EarlyTerminatingSortingCollector
> -------------------------------------------------------------------
>
>                 Key: LUCENE-7255
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7255
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 5.3, 5.4, 5.5, 6.0
>            Reporter: Andrés de la Peña
>              Labels: EarlyTerminatingSortingCollector, pagination, paging, 
> searchafter, sortingmergepolicy
>
> {{EarlyTerminatingSortingCollector}} seems to don't work when used with a 
> {{TopDocsCollector}} searching for documents after a certain {{FieldDoc}}. 
> That is, it can't be used for paging. The following code allows to reproduce 
> the problem:
> {code}
> // Sort to be used both with merge policy and queries
> Sort sort = new Sort(new SortedNumericSortField(FIELD_NAME, 
> SortField.Type.INT));
> // Create directory
> RAMDirectory directory = new RAMDirectory();
> // Setup merge policy
> TieredMergePolicy tieredMergePolicy = new TieredMergePolicy();
> SortingMergePolicy sortingMergePolicy = new 
> SortingMergePolicy(tieredMergePolicy, sort);
> // Setup index writer
> IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new 
> SimpleAnalyzer());
> indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
> indexWriterConfig.setMergePolicy(sortingMergePolicy);
> IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
> // Index values
> for (int i = 1; i <= 1000; i++) {
>     Document document = new Document();
>     document.add(new NumericDocValuesField(FIELD_NAME, i));
>     indexWriter.addDocument(document);
> }
> // Force index merge to ensure early termination
> indexWriter.forceMerge(1, true);
> indexWriter.commit();
> // Create index searcher
> IndexReader reader = DirectoryReader.open(directory);
> IndexSearcher searcher = new IndexSearcher(reader);
> // Paginated read
> int pageSize = 10;
> FieldDoc pageStart = null;
> while (true) {
>     logger.info("Collecting page starting at: {}", pageStart);
>     Query query = new MatchAllDocsQuery();
>     TopDocsCollector tfc = TopFieldCollector.create(sort, pageSize, 
> pageStart, true, false, false);
>     EarlyTerminatingSortingCollector collector = new 
> EarlyTerminatingSortingCollector(tfc, sort, pageSize, sort);
>     searcher.search(query, collector);
>     ScoreDoc[] scoreDocs = tfc.topDocs().scoreDocs;
>     for (ScoreDoc scoreDoc : scoreDocs) {
>         pageStart = (FieldDoc) scoreDoc;
>         logger.info("FOUND {}", scoreDoc);
>     }
>     logger.info("Terminated early: {}", collector.terminatedEarly());
>     if (scoreDocs.length < pageSize) break;
> }
> // Close
> reader.close();
> indexWriter.close();
> directory.close();
> {code}
> The query for the second page doesn't return any results. However, it gets 
> the expected results when if we don't wrap the {{TopFieldCollector}} with the 
> {{EarlyTerminatingSortingCollector}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7255) Paging with SortingMergePolicy and EarlyTerminatingSortingCollector

Reply via email to