[ 
https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618393#comment-13618393
 ] 

Shai Erera commented on LUCENE-4858:
------------------------------------

bq. Maybe we could even go further and add an identifier of the Sorter which 
has been used to sort the segment

+1. This makes sense. We need to be as robust as possible. If a user makes a 
mistake, it's best if he can avoid tripping himself. It needs to be something 
unique, i.e. not just the sorter class, but e.g. for NumericDV also the field. 
Perhaps Sorter should have a sortKey? Then we record 
Sorter.class_Sorter.sortKey?

I agree that addIndexes should use MergePolicy. Unlike the Directory version, 
which shallow-copies the segments, including whatever Diagnostics information 
they contain, the IR version uses SegmentMerger, however bypasses MP. So e.g. 
if the app uses TieredMP, limiting the merged segment size to 10 GB, you can 
addIndexes a 20-segment index, totalling 100 GB, and end up in a single 100 GB 
segment. That's ... uexpected.

So I think we need something on MP, maybe findMergesForAddIndexes... and then 
it will be easier to control how these indexes are added. If that's the 
direction, perhaps we do this in a different issue, as it's unrelated to 
sorting?

And, while diagnostics allow us to record sorted + sorter, we're still limited 
to SegmentReader. In practice this may not be a true limitation, but I feel 
that if AtomicReader exposed metadata(), like commitData() for the composite, 
it will give us more freedom. This collector does not need to be limited to 
SegmentReader only ... but I guess it's ok for now, at least, I know others 
don't like the idea of having metadata() on AR.
                
> Early termination with SortingMergePolicy
> -----------------------------------------
>
>                 Key: LUCENE-4858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4858
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.3
>
>         Attachments: LUCENE-4858.patch, LUCENE-4858.patch
>
>
> Spin-off of LUCENE-4752, see 
> https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565
>  and 
> https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
> When an index is sorted per-segment, queries that sort according to the index 
> sort order could be early terminated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to