[ https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592289#comment-13592289 ]
Shai Erera commented on LUCENE-4752: ------------------------------------ How can you early terminate a query for a single segment? Say that you have 3 sorted segments (individually) and your query asks to get the top-10 of some criteria. The top-10 may come from the 3 segments as follows: seg1=4, seg2=4, seg3=2. But you don't know that until you processed all 3 segments right? While you could make a decision on a per-segment basis to 'terminate', there's no mechanism today to tell IndexSearcher "I'm done w/ that segment, move on". Today, if you want to early terminate, you need to throw an exception from the Collector, and catch it outside, in your application code? To early terminate efficiently, you must have the segments in a consistent order, e.g. S1 > S2 > S3. Then, after you've processed enough elements from S1, you can early terminate the entire query because you're guaranteed that successive documents will be "smaller". Unless, you add to your Collector.collect() an "if (done) return" and consider that a no-op, or make your own IndexSearcher logic ... then per-segment early termination is doable. As for the approach you describe, I think that instead of stuffing into IWC what seems like a random setting (pick-segments-for-sorting), we should have something more generic, like AtomicReaderFactory, which IW will use instead of always loading SegmentReader. That will let you load your custom AtomicReader? Or, perhaps this can be a SortingCodec? Also, a custom SegmentMerger to implement the zig-zag merge would help too. > Merge segments to sort them > --------------------------- > > Key: LUCENE-4752 > URL: https://issues.apache.org/jira/browse/LUCENE-4752 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index > Reporter: David Smiley > Assignee: Adrien Grand > > It would be awesome if Lucene could write the documents out in a segment > based on a configurable order. This of course applies to merging segments > to. The benefit is increased locality on disk of documents that are likely to > be accessed together. This often applies to documents near each other in > time, but also spatially. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org