[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659436#action_12659436 ]
Michael McCandless commented on LUCENE-1483: -------------------------------------------- {quote} > Only other option I see off hand is a comparator that can do both, but not as > clean and probably adds a check in tightly looped code. {quote} Right, I wanted to avoid inner-loop check by swapping out the comparator in between segments. Though, modern CPUs are quite good when an if-statement consistently goes one way, so it could be a single comparator that does internal switching might perform fine. Still, if we fix the API to return a new comparator, we can then allow both options. I think in some cases we'd even fall back to VAL comparison. {quote} > Is largest to smallest best though? {quote} Good question; it's not obvious. We should try both, and perhaps allow for the collector to optionally specify the order. My thinking was the first large segment using ORD is "free" (because ORD is only costly on switching segments). If there are many hits, likely the queue has done most of the work it'll do (ie, the majority of the total # insertions will have been done), unless search is "degenerate". Perhaps the second segment, if large, warrants ORD, but them sometime soonish you'd switch to ORDDEM or VAL. The "long tail" of tiny segments would then normally be zipped through w/ hardly any insertions, so a higher insertion cost (with zero segment transition cost) is OK. But you're right: if we do the tiny segments first, then the queue would be small so transition cost is lower. We should make it simple to override a method to implement your own "search plan", and then provide a default heuristic that decides when to switch comparators. Probably that default heuristic should be based on how often compare was actually invoked for the segment. EG if the String sort is secondary to a numeric sort then even if there are many hits, if the numeric sort mostly wins (doesn't have many compare(...) == 0's) then the String sort should probably immediately switch to VAL after the first segment. > Change IndexSearcher to use MultiSearcher semantics for multiple subreaders > --------------------------------------------------------------------------- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.9 > Reporter: Mark Miller > Priority: Minor > Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, > LUCENE-1483.patch, sortBench.py, sortCollate.py > > > FieldCache and Filters are forced down to a single segment reader, allowing > for individual segment reloading on reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org