[jira] [Commented] (LUCENE-7482) Faster sorted index search for reverse order search

AMIRAULT Martin (JIRA) Thu, 06 Oct 2016 20:47:30 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554014#comment-15554014
 ]


AMIRAULT Martin commented on LUCENE-7482:
-----------------------------------------

A more correct implementation: keep the last 'numDocsToCollect' docIds 
collected for each LeafCollector and flush them to delegated LeafCollectors 
once the search is finished. Will experiment a bit more (With less than 100% 
match this time!) and post a new proposal.

> Faster sorted index search for reverse order search
> ---------------------------------------------------
>
>                 Key: LUCENE-7482
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7482
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: AMIRAULT Martin
>            Priority: Minor
>
> We are currently using Lucene here in my company for our main product.
> Our search functionnality is quite basic and the results are always sorted 
> given a predefined field. The user is only able to choose the sort order 
> (Asc/Desc).
> I am currently investigating using the index sort feature with 
> EarlyTerminationSortingCollector. 
> This is quite a shame searching on a sorted index in reverse order do not 
> have any optimization and was wondering if it would be possible to make it 
> faster by creating a special "ReverseSortingCollector" for this purpose.
> I am aware the posting list is designed to be always iterated in the same 
> order, so it is not about early-terminating the search but more about 
> filtering-out unneeded documents more efficiently.
> If a segment is sorted in reverse order, we can work out easily the docId 
> from which documents should be collected.
> Here is a sample quick code:
> {code:title=ReverseSortingCollector.java|borderStyle=solid}
> public class ReverseSortingCollector extends FilterCollector {
>   /** Sort used to sort the search results */
>   protected final Sort sort;
>   /** Number of documents to collect in each segment */
>   protected final int numDocsToCollect;
>   
> [...]
>     @Override
>     public LeafCollector getLeafCollector(LeafReaderContext context) throws 
> IOException {
>         LeafReader reader = context.reader();
>         Sort segmentSort = reader.getIndexSort();
>         if (isReverseOrder(sort, segmentSort)) {//segment is sorted in 
> reverse order than the search sort
>             
>                       //Here we can easily work out the docNum from which we 
> should collect
>                       long collectFrom = context.reader().numDocs() - 
> numDocsToCollect;
>                       
>             return new FilterLeafCollector(in.getLeafCollector(context)) {
>                 @Override
>                 public void collect(int doc) throws IOException {
>                     if (doc >= collectFrom) {//only delegates 
>                         super.collect(doc);
>                     }
>                 }
>             };
>         }else{
>                       return in.getLeafCollector(context);
>               }
>       }
>       
> }
> {code}
> This is specially efficient when used along with TopFieldCollector as a lot 
> of docValue lookup would not take place. 
> In my experiment it reduced search time by 90%.
> However I was wondering if it is correct, as my knowledge of Lucene is still 
> quite limited.
> Especially is it correct to assume that LeafReader docId always span from 
> 0=>LeafReader.numDocs() ?
> Note : Does not support paging. Could be eventually implemented by providing 
> a way to look up the docId to match from the last document collected (eg for 
> LongPoint querying the docId closest to the previously returned value...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7482) Faster sorted index search for reverse order search

Reply via email to