AMIRAULT Martin created LUCENE-7482:
---------------------------------------

             Summary: Faster sorted index search for reverse order search
                 Key: LUCENE-7482
                 URL: https://issues.apache.org/jira/browse/LUCENE-7482
             Project: Lucene - Core
          Issue Type: New Feature
            Reporter: AMIRAULT Martin
            Priority: Minor


We are currently using Lucene here in my company for our main product.
Our search functionnality is quite basic and the results are always sorted 
given a predefined field. The user is only able to choose the sort order 
(Asc/Desc).

I am currently investigating using the index sort feature with 
EarlyTerminationSortingCollector. 
This is quite a shame searching on a sorted index in reverse order do not have 
any optimization and was wondering if it would be possible to make it faster by 
creating a special "ReverseSortingCollector" for this purpose.

I am aware the posting list is designed to be always iterated in the same 
order, so it is not about early-terminating the search but more about 
filtering-out unneeded documents more efficiently.

If a segment is sorted in reverse order, we can work out easily the docId from 
which documents should be collected.

Here is a sample quick code:

{quote}
public class ReverseSortingCollector extends FilterCollector {

  /** Sort used to sort the search results */
  protected final Sort sort;
  /** Number of documents to collect in each segment */
  protected final int numDocsToCollect;
  
[...]

    @Override
    public LeafCollector getLeafCollector(LeafReaderContext context) throws 
IOException {
        LeafReader reader = context.reader();
        Sort segmentSort = reader.getIndexSort();
        if (isReverseOrder(sort, segmentSort)) {//segment is sorted in reverse 
order than the search sort
            
                        //Here we can easily work out the docNum from which we 
should collect
                        long collectFrom = context.reader().numDocs() - 
numDocsToCollect;
                        
            return new FilterLeafCollector(in.getLeafCollector(context)) {
                @Override
                public void collect(int doc) throws IOException {
                    if (doc >= collectFrom) {//only delegates 
                        super.collect(doc);
                    }
                }
            };
        }else{
                        return in.getLeafCollector(context);
                }
        }
        
}
{quote}

This is specially efficient when used along with TopFieldCollector as a lot of 
docValue lookup would not take place. 
In my experiment it reduced search time by 90%.

However I was wondering if it is correct, as my knowledge of Lucene is still 
quite limited.
Especially is it correct to assume that LeafReader docId always span from 
0->LeafReader.numDocs() ?


Note : Does not support paging. Could be eventually implemented by providing a 
way to look up the docId to match from the last document collected (eg for 
LongPoint querying the docId closest to the previously returned value...)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to