[ 
https://issues.apache.org/jira/browse/LUCENE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Russell updated LUCENE-5637:
----------------------------------

    Attachment: Lucene-5637.patch

Patch for Solr 4.8, with one unit test failure I haven't figured out yet.

> Scaling scale function
> ----------------------
>
>                 Key: LUCENE-5637
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5637
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Chris Russell
>            Priority: Minor
>              Labels: patch, performance
>             Fix For: 4.8
>
>         Attachments: Lucene-5637.patch
>
>
> The existing scale() function examines the scores of all documents in the 
> index in order to calculate its scale constant.  This does not perform well 
> in solr on very large indexes or with costly scoring mechanisms such as geo 
> distance.
> I have developed a patch that allows the scale function to only score 
> documents that match the given filters, thus improving performance of the 
> scale function.  
> For test queries involving two scale operations where one was scaling the 
> result of keyword scoring and the other was scaling the result of geo 
> distance scoring on an index with ~2 million documents, query time was 
> improved from ~400 ms with vanilla scale to ~190 ms with new scale.  A 
> similar query using no scaling ran in ~90 ms.  (Each enhanced scale function 
> added to the query appeared to add about 50 ms of processing)
> e.g. scaled query - q = scale(keywords, 0, 90) and scale(geo, 0, 10)
> e.g. unscaled query - q = keywords and geo
> In both cases fq includes keywords and geo.
> In order to accomplish this goal I had to introduce a couple of changes:
> 1) In the indexsearcher.search method where scorers are created and then used 
> to score on a per-atomicreadercontext basis I had to make it so that all 
> scorers would be created before any scoring was done.  This was so that the 
> scale function would have an opportunity to observe the entire index before 
> being asked to score something.
> 2) Introduced a new property to the Bits interface that indicates whether or 
> not the bits provide constant-time access.  Why? Read on.
> 3) FilterSet used to return Null when asked for its bits because it did not 
> have any, it had an iterator.  This was an issue when trying to make it so 
> that scale would only score documents matching the filter.  Thus a new bits 
> implementation was added (LazyIteratorBackedBits) that could expose an 
> iterator as a Bits implementation.  It advances the iterator on-demand when 
> asked about a document and uses an OpenBitSet to keep track of what it has 
> advanced beyond.  Thus once the iterator is exhausted it provides 
> constant-time answers like any other Bits.
> 4) Introduced a function on the ValueSource interface to allow a Bits to be 
> passed in for filtering purposes.
> This was originally developed against Solr 4.2 but I have ported it to Solr 
> 4.8.  There is one failing unit test related to code that has been added in 
> the interim, AnalyzingInfixSuggesterTest.testRandomNRT.  I have not been able 
> to figure out why this test fails.  All other tests pass.
> In relation to implementation detail 1) above, the introduction of 
> LeafCollectors in trunk has caused somewhat of an issue.  It seems to no 
> longer be possible to create multiple scorers without immediately scoring on 
> that LeafCollector.  This may be related to the encapsulation of the 
> Collector.setNextReader() method which was very useful for this purpose.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to