[
https://issues.apache.org/jira/browse/LUCENE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Russell updated LUCENE-5637:
----------------------------------
Attachment: Lucene-5637.patch
Patch for Solr 4.8, with one unit test failure I haven't figured out yet.
> Scaling scale function
> ----------------------
>
> Key: LUCENE-5637
> URL: https://issues.apache.org/jira/browse/LUCENE-5637
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Chris Russell
> Priority: Minor
> Labels: patch, performance
> Fix For: 4.8
>
> Attachments: Lucene-5637.patch
>
>
> The existing scale() function examines the scores of all documents in the
> index in order to calculate its scale constant. This does not perform well
> in solr on very large indexes or with costly scoring mechanisms such as geo
> distance.
> I have developed a patch that allows the scale function to only score
> documents that match the given filters, thus improving performance of the
> scale function.
> For test queries involving two scale operations where one was scaling the
> result of keyword scoring and the other was scaling the result of geo
> distance scoring on an index with ~2 million documents, query time was
> improved from ~400 ms with vanilla scale to ~190 ms with new scale. A
> similar query using no scaling ran in ~90 ms. (Each enhanced scale function
> added to the query appeared to add about 50 ms of processing)
> e.g. scaled query - q = scale(keywords, 0, 90) and scale(geo, 0, 10)
> e.g. unscaled query - q = keywords and geo
> In both cases fq includes keywords and geo.
> In order to accomplish this goal I had to introduce a couple of changes:
> 1) In the indexsearcher.search method where scorers are created and then used
> to score on a per-atomicreadercontext basis I had to make it so that all
> scorers would be created before any scoring was done. This was so that the
> scale function would have an opportunity to observe the entire index before
> being asked to score something.
> 2) Introduced a new property to the Bits interface that indicates whether or
> not the bits provide constant-time access. Why? Read on.
> 3) FilterSet used to return Null when asked for its bits because it did not
> have any, it had an iterator. This was an issue when trying to make it so
> that scale would only score documents matching the filter. Thus a new bits
> implementation was added (LazyIteratorBackedBits) that could expose an
> iterator as a Bits implementation. It advances the iterator on-demand when
> asked about a document and uses an OpenBitSet to keep track of what it has
> advanced beyond. Thus once the iterator is exhausted it provides
> constant-time answers like any other Bits.
> 4) Introduced a function on the ValueSource interface to allow a Bits to be
> passed in for filtering purposes.
> This was originally developed against Solr 4.2 but I have ported it to Solr
> 4.8. There is one failing unit test related to code that has been added in
> the interim, AnalyzingInfixSuggesterTest.testRandomNRT. I have not been able
> to figure out why this test fails. All other tests pass.
> In relation to implementation detail 1) above, the introduction of
> LeafCollectors in trunk has caused somewhat of an issue. It seems to no
> longer be possible to create multiple scorers without immediately scoring on
> that LeafCollector. This may be related to the encapsulation of the
> Collector.setNextReader() method which was very useful for this purpose.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]