[
https://issues.apache.org/jira/browse/SOLR-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366589#comment-16366589
]
David Smiley commented on SOLR-11769:
-------------------------------------
Yonik: That's a good point -- "createDocSet" ought to have a result that is
safe to modify. However note that you made createDocSet return
DocSetUtil.getDocSet( answer, searcher ) at the very end, which is expressly
documented to not have a result that is safe to modify since it can set
LiveDocs. Maybe an indirect caller here, SolrIndexSearcher.getDocSet(Query
query) and getDocSet(Query, DocSet) ought to then concern themselves with
calling this on the results of getDocSetNC because these methods are cache
aware.
Separately, I've been poking around this related code and noticed this
DocSetProducer thing. I'm skeptical we should have it. Firstly it's a Solr
thing but most Queries are implemented at the Lucene level. Secondly, I think
we can achieve the same by grabbing the Weight then Scorer and then the
DocIdSetIterator, then pass to BitSetIterator.getFixedBitSetOrNull(iter). That
should work with a some Lucene and Solr queries,
> Sorting performance degrades when useFilterForSortedQuery is enabled and
> there is no filter query specified
> -----------------------------------------------------------------------------------------------------------
>
> Key: SOLR-11769
> URL: https://issues.apache.org/jira/browse/SOLR-11769
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: search
> Affects Versions: 4.10.4
> Environment: OS: macOS Sierra (version 10.12.4)
> Memory: 16GB
> CPU: 2.9 GHz Intel Core i7
> Java Version: 1.8
> Reporter: Betim Deva
> Assignee: David Smiley
> Priority: Major
> Labels: performance
> Attachments: SOLR-11769_Optimize_MatchAllDocsQuery_more.patch
>
>
> The performance of sorting degrades significantly when the
> {{useFilterForSortedQuery}} is enabled, and there's no filter query specified.
> *Steps to Reproduce:*
> 1. Set {{useFilterForSortedQuery=true}} in {{solrconfig.xml}}
> 2. Run a query to match and return a single document. Also add sorting
> - Example {{/select?q=foo:123&sort=bar+desc}}
> Having a large index (> 10 million documents), this yields to a slow response
> (a few hundreds of milliseconds on average) even when the resulting set
> consists of a single document.
> *Observation 1:*
> - Disabling {{useFilterForSortedQuery}} improves the performance to < 1ms
> *Observation 2:*
> - Removing the {{sort}} improves the performance to < 1ms
> *Observation 3:*
> - Keeping the {{sort}}, and adding any filter query (such as {{fq=\*:\*}})
> improves the performance to < 1 ms.
> After profiling
> [SolrIndexSearcher.java|https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java;h=9ee5199bdf7511c70f2cc616c123292c97d36b5b;hb=HEAD#l1400]
> found that the bottleneck is on
> {{DocSet bigFilt = getDocSet(cmd.getFilterList());}}
> when {{cmd.getFilterList())}} is passed in as {{null}}. This is making
> {{getDocSet()}} function collect document ids every single time it is called
> without any caching.
> {code:java}
> 1394 if (useFilterCache) {
> 1395 // now actually use the filter cache.
> 1396 // for large filters that match few documents, this may be
> 1397 // slower than simply re-executing the query.
> 1398 if (out.docSet == null) {
> 1399 out.docSet = getDocSet(cmd.getQuery(), cmd.getFilter());
> 1400 DocSet bigFilt = getDocSet(cmd.getFilterList());
> 1401 if (bigFilt != null) out.docSet =
> out.docSet.intersection(bigFilt);
> 1402 }
> 1403 // todo: there could be a sortDocSet that could take a list of
> 1404 // the filters instead of anding them first...
> 1405 // perhaps there should be a multi-docset-iterator
> 1406 sortDocSet(qr, cmd);
> 1407 }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]