[ 
https://issues.apache.org/jira/browse/SOLR-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366589#comment-16366589
 ] 

David Smiley commented on SOLR-11769:
-------------------------------------


Yonik: That's a good point -- "createDocSet" ought to have a result that is 
safe to modify.  However note that you made createDocSet return 
DocSetUtil.getDocSet( answer, searcher ) at the very end, which is expressly 
documented to not have a result that is safe to modify since it can set 
LiveDocs.  Maybe an indirect caller here, SolrIndexSearcher.getDocSet(Query 
query) and getDocSet(Query, DocSet) ought to then concern themselves with 
calling this on the results of getDocSetNC because these methods are cache 
aware.

Separately, I've been poking around this related code and noticed this 
DocSetProducer thing.  I'm skeptical we should have it.  Firstly it's a Solr 
thing but most Queries are implemented at the Lucene level.  Secondly, I think 
we can achieve the same by grabbing the Weight then Scorer and then the 
DocIdSetIterator, then pass to BitSetIterator.getFixedBitSetOrNull(iter).  That 
should work with a some Lucene and Solr queries,

> Sorting performance degrades when useFilterForSortedQuery is enabled and 
> there is no filter query specified
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11769
>                 URL: https://issues.apache.org/jira/browse/SOLR-11769
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: 4.10.4
>         Environment: OS: macOS Sierra (version 10.12.4)
> Memory: 16GB
> CPU: 2.9 GHz Intel Core i7
> Java Version: 1.8
>            Reporter: Betim Deva
>            Assignee: David Smiley
>            Priority: Major
>              Labels: performance
>         Attachments: SOLR-11769_Optimize_MatchAllDocsQuery_more.patch
>
>
> The performance of sorting degrades significantly when the 
> {{useFilterForSortedQuery}} is enabled, and there's no filter query specified.
> *Steps to Reproduce:*
> 1. Set {{useFilterForSortedQuery=true}} in {{solrconfig.xml}}
> 2. Run a  query to match and return a single document. Also add sorting
> - Example {{/select?q=foo:123&sort=bar+desc}}
> Having a large index (> 10 million documents), this yields to a slow response 
> (a few hundreds of milliseconds on average) even when the resulting set 
> consists of a single document.
> *Observation 1:*
> - Disabling {{useFilterForSortedQuery}} improves the performance to < 1ms
> *Observation 2:*
> - Removing the {{sort}} improves the performance to < 1ms
> *Observation 3:*
> - Keeping the {{sort}}, and adding any filter query (such as {{fq=\*:\*}}) 
> improves the performance to < 1 ms.
> After profiling 
> [SolrIndexSearcher.java|https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java;h=9ee5199bdf7511c70f2cc616c123292c97d36b5b;hb=HEAD#l1400]
>  found that the bottleneck is on 
> {{DocSet bigFilt = getDocSet(cmd.getFilterList());}} 
> when {{cmd.getFilterList())}} is passed in as {{null}}. This is making 
> {{getDocSet()}} function collect document ids every single time it is called 
> without any caching.
> {code:java}
> 1394     if (useFilterCache) {
> 1395       // now actually use the filter cache.
> 1396       // for large filters that match few documents, this may be
> 1397       // slower than simply re-executing the query.
> 1398       if (out.docSet == null) {
> 1399         out.docSet = getDocSet(cmd.getQuery(), cmd.getFilter());
> 1400         DocSet bigFilt = getDocSet(cmd.getFilterList());
> 1401         if (bigFilt != null) out.docSet = 
> out.docSet.intersection(bigFilt);
> 1402       }
> 1403       // todo: there could be a sortDocSet that could take a list of
> 1404       // the filters instead of anding them first...
> 1405       // perhaps there should be a multi-docset-iterator
> 1406       sortDocSet(qr, cmd);
> 1407     }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to