[ https://issues.apache.org/jira/browse/SOLR-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298292#comment-16298292 ]
Yonik Seeley commented on SOLR-11769: ------------------------------------- We just need to be careful about returning cached sets where we did not before (and check that we never modify sets returned, as well as document that the returned set should not be modified). For DocSetUtil.createDocSet specifically, it feels like potentially returning liveDocs should either be moved to a higher level caching function, or we should rename createDocSet since it can now sometimes just return an existing shared set. > Sorting performance degrades when useFilterForSortedQuery is enabled and > there is no filter query specified > ----------------------------------------------------------------------------------------------------------- > > Key: SOLR-11769 > URL: https://issues.apache.org/jira/browse/SOLR-11769 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search > Affects Versions: 4.10.4 > Environment: OS: macOS Sierra (version 10.12.4) > Memory: 16GB > CPU: 2.9 GHz Intel Core i7 > Java Version: 1.8 > Reporter: Betim Deva > Assignee: David Smiley > Labels: performance > Attachments: SOLR-11769_Optimize_MatchAllDocsQuery_more.patch > > > The performance of sorting degrades significantly when the > {{useFilterForSortedQuery}} is enabled, and there's no filter query specified. > *Steps to Reproduce:* > 1. Set {{useFilterForSortedQuery=true}} in {{solrconfig.xml}} > 2. Run a query to match and return a single document. Also add sorting > - Example {{/select?q=foo:123&sort=bar+desc}} > Having a large index (> 10 million documents), this yields to a slow response > (a few hundreds of milliseconds on average) even when the resulting set > consists of a single document. > *Observation 1:* > - Disabling {{useFilterForSortedQuery}} improves the performance to < 1ms > *Observation 2:* > - Removing the {{sort}} improves the performance to < 1ms > *Observation 3:* > - Keeping the {{sort}}, and adding any filter query (such as {{fq=\*:\*}}) > improves the performance to < 1 ms. > After profiling > [SolrIndexSearcher.java|https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java;h=9ee5199bdf7511c70f2cc616c123292c97d36b5b;hb=HEAD#l1400] > found that the bottleneck is on > {{DocSet bigFilt = getDocSet(cmd.getFilterList());}} > when {{cmd.getFilterList())}} is passed in as {{null}}. This is making > {{getDocSet()}} function collect document ids every single time it is called > without any caching. > {code:java} > 1394 if (useFilterCache) { > 1395 // now actually use the filter cache. > 1396 // for large filters that match few documents, this may be > 1397 // slower than simply re-executing the query. > 1398 if (out.docSet == null) { > 1399 out.docSet = getDocSet(cmd.getQuery(), cmd.getFilter()); > 1400 DocSet bigFilt = getDocSet(cmd.getFilterList()); > 1401 if (bigFilt != null) out.docSet = > out.docSet.intersection(bigFilt); > 1402 } > 1403 // todo: there could be a sortDocSet that could take a list of > 1404 // the filters instead of anding them first... > 1405 // perhaps there should be a multi-docset-iterator > 1406 sortDocSet(qr, cmd); > 1407 } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org