[jira] [Comment Edited] (SOLR-6810) Faster searching limited but high rows across many shards all with many hits

Per Steffensen (JIRA) Wed, 03 Dec 2014 07:49:35 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232772#comment-14232772
 ]


Per Steffensen edited comment on SOLR-6810 at 12/3/14 3:48 PM:
---------------------------------------------------------------

We have solved the problem (reducing response-time by a factor of 60 on our 
particular system/data/distribution) the following way

Introduced the concept of "distributed query algorithm" (DQA) controlled by 
request parameter {{dqa}}. Naming the existing (default) distributed query 
algorithm {{find-id-relevance_fetch-by-ids}} (short-alias {{firfbi}}) and 
introducing a new alternative distributed query algorithm called 
{{find-relevance_find-ids-limited-rows_fetch-by-ids}} (short-alias 
{{frfilrfbi}}) 
* {{find-id-relevance_fetch-by-ids}} does as always - see JavaDoc of 
{{ShardParams.FIND_ID_RELEVANCE_FETCH_BY_IDS}}
* {{find-relevance_find-ids-limited-rows_fetch-by-ids}} does it in a different 
way  - see JavaDoc of 
{{ShardParams.FIND_RELEVANCE_FIND_IDS_LIMITED_ROWS_FETCH_BY_IDS}}

Believe “distributed query algorithm” is a pendant to elasticsearch's “search 
type”, but just with much better naming that say something about what it is 
actually controlling :-)

Both DQAs support the {{disturb.singlePass}} flag. I have *renamed* it to 
{{dqa.forceSkipGetIds}} because it is only {{find-id-relevance_fetch-by-ids}} 
that becomes single-pass (going from 2 to 1 pass) with this flag. 
{{find-relevance_find-ids-limited-rows_fetch-by-ids}} goes from 3 to 2 passes. 
{{dqa.forceSkipGetIds=true}} is default for 
{{find-relevance_find-ids-limited-rows_fetch-by-ids}}. There are really no need 
to ever run with {{dqa.forceSkipGetIds=false}} for this DQA, but it is 
supported.

Attaching patch corresponding to our solution - going into production as we 
speak to reduce our response-times by a factor of 60. You do not necessarily 
need to just adopt it. But lets at least consider it a starting-point for a 
discussion. Details about the patch
* {{ShardParams.DQA}}: Enum of the DQA’s, including different helper methods 
that IMHO belongs here
* {{QueryComponent}}/{{ResponseBuilder}}: Changed to implement both DQA’s now
* {{SolrIndexSearcher.doc}}: Does not go to store, if only asking for score. 
This is important for the optimization 
* {{TestIndexSearcher}}: Added a test to test this particular new aspect of 
{{SolrIndexSearcher}}
* {{TestDistributedQueryAlgorithm}}: A new test-class dedicated tests of DQA’s. 
{{testDocReads}}-test really shows exactly what this new DQA does for you. Test 
asserts that you only go to store X times across the cluster and not (up to) 
#shards * X times (X = rows in outer query)
* {{LeafReaderTestWrappers}}: Test-wrappers for {{LeafReader}} s. Can help 
collecting information about how {{LeafReader}} s are used in different 
test-scenarios. Used by {{TestIndexSearcher}}. Can be extended with other kinds 
of wrappers that collect different kinds of information.
* {{SolrIndexSearcherTestWrapper}} and {{SolrCoreTestWrapper}}. Generic classes 
that can help wrapping all {{LeafReader}} s under a {{SolrIndexSearcher}} or a 
{{SolrCore}} respectively. Used by {{TestDistributedQueryAlgorithm}}
* {{DistributedQueryComponentOptimizationTest}}: Updated with new tests around 
DQA’s. And made more systematic in the way the tests are performed. Do not want 
to add hundreds of almost similar code-lines
* {{ShardRoutingTest}}: Same comments as for 
{{DistributedQueryComponentOptimizationTest}} above
* {{SolrTestCaseJ4}}: Randomly selecting a DQA for each individual query fired 
running the test-suite - when you do not specify which DQA you want explicitly 
in the request. With helper-methods for fixing the DQA for tests that focus on 
DQA testing
* Fix for SOLR-6812 is included in the patch because it is need to keep the 
test-suite green. But should probably be committed as part of SOLR-6812, and 
left out of this SOLR-6810. New DQA 
({{find-relevance_find-ids-limited-rows_fetch-by-ids}}) has 
{{dqa.forceSkipGetIds}} (old {{disturb.singlePass}}) set to true by default. 
And since we run tests randomly selecting the DQA for every query, we are also 
indirectly randoming {{dqa.forceSkipGetIds}}. Therefore the test-suite will 
likely fail if skip-get-ids does not work for all kinds of requests. This is 
actually also a good way to have {{dqa.forceSkipGetIds}} (old 
{{distrib.singlePass}}) tested, so that we will not have a partially-working 
feature (as before SOLR-6795/SOLR-6796/SOLR-6812/SOLR-6813). The tests added to 
{{DistributedQueryComponentOptimizationTest}} in SOLR-6795 and SOLR-6796 have 
been removed again, because the problems (along with any other problems with 
{{dqa.forceSkipGetIds}}) will now (potentially) be revealed anyway because of 
indirect randomized testing of {{dqa.forceSkipGetIds}}
* I do not have a solution to SOLR-6813, so temporarily making sure that it 
will not make the test-suite fail, by forcing the particular query in 
{{DistributedExpandComponentTest}} to use {{find-id-relevance_fetch-by-ids}} 
(making it use {{dqa.forceSkipGetIds=true}}) - the lines 
{{switchToOriginalDQADefaultProvider()}} and 
{{switchToTestDQADefaultProvider()}}. Those lines should be removed when 
SOLR-6813 has been resolved. It will also work with 
{{find-relevance_find-ids-limited-rows_fetch-by-ids}} and 
{{dqa.forceSkipGetIds=false}}, so it is not 
{{find-relevance_find-ids-limited-rows_fetch-by-ids}} that does not work. It is 
{{dqa.forceSkipGetIds=false}} that does not work.


was (Author: steff1193):
We have solved the problem (reducing response-time by a factor of 60 on our 
particular system/data/distribution) the following way

Introduced the concept of "distributed query algorithm" (DQA) controlled by 
request parameter {{dqa}}. Naming the existing (default) distributed query 
algorithm {{find-id-relevance_fetch-by-ids}} (short-alias {{firfbi}}) and 
introducing a new alternative distributed query algorithm called 
{{find-relevance_find-ids-limited-rows_fetch-by-ids}} (short-alias 
{{frfilrfbi}}) 
* {{find-id-relevance_fetch-by-ids}} does as always - see JavaDoc of 
{{ShardParams.FIND_ID_RELEVANCE_FETCH_BY_IDS}}
* {{find-relevance_find-ids-limited-rows_fetch-by-ids does}} it in a different 
way  - see JavaDoc of 
{{ShardParams.FIND_RELEVANCE_FIND_IDS_LIMITED_ROWS_FETCH_BY_IDS}}

Believe “distributed query algorithm” is a pendant to elasticsearch's “search 
type”, but just with much better naming that say something about what it is 
actually controlling :-)

Both DQAs support the {{disturb.singlePass}} flag. I have *renamed* it to 
{{dqa.forceSkipGetIds}} because it is only {{find-id-relevance_fetch-by-ids}} 
that becomes single-pass (going from 2 to 1 pass) with this flag. 
{{find-relevance_find-ids-limited-rows_fetch-by-ids}} goes from 3 to 2 passes. 
{{dqa.forceSkipGetIds=true}} is default for 
{{find-relevance_find-ids-limited-rows_fetch-by-ids}}. There are really no need 
to ever run with {{dqa.forceSkipGetIds=false}} for this DQA, but it is 
supported.

Attaching patch corresponding to our solution - going into production as we 
speak to reduce our response-times by a factor of 60. You do not necessarily 
need to just adopt it. But lets at least consider it a starting-point for a 
discussion. Details about the patch
* {{ShardParams.DQA}}: Enum of the DQA’s, including different helper methods 
that IMHO belongs here
* {{QueryComponent}}/{{ResponseBuilder}}: Changed to implement both DQA’s now
* {{SolrIndexSearcher.doc}}: Does not go to store, if only asking for score. 
This is important for the optimization 
* {{TestIndexSearcher}}: Added a test to test this particular new aspect of 
{{SolrIndexSearcher}}
* {{TestDistributedQueryAlgorithm}}: A new test-class dedicated tests of DQA’s. 
{{testDocReads}}-test really shows exactly what this new DQA does for you. Test 
asserts that you only go to store X times across the cluster and not (up to) 
#shards * X times (X = rows in outer query)
* {{LeafReaderTestWrappers}}: Test-wrappers for {{LeafReader}} s. Can help 
collecting information about how {{LeafReader}} s are used in different 
test-scenarios. Used by {{TestIndexSearcher}}. Can be extended with other kinds 
of wrappers that collect different kinds of information.
* {{SolrIndexSearcherTestWrapper}} and {{SolrCoreTestWrapper}}. Generic classes 
that can help wrapping all {{LeafReader}} s under a {{SolrIndexSearcher}} or a 
{{SolrCore}} respectively. Used by {{TestDistributedQueryAlgorithm}}
* {{DistributedQueryComponentOptimizationTest}}: Updated with new tests around 
DQA’s. And made more systematic in the way the tests are performed. Do not want 
to add hundreds of almost similar code-lines
* {{ShardRoutingTest}}: Same comments as for 
{{DistributedQueryComponentOptimizationTest}} above
* {{SolrTestCaseJ4}}: Randomly selecting a DQA for each individual query fired 
running the test-suite - when you do not specify which DQA you want explicitly 
in the request. With helper-methods for fixing the DQA for tests that focus on 
DQA testing
* Fix for SOLR-6812 is included in the patch because it is need to keep the 
test-suite green. But should probably be committed as part of SOLR-6812, and 
left out of this SOLR-6810. New DQA 
({{find-relevance_find-ids-limited-rows_fetch-by-ids}}) has 
{{dqa.forceSkipGetIds}} (old {{disturb.singlePass}}) set to true by default. 
And since we run tests randomly selecting the DQA for every query, we are also 
indirectly randoming {{dqa.forceSkipGetIds}}. Therefore the test-suite will 
likely fail if skip-get-ids does not work for all kinds of requests. This is 
actually also a good way to have {{dqa.forceSkipGetIds}} (old 
{{distrib.singlePass}}) tested, so that we will not have a partially-working 
feature (as before SOLR-6795/SOLR-6796/SOLR-6812/SOLR-6813). The tests added to 
{{DistributedQueryComponentOptimizationTest}} in SOLR-6795 and SOLR-6796 have 
been removed again, because the problems (along with any other problems with 
{{dqa.forceSkipGetIds}}) will now (potentially) be revealed anyway because of 
indirect randomized testing of {{dqa.forceSkipGetIds}}
* I do not have a solution to SOLR-6813, so temporarily making sure that it 
will not make the test-suite fail, by forcing the particular query in 
{{DistributedExpandComponentTest}} to use {{find-id-relevance_fetch-by-ids}} 
(making it use {{dqa.forceSkipGetIds=true}}) - the lines 
{{switchToOriginalDQADefaultProvider()}} and 
{{switchToTestDQADefaultProvider()}}. Those lines should be removed when 
SOLR-6813 has been resolved. It will also work with 
{{find-relevance_find-ids-limited-rows_fetch-by-ids}} and 
{{dqa.forceSkipGetIds=false}}, so it is not 
{{find-relevance_find-ids-limited-rows_fetch-by-ids}} that does not work. It is 
{{dqa.forceSkipGetIds=false}} that does not work.

> Faster searching limited but high rows across many shards all with many hits
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-6810
>                 URL: https://issues.apache.org/jira/browse/SOLR-6810
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Per Steffensen
>              Labels: distributed_search, performance
>         Attachments: branch_5x_rev1642874.patch
>
>
> Searching "limited but high rows across many shards all with many hits" is 
> slow
> E.g.
> * Query from outside client: q=something&rows=1000
> * Resulting in sub-requests to each shard something a-la this
> ** 1) q=something&rows=1000&fl=id,score
> ** 2) Request the full documents with ids in the global-top-1000 found among 
> the top-1000 from each shard
> What does the subject mean
> * "limited but high rows" means 1000 in the example above
> * "many shards" means 200-1000 in our case
> * "all with many hits" means that each of the shards have a significant 
> number of hits on the query
> The problem grows on all three factors above
> Doing such a query on our system takes between 5 min to 1 hour - depending on 
> a lot of things. It ought to be much faster, so lets make it.
> Profiling show that the problem is that it takes lots of time to access the 
> store to get id’s for (up to) 1000 docs (value of rows parameter) per shard. 
> Having 1000 shards its up to 1 mio ids that has to be fetched. There is 
> really no good reason to ever read information from store for more than the 
> overall top-1000 documents, that has to be returned to the client.
> For further detail see mail-thread "Slow searching limited but high rows 
> across many shards all with high hits" started 13/11-2014 on 
> dev@lucene.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6810) Faster searching limited but high rows across many shards all with many hits

Reply via email to