[
https://issues.apache.org/jira/browse/SOLR-15352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18029096#comment-18029096
]
David Smiley commented on SOLR-15352:
-------------------------------------
I suspect collection1 and collection2 are each only a single shard, and so
going to an alias over these two turns the scenario from a single shard to
multi-sharded search which is a fundamentally different algorithm with multiple
phases of interaction with each shard. Solr calls this distributed search
internally. Furthermore, your collections are so small that the unavoidable
overhead of distributed search is going to be more pronounced here. Still;
that's quite a performance difference that I'm not sure can completely be
explained by that hypothesis. Maybe.
I suggest trying adding {{distrib.singlePass=true}} as documented
[here|https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html#distrib-singlepass-parameter]
and seeing if this improves your results. For only 2 shards and not adding
stuff like highlighting, I think it would be faster.
> Querying multiple collection performance issue
> ----------------------------------------------
>
> Key: SOLR-15352
> URL: https://issues.apache.org/jira/browse/SOLR-15352
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 8.8.1
> Environment: SolrCloud Setup: 3 zookeeper servers and 2 solr cloud
> nodes
> each Solr node hosted in AWS m5.xlarge EC2 with 8G RAM dedicated to Solr JVM
> heap
> in this environment each collection is one shard and 2 replicas
> for benchmarking i've used JMeter, setting the thread group=50, and loop
> count=500
>
> Reporter: Mazen Raafat
> Priority: Critical
> Attachments: querying alias points to collection 1 and collection
> 2.png, querying collection 1.png, querying collection 2.png, thread group.png
>
>
> performance degradation when querying multiple collections using aliases that
> points to multiple collection or calling the search handler directly with
> collection query param as follow
> {{http://localhost:8983/solr/collection1/select?collection=collection1,collection2,collection3}}
>
> in the first test i have queried a collection with about 40k docs, the
> throughput was ~3k req/sec
> in the second test i have queried another collection with about 4k docs, the
> throughput was ~3.5k req/sec
> in the third test i have queried an alias that points to both collections and
> viola! the throughput dropped to be ~200 req/sec!
> i have tried not to use alias and use
> solrurl/solr/collection1/select?collections=collection2 and got the same
> result
>
> notes:
> # both collections have the same schema
> # query and filter query are the same in all tests
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]