[jira] [Commented] (SOLR-15138) PerReplicaStates does not scale to large collections as well as state.json

Mike Drob (Jira) Mon, 15 Feb 2021 15:06:06 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284973#comment-17284973
 ]


Mike Drob commented on SOLR-15138:
----------------------------------

I've done some performance testing on master, here are comparisons on the same 
4 node cluster after a sufficient warm up period.

 

Small collections - 10 clients creating 10 collections in parallel, each 
collection 2x2
|| ||Default||PRS||
|Median|4806ms|4779ms|
|95%|6057ms|5434ms|

Large collections - 1 client creating 10x10 sized collections

 
|| ||Default||PRS||
|Median|8972ms|13956ms|
|95%|10294ms|17393ms|

 

I didn't have a chance to take heap measurements or anything much more detailed 
here, but my observations are that while PRS is _slightly_ better for lots of 
small collections, looks like about 10% faster on the tails, and a fair bit 
worse for large collections overall, maybe 50-70% slower.

 

I then sized my cluster up to 8 nodes, the small collection test performed 
almost identically (not sure if this is a good sign or not). The large 
collection test gave me these numbers:
|| ||Default||PRS||
|Median|8882ms|9016ms|
|95%|8903ms|9246ms|

A lot less impact from the slowest requests on the larger cluster, but PRS 
still does worse than the consolidated state.json implementation.

 

 

Note that this was with the Distributed Overseer change in the code, but I did 
not have that enabled for this cluster.

> PerReplicaStates does not scale to large collections as well as state.json
> --------------------------------------------------------------------------
>
>                 Key: SOLR-15138
>                 URL: https://issues.apache.org/jira/browse/SOLR-15138
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 8.8
>            Reporter: Mike Drob
>            Assignee: Noble Paul
>            Priority: Major
>             Fix For: 8.9
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> I was testing PRS collection creation with larger collections today 
> (previously I had tested with many small collections) and it seemed to be 
> having trouble keeping up.
>  
> I was running a 4 node instance, each JVM with 4G Heap in k8s, and a single 
> zookeeper.
>  
> With this cluster configuration, I am able to create several (at least 10) 
> collections with 11 shards and 11 replicas using the "old way" of keeping 
> state. These collections are created serially, waiting for all replicas to be 
> active before proceeding.
> However, when attempting to do the same with PRS, the creation stalls on 
> collection 2 or 3, with several replicas stuck in a "down" state. Further, 
> when attempting to delete these collections using the regular API it 
> sometimes takes several attempts after getting stuck a few times as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-15138) PerReplicaStates does not scale to large collections as well as state.json

Reply via email to