[
https://issues.apache.org/jira/browse/SOLR-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106729#comment-16106729
]
Noble Paul commented on SOLR-10265:
-----------------------------------
bq.Would a watcher have to be set for every shard of every collection hosted on
a particular Solr if we split it up more granularly?
yes. I don't think it is a huge problem
Let's do a back-of-the-envelope calculation
let's split our 100B doc index to 1000 shards (100m docs/shard) and 5 replicas
. that is just 5000 cores. This leads to having 5000 watchers if no node has
multiple replicas of same shard. IIRC zookeeper really has no issues with 5000
watchers
> Overseer can become the bottleneck in very large clusters
> ---------------------------------------------------------
>
> Key: SOLR-10265
> URL: https://issues.apache.org/jira/browse/SOLR-10265
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Varun Thacker
>
> Let's say we have a large cluster. Some numbers:
> - To ingest the data at the volume we want to I need roughly a 600 shard
> collection.
> - Index into the collection for 1 hour and then create a new collection
> - For a 30 days retention window with these numbers we would end up wth
> ~400k cores in the cluster
> - Just a rolling restart of this cluster can take hours because the overseer
> queue gets backed up. If a few nodes looses connectivity to ZooKeeper then
> also we can end up with lots of messages in the Overseer queue
> With some tests here are the two high level problems we have identified:
> 1> How fast can the overseer process operations:
> The rate at which the overseer processes events is too slow at this scale.
> I ran {{OverseerTest#testPerformance}} which creates 10 collections ( 1 shard
> 1 replica ) and generates 20k state change events. The test took 119 seconds
> to run on my machine which means ~170 events a second. Let's say a server can
> process 5x of my machine so 1k events a second.
> Total events generated by a 400k replica cluster = 400k * 4 ( state changes
> till replica become active ) = 1.6M / 1k events a second will be 1600 minutes.
> Second observation was that the rate at which the overseer can process events
> slows down when the number of items in the queue gets larger
> I ran the same {{OverseerTest#testPerformance}} but changed the number of
> events generated to 2000 instead. The test took only 5 seconds to run. So it
> was a lot faster than the test run which generated 20k events
> 2> State changes overwhelming ZK:
> For every state change Solr is writing out a big state.json to zookeeper.
> This can lead to the zookeeper transaction logs going out of control even
> with auto purging etc set .
> I haven't debugged why the transaction logs ran into terabytes without taking
> into snapshots but this was my assumption based on the other problems we
> observed
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]