[
https://issues.apache.org/jira/browse/SOLR-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803670#comment-13803670
]
Noble Paul commented on SOLR-5381:
----------------------------------
bq.We also are not constantly working with large files - in a steady state we
dont pull or push large files at all to ZK - it's only on a cluster state change
In a big enough cluster you can expect a state change event almost every few
seconds. So , it is not ideal to update the state on each node all the time
bq. If each collection had its own clusterstate.json, maybe migrate collection
to other cluster will be more easy,
Yes it is a low hanging fruit. probably easier to implement than separating out
shards
bq.but it was very expensive it turned out, because of having to do so many
calls to load the state
This would be a very wrong approach. Each node does not need to be aware of
every other node in the cluster. A node may only be aware of the shards it is a
member of. It really does not have to load the state of other shards. The only
instance when a node needs to know about the state of other shards is when it
needs to forward a request. That information can be looked up on demand and
cached. The cache can be invalidated when a request is fired to a wrong node
.Each request would say that this request is for collection/shard/range .If the
assumption is wrong the node would throw an appropriate exception . The sender
can invalidate the cache and refresh the state
As I see it , SolrCloud cluster is a cluster of shards. A shard is the logical
unit . Nobody should need to watch other shards on a realtime basis. In a very
large cluster, requests would rarely span across shards because the data would
be partitioned in such a way that the queries/updates would be contained within
the shard itself.
> Split Clusterstate and scale
> -----------------------------
>
> Key: SOLR-5381
> URL: https://issues.apache.org/jira/browse/SOLR-5381
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Reporter: Noble Paul
> Assignee: Noble Paul
> Original Estimate: 2,016h
> Remaining Estimate: 2,016h
>
> clusterstate.json is a single point of contention for all components in
> SolrCloud. It would be hard to scale SolrCloud beyond a few thousand nodes
> because there are too many updates and too many nodes need to be notified of
> the changes. As the no:of nodes go up the size of clusterstate.json keeps
> going up and it will soon exceed the limit impossed by ZK.
> The first step is to store the shards information in separate nodes and each
> node can just listen to the shard node it belongs to. We may also need to
> split each collection into its own node and the clusterstate.json just
> holding the names of the collections .
> This is an umbrella issue
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]