[ 
https://issues.apache.org/jira/browse/SOLR-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784877#comment-16784877
 ] 

Gus Heck commented on SOLR-12993:
---------------------------------

In general I wonder about out tendency to store configuration as blobs of JSON 
in Zookeeper which is inherently tree based itself (and more flexible, and more 
powerful than JSON). We have lots of code that is packing and unpacking these 
JSON files and then effectively accessing a path within the JSON... whereas we 
could just access a path in Zookeeper directly and not bother with any JSON 
parsing/encoding. And if we wanted to watch something within the structure it 
wouldn't have to be conflated with other code that wanted to watch something 
else...

> Split the state.json into 2. a small frequently modified data + a large 
> unmodified data
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-12993
>                 URL: https://issues.apache.org/jira/browse/SOLR-12993
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Noble Paul
>            Priority: Major
>
> This a just a proposal to minimize the ZK load and improve scalability of 
> very large clusters.
> Every time a small state change occurs for a collection/replica the following 
> file needs to be updated + read * n times (where n = no of replicas for this 
> collection ). The proposal is to split the main file into 2.
> {code}
> {"gettingstarted":{
>     "pullReplicas":"0",
>     "replicationFactor":"2",
>     "router":{"name":"compositeId"},
>     "maxShardsPerNode":"-1",
>     "autoAddReplicas":"false",
>     "nrtReplicas":"2",
>     "tlogReplicas":"0",
>     "shards":{
>       "shard1":{
>         "range":"80000000-ffffffff",
>       
>         "replicas":{
>           "core_node3":{
>             "core":"gettingstarted_shard1_replica_n1",
>             "base_url":"http://10.0.0.80:8983/solr";,
>             "node_name":"10.0.0.80:8983_solr",
>             "state":"active",
>             "type":"NRT",
>             "force_set_state":"false",
>             "leader":"true"},
>           "core_node5":{
>             "core":"gettingstarted_shard1_replica_n2",
>             "base_url":"http://10.0.0.80:7574/solr";,
>             "node_name":"10.0.0.80:7574_solr",
>          
>             "type":"NRT",
>             "force_set_state":"false"}}},
>       "shard2":{
>         "range":"0-7fffffff",
>         "state":"active",
>         "replicas":{
>           "core_node7":{
>             "core":"gettingstarted_shard2_replica_n4",
>             "base_url":"http://10.0.0.80:7574/solr";,
>             "node_name":"10.0.0.80:7574_solr",
>            
>             "type":"NRT",
>             "force_set_state":"false"},
>           "core_node8":{
>             "core":"gettingstarted_shard2_replica_n6",
>             "base_url":"http://10.0.0.80:8983/solr";,
>             "node_name":"10.0.0.80:8983_solr",
>          
>             "type":"NRT",
>             "force_set_state":"false",
>             "leader":"true"}}}}}}
> {code}
> another file {{status.json}} which is frequently updated and small.
> {code}
> {
>     "shard1": {
>       "state": "ACTIVE",
>       "core_node3": {"state": "active", "leader" : true},
>       "core_node5": {"state": "active"}
>     },
>     "shard2": {
>       "state": "active",
>       "core_node7": {"state": "active"},
>       "core_node8": {"state": "active", "leader" : true}}
>   }
> {code}
> Here the size of the file is roughly one tenth of the other file. This leads 
> to a dramatic reduction in the amount of data written/read to/from ZK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to