[ 
https://issues.apache.org/jira/browse/SOLR-16472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patson Luk updated SOLR-16472:
------------------------------
    Description: 
h2. Description

It is found that the existing `CreateCollectionCmd` would update the 
[`state.json` per new 
replica|https://github.com/apache/solr/blob/c80cf5b03ac1b1307df65ebe37f6ffdb26611013/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java#L334].
 This potentially creates a lot of unnecessary writes as only the last 
state.json with all the replicas matters.

Take note that this actually is a problem for non PRS collection as well, but 
probably less problematic for non PRS as even if it does very similar operation 
per replica by sending the update to OS queue 
`ccc.offerStateUpdate(Utils.toJSON(props));`, at least OS would use the 
`ClusterStateUpdater` which uses `ZkStateWriter#enqueueUpdate` that likely 
batches up state.json update (only writes the latest entry every 2 secs by 
default) ; while for PRS, we use direct update which each call translates to an 
actual call to ZK.
h2. Proposal

We will move the `ZkClient#setData` statement out of the replica loop and only 
call it once outside

 

  was:
h2. Description
It is found that the existing `CreateCollectionCmd` would update the 
[`state.json` per new 
replica|https://github.com/apache/solr/blob/c80cf5b03ac1b1307df65ebe37f6ffdb26611013/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java#L334].
 This potentially creates a lot of unnecessary writes as only the last 
state.json with all the replicas matters.

Take note that this actually is a problem for non PRS collection as well, but 
probably less problematic for non PRS as even if it does very similar operation 
per replica by sending the update to OS queue 
`ccc.offerStateUpdate(Utils.toJSON(props));`, at least OS would use the 
`ClusterStateUpdater` which uses `ZkStateWriter#enqueueUpdate` that likely 
batches up state.json update (only writes the latest entry every 2 secs by 
default) ; while for PRS, we use directly update which each call translates to 
an actual call to ZK.

h2.  Proposal
We will move the `ZkClient#setData` statement out of the replica loop and only 
call it once outside

 


> Remove redundant state.json update For PRS collection creation
> --------------------------------------------------------------
>
>                 Key: SOLR-16472
>                 URL: https://issues.apache.org/jira/browse/SOLR-16472
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 9.0
>            Reporter: Patson Luk
>            Priority: Minor
>
> h2. Description
> It is found that the existing `CreateCollectionCmd` would update the 
> [`state.json` per new 
> replica|https://github.com/apache/solr/blob/c80cf5b03ac1b1307df65ebe37f6ffdb26611013/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java#L334].
>  This potentially creates a lot of unnecessary writes as only the last 
> state.json with all the replicas matters.
> Take note that this actually is a problem for non PRS collection as well, but 
> probably less problematic for non PRS as even if it does very similar 
> operation per replica by sending the update to OS queue 
> `ccc.offerStateUpdate(Utils.toJSON(props));`, at least OS would use the 
> `ClusterStateUpdater` which uses `ZkStateWriter#enqueueUpdate` that likely 
> batches up state.json update (only writes the latest entry every 2 secs by 
> default) ; while for PRS, we use direct update which each call translates to 
> an actual call to ZK.
> h2. Proposal
> We will move the `ZkClient#setData` statement out of the replica loop and 
> only call it once outside
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to