[
https://issues.apache.org/jira/browse/SOLR-16472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patson Luk updated SOLR-16472:
------------------------------
Description:
h2. Description
It is found that the existing `CreateCollectionCmd` would update the
[`state.json` per new
replica|https://github.com/apache/solr/blob/c80cf5b03ac1b1307df65ebe37f6ffdb26611013/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java#L334].
This potentially creates a lot of unnecessary writes as only the last
state.json with all the replicas matters.
Take note that this actually is a problem for non PRS collection as well, but
probably less problematic for non PRS as even if it does very similar operation
per replica by sending the update to OS queue
`ccc.offerStateUpdate(Utils.toJSON(props));`, at least OS would use the
`ClusterStateUpdater` which uses `ZkStateWriter#enqueueUpdate` that likely
batches up state.json update (only writes the latest entry every 2 secs by
default) ; while for PRS, we use direct update which each call translates to an
actual call to ZK.
h2. Proposal
We will move the `ZkClient#setData` statement out of the replica loop and only
call it once outside
was:
h2. Description
It is found that the existing `CreateCollectionCmd` would update the
[`state.json` per new
replica|https://github.com/apache/solr/blob/c80cf5b03ac1b1307df65ebe37f6ffdb26611013/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java#L334].
This potentially creates a lot of unnecessary writes as only the last
state.json with all the replicas matters.
Take note that this actually is a problem for non PRS collection as well, but
probably less problematic for non PRS as even if it does very similar operation
per replica by sending the update to OS queue
`ccc.offerStateUpdate(Utils.toJSON(props));`, at least OS would use the
`ClusterStateUpdater` which uses `ZkStateWriter#enqueueUpdate` that likely
batches up state.json update (only writes the latest entry every 2 secs by
default) ; while for PRS, we use directly update which each call translates to
an actual call to ZK.
h2. Proposal
We will move the `ZkClient#setData` statement out of the replica loop and only
call it once outside
> Remove redundant state.json update For PRS collection creation
> --------------------------------------------------------------
>
> Key: SOLR-16472
> URL: https://issues.apache.org/jira/browse/SOLR-16472
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: 9.0
> Reporter: Patson Luk
> Priority: Minor
>
> h2. Description
> It is found that the existing `CreateCollectionCmd` would update the
> [`state.json` per new
> replica|https://github.com/apache/solr/blob/c80cf5b03ac1b1307df65ebe37f6ffdb26611013/solr/core/src/java/org/apache/solr/cloud/api/collections/CreateCollectionCmd.java#L334].
> This potentially creates a lot of unnecessary writes as only the last
> state.json with all the replicas matters.
> Take note that this actually is a problem for non PRS collection as well, but
> probably less problematic for non PRS as even if it does very similar
> operation per replica by sending the update to OS queue
> `ccc.offerStateUpdate(Utils.toJSON(props));`, at least OS would use the
> `ClusterStateUpdater` which uses `ZkStateWriter#enqueueUpdate` that likely
> batches up state.json update (only writes the latest entry every 2 secs by
> default) ; while for PRS, we use direct update which each call translates to
> an actual call to ZK.
> h2. Proposal
> We will move the `ZkClient#setData` statement out of the replica loop and
> only call it once outside
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]