[ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253820#comment-17253820
 ] 

Noble Paul commented on SOLR-15052:
-----------------------------------

{quote}Then the {{R5}} update is also going to read the directory listing and 
execute.
{quote}
R5 would have gotten a callback and it would've updated the per-replica-states 
anyway. So, all that we are doing is an extra {{stat}} read , which is 
extremely cheap.
{quote}With 500 children znodes, getChildren took on my laptop about 10-15ms 
while getData on a single file with equivalent amount of text took longer at 
~20ms. This came as a surprise to me.
{quote}
Reads are not such a big deal. Even writes are not a big deal. But, CAS writes 
are a big deal. We would like to minimize contention while doing CAS writes.
{quote}The multi operation (delete znode, create znode) took about 40ms while 
the CAS of the text file was faster at 30ms,
{quote}
CAS in itself is not slow. As the no:of of parallel writes grow, the 
performance degrades dramatically. If you have 1000's of replicas trying to 
update using CAS, the performance is going to be unacceptably low. Where as, 
the {{multi}} approach on individual nodes will perform same irrespective of 
whether we have 2 replicas or 20000 replicas.
{quote}The implementation in the PR could easily avoid systematically 
re-reading the znode children list by attempting the multi operation on the 
cached PerReplicaStates of the DocCollection
{quote}
It already uses the cached data. Yes, it does an extra version check, but 
that's cheap

> Reducing overseer bottlenecks using per-replica states
> ------------------------------------------------------
>
>                 Key: SOLR-15052
>                 URL: https://issues.apache.org/jira/browse/SOLR-15052
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Ishan Chattopadhyaya
>            Priority: Major
>         Attachments: per-replica-states-gcp.pdf
>
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to