[GitHub] [druid] capistrant commented on pull request #11135: Create dynamic config that can limit number of non-primary replicants loaded per coordination cycle

GitBox Fri, 05 Aug 2022 13:45:33 -0700


capistrant commented on PR #11135:
URL: https://github.com/apache/druid/pull/11135#issuecomment-1206851194


   > @capistrant , I was taking a look at the `maxNonPrimaryReplicantsToLoad` 
config but I couldn't really distinguish it from `replicationThrottleLimit`.
   > 
   > I see that you have made a similar observation here:
   > 
   > > I folded this new configuration and feature into ReplicationThrottler. 
That is essentially what it is doing, just in a new way compared to the current 
ReplicationThrottler functionality.
   > 
   > Could you please help me understand the difference between the two? In 
which case would we want to tune this config rather than tuning the 
`replicationThrottleLimit` itself?
   
   My observation is that `maxNonPrimaryReplicantsToLoad` is a new way of 
throttling replication. Not that it is doing the same thing as 
`replicationThrottleLimit`
   
   `replicationThrottleLimit` is a limit on the number of in-progress replica 
loads at any one time during RunRules. We tack the in-progress loads in a list. 
Items are removed from said list when a `LoadQueuePeon` issues a callback to 
remove them on completion of the load.
   
   `maxNonPrimaryReplicantsToLoad` is a hard limit on the number of replica 
loads during RunRules. Once it is hit, there is no more non-primary replicas 
created for the rest of RunRules. 
   
   You'd want to tune `maxNonPrimaryReplicantsToLoad` if you want to put an 
upper bound on the work to load non-primary replicas done by the coordinator 
per execution of RunRules. The reason we use it at my org is because we want 
the coordinator to avoid "putting it's head in the sand" and loading replicas 
for an un-desirable amount of time instead of finishing it's duties and 
refreshing its metadata. An example of an "un-desirable amount of work" is if a 
Historical drops out of the cluster momentarily while the Coordinator is 
refreshing its `SegmentReplicantLookup`. The coordinator all of a sudden thinks 
X segment are under-replicated. But if the Historical is coming back online 
(say after a restart to deploy new configs), we don't want the Coordinator to 
spin and load those X segments when it could just finish its duties and notice 
that the segments are not under-replicated anymore. 
   
   I'm not aware of reasons for using `replicationThrottleLimit`. It didn't 
meet my orgs needs for throttling replication and it is why I introduced the 
new config. I guess it is a way to avoid flooding the cluster with replica 
loads? My clusters have actually tuned that value up to avoid hitting it at the 
low default that exists. We don't care about the number of in-flight loads, we 
just care about limiting the total number of replica loads per RunRules 
execution.
   
   Let me know if that clarification is still not making sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] capistrant commented on pull request #11135: Create dynamic config that can limit number of non-primary replicants loaded per coordination cycle

Reply via email to