[GitHub] [druid] capistrant opened a new pull request, #12504: Add coordinator dynamic config to limit the number of segments loaded per RunRules execution

GitBox Thu, 05 May 2022 11:24:03 -0700


capistrant opened a new pull request, #12504:
URL: https://github.com/apache/druid/pull/12504

<!-- If you are a committer, follow the PR action item checklist for
committers:

https://github.com/apache/druid/blob/master/dev/committer-instructions.md#pr-and-issue-action-item-checklist-for-committers.
-->

### Description

#### Added a new coordinator dynamic configuration item

`maxSegmentsToLoad`

> This it the maximum number of segments - both primary and non-primary
replicants - that can
be loaded per Coordination run. The default is equivalent to there
being no limit. This
differs from maxNonPrimaryReplicantsToLoad because it includes the
count of primary
replicants that are loaded in the limit. An operator may want to use
this configuration to
prevent the coordinator from spinning and loading many segments that
are already loaded, but
appeared to be unavailable due to a temporary network issue causing
some number of Historical
servers to have their segments go missing (or some other event that
caused a similar event).

#### Added logic in RunRules to short-circuit if the number of segments
loaded reaches the value of the new dynamic config

We track the aggregate number of segments loaded during the execution of
RunRules in a global statistic. `LoadRule.java` is the class that updates this
stat as it loads segments. `RunRules.java` is continuously checking the value
of this stat, making sure we haven't loaded the max amount of segments before
it matches another segment to load rules. If the limit is reached, RunRules
won't address any more segments and the coordinator moves on to the next duty.

#### Why?

The coordinator refreshes its map of the segments loaded in the cluster at
the start of the list of duties it is about to execute. It does not dynamically
respond to segments being announced while the duties are in-flight. This means
that if for some reason 10% of druid segments were unavailable at the start of
coordination, but became available while the coordinator was executing run
rules, the coordinator would still proceed with loading all of the segments who
were unavailable or under-replicated. This can lead to lots of wasted time and
work! If those 10% of unavailable segments take 20 minutes to load, that is 20
minutes that the coordinator could have been doing other work, such as loading
actual unavailable segments that were newly ingested, etc.

An example when something like this could happen is if you had a network
issue that temporarily forced a number of historical nodes offline, but they
re-connected shortly thereafter, but too late to prevent the coordinator from
thinking all of their data was gone.

Another example would be some type of negative event where multiple
historical servers had to be restarted. Think OOM or GC issues due to an
unusual workload. If those historicals restart after the coordinator has
decided they were not serving their segments anymore, we end up trying to load
all the segments despite them announcing after starting up.

At the end of the day this is a niche configuration that operators may
desire access to in some circumstances. I come from a background of operating a
large multi-tenant cluster wrote this patch out of necessity due to an existing
issue we are having. Our cluster has faced some instability recently due to
unexpected workloads causing historicals to wedge in a sub optimal state due to
GC. While we work on a solution to the underlying problem, this a mitigating
resource to deal with the occasions that we have to restart multiple historical
servers at one time to get out of the bad state. I configure the coordinator to
load only as many segments as we typically see during peak ingest times. That
way, we are operating as normal at all times, and in the case of an unexpected
issue, the coordinator will not load thousands of segments that come back
online after historical restarts.

#### How is this different from `maxNonPrimaryReplicantsToLoad`

This dynamic configuration was also introduced by me. Again, a reaction to
an experience while managing a large cluster. When we took servers out for
maintenance, we did not want the coordinator block while replicating all of the
segments to get back to full replication. Instead we wanted the coordinator to
eat away at finite sized chunks of replicas. This was to prevent blocking other
duties from running, such as loading primary replicas.

That configuration still serves that purpose if it is set lower than this
new configuration. For instance, I may want to limit my coordinator to loading
a max of 10000 segments per RunRules cycle. However, I also want to limit
non-primary replicas to 4000.

<hr>

##### Key changed/added classes in this PR
* `RunRules`
* `LoadRule`
* web-console dynamic config forms
<hr>

This PR has:
- [ ] been self-reviewed.
- [ ] added documentation for new or modified features or behaviors.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] been tested in a test Druid cluster.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] capistrant opened a new pull request, #12504: Add coordinator dynamic config to limit the number of segments loaded per RunRules execution

Reply via email to