Hi Stan,

I was about to start a vote on this one but I think I have one more idea to
your last point about the total cap.
What if we said that the (leader|follower).replication.throttled.rate is
the overall limit which we allow for leadership replication (so the total
cap) and (leader|follower).reassignment.throttled.rate must have a value
lower than that. By default it'd be -1 which would mean that
replication.throttled.rate should be applied (so the backward compatible
behavior). Increasing this value would mean that we put a limit on
reassignment throttling. For other replication throttling the
replication.throttled.rate - reassignment.throttled.rate would be applied.
If replication.throttled.rate is not specified but
reassignment.throttled.rate is specified, then the reassignment is bounded
and other replication traffic isn't. Finally,
replication.throttled.replicas would be applied on reassignment too if
specified so the reassignment won't "escape" the boundaries given by the
replication throttle side.
I think this is a fair solution to solve the total cap problem and would be
aligned with the current config.
What do you think?

Viktor

On Mon, Nov 4, 2019 at 3:55 PM Viktor Somogyi-Vass <viktorsomo...@gmail.com>
wrote:

> Exactly. I also can't envision scenarios where we would like to throttle
> the reassignment traffic to only a subset of the reassigning replicas.
>
> The other day I was wondering about that with specialized quotas we can
> solve the incremental partition reassignment too. Basically the controller
> would throttle most of the partitions to 0 and let only some of them to
> reassign but I discarded the idea because it is more intuitive to actually
> break up a big reassignment into smaller steps (and more traceable too).
> But perhaps there is a need for throttling the reassigning replicas
> differently depending on the produce rate on those partitions, however in
> my mind I was planning with the incremental partition reassignment so
> perhaps it'd be the best if the controller would be able to decide how many
> partition can be fitted into the given bandwidth and we'd just expose
> simple configs.
>
> If we always take the lowest value, this means that the reassignment
> throttle must always be equal to or lower than the replication throttle.
> Doesn't that mean that the reassigning partitions may never catch up? I
> guess not, since we expect to always be moving less than the total number
> of partitions at one time.
> I have mixed feelings about this - I like the flexibility of being able to
> configure whatever value we please, yet I struggle to come up with a
> scenario where we would want a higher reassignment throttle than
> replication. Perhaps your suggestion is better.
>
> Yes it could mean that, however concern with preferring reassignment
> quotas is that it could cause the "bootstrapping broker problem", so the
> sum of follower reassignment + replication quotas would eat away the
> bandwidth from the leaders. In this case I think it's a better problem to
> have a reassignment that you can't finish than leaders unable to answer
> fetch requests fast enough. The reassignment problem can be mitigated in
> this case by carefully increasing the replication & reassignment quotas in
> this case for the given partition. I'll set up a test environment for this
> though and get back if something doesn't add up.
>
> This begs another question - since we're separating the replication
> throttle from the reassignment throttle, the maximum traffic a broker may
> replicate now becomes `replication.throttled.rate` + `
> reassignment.throttled.rate`
> Seems like we would benefit from having a total cap to ensure users don't
> shoot themselves in the foot.
>
> We could have a new config that denotes the total possible throttle rate
> and we then divide that by reassignment and replication. But that assumes
> that we would set the replication.throttled.rate much lower than what the
> broker could handle.
>
> Perhaps the best approach would be to denote how much the broker can handle
> (total.replication.throttle.rate) and then allow only up to N% of that go
> towards reassignments (reassignment.throttled.rate) in a best-effort way
> (preferring replication traffic). That sounds tricky to implement though
> Interested to hear what others think
>
> Good catch. I'm also leaning towards to having simpler configs and
> improving the broker/controller code to make more intelligent decisions. I
> also agree with having a total.replication.throttle.rate but I think we
> should stay with the byte based notation as that is more conventional in
> the quota world and easier to handle. That way you can say that your total
> replication quota is 10, your leader and follower replication quota is 3
> each, the reassignment ones are 2 each and then you maxed out your limit.
> We can print warnings/errors if the overall value doesn't match up to the
> max.
>
> Viktor
>
> On Mon, Nov 4, 2019 at 12:27 PM Stanislav Kozlovski <
> stanis...@confluent.io> wrote:
>
>> Hi Viktor,
>>
>> > As for the first question I think is no need for *.throttled.replicas in
>> case of reassignment because the LeaderAndIsrRequest exactly specifies the
>> replicas needed to be throttled.
>>
>> Exactly. I also can't envision scenarios where we would like to throttle
>> the reassignment traffic to only a subset of the reassigning replicas.
>>
>> > For instance a bootstrapping server where all replicas are throttled and
>> there are reassigning replicas and the reassignment throttle set higher I
>> think we should still apply the replication throttle to ensure the broker
>> won't have problems. What do you think?
>>
>> If we always take the lowest value, this means that the reassignment
>> throttle must always be equal to or lower than the replication throttle.
>> Doesn't that mean that the reassigning partitions may never catch up? I
>> guess not, since we expect to always be moving less than the total number
>> of partitions at one time.
>> I have mixed feelings about this - I like the flexibility of being able to
>> configure whatever value we please, yet I struggle to come up with a
>> scenario where we would want a higher reassignment throttle than
>> replication. Perhaps your suggestion is better.
>>
>> This begs another question - since we're separating the replication
>> throttle from the reassignment throttle, the maximum traffic a broker may
>> replicate now becomes `replication.throttled.rate` + `
>> reassignment.throttled.rate`
>> Seems like we would benefit from having a total cap to ensure users don't
>> shoot themselves in the foot.
>>
>> We could have a new config that denotes the total possible throttle rate
>> and we then divide that by reassignment and replication. But that assumes
>> that we would set the replication.throttled.rate much lower than what the
>> broker could handle.
>>
>> Perhaps the best approach would be to denote how much the broker can
>> handle
>> (total.replication.throttle.rate) and then allow only up to N% of that go
>> towards reassignments (reassignment.throttled.rate) in a best-effort way
>> (preferring replication traffic). That sounds tricky to implement though
>> Interested to hear what others think
>>
>> Best,
>> Stanislav
>>
>>
>> On Mon, Nov 4, 2019 at 11:08 AM Viktor Somogyi-Vass <
>> viktorsomo...@gmail.com>
>> wrote:
>>
>> > Hey Stan,
>> >
>> > > We will introduce two new configs in order to eventually replace
>> > *.replication.throttled.rate.
>> > Just to clarify, you mean to replace said config in the context of
>> > reassignment throttling, right? We are not planning to remove that
>> config
>> >
>> > Yes, I don't want to remove that config either. Removed that sentence.
>> >
>> > And also to clarify, *.throttled.replicas will not apply to the new
>> > *reassignment* configs, correct? We will throttle all reassigning
>> replicas.
>> > (I am +1 on this, I believe it is easier to reason about. We could
>> always
>> > add a new config later)
>> >
>> > Are you asking whether there is a need for a
>> > leader.reassignment.throttled.replicas and
>> > follower.reassignment.throttled.replicas config or are you interested in
>> > the behavior between the old and the new configs?
>> > As for the first question I think is no need for *.throttled.replicas in
>> > case of reassignment because the LeaderAndIsrRequest exactly specifies
>> the
>> > replicas needed to be throttled.
>> > As for the second, see below.
>> >
>> > I have one comment about backwards-compatibility - should we ensure that
>> > the old `*.replication.throttled.rate` and `*.throttled.replicas` still
>> > apply to reassigning traffic if set? We could have the new config take
>> > precedence, but still preserve backwards compatibility.
>> >
>> > Sure, we should apply replication throttling to reassignment too if set.
>> > But instead of the new taking precedence I'd apply whichever has lower
>> > value.
>> > For instance a bootstrapping server where all replicas are throttled and
>> > there are reassigning replicas and the reassignment throttle set higher
>> I
>> > think we should still apply the replication throttle to ensure the
>> broker
>> > won't have problems. What do you think?
>> >
>> > Thanks,
>> > Viktor
>> >
>> >
>> > On Fri, Nov 1, 2019 at 9:57 AM Stanislav Kozlovski <
>> stanis...@confluent.io
>> > >
>> > wrote:
>> >
>> > > Hey Viktor. Thanks for the KIP!
>> > >
>> > > > We will introduce two new configs in order to eventually replace
>> > > *.replication.throttled.rate.
>> > > Just to clarify, you mean to replace said config in the context of
>> > > reassignment throttling, right? We are not planning to remove that
>> config
>> > >
>> > > And also to clarify, *.throttled.replicas will not apply to the new
>> > > *reassignment* configs, correct? We will throttle all reassigning
>> > replicas.
>> > > (I am +1 on this, I believe it is easier to reason about. We could
>> always
>> > > add a new config later)
>> > >
>> > > I have one comment about backwards-compatibility - should we ensure
>> that
>> > > the old `*.replication.throttled.rate` and `*.throttled.replicas`
>> still
>> > > apply to reassigning traffic if set? We could have the new config take
>> > > precedence, but still preserve backwards compatibility.
>> > >
>> > > Thanks,
>> > > Stanislav
>> > >
>> > > On Thu, Oct 24, 2019 at 1:38 PM Viktor Somogyi-Vass <
>> > > viktorsomo...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi People,
>> > > >
>> > > > I've created a KIP to improve replication quotas by handling
>> > reassignment
>> > > > related throttling as a separate case with its own configurable
>> limits
>> > > and
>> > > > change the kafka-reassign-partitions tool to use these new configs
>> > going
>> > > > forward.
>> > > > Please have a look, I'd be happy to receive any feedback and answer
>> > > > all your questions.
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-542%3A+Partition+Reassignment+Throttling
>> > > >
>> > > > Thanks,
>> > > > Viktor
>> > > >
>> > >
>> > >
>> > > --
>> > > Best,
>> > > Stanislav
>> > >
>> >
>>
>>
>> --
>> Best,
>> Stanislav
>>
>

Reply via email to