My concern is that we're very focused on reassignment where I think users
enable throttling to avoid overwhelming brokers with replica catch up
traffic (typically disk and/or bandwidth). The current approach achieves
that by not throttling ISR replication.

The downside is that when a broker falls out of the ISR, it may suddenly
get throttled and never catch up. However, if the throttle can cause this
kind of issue, then it's broken for replicas being reassigned too, so one
could say that it's a configuration error.

Do we have specific scenarios that would be solved by the proposed change?

Ismael

On Fri, Dec 6, 2019 at 2:26 AM Viktor Somogyi-Vass <viktorsomo...@gmail.com>
wrote:

> Thanks for the question. I think it depends on how the user will try to fix
> it.
> - If they just replace the disk then I think it shouldn't count as a
> reassignment and should be allocated under the normal replication quotas.
> In this case there is no reassignment going on as far as I can tell, the
> broker shuts down serving replicas from that dir/disk, notifies the
> controller which changes the leadership. When the disk is fixed the broker
> will be restarted to pick up the changes and it starts the replication from
> the current leader.
> - If the user reassigns the partitions to other brokers then it will fall
> under the reassignment traffic.
> Also if the user moves a partition to a different disk it would also count
> as normal replication as it technically not a reassignment but an
> alter-replica-dir event but it's still done with the reassignment tool, so
> I'd keep the current functionality of the
> --replica-alter-log-dirs-throttle.
> Is this aligned with your thinking?
>
> Viktor
>
> On Wed, Dec 4, 2019 at 2:47 PM Ismael Juma <isma...@gmail.com> wrote:
>
> > Thanks Viktor. How do we intend to handle the case where a broker loses
> its
> > disk and has to catch up from the beginning?
> >
> > Ismael
> >
> > On Wed, Dec 4, 2019, 4:31 AM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com>
> > wrote:
> >
> > > Thanks for the notice Ismael, KAFKA-4313 fixed this issue indeed. I've
> > > updated the KIP.
> > >
> > > Viktor
> > >
> > > On Tue, Dec 3, 2019 at 3:28 PM Ismael Juma <ism...@juma.me.uk> wrote:
> > >
> > > > Hi Viktor,
> > > >
> > > > The KIP states:
> > > >
> > > > "KIP-73
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas
> > > > >
> > > > added
> > > > quotas for replication but it doesn't separate normal replication
> > traffic
> > > > from reassignment. So a user is able to specify the partition and the
> > > > throttle rate but it will be applied to both ISR and non-ISR traffic"
> > > >
> > > > This is not true, ISR traffic is not throttled.
> > > >
> > > > Ismael
> > > >
> > > > On Thu, Oct 24, 2019 at 5:38 AM Viktor Somogyi-Vass <
> > > > viktorsomo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi People,
> > > > >
> > > > > I've created a KIP to improve replication quotas by handling
> > > reassignment
> > > > > related throttling as a separate case with its own configurable
> > limits
> > > > and
> > > > > change the kafka-reassign-partitions tool to use these new configs
> > > going
> > > > > forward.
> > > > > Please have a look, I'd be happy to receive any feedback and answer
> > > > > all your questions.
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-542%3A+Partition+Reassignment+Throttling
> > > > >
> > > > > Thanks,
> > > > > Viktor
> > > > >
> > > >
> > >
> >
>

Reply via email to