Thanks for the feedback Boyang.

If there’s anyone else who has feedback regarding this KIP, would really
appreciate it hearing it!

Thanks,
Sanjana

On Tue, Mar 17, 2020 at 11:38 PM Boyang Chen <bche...@outlook.com> wrote:

> Sounds great!
>
> Get Outlook for iOS<https://aka.ms/o0ukef>
> ________________________________
> From: Sanjana Kaundinya <skaundi...@gmail.com>
> Sent: Tuesday, March 17, 2020 5:54:35 PM
> To: dev@kafka.apache.org <dev@kafka.apache.org>
> Subject: Re: [DISCUSS] KIP-580: Exponential Backoff for Kafka Clients
>
> Thanks for the explanation Boyang. One of the most common problems that we
> have in Kafka is with respect to metadata fetches. For example, if there is
> a broker failure, all clients start to fetch metadata at the same time and
> it often takes a while for the metadata to converge. In a high load
> cluster, there are also issues where the volume of metadata has made
> convergence of metadata slower.
>
> For this case, exponential backoff helps as it reduces the retry rate and
> spaces out how often clients will retry, thereby bringing down the time for
> convergence. Something that Jason mentioned that would be a great addition
> here would be if the backoff should be “jittered” as it was in KIP-144 with
> respect to exponential reconnect backoff. This would help prevent the
> clients from being synchronized on when they retry, thereby spacing out the
> number of requests being sent to the broker at the same time.
>
> I’ll add this example to the KIP and flush out more of the details - so
> it’s more clear.
>
> On Mar 17, 2020, 1:24 PM -0700, Boyang Chen <reluctanthero...@gmail.com>,
> wrote:
> > Thanks for the reply Sanjana. I guess I would like to rephrase my
> question
> > 2 and 3 as my previous response is a little bit unactionable.
> >
> > My specific point is that exponential backoff is not a silver bullet and
> we
> > should consider using it to solve known problems, instead of making the
> > holistic changes to all clients in Kafka ecosystem. I do like the
> > exponential backoff idea and believe this would be of great value, but
> > maybe we should focus on proposing some existing modules that are
> suffering
> > from static retry, and only change them in this first KIP. If in the
> > future, some other component users believe they are also suffering, we
> > could get more minor KIPs to change the behavior as well.
> >
> > Boyang
> >
> > On Sun, Mar 15, 2020 at 12:07 AM Sanjana Kaundinya <skaundi...@gmail.com
> >
> > wrote:
> >
> > > Thanks for the feedback Boyang, I will revise the KIP with the
> > > mathematical relations as per your suggestion. To address your
> feedback:
> > >
> > > 1. Currently, with the default of 100 ms per retry backoff, in 1 second
> > > we would have 10 retries. In the case of using an exponential backoff,
> we
> > > would have a total of 4 retries in 1 second. Thus we have less than
> half of
> > > the amount of retries in the same timeframe and can lessen broker
> pressure.
> > > This calculation is done as following (using the formula laid out in
> the
> > > KIP:
> > >
> > > Try 1 at time 0 ms, failures = 0, next retry in 100 ms (default retry
> ms
> > > is initially 100 ms)
> > > Try 2 at time 100 ms, failures = 1, next retry in 200 ms
> > > Try 3 at time 300 ms, failures = 2, next retry in 400 ms
> > > Try 4 at time 700 ms, failures = 3, next retry in 800 ms
> > > Try 5 at time 1500 ms, failures = 4, next retry in 1000 ms (default max
> > > retry ms is 1000 ms)
> > >
> > > For 2 and 3, could you elaborate more about what you mean with respect
> to
> > > client timeouts? I’m not very familiar with the Streams framework, so
> would
> > > love to get more insight to how that currently works, with respect to
> > > producer transactions, so I can appropriately update the KIP to address
> > > these scenarios.
> > > On Mar 13, 2020, 7:15 PM -0700, Boyang Chen <
> reluctanthero...@gmail.com>,
> > > wrote:
> > > > Thanks for the KIP Sanjana. I think the motivation is good, but lack
> of
> > > > more quantitative analysis. For instance:
> > > >
> > > > 1. How much retries we are saving by applying the exponential retry
> vs
> > > > static retry? There should be some mathematical relations between the
> > > > static retry ms, the initial exponential retry ms, the max
> exponential
> > > > retry ms in a given time interval.
> > > > 2. How does this affect the client timeout? With exponential retry,
> the
> > > > client shall be getting easier to timeout on a parent level caller,
> for
> > > > instance stream attempts to retry initializing producer transactions
> with
> > > > given 5 minute interval. With exponential retry this mechanism could
> > > > experience more frequent timeout which we should be careful with.
> > > > 3. With regards to #2, we should have more detailed checklist of all
> the
> > > > existing static retry scenarios, and adjust the initial exponential
> retry
> > > > ms to make sure we won't get easily timeout in high level due to too
> few
> > > > attempts.
> > > >
> > > > Boyang
> > > >
> > > > On Fri, Mar 13, 2020 at 4:38 PM Sanjana Kaundinya <
> skaundi...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Everyone,
> > > > >
> > > > > I’ve written a KIP about introducing exponential backoff for Kafka
> > > > > clients. Would appreciate any feedback on this.
> > > > >
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-580%3A+Exponential+Backoff+for+Kafka+Clients
> > > > >
> > > > > Thanks,
> > > > > Sanjana
> > > > >
> > >
>

Reply via email to