Re: [VOTE] KIP-219 - Improve Quota Communication

Jonghyun Lee Wed, 02 May 2018 12:08:27 -0700

Hi Magnus and Jun, do you have any feedback on this?

Since two of the original voters (Becket and Rajini) showed a preference
for bumping up all request versions, I'll wait till tomorrow and start
implementing it unless someone expresses different opinions.


As for the implementation, the current plan is to add a method to
AbstractResponse which tells whether or not client should throttle and have
an implementation in each response type based on its version.

Thanks,
Jon


On Mon, Apr 30, 2018 at 9:15 AM, Jonghyun Lee <[email protected]> wrote:

> Hi Rajini,
>
> Thanks for letting me know the SASL use case. I think this is a similar
> point that Becket raised (creating dependency between different requests)
> and it makes sense to me.
>
> Will wait for some more feedback and finalize.
>
> Thanks,
> Jon
>
>
>
> On Mon, Apr 30, 2018 at 3:46 AM, Rajini Sivaram <[email protected]>
> wrote:
>
>> Hi Jon,
>>
>> We currently send two ApiVersions requests for SASL, the first with
>> version
>> 0 to get the request versions for SASL and another after authentication.
>> You are currently using the second and basing all throttling decisions on
>> that. This may get in the way if we wanted to change this code in the
>> future (e.g. throttle SASL requests or combine the two ApiVersions
>> requests
>> into one). So if it is easy enough to check request versions against an
>> immutable map rather than track broker ApiVersions in a mutable set,
>> perhaps that would be better?
>>
>> Hi Magnus, It will be good to know what works better for non-Java clients.
>>
>>
>> On Fri, Apr 27, 2018 at 8:34 PM, Jonghyun Lee <[email protected]>
>> wrote:
>>
>> > Thanks, Becket.
>> >
>> > Assuming that requiring new client implementations to issue
>> > ApiVersionsRequest before issuing any other requests is reasonable as
>> you
>> > mentioned, I am wondering if keeping track of brokers that support
>> KIP-219
>> > is actually simpler than keeping a map from each request type to its min
>> > version number supporting KIP-219 and checking the version of every
>> single
>> > request to see whether or not to throttle from the client side. To me,
>> it
>> > looks like a broker property that can be cached, and checking every
>> single
>> > request seems excessive.
>> >
>> > Thanks,
>> > Jon
>> >
>> >
>> >
>> >
>> > On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin <[email protected]>
>> wrote:
>> >
>> > > The reason we wanted to bump up all the request versions was let the
>> > > clients know whether the broker has already throttled the request or
>> not.
>> > > This avoids double throttling in both brokers and clients, if the
>> clients
>> > > implementation supports KIP-219.
>> > >
>> > > The new change uses only ApiVersionRequest, instead of all the
>> requests,
>> > to
>> > > indicate whether the brokers have applied throttling or not. The
>> > difference
>> > > is that all the clients must issue ApiVersionRequest first before
>> issuing
>> > > any other requests. This has no impact on the existing clients. But
>> for
>> > > clients implementation that wants to support KIP-219, they need to
>> follow
>> > > this practice, which seems to be reasonable.
>> > > Initially I thought the change is fine. However, thinking about this
>> > again,
>> > > that means the clients implementation needs to remember the
>> ApiVersions
>> > of
>> > > each broker, which may have some complexity. Also this seems
>> introduces
>> > the
>> > > behavior dependency between different requests, which seems
>> unnecessary.
>> > >
>> > > Due to the above reasons, I think it might be better to bump up all
>> the
>> > > request versions.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > > On Wed, Apr 25, 2018 at 3:19 PM, Jonghyun Lee <[email protected]>
>> > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > Per Ismael's suggestion, I'd like to get comments from the original
>> > > voters
>> > > > for KIP-219 (Becket, Jun, Rajini) and others about the new interface
>> > > change
>> > > > proposed in the discussion thread (
>> > > > https://markmail.org/search/?q=kafka+KIP-219#query:kafka%
>> > > > 20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
>> > > > Would you let me know?
>> > > >
>> > > > Thanks,
>> > > > Jon
>> > > >
>> > > >
>> > > > On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin <[email protected]>
>> wrote:
>> > > >
>> > > > > +Jon
>> > > > >
>> > > > > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin <[email protected]>
>> > > wrote:
>> > > > >
>> > > > >> Thanks for the discussion and voting.
>> > > > >>
>> > > > >> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
>> > > > >>
>> > > > >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <
>> > > > [email protected]>
>> > > > >> wrote:
>> > > > >>
>> > > > >> > Hi Becket,
>> > > > >> >
>> > > > >> > Thanks for the update. Yes, it does address my concern.
>> > > > >> >
>> > > > >> > +1 (binding)
>> > > > >> >
>> > > > >> > Regards,
>> > > > >> >
>> > > > >> > Rajini
>> > > > >> >
>> > > > >> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <
>> [email protected]
>> > >
>> > > > >> wrote:
>> > > > >> >
>> > > > >> > > Actually returning an empty fetch request may still be
>> useful to
>> > > > >> reduce
>> > > > >> > the
>> > > > >> > > throttle time due to request quota violation because the
>> > > > FetchResponse
>> > > > >> > send
>> > > > >> > > time will be less. I just updated the KIP.
>> > > > >> > >
>> > > > >> > > Rajini, does that address your concern?
>> > > > >> > >
>> > > > >> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <
>> > [email protected]
>> > > >
>> > > > >> > wrote:
>> > > > >> > >
>> > > > >> > > > Thanks for the reply, Jun.
>> > > > >> > > >
>> > > > >> > > > Currently the byte rate quota does not apply to
>> > > HeartbeatRequest,
>> > > > >> > > > JoinGroupRequest/SyncGroupRequest. So the only case those
>> > > > requests
>> > > > >> are
>> > > > >> > > > throttled is because the request quota is violated. In that
>> > > case,
>> > > > >> the
>> > > > >> > > > throttle time does not really matter whether we return a
>> full
>> > > > >> > > FetchResponse
>> > > > >> > > > or an empty one. Would it be more consistent if we throttle
>> > > based
>> > > > on
>> > > > >> > the
>> > > > >> > > > actual throttle time / channel mute time?
>> > > > >> > > >
>> > > > >> > > > Thanks,
>> > > > >> > > >
>> > > > >> > > > Jiangjie (Becket) Qin
>> > > > >> > > >
>> > > > >> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <[email protected]
>> >
>> > > > wrote:
>> > > > >> > > >
>> > > > >> > > >> Hi, Jiangjie,
>> > > > >> > > >>
>> > > > >> > > >> You are right that the heartbeat is done in a channel
>> > different
>> > > > >> from
>> > > > >> > the
>> > > > >> > > >> fetch request. I think it's still useful to return an
>> empty
>> > > fetch
>> > > > >> > > response
>> > > > >> > > >> when the quota is violated. This way, the throttle time
>> for
>> > the
>> > > > >> > > heartbeat
>> > > > >> > > >> request won't be large. I agree that we can just mute the
>> > > channel
>> > > > >> for
>> > > > >> > > the
>> > > > >> > > >> fetch request for the throttle time computed based on a
>> full
>> > > > fetch
>> > > > >> > > >> response. This probably also partially addresses Rajini's
>> #1
>> > > > >> concern.
>> > > > >> > > >>
>> > > > >> > > >> Thanks,
>> > > > >> > > >>
>> > > > >> > > >> Jun
>> > > > >> > > >>
>> > > > >> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <
>> > > > [email protected]>
>> > > > >> > > wrote:
>> > > > >> > > >>
>> > > > >> > > >> > Hi Rajini,
>> > > > >> > > >> >
>> > > > >> > > >> > Thanks for the comments. Pleas see the reply inline.
>> > > > >> > > >> >
>> > > > >> > > >> > Hi Jun,
>> > > > >> > > >> >
>> > > > >> > > >> > Thinking about the consumer rebalance case a bit more,
>> I am
>> > > not
>> > > > >> sure
>> > > > >> > > if
>> > > > >> > > >> we
>> > > > >> > > >> > need to worry about the delayed rebalance due to quota
>> > > > violation
>> > > > >> or
>> > > > >> > > not.
>> > > > >> > > >> > The rebalance actually uses a separate channel to
>> > > coordinator.
>> > > > >> > > Therefore
>> > > > >> > > >> > unless the request quota is hit, the rebalance won't be
>> > > > >> throttled.
>> > > > >> > > Even
>> > > > >> > > >> if
>> > > > >> > > >> > request quota is hit, it seems unlikely to be delayed
>> long.
>> > > So
>> > > > it
>> > > > >> > > looks
>> > > > >> > > >> > that we don't need to unmute the channel earlier than
>> > needed.
>> > > > >> Does
>> > > > >> > > this
>> > > > >> > > >> > address your concern?
>> > > > >> > > >> >
>> > > > >> > > >> > Thanks,
>> > > > >> > > >> >
>> > > > >> > > >> > Jiangjie (Becket) Qin
>> > > > >> > > >> >
>> > > > >> > > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
>> > > > >> > > >> [email protected]>
>> > > > >> > > >> > wrote:
>> > > > >> > > >> >
>> > > > >> > > >> > > Hi Becket,
>> > > > >> > > >> > >
>> > > > >> > > >> > > A few questions:
>> > > > >> > > >> > >
>> > > > >> > > >> > > 1. KIP says: *Although older client implementations
>> > (prior
>> > > to
>> > > > >> > > >> knowledge
>> > > > >> > > >> > of
>> > > > >> > > >> > > this KIP) will immediately send the next request after
>> > the
>> > > > >> broker
>> > > > >> > > >> > responds
>> > > > >> > > >> > > without paying attention to the throttle time field,
>> the
>> > > > >> broker is
>> > > > >> > > >> > > protected by virtue of muting the channel for time X.
>> > i.e.,
>> > > > the
>> > > > >> > next
>> > > > >> > > >> > > request will not be processed until the channel is
>> > > unmuted. *
>> > > > >> > > >> > > For fetch requests, the response is sent immediately
>> and
>> > > the
>> > > > >> mute
>> > > > >> > > >> time on
>> > > > >> > > >> > > the broker based on empty fetch response will often be
>> > zero
>> > > > >> > (unlike
>> > > > >> > > >> the
>> > > > >> > > >> > > throttle time returned to the client based on
>> non-empty
>> > > > >> response).
>> > > > >> > > >> Won't
>> > > > >> > > >> > > that lead to a tight loop of fetch requests from old
>> > > clients
>> > > > >> > > >> > (particularly
>> > > > >> > > >> > > expensive with SSL)? Wouldn't it be better to retain
>> > > current
>> > > > >> > > behaviour
>> > > > >> > > >> > for
>> > > > >> > > >> > > old clients? Also, if we change the behaviour for old
>> > > > clients,
>> > > > >> > > client
>> > > > >> > > >> > > metrics that track throttle time will report a lot of
>> > > > throttle
>> > > > >> > > >> unrelated
>> > > > >> > > >> > to
>> > > > >> > > >> > >  actual throttle time.
>> > > > >> > > >> > >
>> > > > >> > > >> > For consumers, if quota is violated, the throttle time
>> on
>> > the
>> > > > >> broker
>> > > > >> > > >> will
>> > > > >> > > >> > not be 0. It is just that the throttle time will not be
>> > > > >> increasing
>> > > > >> > > >> because
>> > > > >> > > >> > the consumer will return an empty response in this
>> case. So
>> > > > there
>> > > > >> > > should
>> > > > >> > > >> > not be a tight loop.
>> > > > >> > > >> >
>> > > > >> > > >> >
>> > > > >> > > >> > > 2. KIP says: *The usual idle timeout i.e.,
>> > > > >> > connections.max.idle.ms
>> > > > >> > > >> > > <http://connections.max.idle.ms> will still be
>> honored
>> > > > during
>> > > > >> the
>> > > > >> > > >> > throttle
>> > > > >> > > >> > > time X. This makes sure that the brokers will detect
>> > client
>> > > > >> > > connection
>> > > > >> > > >> > > closure in a bounded time.*
>> > > > >> > > >> > > Wouldn't it be better to bound maximum throttle time
>> to
>> > > > >> > > >> > > *connections.max.idle.ms
>> > > > >> > > >> > > <http://connections.max.idle.ms>*? If we mute for a
>> time
>> > > > >> greater
>> > > > >> > > than
>> > > > >> > > >> > > *connections.max.idle.ms
>> > > > >> > > >> > > <http://connections.max.idle.ms>* and then close a
>> > client
>> > > > >> > > connection
>> > > > >> > > >> > > simply
>> > > > >> > > >> > > because we had muted it on the broker for a longer
>> > throttle
>> > > > >> time,
>> > > > >> > we
>> > > > >> > > >> > force
>> > > > >> > > >> > > a reconnection and read the next request from the new
>> > > > >> connection
>> > > > >> > > >> straight
>> > > > >> > > >> > > away. This feels the same as a bound on the quota of *
>> > > > >> > > >> > > connections.max.idle.ms
>> > > > >> > > >> > > <http://connections.max.idle.ms>*, but with added
>> load
>> > on
>> > > > the
>> > > > >> > > broker
>> > > > >> > > >> for
>> > > > >> > > >> > > authenticating another connection (expensive with
>> SSL).
>> > > > >> > > >> > >
>> > > > >> > > >> > I think we need to think about the consumer prior to and
>> > > after
>> > > > >> this
>> > > > >> > > KIP
>> > > > >> > > >> > separately.
>> > > > >> > > >> >
>> > > > >> > > >> > For consumer prior to this KIP, even if we unmute the
>> > channel
>> > > > >> after
>> > > > >> > > >> > connection.max.idle.ms, it is likely that the consumers
>> > have
>> > > > >> > already
>> > > > >> > > >> > reached request.timeout.ms before that and has
>> reconnected
>> > > to
>> > > > >> the
>> > > > >> > > >> broker.
>> > > > >> > > >> > So there is no real difference whether we close the
>> > throttled
>> > > > >> > channel
>> > > > >> > > or
>> > > > >> > > >> > not.
>> > > > >> > > >> >
>> > > > >> > > >> > For consumers after the KIP, because they will honor the
>> > > > throttle
>> > > > >> > > time,
>> > > > >> > > >> > they will back off until throttle time is reached. If
>> that
>> > > > >> throttle
>> > > > >> > > >> time is
>> > > > >> > > >> > longer than connection.max.idle.ms, it seems not a big
>> > > > overhead
>> > > > >> > > because
>> > > > >> > > >> > there will only be one connection re-establishment in
>> > quite a
>> > > > few
>> > > > >> > > >> minutes.
>> > > > >> > > >> > Compared with such overhead, it seems more important to
>> > honor
>> > > > the
>> > > > >> > > quota
>> > > > >> > > >> so
>> > > > >> > > >> > the broker can survive.
>> > > > >> > > >> >
>> > > > >> > > >> >
>> > > > >> > > >> > > 3. Are we changing the behaviour of network bandwidth
>> > quota
>> > > > for
>> > > > >> > > >> > > Produce/Fetch and retaining current behaviour for
>> request
>> > > > >> quotas?
>> > > > >> > > >> > >
>> > > > >> > > >> > This is going to be applied to all the quota. Including
>> > > request
>> > > > >> > > quotas.
>> > > > >> > > >> >
>> > > > >> > > >> >
>> > > > >> > > >> > >
>> > > > >> > > >> > > Thanks,
>> > > > >> > > >> > >
>> > > > >> > > >> > > Rajini
>> > > > >> > > >> > >
>> > > > >> > > >> > >
>> > > > >> > > >> > >
>> > > > >> > > >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <
>> > > [email protected]>
>> > > > >> > wrote:
>> > > > >> > > >> > >
>> > > > >> > > >> > > > Hi, Jiangjie,
>> > > > >> > > >> > > >
>> > > > >> > > >> > > > Thanks for the updated KIP. +1
>> > > > >> > > >> > > >
>> > > > >> > > >> > > > Jun
>> > > > >> > > >> > > >
>> > > > >> > > >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <
>> > > > >> > [email protected]>
>> > > > >> > > >> > wrote:
>> > > > >> > > >> > > >
>> > > > >> > > >> > > > > Thanks for the comments, Jun.
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > 1. Good point.
>> > > > >> > > >> > > > > 2. Also makes sense. Usually the
>> > > connection.max.idle.ms
>> > > > is
>> > > > >> > high
>> > > > >> > > >> > enough
>> > > > >> > > >> > > > so
>> > > > >> > > >> > > > > the throttling is impacted.
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > I have updated the KIP to reflect the changes.
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > Thanks,
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > Jiangjie (Becket) Qin
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <
>> > > > [email protected]>
>> > > > >> > > wrote:
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > > > > Hi, Jiangjie,
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > Sorry for the late response. The proposal sounds
>> > good
>> > > > >> > > overall. A
>> > > > >> > > >> > > couple
>> > > > >> > > >> > > > > of
>> > > > >> > > >> > > > > > minor comments below.
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > 1. For throttling a fetch request, we could
>> > > potentially
>> > > > >> just
>> > > > >> > > >> send
>> > > > >> > > >> > an
>> > > > >> > > >> > > > > empty
>> > > > >> > > >> > > > > > response. We can return a throttle time
>> calculated
>> > > > from a
>> > > > >> > full
>> > > > >> > > >> > > > response,
>> > > > >> > > >> > > > > > but only mute the channel on the server based
>> on a
>> > > > >> throttle
>> > > > >> > > time
>> > > > >> > > >> > > > > calculated
>> > > > >> > > >> > > > > > based on the empty response. This has the
>> benefit
>> > > that
>> > > > >> the
>> > > > >> > > >> server
>> > > > >> > > >> > > will
>> > > > >> > > >> > > > > mute
>> > > > >> > > >> > > > > > the channel much shorter, which will prevent the
>> > > > consumer
>> > > > >> > from
>> > > > >> > > >> > > > > rebalancing
>> > > > >> > > >> > > > > > when throttled.
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > 2. The wiki says "connections.max.idle.ms
>> should
>> > be
>> > > > >> ignored
>> > > > >> > > >> during
>> > > > >> > > >> > > the
>> > > > >> > > >> > > > > > throttle time X." This has the potential issue
>> > that a
>> > > > >> server
>> > > > >> > > may
>> > > > >> > > >> > not
>> > > > >> > > >> > > > > detect
>> > > > >> > > >> > > > > > that a client connection is already gone until
>> > after
>> > > an
>> > > > >> > > >> arbitrary
>> > > > >> > > >> > > > amount
>> > > > >> > > >> > > > > of
>> > > > >> > > >> > > > > > time. Perhaps we could still just close a
>> > connection
>> > > if
>> > > > >> the
>> > > > >> > > >> server
>> > > > >> > > >> > > has
>> > > > >> > > >> > > > > > muted it for longer than
>> connections.max.idle.ms.
>> > > This
>> > > > >> will
>> > > > >> > > at
>> > > > >> > > >> > least
>> > > > >> > > >> > > > > bound
>> > > > >> > > >> > > > > > the time for a server to detect closed client
>> > > > >> connections.
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > Thanks,
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > Jun
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
>> > > > >> > > >> [email protected]>
>> > > > >> > > >> > > > > wrote:
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > > > > Hi,
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > > > We would like to start the voting thread for
>> > > KIP-219.
>> > > > >> The
>> > > > >> > > KIP
>> > > > >> > > >> > > > proposes
>> > > > >> > > >> > > > > to
>> > > > >> > > >> > > > > > > improve the quota communication between the
>> > brokers
>> > > > and
>> > > > >> > > >> clients,
>> > > > >> > > >> > > > > > especially
>> > > > >> > > >> > > > > > > for cases of long throttling time.
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > > > The KIP wiki is following:
>> > > > >> > > >> > > > > > > https://cwiki.apache.org/
>> > > > confluence/display/KAFKA/KIP-
>> > > > >> > > >> > > > > > 219+-+Improve+quota+
>> > > > >> > > >> > > > > > > communication
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > > > The discussion thread is here:
>> > > > >> > > >> > > > > > > http://markmail.org/search/?q=
>> > > > >> kafka+KIP-219#query:kafka%
>> > > > >> > > >> > > > > > > 20KIP-219+page:1+mid:
>> > > ooxabguy7nz7l7zy+state:results
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > > > Thanks,
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > > > Jiangjie (Becket) Qin
>> > > > >> > > >> > > > > > >
>> > > > >> > > >> > > > > >
>> > > > >> > > >> > > > >
>> > > > >> > > >> > > >
>> > > > >> > > >> > >
>> > > > >> > > >> >
>> > > > >> > > >>
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Reply via email to