Re: [VOTE] KIP-219 - Improve Quota Communication

Jonghyun Lee Fri, 27 Apr 2018 12:35:08 -0700

Thanks, Becket.

Assuming that requiring new client implementations to issue
ApiVersionsRequest before issuing any other requests is reasonable as you
mentioned, I am wondering if keeping track of brokers that support KIP-219
is actually simpler than keeping a map from each request type to its min
version number supporting KIP-219 and checking the version of every single
request to see whether or not to throttle from the client side. To me, it
looks like a broker property that can be cached, and checking every single
request seems excessive.


Thanks,
Jon




On Fri, Apr 27, 2018 at 11:36 AM, Becket Qin <becket....@gmail.com> wrote:

> The reason we wanted to bump up all the request versions was let the
> clients know whether the broker has already throttled the request or not.
> This avoids double throttling in both brokers and clients, if the clients
> implementation supports KIP-219.
>
> The new change uses only ApiVersionRequest, instead of all the requests, to
> indicate whether the brokers have applied throttling or not. The difference
> is that all the clients must issue ApiVersionRequest first before issuing
> any other requests. This has no impact on the existing clients. But for
> clients implementation that wants to support KIP-219, they need to follow
> this practice, which seems to be reasonable.
> Initially I thought the change is fine. However, thinking about this again,
> that means the clients implementation needs to remember the ApiVersions of
> each broker, which may have some complexity. Also this seems introduces the
> behavior dependency between different requests, which seems unnecessary.
>
> Due to the above reasons, I think it might be better to bump up all the
> request versions.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Wed, Apr 25, 2018 at 3:19 PM, Jonghyun Lee <jonghy...@gmail.com> wrote:
>
> > Hello,
> >
> > Per Ismael's suggestion, I'd like to get comments from the original
> voters
> > for KIP-219 (Becket, Jun, Rajini) and others about the new interface
> change
> > proposed in the discussion thread (
> > https://markmail.org/search/?q=kafka+KIP-219#query:kafka%
> > 20KIP-219+page:1+mid:5rwju2gwpicojr3f+state:results).
> > Would you let me know?
> >
> > Thanks,
> > Jon
> >
> >
> > On Wed, Apr 25, 2018 at 2:43 PM, Dong Lin <lindon...@gmail.com> wrote:
> >
> > > +Jon
> > >
> > > On Mon, 22 Jan 2018 at 10:38 AM Becket Qin <becket....@gmail.com>
> wrote:
> > >
> > >> Thanks for the discussion and voting.
> > >>
> > >> KIP-219 has passed with +3 binding votes (Becket, Jun, Rajini).
> > >>
> > >> On Thu, Jan 18, 2018 at 1:32 AM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Becket,
> > >> >
> > >> > Thanks for the update. Yes, it does address my concern.
> > >> >
> > >> > +1 (binding)
> > >> >
> > >> > Regards,
> > >> >
> > >> > Rajini
> > >> >
> > >> > On Wed, Jan 17, 2018 at 5:24 PM, Becket Qin <becket....@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Actually returning an empty fetch request may still be useful to
> > >> reduce
> > >> > the
> > >> > > throttle time due to request quota violation because the
> > FetchResponse
> > >> > send
> > >> > > time will be less. I just updated the KIP.
> > >> > >
> > >> > > Rajini, does that address your concern?
> > >> > >
> > >> > > On Tue, Jan 16, 2018 at 7:01 PM, Becket Qin <becket....@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Thanks for the reply, Jun.
> > >> > > >
> > >> > > > Currently the byte rate quota does not apply to
> HeartbeatRequest,
> > >> > > > JoinGroupRequest/SyncGroupRequest. So the only case those
> > requests
> > >> are
> > >> > > > throttled is because the request quota is violated. In that
> case,
> > >> the
> > >> > > > throttle time does not really matter whether we return a full
> > >> > > FetchResponse
> > >> > > > or an empty one. Would it be more consistent if we throttle
> based
> > on
> > >> > the
> > >> > > > actual throttle time / channel mute time?
> > >> > > >
> > >> > > > Thanks,
> > >> > > >
> > >> > > > Jiangjie (Becket) Qin
> > >> > > >
> > >> > > > On Tue, Jan 16, 2018 at 3:45 PM, Jun Rao <j...@confluent.io>
> > wrote:
> > >> > > >
> > >> > > >> Hi, Jiangjie,
> > >> > > >>
> > >> > > >> You are right that the heartbeat is done in a channel different
> > >> from
> > >> > the
> > >> > > >> fetch request. I think it's still useful to return an empty
> fetch
> > >> > > response
> > >> > > >> when the quota is violated. This way, the throttle time for the
> > >> > > heartbeat
> > >> > > >> request won't be large. I agree that we can just mute the
> channel
> > >> for
> > >> > > the
> > >> > > >> fetch request for the throttle time computed based on a full
> > fetch
> > >> > > >> response. This probably also partially addresses Rajini's #1
> > >> concern.
> > >> > > >>
> > >> > > >> Thanks,
> > >> > > >>
> > >> > > >> Jun
> > >> > > >>
> > >> > > >> On Mon, Jan 15, 2018 at 9:27 PM, Becket Qin <
> > becket....@gmail.com>
> > >> > > wrote:
> > >> > > >>
> > >> > > >> > Hi Rajini,
> > >> > > >> >
> > >> > > >> > Thanks for the comments. Pleas see the reply inline.
> > >> > > >> >
> > >> > > >> > Hi Jun,
> > >> > > >> >
> > >> > > >> > Thinking about the consumer rebalance case a bit more, I am
> not
> > >> sure
> > >> > > if
> > >> > > >> we
> > >> > > >> > need to worry about the delayed rebalance due to quota
> > violation
> > >> or
> > >> > > not.
> > >> > > >> > The rebalance actually uses a separate channel to
> coordinator.
> > >> > > Therefore
> > >> > > >> > unless the request quota is hit, the rebalance won't be
> > >> throttled.
> > >> > > Even
> > >> > > >> if
> > >> > > >> > request quota is hit, it seems unlikely to be delayed long.
> So
> > it
> > >> > > looks
> > >> > > >> > that we don't need to unmute the channel earlier than needed.
> > >> Does
> > >> > > this
> > >> > > >> > address your concern?
> > >> > > >> >
> > >> > > >> > Thanks,
> > >> > > >> >
> > >> > > >> > Jiangjie (Becket) Qin
> > >> > > >> >
> > >> > > >> > On Mon, Jan 15, 2018 at 4:22 AM, Rajini Sivaram <
> > >> > > >> rajinisiva...@gmail.com>
> > >> > > >> > wrote:
> > >> > > >> >
> > >> > > >> > > Hi Becket,
> > >> > > >> > >
> > >> > > >> > > A few questions:
> > >> > > >> > >
> > >> > > >> > > 1. KIP says: *Although older client implementations (prior
> to
> > >> > > >> knowledge
> > >> > > >> > of
> > >> > > >> > > this KIP) will immediately send the next request after the
> > >> broker
> > >> > > >> > responds
> > >> > > >> > > without paying attention to the throttle time field, the
> > >> broker is
> > >> > > >> > > protected by virtue of muting the channel for time X. i.e.,
> > the
> > >> > next
> > >> > > >> > > request will not be processed until the channel is
> unmuted. *
> > >> > > >> > > For fetch requests, the response is sent immediately and
> the
> > >> mute
> > >> > > >> time on
> > >> > > >> > > the broker based on empty fetch response will often be zero
> > >> > (unlike
> > >> > > >> the
> > >> > > >> > > throttle time returned to the client based on non-empty
> > >> response).
> > >> > > >> Won't
> > >> > > >> > > that lead to a tight loop of fetch requests from old
> clients
> > >> > > >> > (particularly
> > >> > > >> > > expensive with SSL)? Wouldn't it be better to retain
> current
> > >> > > behaviour
> > >> > > >> > for
> > >> > > >> > > old clients? Also, if we change the behaviour for old
> > clients,
> > >> > > client
> > >> > > >> > > metrics that track throttle time will report a lot of
> > throttle
> > >> > > >> unrelated
> > >> > > >> > to
> > >> > > >> > >  actual throttle time.
> > >> > > >> > >
> > >> > > >> > For consumers, if quota is violated, the throttle time on the
> > >> broker
> > >> > > >> will
> > >> > > >> > not be 0. It is just that the throttle time will not be
> > >> increasing
> > >> > > >> because
> > >> > > >> > the consumer will return an empty response in this case. So
> > there
> > >> > > should
> > >> > > >> > not be a tight loop.
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > > 2. KIP says: *The usual idle timeout i.e.,
> > >> > connections.max.idle.ms
> > >> > > >> > > <http://connections.max.idle.ms> will still be honored
> > during
> > >> the
> > >> > > >> > throttle
> > >> > > >> > > time X. This makes sure that the brokers will detect client
> > >> > > connection
> > >> > > >> > > closure in a bounded time.*
> > >> > > >> > > Wouldn't it be better to bound maximum throttle time to
> > >> > > >> > > *connections.max.idle.ms
> > >> > > >> > > <http://connections.max.idle.ms>*? If we mute for a time
> > >> greater
> > >> > > than
> > >> > > >> > > *connections.max.idle.ms
> > >> > > >> > > <http://connections.max.idle.ms>* and then close a client
> > >> > > connection
> > >> > > >> > > simply
> > >> > > >> > > because we had muted it on the broker for a longer throttle
> > >> time,
> > >> > we
> > >> > > >> > force
> > >> > > >> > > a reconnection and read the next request from the new
> > >> connection
> > >> > > >> straight
> > >> > > >> > > away. This feels the same as a bound on the quota of *
> > >> > > >> > > connections.max.idle.ms
> > >> > > >> > > <http://connections.max.idle.ms>*, but with added load on
> > the
> > >> > > broker
> > >> > > >> for
> > >> > > >> > > authenticating another connection (expensive with SSL).
> > >> > > >> > >
> > >> > > >> > I think we need to think about the consumer prior to and
> after
> > >> this
> > >> > > KIP
> > >> > > >> > separately.
> > >> > > >> >
> > >> > > >> > For consumer prior to this KIP, even if we unmute the channel
> > >> after
> > >> > > >> > connection.max.idle.ms, it is likely that the consumers have
> > >> > already
> > >> > > >> > reached request.timeout.ms before that and has reconnected
> to
> > >> the
> > >> > > >> broker.
> > >> > > >> > So there is no real difference whether we close the throttled
> > >> > channel
> > >> > > or
> > >> > > >> > not.
> > >> > > >> >
> > >> > > >> > For consumers after the KIP, because they will honor the
> > throttle
> > >> > > time,
> > >> > > >> > they will back off until throttle time is reached. If that
> > >> throttle
> > >> > > >> time is
> > >> > > >> > longer than connection.max.idle.ms, it seems not a big
> > overhead
> > >> > > because
> > >> > > >> > there will only be one connection re-establishment in quite a
> > few
> > >> > > >> minutes.
> > >> > > >> > Compared with such overhead, it seems more important to honor
> > the
> > >> > > quota
> > >> > > >> so
> > >> > > >> > the broker can survive.
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > > 3. Are we changing the behaviour of network bandwidth quota
> > for
> > >> > > >> > > Produce/Fetch and retaining current behaviour for request
> > >> quotas?
> > >> > > >> > >
> > >> > > >> > This is going to be applied to all the quota. Including
> request
> > >> > > quotas.
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > >
> > >> > > >> > > Thanks,
> > >> > > >> > >
> > >> > > >> > > Rajini
> > >> > > >> > >
> > >> > > >> > >
> > >> > > >> > >
> > >> > > >> > > On Wed, Jan 10, 2018 at 10:29 PM, Jun Rao <
> j...@confluent.io>
> > >> > wrote:
> > >> > > >> > >
> > >> > > >> > > > Hi, Jiangjie,
> > >> > > >> > > >
> > >> > > >> > > > Thanks for the updated KIP. +1
> > >> > > >> > > >
> > >> > > >> > > > Jun
> > >> > > >> > > >
> > >> > > >> > > > On Mon, Jan 8, 2018 at 7:45 PM, Becket Qin <
> > >> > becket....@gmail.com>
> > >> > > >> > wrote:
> > >> > > >> > > >
> > >> > > >> > > > > Thanks for the comments, Jun.
> > >> > > >> > > > >
> > >> > > >> > > > > 1. Good point.
> > >> > > >> > > > > 2. Also makes sense. Usually the
> connection.max.idle.ms
> > is
> > >> > high
> > >> > > >> > enough
> > >> > > >> > > > so
> > >> > > >> > > > > the throttling is impacted.
> > >> > > >> > > > >
> > >> > > >> > > > > I have updated the KIP to reflect the changes.
> > >> > > >> > > > >
> > >> > > >> > > > > Thanks,
> > >> > > >> > > > >
> > >> > > >> > > > > Jiangjie (Becket) Qin
> > >> > > >> > > > >
> > >> > > >> > > > >
> > >> > > >> > > > > On Mon, Jan 8, 2018 at 6:30 PM, Jun Rao <
> > j...@confluent.io>
> > >> > > wrote:
> > >> > > >> > > > >
> > >> > > >> > > > > > Hi, Jiangjie,
> > >> > > >> > > > > >
> > >> > > >> > > > > > Sorry for the late response. The proposal sounds good
> > >> > > overall. A
> > >> > > >> > > couple
> > >> > > >> > > > > of
> > >> > > >> > > > > > minor comments below.
> > >> > > >> > > > > >
> > >> > > >> > > > > > 1. For throttling a fetch request, we could
> potentially
> > >> just
> > >> > > >> send
> > >> > > >> > an
> > >> > > >> > > > > empty
> > >> > > >> > > > > > response. We can return a throttle time calculated
> > from a
> > >> > full
> > >> > > >> > > > response,
> > >> > > >> > > > > > but only mute the channel on the server based on a
> > >> throttle
> > >> > > time
> > >> > > >> > > > > calculated
> > >> > > >> > > > > > based on the empty response. This has the benefit
> that
> > >> the
> > >> > > >> server
> > >> > > >> > > will
> > >> > > >> > > > > mute
> > >> > > >> > > > > > the channel much shorter, which will prevent the
> > consumer
> > >> > from
> > >> > > >> > > > > rebalancing
> > >> > > >> > > > > > when throttled.
> > >> > > >> > > > > >
> > >> > > >> > > > > > 2. The wiki says "connections.max.idle.ms should be
> > >> ignored
> > >> > > >> during
> > >> > > >> > > the
> > >> > > >> > > > > > throttle time X." This has the potential issue that a
> > >> server
> > >> > > may
> > >> > > >> > not
> > >> > > >> > > > > detect
> > >> > > >> > > > > > that a client connection is already gone until after
> an
> > >> > > >> arbitrary
> > >> > > >> > > > amount
> > >> > > >> > > > > of
> > >> > > >> > > > > > time. Perhaps we could still just close a connection
> if
> > >> the
> > >> > > >> server
> > >> > > >> > > has
> > >> > > >> > > > > > muted it for longer than connections.max.idle.ms.
> This
> > >> will
> > >> > > at
> > >> > > >> > least
> > >> > > >> > > > > bound
> > >> > > >> > > > > > the time for a server to detect closed client
> > >> connections.
> > >> > > >> > > > > >
> > >> > > >> > > > > > Thanks,
> > >> > > >> > > > > >
> > >> > > >> > > > > > Jun
> > >> > > >> > > > > >
> > >> > > >> > > > > >
> > >> > > >> > > > > > On Mon, Nov 20, 2017 at 5:30 PM, Becket Qin <
> > >> > > >> becket....@gmail.com>
> > >> > > >> > > > > wrote:
> > >> > > >> > > > > >
> > >> > > >> > > > > > > Hi,
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > We would like to start the voting thread for
> KIP-219.
> > >> The
> > >> > > KIP
> > >> > > >> > > > proposes
> > >> > > >> > > > > to
> > >> > > >> > > > > > > improve the quota communication between the brokers
> > and
> > >> > > >> clients,
> > >> > > >> > > > > > especially
> > >> > > >> > > > > > > for cases of long throttling time.
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > The KIP wiki is following:
> > >> > > >> > > > > > > https://cwiki.apache.org/
> > confluence/display/KAFKA/KIP-
> > >> > > >> > > > > > 219+-+Improve+quota+
> > >> > > >> > > > > > > communication
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > The discussion thread is here:
> > >> > > >> > > > > > > http://markmail.org/search/?q=
> > >> kafka+KIP-219#query:kafka%
> > >> > > >> > > > > > > 20KIP-219+page:1+mid:
> ooxabguy7nz7l7zy+state:results
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > Thanks,
> > >> > > >> > > > > > >
> > >> > > >> > > > > > > Jiangjie (Becket) Qin
> > >> > > >> > > > > > >
> > >> > > >> > > > > >
> > >> > > >> > > > >
> > >> > > >> > > >
> > >> > > >> > >
> > >> > > >> >
> > >> > > >>
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [VOTE] KIP-219 - Improve Quota Communication

Reply via email to