In the AdminClient, we allow setting per-call timeouts.  The global
timeout is just a default.  It seems like that is really what we should
do in the producer and consumer as well, rather than having a lot of
special cases for timeouts in  connecting vs. other call states.  Then
join requests could gave a 5 minute timeout, but other requests could
gave a shorter one.  Thoughts?

Cheers,
Colin
 
OnTue, May 23, 2017, at 04:27, Rajini Sivaram wrote:
> Guozhang,
> 
> At the moment we don't have a connect timeout. And the behaviour
> suggested
> in the KIP is useful to address this.
> 
> We do however have a request.timeout.ms. This is the amount of time it
> would take to detect a crashed broker if the broker crashed after a
> connection was established. Unfortunately in the consumer, this was
> increased to > 5minutes since JoinRequest can take up to
> max.poll.interval.ms, which has a default of  5 minutes. Since the
> whole point of this timeout is to detect a crashed broker, 5 minutes is
> too
> large.
> 
> My suggestion was to use request.timeout.ms to also detect connection
> timeouts to a crashed broker - implement the behavior suggested in the
> KIP
> without adding a new config parameter. As Ismael has said, this will need
> to fix request.timeout.ms in the consumer.
> 
> 
> On Mon, May 22, 2017 at 1:23 PM, Simon Souter <sim...@cakesolutions.net>
> wrote:
> 
> > The following tickets are probably relevant to this KIP:
> >
> > https://issues.apache.org/jira/browse/KAFKA-3457
> > https://issues.apache.org/jira/browse/KAFKA-1894
> > https://issues.apache.org/jira/browse/KAFKA-3834
> >
> > On 22 May 2017 at 16:30, Rajini Sivaram <rajinisiva...@gmail.com> wrote:
> >
> > > Ismael,
> > >
> > > Yes, agree. My concern was that a connection can be shutdown uncleanly at
> > > any time. If a client is in the middle of a request, then it times out
> > > after min(request.timeout.ms, tcp-timeout). If we add another config
> > > option
> > > connect.timeout.ms, then we will sometimes wait for min(
> > connect.timeout.ms
> > > ,
> > > tcp-timeout) and sometimes for min(request.timeout.ms, tcp-timeout),
> > > depending
> > > on connection state. One config option feels neater to me.
> > >
> > > On Mon, May 22, 2017 at 11:21 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> > >
> > > > Rajini,
> > > >
> > > > For this to have the desired effect, we'd probably need to lower the
> > > > default request.timeout.ms for the consumer and fix the underlying
> > > reason
> > > > why it is a little over 5 minutes at the moment.
> > > >
> > > > Ismael
> > > >
> > > > On Mon, May 22, 2017 at 4:15 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > Sorry, what I meant was: Can you reuse the existing configuration
> > > option
> > > > > request.timeout,ms , instead of adding a new config and add the
> > > behaviour
> > > > > that you have proposed in the KIP for the connection phase using this
> > > > > timeout? I think the timeout for connection is useful. I am not sure
> > we
> > > > > need another configuration option to implement it.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > >
> > > > > On Mon, May 22, 2017 at 11:06 AM, 东方甲乙 <254479...@qq.com> wrote:
> > > > >
> > > > > > Hi Rajini.
> > > > > >
> > > > > > When kafka node' machine is shutdown or network is closed, the
> > > > connecting
> > > > > > phase could not use the request.timeout.ms, because the client
> > > haven't
> > > > > > send a req yet.   And no response for the nio, the selector will
> > not
> > > > > close
> > > > > > the connect, so it will not choose other good node to get the
> > > metadata.
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > David
> > > > > >
> > > > > > ------------------ 原始邮件 ------------------
> > > > > > *发件人:* "Rajini Sivaram" <rajinisiva...@gmail.com>;
> > > > > > *发送时间:* 2017年5月22日(星期一) 20:17
> > > > > > *收件人:* "dev" <dev@kafka.apache.org>;
> > > > > > *主题:* Re: [DISCUSS] KIP-148: Add a connect timeout for client
> > > > > >
> > > > > >
> > > > > > Hi David,
> > > > > >
> > > > > > Is there a reason why you wouldn't want to use request.timeout.ms
> > as
> > > > the
> > > > > > timeout parameter for connections? Then you would use the same
> > > timeout
> > > > > for
> > > > > > connected and connecting phases when shutdown is unclean. You could
> > > > still
> > > > > > use the timeout to ensure that next metadata request is sent to
> > > another
> > > > > > node.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > > On Sun, May 21, 2017 at 9:51 AM, 东方甲乙 <254479...@qq.com> wrote:
> > > > > >
> > > > > > > Hi Guozhang,
> > > > > > >
> > > > > > >
> > > > > > > Thanks for the clarify. For the clarify 2, I think the key thing
> > is
> > > > not
> > > > > > > users control how much time in maximum to wait for inside code,
> > but
> > > > is
> > > > > > the
> > > > > > > network client can be aware of the connecting can't be finished
> > and
> > > > > try a
> > > > > > > good node. In the producer.sender even the selector.poll can
> > > timeout,
> > > > > but
> > > > > > > the next time is also not close the previous connecting and try
> > > > another
> > > > > > > good node.
> > > > > > >
> > > > > > >
> > > > > > > In out test env, QA shutdown one of the leader node, the producer
> > > > send
> > > > > > the
> > > > > > > request will timeout and close the node's connection then request
> > > the
> > > > > > > metadata.  But sometimes the request node is also the shutdown
> > > node.
> > > > > > When
> > > > > > > connecting the shutting down node to get the metadata, it is in
> > the
> > > > > > > connecting phase, network client mark the connecting node's state
> > > to
> > > > > > > CONNECTING, but if the node is shutdown,  the socket can't be
> > aware
> > > > of
> > > > > > the
> > > > > > > connecting is broken. Though the selector.poll has timeout
> > > parameter,
> > > > > but
> > > > > > > it will not close the connection, so the next
> > > > > > > time in the "networkclient.maybeUpdate" it will check if
> > > > > > > isAnyNodeConnecting, then will not connect to any good node the
> > get
> > > > the
> > > > > > > metadata.  It need about several minutes to
> > > > > > > aware the connecting is timeout and try other node.
> > > > > > >
> > > > > > >
> > > > > > > So I want to add a connect.timeout parameter,  the selector can
> > > find
> > > > > the
> > > > > > > connecting is timeout and close the connection.  It seems the
> > > > currently
> > > > > > the
> > > > > > > timeout value passed in `selector.poll()`
> > > > > > > seems can not do this.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > David
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ------------------ 原始邮件 ------------------
> > > > > > > 发件人: "Guozhang Wang";<wangg...@gmail.com>;
> > > > > > > 发送时间: 2017年5月16日(星期二) 凌晨1:51
> > > > > > > 收件人: "dev@kafka.apache.org"<dev@kafka.apache.org>;
> > > > > > >
> > > > > > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Hi David,
> > > > > > >
> > > > > > > I may be a bit confused before, just clarifying a few things:
> > > > > > >
> > > > > > > 1. As you mentioned, a client will always try to first establish
> > > the
> > > > > > > connection with a broker node before it tries to send any request
> > > to
> > > > > it.
> > > > > > > And after connection is established, it will either continuously
> > > send
> > > > > > many
> > > > > > > requests (e.g. produce) for just a single request (e.g. metadata)
> > > to
> > > > > the
> > > > > > > broker, so these two phases are indeed different.
> > > > > > >
> > > > > > > 2. In the connected phase, connections.max.idle.ms is used to
> > > > > > > auto-disconnect the socket if no requests has been sent /
> > received
> > > > > during
> > > > > > > that period of time; in the connecting phase, we always try to
> > > create
> > > > > the
> > > > > > > socket via "socketChannel.connect" in a non-blocking call, and
> > then
> > > > > > checks
> > > > > > > if the connection has been established, but all the callers of
> > this
> > > > > > > function (in either producer or consumer) has a timeout parameter
> > > as
> > > > in
> > > > > > > `selector.poll()`, and the timeout parameter is set either by
> > > > > > calculations
> > > > > > > based on metadata.expiration.time and backoff for
> > producer#sender,
> > > or
> > > > > by
> > > > > > > directly passed values from consumer#poll(timeout), so although
> > > there
> > > > > is
> > > > > > no
> > > > > > > directly config controlling that, users can still control how
> > much
> > > > time
> > > > > > in
> > > > > > > maximum to wait for inside code.
> > > > > > >
> > > > > > > I originally thought your scenarios is more on the connected
> > phase,
> > > > but
> > > > > > now
> > > > > > > I feel you are talking about the connecting phase. For that
> > case, I
> > > > > still
> > > > > > > feel currently the timeout value passed in `selector.poll()`
> > which
> > > is
> > > > > > > controllable from user code should be sufficient?
> > > > > > >
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, May 14, 2017 at 2:37 AM, 东方甲乙 <254479...@qq.com> wrote:
> > > > > > >
> > > > > > > > Hi Guozhang,
> > > > > > > >
> > > > > > > >
> > > > > > > > Sorry for the delay, thanks for the question.  It seems two
> > > > different
> > > > > > > > parameters to me:
> > > > > > > > connect.timeout.ms: only work for the connecting phrase, after
> > > > > > connected
> > > > > > > > phrase this parameter is not used.
> > > > > > > > connections.max.idle.ms: currently not work in the connecting
> > > > phrase
> > > > > > > > (only select return readyKeys >0) will add to the expired
> > > manager,
> > > > > > after
> > > > > > > > connected will check if the connection is still alive in some
> > > time.
> > > > > > > >
> > > > > > > >
> > > > > > > > Even if we change the connections.max.idle.ms to work
> > including
> > > > the
> > > > > > > > connecting phrase, we can not set this parameter to a small
> > > value,
> > > > > such
> > > > > > > as
> > > > > > > > 5 seconds. Because the client is maybe busy sending message to
> > > > other
> > > > > > > node,
> > > > > > > > it will be disconnected in 5 seconds, so the default value of
> > > > > > > > connections.max.idle.ms is setting to a larger time. We should
> > > > have
> > > > > > two
> > > > > > > > parameters to control the connecting phrase behavior and the
> > > > > connected
> > > > > > > > phrase behavior, do you think so?
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > >
> > > > > > > > David
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > ------------------ 原始邮件 ------------------
> > > > > > > > 发件人: "Guozhang Wang";<wangg...@gmail.com>;
> > > > > > > > 发送时间: 2017年5月6日(星期六) 上午7:52
> > > > > > > > 收件人: "dev@kafka.apache.org"<dev@kafka.apache.org>;
> > > > > > > >
> > > > > > > > 主题: Re: [DISCUSS] KIP-148: Add a connect timeout for client
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Hello David,
> > > > > > > >
> > > > > > > > Thanks for the KIP. For the described issue, I'm wondering if
> > it
> > > > can
> > > > > be
> > > > > > > > resolved by tuning the CONNECTIONS_MAX_IDLE_MS_CONFIG (
> > > > > > > > connections.max.idle.ms) on the client side? Default is 9
> > > minutes.
> > > > > > > >
> > > > > > > >
> > > > > > > > Guozhang
> > > > > > > >
> > > > > > > > On Tue, May 2, 2017 at 8:22 AM, 东方甲乙 <254479...@qq.com> wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > Currently in our test environment, we found that after one of
> > > the
> > > > > > > broker
> > > > > > > > > node crash (reboot or os crash), the client may still be
> > > > connecting
> > > > > > to
> > > > > > > > the
> > > > > > > > > crash node to send metadata request or other request, and it
> > > > needs
> > > > > > > > several
> > > > > > > > > minutes to be aware that the connection is timeout then try
> > > > another
> > > > > > > node
> > > > > > > > to
> > > > > > > > > connect to send the request. Then the client may still not be
> > > > aware
> > > > > > of
> > > > > > > > the
> > > > > > > > > metadata change after several minutes.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > So I want to add a connect timeout on the  client,  please
> > > take a
> > > > > > look
> > > > > > > > at:
> > > > > > > > >
> > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > 148%3A+Add+a+connect+timeout+for+client
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > David
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > [image: cake_logo_strap_screen 400.jpg] <http://www.cakesolutions.net>
> >
> > Simon Souter
> > (Office) 0845 617 1200
> > Houldsworth Mill, Houldsworth Street, Reddish, Stockport, SK5 6DA, UK
> > [image: twitter-circle-darkgrey.png]
> > <https://twitter.com/cakesolutions> [image:
> > facebook-circle-darkgrey.png]
> > <https://www.facebook.com/cakesolutionslimited/> [image:
> > linkedin-circle-darkgrey.png]
> > <https://www.linkedin.com/company/cake-solutions-limited>
> > [image: Reactive Applications]
> > <https://cakesolutions.sigstr.net/uc/588780e6825be936ed5682e0>
> > Company registered in the UK, No. 4184567 If you have received this e-mail
> > in error, please accept our apologies, destroy it immediately, and it would
> > be greatly appreciated if you notified the sender. It is your
> > responsibility to protect your system from viruses and any other harmful
> > code or device. We try to eliminate them from e-mails and attachments, but
> > we accept no liability for any which remain. We may monitor or access any
> > or all e-mails sent to us.
> > [image: Powered by Sigstr]
> > <https://cakesolutions.sigstr.net/uc/588780e6825be936ed5682e0/watermark>
> >

Reply via email to