Cheng,

Thanks for the KIP and the detailed proposal section. LGTM!

On Thu, May 7, 2020 at 3:38 PM Cheng Tan <c...@confluent.io> wrote:
>
> I think more about the potential wider use cases, I modified the proposal to 
> target all the connection. Thanks.
>
> - Best, - Cheng Tan
>
> > On May 7, 2020, at 1:41 AM, Cheng Tan <c...@confluent.io> wrote:
> >
> > Hi Colin,
> >
> > Sorry for the confusion. I’m proposing to implement timeout in the 
> > NetworkClient.leastLoadedNode() when iterating all the cached node. The 
> > alternative I can think is to implement the timeout in NetworkClient.poll()
> >
> > I’d prefer to implement in the leastLoadedNode(). Here’re the reasons:
> > Usually when clients send a request, they will asking the network client to 
> > send the request to a specific node. In this case, the 
> > connection.setup.timeout won’t matter too much because the client doesn’t 
> > want to try other nodes for that specific request. The request level 
> > timeout would be enough. The metadata fetcher fetches the nodes status 
> > periodically so the clients can reassign the request to another node after 
> > timeout.
> > Consumer, producer, and AdminClient are all using leastLoadedNode() for 
> > metadata fetch, where the connection setup timeout can play an important 
> > role. Unlike other requests can refer to the metadata for node condition, 
> > the metadata requests can only blindly choose a node for retry in the worst 
> > scenario. We want to make sure the client can get the metadata smoothly and 
> > as soon as possible. As a result, we need this connection.setup.timeout.
> > Implementing the timeout in poll() or anywhere else might need an extra 
> > iteration of all nodes, which might downgrade the network client 
> > performance.
> > I also updated the KIP content and KIP status. Please let me know if the 
> > above ideas make sense. Thanks.
> >
> > Best, - Cheng Tan
> >
> >
> >
> >> On May 4, 2020, at 5:26 PM, Colin McCabe <cmcc...@apache.org 
> >> <mailto:cmcc...@apache.org>> wrote:
> >>
> >> Hi Cheng,
> >>
> >> On the KIP page, it lists this KIP as "draft."  It seems like "under 
> >> discussion" is appropriate here, right?
> >>
> >>> Currently, the initial socket connection timeout is depending on Linux 
> >>> kernel setting
> >>> tcp_syn_retries. The timeout value is 2 ^ (tcp_sync_retries + 1) - 1 
> >>> seconds. For the
> >>> reasons below, we want to control the client-side socket timeout directly 
> >>> using
> >>> configuration files
> >>
> >> Linux is just one example of an OS that Kafka could run on, right?  You 
> >> could also be running on MacOS, for example.
> >>
> >>> I'm proposing to do a lazy socket connection time out. That is, we only 
> >>> check if
> >>> we need to timeout a socket when we consider the corresponding node as a
> >>> candidate in the node provider.
> >>
> >> The NodeProvider is an AdminClient abstraction, right?  Why wouldn't we 
> >> implement a connection setup timeout for all clients, not just AdminClient?
> >>
> >> best,
> >> Colin
> >>
> >> On Mon, May 4, 2020, at 13:18, Colin McCabe wrote:
> >>> Hmm.  A big part of the reason behind the KIP is that the default
> >>> connection timeout behavior of the OS doesn't work for Kafka, right?
> >>> For example, on Linux, if we wait 127 seconds for a connection attempt
> >>> to time out, we won't get a chance to make another attempt in most
> >>> cases.  So I think it makes sense to set a shorter default.
> >>>
> >>> best,
> >>> Colin
> >>>
> >>>
> >>> On Mon, May 4, 2020, at 09:44, Jose Garcia Sancio wrote:
> >>>> Thanks for the KIP Cheng,
> >>>>
> >>>>> The default value will be 10 seconds.
> >>>>
> >>>> I think we should make the default the current behavior. Meaning the
> >>>> default should leverage the default connect timeout from the operating
> >>>> system.
> >>>>
> >>>>> Proposed Changes
> >>>>
> >>>> I don't fully understand this section. It seems like it is mainly
> >>>> focused on the problem with the current implementation. Can you
> >>>> explain how the proposed changes solve the problem?
> >>>>
> >>>> Thanks.
> >>>>
> >>>>
> >>>> --
> >>>> -Jose
> >>>>
> >>>
> >
>


-- 
-Jose

Reply via email to