Cheng, Thanks for the KIP and the detailed proposal section. LGTM!
On Thu, May 7, 2020 at 3:38 PM Cheng Tan <c...@confluent.io> wrote: > > I think more about the potential wider use cases, I modified the proposal to > target all the connection. Thanks. > > - Best, - Cheng Tan > > > On May 7, 2020, at 1:41 AM, Cheng Tan <c...@confluent.io> wrote: > > > > Hi Colin, > > > > Sorry for the confusion. I’m proposing to implement timeout in the > > NetworkClient.leastLoadedNode() when iterating all the cached node. The > > alternative I can think is to implement the timeout in NetworkClient.poll() > > > > I’d prefer to implement in the leastLoadedNode(). Here’re the reasons: > > Usually when clients send a request, they will asking the network client to > > send the request to a specific node. In this case, the > > connection.setup.timeout won’t matter too much because the client doesn’t > > want to try other nodes for that specific request. The request level > > timeout would be enough. The metadata fetcher fetches the nodes status > > periodically so the clients can reassign the request to another node after > > timeout. > > Consumer, producer, and AdminClient are all using leastLoadedNode() for > > metadata fetch, where the connection setup timeout can play an important > > role. Unlike other requests can refer to the metadata for node condition, > > the metadata requests can only blindly choose a node for retry in the worst > > scenario. We want to make sure the client can get the metadata smoothly and > > as soon as possible. As a result, we need this connection.setup.timeout. > > Implementing the timeout in poll() or anywhere else might need an extra > > iteration of all nodes, which might downgrade the network client > > performance. > > I also updated the KIP content and KIP status. Please let me know if the > > above ideas make sense. Thanks. > > > > Best, - Cheng Tan > > > > > > > >> On May 4, 2020, at 5:26 PM, Colin McCabe <cmcc...@apache.org > >> <mailto:cmcc...@apache.org>> wrote: > >> > >> Hi Cheng, > >> > >> On the KIP page, it lists this KIP as "draft." It seems like "under > >> discussion" is appropriate here, right? > >> > >>> Currently, the initial socket connection timeout is depending on Linux > >>> kernel setting > >>> tcp_syn_retries. The timeout value is 2 ^ (tcp_sync_retries + 1) - 1 > >>> seconds. For the > >>> reasons below, we want to control the client-side socket timeout directly > >>> using > >>> configuration files > >> > >> Linux is just one example of an OS that Kafka could run on, right? You > >> could also be running on MacOS, for example. > >> > >>> I'm proposing to do a lazy socket connection time out. That is, we only > >>> check if > >>> we need to timeout a socket when we consider the corresponding node as a > >>> candidate in the node provider. > >> > >> The NodeProvider is an AdminClient abstraction, right? Why wouldn't we > >> implement a connection setup timeout for all clients, not just AdminClient? > >> > >> best, > >> Colin > >> > >> On Mon, May 4, 2020, at 13:18, Colin McCabe wrote: > >>> Hmm. A big part of the reason behind the KIP is that the default > >>> connection timeout behavior of the OS doesn't work for Kafka, right? > >>> For example, on Linux, if we wait 127 seconds for a connection attempt > >>> to time out, we won't get a chance to make another attempt in most > >>> cases. So I think it makes sense to set a shorter default. > >>> > >>> best, > >>> Colin > >>> > >>> > >>> On Mon, May 4, 2020, at 09:44, Jose Garcia Sancio wrote: > >>>> Thanks for the KIP Cheng, > >>>> > >>>>> The default value will be 10 seconds. > >>>> > >>>> I think we should make the default the current behavior. Meaning the > >>>> default should leverage the default connect timeout from the operating > >>>> system. > >>>> > >>>>> Proposed Changes > >>>> > >>>> I don't fully understand this section. It seems like it is mainly > >>>> focused on the problem with the current implementation. Can you > >>>> explain how the proposed changes solve the problem? > >>>> > >>>> Thanks. > >>>> > >>>> > >>>> -- > >>>> -Jose > >>>> > >>> > > > -- -Jose