Dear Rajini, Thanks for all the feedbacks. They are very helpful for me to do the brainstorming. I’ve incorporated our discuss in the KIP and started a voting thread.
Best, - Cheng Tan > On May 15, 2020, at 2:13 PM, Rajini Sivaram <rajinisiva...@gmail.com> wrote: > > Hi Cheng, > > I am fine with the rest of the KIP apart from the 10s default. If no one > else has any concerns about this new default, let's go with it. Please go > ahead and start vote. > > Regards, > > Rajini > > > On Fri, May 15, 2020 at 8:21 PM Cheng Tan <c...@confluent.io> wrote: > >> Dear Rajini, >> >> >> Thanks for the reply. >> >>> e have a lot of these and I want to >>> understand the benefits of the proposed timeout in this case alone. We >>> currently have a request timeout of 30s. Would you consider adding a 10s >>> connection timeout? >> >> A shorter timeout (10s) at the transportation level will help clients >> detect dead nodes faster. “request.timeout.ms” is too general and applies >> to all the requests whose complexity at the application level varies. It’s >> risky to set “request.timeout.ms” to a lower value for detecting dead >> nodes quicker because of the involvement of the application layer. >> >> After “socket.connection.setup.timeout.ms” hits, NetworkClient will fail >> the request in the exact approach as it handles “request.timeout.ms”. >> That is to say, the response will constructed upon a RetriableException. >> Producer, Consumer, and KafkaAdminClient can then perform their retry logic >> as a request timeout happens. >> >>> We have KIP-612 that is proposing to throttle connection set up on the >> one >>> hand and this KIP that is dramatically reducing default connection >> timeout >>> on the other. Not sure if that is a good idea. >> >> The default of the broker connection creation rate limit is Int.MaxValue. >> The KIP also proposes per-IP throttle configuration. Thus, I don’t expect >> the combination of the broker connection throttle and a shorter client >> transportation timeout will have a side effect. >> >> Does the reasons above make sense to you? >> >> Best, - Cheng >> >> >> >> >>> On May 15, 2020, at 4:49 AM, Rajini Sivaram <rajinisiva...@gmail.com> >> wrote: >>> >>> Hi Cheng, >>> >>> Let me rephrase my question. Let's say we didn't have the case of >>> leastLoadedNode. We are only talking about connections to a specific node >>> (i.e. leader or controller). We have a lot of these and I want to >>> understand the benefits of the proposed timeout in this case alone. We >>> currently have a request timeout of 30s. Would you consider adding a 10s >>> connection timeout? And if you did, what would you expect the 10s timeout >>> to do? >>> >>> a) We could fail a request if connection didn't complete within 10s. If >> we >>> always expect connections to succeed within 10s, this would be considered >>> reasonable behaviour. But this would be changing the current default, >> which >>> allows you up to 30 seconds to connect and process a request. >>> b) We retry the connection. What would be the point? We were waiting in a >>> queue for connecting, but we decide to stop and join the back of the >> queue. >>> >>> We have KIP-612 that is proposing to throttle connection set up on the >> one >>> hand and this KIP that is dramatically reducing default connection >> timeout >>> on the other. Not sure if that is a good idea. >>> >>> >>> On Fri, May 15, 2020 at 1:26 AM Cheng Tan <c...@confluent.io> wrote: >> >>