Hi Matthias, I am +1 (non-binding) on the KIP.
Just one final remark: Wouldn't it be better to specify task.timeout.ms to -1 if no retry should be done? IMO it would make the config more intuitive because 0 would not have two possible meanings (i.e. try once and never try) anymore. Best, Bruno On Sat, May 16, 2020 at 7:51 PM Guozhang Wang <wangg...@gmail.com> wrote: > > Hello Matthias, > > Thanks for the updated KIP, overall I'm +1 on this proposal. Some minor > comments (I know gmail mixed that again for me so I'm leaving it as a combo > for both DISCUSS and VOTE :) > > 1) There are some inconsistent statements in the proposal regarding what to > deprecated: at the beginning it says "We propose to deprecate the retries > configuration parameter for the producer and admin client" but later in > compatibility we say "Producer and admin client behavior does not change; > both still respect retries config." My understanding is that we will only > deprecate the StreamsConfig#retries while not touch on > ProducerConfig/AdminClientConfig#retries, AND we will always override the > embedded producer / admin retries config to infinity so that we never rely > on those configs but always bounded by the timeout config. Is that > right, if yes could you clarify in the doc? > > 2) We should also document the related behavior change in PartitionAssignor > that uses AdminClient. More specifically, the admin client's retries config > is piggy-backed inside InternalTopicManager as an outer-loop retry logic in > addition to AdminClient's own inner retry loop. There are some existing > issues like KAFKA-9999 / 10006 that Sophie and Boyang has been working on. > I exchanged some ideas with them, and generally we should consider if the > outer-loop of InternalTopicManager should just be removed and if we got > TimeoutException we should just trigger another rebalance etc. > > BTW as I mentioned in the previous statement, today throwing an exception > that kills one thread but not the whole instance is still an issue for > monitoring purposes, but I suppose this is not going to be in this KIP but > addressed by another KIP, right? > > > Guozhang > > > On Fri, May 15, 2020 at 1:14 PM Boyang Chen <reluctanthero...@gmail.com> > wrote: > > > Good, good. > > > > Read through the discussions, the KIP looks good to me, +1 (non-binding) > > > > On Fri, May 15, 2020 at 11:51 AM Sophie Blee-Goldman <sop...@confluent.io> > > wrote: > > > > > Called out! > > > > > > Seems like gmail struggles with [...] prefixed subjects. You'd think they > > > would adapt > > > all their practices to conform to the Apache Kafka mailing list > > standards, > > > but no! > > > > > > +1 (non-binding) by the way > > > > > > On Fri, May 15, 2020 at 11:46 AM John Roesler <vvcep...@apache.org> > > wrote: > > > > > > > Hi Boyang, > > > > > > > > It is a separate thread, and you have just revealed yourself as a gmail > > > > user ;) > > > > > > > > (Gmail sometimes conflates vote and discuss threads for no apparent > > > reason > > > > ) > > > > > > > > -John > > > > > > > > On Fri, May 15, 2020, at 13:39, Boyang Chen wrote: > > > > > Hey Matthias, should this be on a separate VOTE thread? > > > > > > > > > > On Fri, May 15, 2020 at 11:38 AM John Roesler <vvcep...@apache.org> > > > > wrote: > > > > > > > > > > > Thanks, Matthias! > > > > > > > > > > > > I’m +1 (binding) > > > > > > > > > > > > -John > > > > > > > > > > > > On Fri, May 15, 2020, at 11:55, Matthias J. Sax wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I would like to start the vote on KIP-572: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-572%3A+Improve+timeouts+and+retries+in+Kafka+Streams > > > > > > > > > > > > > > > > > > > > > -Matthias > > > > > > > > > > > > > > > > > > > > > Attachments: > > > > > > > * signature.asc > > > > > > > > > > > > > > > > > > > > > > > -- > -- Guozhang