Hello Boyang, Thanks for the KIP, I have some comments below:
1. "Once transactions are complete, the call will return." This seems different from the existing behavior, in which we would return a retriable CONCURRENT_TRANSACTIONS and let the client to retry, is this intentional? 2. "an overload to onPartitionsAssigned in the consumer's rebalance listener interface": as part of KIP-341 we've already add this information to the onAssignment callback. Would this be sufficient? Or more generally speaking, which information have to be passed around in rebalance callback while others can be passed around in PartitionAssignor callback? In Streams for example both callbacks are used but most critical information is passed via onAssignment. 3. "We propose to use a separate record type in order to store the group assignment.": hmm, I thought with the third typed FindCoordinator, the same broker that act as the consumer coordinator would always be selected as the txn coordinator, in which case it can access its local cache metadata / offset topic to get this information already? We just need to think about how to make these two modules directly exchange information without messing up the code hierarchy. 4. The config of "CONSUMER_GROUP_AWARE_TRANSACTION": it seems the goal of this config is just to avoid old-versioned broker to not be able to recognize newer versioned client. I think if we can do something else to avoid this config though, for example we can use the embedded AdminClient to send the APIVersion request upon starting up, and based on the returned value decides whether to go to the old code path or the new behavior. Admittedly asking a random broker about APIVersion does not guarantee the whole cluster's versions, but what we can do is to first 1) find the coordinator (and if the random broker does not even recognize the new discover type, fall back to old path directly), and then 2) ask the discovered coordinator about its supported APIVersion. 5. This is a meta question: have you considered how this can be applied to Kafka Connect as well? For example, for source connectors, the assignment is not by "partitions", but by some other sort of "resources" based on the source systems, how KIP-447 would affect Kafka Connectors that implemented EOS as well? Guozhang On Sat, Jun 22, 2019 at 8:40 PM Ismael Juma <ism...@juma.me.uk> wrote: > Hi Boyang, > > Thanks for the KIP. It's good that we listed a number of rejected > alternatives. It would be helpful to have an explanation of why they were > rejected. > > Ismael > > On Sat, Jun 22, 2019 at 8:31 PM Boyang Chen <bche...@outlook.com> wrote: > > > Hey all, > > > > I would like to start a discussion for KIP-447: > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-447%3A+Producer+scalability+for+exactly+once+semantics > > > > this is a work originated by Jason Gustafson and we would like to proceed > > into discussion stage. > > > > Let me know your thoughts, thanks! > > > > Boyang > > > -- -- Guozhang