hachikuji commented on PR #12392: URL: https://github.com/apache/kafka/pull/12392#issuecomment-1295396711
Thanks for all the discussion here and sorry for the late arrival. I have seen this issue in practice as well, often in the context of hanging transactions. The late-arriving `Produce` request is not expected by the transaction coordinator. Unless the producer is lingering around to continue writing to the transaction, then it is considered hanging by the partition leader. It's also fair to point out that this can violate the transaction's atomicity. I think the basic idea in the patch here is to bump the epoch when we abort a transaction in order to fence off writes that are in inflight. Do I have that right? This is in the spirit of an idea that's been on my mind for a while. The only difference is that I was considering a server-side implementation. The basic thought is to have the coordinator bump the epoch after _every_ `EndTxn` request. We would let the bumped epoch be returned in the response. EndTxnResponse => ThrottleTimeMs ErrorCode ProducerId ProducerEpoch The tuple of `(producerId, epoch)` effectively becomes a unique transaction ID. This would also simplify some of the sequence bookkeeping that we've had so much trouble with on the client. Each transaction would begin with sequence=0 on every partition and the client could "forget" about the inflight requests. Some of the logic we have struggled to get right is how to continue the sequence chain There is still a hole, however, which I think @jolshan was describing above. We cannot assume clients will always add partitions correctly to the transaction before beginning to write to the partition. We need a server-side validation. Otherwise, hanging transactions will always be possible. We have seen this so many times by now. My suggestion here is to let us get a KIP out in the couple weeks with a good server-side solution. We may still need a client-side approach for compatibility with older brokers though, so maybe we can leave the PR open. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
