Justine Olshan created KAFKA-15984:
--------------------------------------
Summary: Client disconnections can cause hanging transactions on
__consumer_offsets
Key: KAFKA-15984
URL: https://issues.apache.org/jira/browse/KAFKA-15984
Project: Kafka
Issue Type: Task
Reporter: Justine Olshan
When investigating frequent hanging transactions on __consumer_offsets
partitions, we realized that many of them were cause by the same offset being
committed with duplicates and one with `"isDisconnectedClient":true`.
TxnOffsetCommits do not have sequence numbers and thus are not protected
against duplicates in the same way idempotent produce requests are. Thus, when
a client is disconnected (and flushes its requests), we may see the duplicate
get appended to the log.
KIP-890 part 1 should protect against this as the duplicate will not succeed
verification. KIP-890 part 2 strengthens this further as duplicates (from
previous transactions) can not be added to new transactions if the partitions
is re-added since the epoch will be bumped.
Another possible solution is to do duplicate checking on the group coordinator
side when the request comes in. This solution could be used instead of KIP-890
part 1 to prevent hanging transactions but given that part 1 only has one open
PR remaining, we may not need to do this. However, this can also prevent
duplicates from being added to a new transaction – something only part 2 will
protect against.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)