[
https://issues.apache.org/jira/browse/KAFKA-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566808#comment-17566808
]
Daniel Urban commented on KAFKA-14053:
--------------------------------------
I understand that increasing the epoch on the client side is probably violating
the contract in the protocol.
Refactored my change so the client side timeouts (both delivery and request
timeout) will become fatal errors in transactional producers, resulting a last,
best-effort epoch bump.
> Transactional producer should bump the epoch when a batch encounters delivery
> timeout
> -------------------------------------------------------------------------------------
>
> Key: KAFKA-14053
> URL: https://issues.apache.org/jira/browse/KAFKA-14053
> Project: Kafka
> Issue Type: Bug
> Reporter: Daniel Urban
> Assignee: Daniel Urban
> Priority: Major
>
> When a batch fails due to delivery timeout, it is possible that the batch is
> still in-flight. Due to underlying infra issues, it is possible that an
> EndTxnRequest and a WriteTxnMarkerRequest is processed before the in-flight
> batch is processed on the leader. This can cause transactional batches to be
> appended to the log after the corresponding abort marker.
> This can cause the LSO to be infinitely blocked in the partition, or can even
> violate processing guarantees, as the out-of-order batch can become part of
> the next transaction.
> Because of this, the producer should skip aborting the partition, and bump
> the epoch to fence the in-flight requests.
>
> More detail can be found here:
> [https://lists.apache.org/thread/8d2oblsjtdv7740glc37v79f0r7p99dp]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)