[jira] [Commented] (KAFKA-17754) A delayed EndTxn message can cause aborted read, lost writes, atomicity violation

Calvin Liu (Jira) Wed, 23 Jul 2025 14:05:05 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-17754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18009390#comment-18009390
 ]


Calvin Liu commented on KAFKA-17754:
------------------------------------

Hi [~aphyr] 

I tried some tests but I was not able to repro the issue. I think the problem 
is that I can't find the way to randomly delay the requests. Using the 
following command for testing on docker
{code:java}
lein run test --nodes-file /root/nodes --username root  -s --no-ww-deps --txn 
--concurrency 3n --nemesis pause,kill,partition --intra-txn-delay 50 
--db-targets one,majority –partition-targets one,majority,majorities-ring 
--nemesis-interval 10 --abort-p 0.15 --crash-clients --time-limit 1200 
--test-count 10  --db kafka_kraft_3_8 {code}
Any suggestions?

> A delayed EndTxn message can cause aborted read, lost writes, atomicity 
> violation
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-17754
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17754
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, producer , protocol
>    Affects Versions: 3.8.0
>            Reporter: Kyle Kingsbury
>            Assignee: Justine Olshan
>            Priority: Major
>              Labels: atomic, transaction
>         Attachments: lamport-1.png
>
>
> In short: I believe that both an internal retry mechanism and guidance from 
> the example code and client API docs are independently capable of causing 
> committed transactions to actually abort, aborted transactions to actually 
> commit, and transactions to be split into multiple parts with different 
> fates. A delayed EndTxn message could arrive seconds later and decide the 
> fate of an unrelated transaction.
> Consider the attached Lamport diagram, reconstructed from node logs and 
> packet captures in a recent Jepsen test. In it, a single process, using a 
> single producer and consumer, executes a series of transactions which all 
> commit or abort cleanly. Process 76 selected the unique transactional ID 
> `jt1234` on initialization.
> From packet captures and debug logs, we see `jt1234` used producer ID `233`, 
> submitted all four operations, then sent an EndTxn message with `committed = 
> false`, which denotes a transaction abort. However, fifteen separate calls to 
> `poll` observed this transaction's write of `424` to key `5`---an obvious 
> case of aborted read (G1a). Even stranger, *no* poller observed the other 
> writes from this transaction: key `17` apparently never
> received values `926` or `927`. Why?
> From the packet capture and logs: process 76 began a transaction which sent 
> `1018` to key `15`. It sent an `EndTxn` message to commit that transaction to 
> node `n3`. However, it did not receive a prompt response. The client then 
> quietly sent a *second* commit message to `n4`, which returned successfully; 
> the test harness's call to `commitTransaction` completed successfully. The 
> process then performed and intentionally aborted a second transaction; this 
> completed OK. So far, so good.
> Then process 76 began our problematic transaction. It sent `424` to key `5`, 
> and added new partitions to the transaction. Just after accepting record 
> `424`, node `n3` received the delayed commit message from two transactions 
> previously. It committed the current transaction, effectively chopping it in 
> half. The first half (record `424`) was committed and visible to pollers. The 
> second half, sending `926` and `927` to key `17`, implicitly began a second 
> transaction, which was aborted by the client.
> This suggests a fundamental problem in the Kafka transaction protocol. The 
> protocol is intentionally designed to allow clients to submit requests over 
> multiple TCP connections and to distribute them across multiple nodes. There 
> is no sequence number to order requests from the same client. There is no 
> concept of a transaction number. When a server receives a commit (or abort) 
> message, it has no way to know what transaction the client intended to 
> commit. It simply commits or aborts whatever transaction happens to be in 
> progress.
> This means transactions which appeared to commit could actually abort, and 
> vice versa: we observed both aborted reads and lost writes. It also means 
> transactions could get chopped in to smaller pieces: one could lose some, but 
> not all, of a transaction's effects.
> What does it take to get this behavior? First, an `EndTxn` message must be 
> delayed---for instance due to network latency, packet loss, a slow computer, 
> garbage collection, etc. Second, while that `EndTxn` arrow is hovering in the 
> air, the client needs to move on to perform a second transaction using the 
> same producer ID and epoch. There are several ways this could happen.
> First, users could explicitly retry committing or aborting a transaction. The 
> docs say they can, and the client won't stop them.
> Second, the official Kafka Java client docs instruct users repeatedly 
> instruct users to call `abortTransaction` if an error occurs during 
> `commitTransaction`. The provided example code leads directly to this 
> behavior: if `commitTransaction` times out, it calls `abortTransaction`, and 
> violà: the client can move on to later operations. The only exceptions in the 
> docs are `ProducerFencedException`, `OutOfOrderSequenceException`, and  
> `AuthorizationException`, none of which apply here.
> I've tried to avoid this problem by ensuring that transactions either commit 
> once, or abort once, never both. Sadly, this doesn't work. Indeed, process 76 
> in this test run *never* tried to abort a transaction after calling 
> commit---and even though it only calls `commitTransaction` once, it sent 
> *two* commit messages to two different nodes. I suspect this is because the 
> Java client treats timeouts as retriable 
> ([https://github.com/apache/kafka/blob/8125c3da5bb6ebb35a0cb3494624d33fad4e3187/clients/src/main/java/org/apache/kafka/common/errors/TimeoutException.java#L22]),
>  and the transaction manager appears to perform its own internal retries.
> I'm not entirely sure how to fix this--the protocol feels like it's missing 
> critical ordering information, and you can't rely on e.g. TCP ordering, 
> because it's multi-stream. One option might be to force the producer to 
> acquire a new epoch if it ever encounters an indefinite result from an EndTxn 
> message-then the producer fencing mechanism would prevent any delayed EndTxn 
> messages from being processed, right?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-17754) A delayed EndTxn message can cause aborted read, lost writes, atomicity violation

Reply via email to