[
https://issues.apache.org/jira/browse/KAFKA-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Calvin Liu updated KAFKA-17877:
-------------------------------
Description:
{code:java}
java.lang.IllegalStateException: WriteTxnMarkerResponse for
lkc-devcv9jg9n_transaction-bench-transaction-id-72UwIuNVQkOxl4y_OEBAlA does not
contain expected error map for producer id 8308
{code}
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerRequestCompletionHandler.scala#L100]
------
It is a data partition side bug. The leader may return the response early
without all the producer ID included in the response.
Consider the following case:
# We have 2 markers to append, one for producer-0, one for producer-1
# When we first process producer-0, it appends a marker to the
__consumer_offset.
# The __consumer_offset append finishes very fast because the group
coordinator is no longer the leader. So the coordinator directly returns
NOT_LEADER_OR_FOLLOWER. In its callback, it calls the {{maybeComplete()}} for
the first time, and because there is only one partition to append, it is able
to go further to call {{maybeSendResponseCallback()}} and decrement
{{{}numAppends{}}}.
# Then it calls the replica manager append for nothing, in the callback, it
calls the {{maybeComplete()}} for the second time. This time, it also
decrements {{{}numAppends{}}}.
Remember, because we only have 2 markers, the initial value for {{numAppends}}
is also 2. So in step 4, it is able to finish the request without even
processing producer-1. This will cause the producer-1 missing from the
WriteTxnMarkers response.
----
As a result, the txn coordinator will not update the txn state correctly though
the markers may have been written in the data partitions. There is an impact on
the clients. the client believes the txn is completed but when it tries to send
any request for the new transaction with the same transaction ID, the request
will fail with CONCURRENT_TRANSACTIONS.
Note, this can only happen with the KIP-848 coordinator enabled.
was:
{code:java}
java.lang.IllegalStateException: WriteTxnMarkerResponse for
lkc-devcv9jg9n_transaction-bench-transaction-id-72UwIuNVQkOxl4y_OEBAlA does not
contain expected error map for producer id 8308
{code}
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerRequestCompletionHandler.scala#L100]
------
It is a data partition side bug. The leader may return the response early
without all the producer ID included in the response.
Consider the following case:
# We have 2 markers to append, one for producer-0, one for producer-1
# When we first process producer-0, it appends a marker to the
__consumer_offset.
# The __consumer_offset append finishes very fast because the group
coordinator is no longer the leader. So the coordinator directly returns
NOT_LEADER_OR_FOLLOWER. In its callback, it calls the {{maybeComplete()}} for
the first time, and because there is only one partition to append, it is able
to go further to call {{maybeSendResponseCallback()}} and decrement
{{{}numAppends{}}}.
# Then it calls the replica manager append for nothing, in the callback, it
calls the {{maybeComplete()}} for the second time. This time, it also
decrements {{{}numAppends{}}}.
Remember, because we only have 2 markers, the initial value for {{numAppends}}
is also 2. So in step 4, it is able to finish the request without even
processing producer-1. This will cause the producer-1 missing from the
WriteTxnMarkers response.
----
As a result, the txn coordinator will not update the txn state correctly though
the markers may have been written in the data partitions. There is an impact
on the clients. the
> IllegalStateException: missing producer id from the WriteTxnMarkersResponse
> ---------------------------------------------------------------------------
>
> Key: KAFKA-17877
> URL: https://issues.apache.org/jira/browse/KAFKA-17877
> Project: Kafka
> Issue Type: Bug
> Reporter: Calvin Liu
> Assignee: Calvin Liu
> Priority: Major
>
> {code:java}
> java.lang.IllegalStateException: WriteTxnMarkerResponse for
> lkc-devcv9jg9n_transaction-bench-transaction-id-72UwIuNVQkOxl4y_OEBAlA does
> not contain expected error map for producer id 8308
> {code}
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerRequestCompletionHandler.scala#L100]
> ------
> It is a data partition side bug. The leader may return the response early
> without all the producer ID included in the response.
> Consider the following case:
> # We have 2 markers to append, one for producer-0, one for producer-1
> # When we first process producer-0, it appends a marker to the
> __consumer_offset.
> # The __consumer_offset append finishes very fast because the group
> coordinator is no longer the leader. So the coordinator directly returns
> NOT_LEADER_OR_FOLLOWER. In its callback, it calls the {{maybeComplete()}} for
> the first time, and because there is only one partition to append, it is able
> to go further to call {{maybeSendResponseCallback()}} and decrement
> {{{}numAppends{}}}.
> # Then it calls the replica manager append for nothing, in the callback, it
> calls the {{maybeComplete()}} for the second time. This time, it also
> decrements {{{}numAppends{}}}.
> Remember, because we only have 2 markers, the initial value for
> {{numAppends}} is also 2. So in step 4, it is able to finish the request
> without even processing producer-1. This will cause the producer-1 missing
> from the WriteTxnMarkers response.
> ----
> As a result, the txn coordinator will not update the txn state correctly
> though the markers may have been written in the data partitions. There is an
> impact on the clients. the client believes the txn is completed but when it
> tries to send any request for the new transaction with the same transaction
> ID, the request will fail with CONCURRENT_TRANSACTIONS.
> Note, this can only happen with the KIP-848 coordinator enabled.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)