[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045162#comment-16045162
 ] 

Apurva Mehta commented on KAFKA-5415:
-------------------------------------

The last successful metadata update was the following. The update timestamp was 
1496957141444.

{noformat}
[2017-06-08 21:25:41,449] DEBUG TransactionalId my-first-transactional-id 
complete transition from Ongoing to TxnTransitMetadata(producerId=2000, 
producerEpoch=0, txnTimeoutMs=60000, txnState=Ongoing, 
topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, 
output-topic-1), txnStartTimestamp=1496957141430, 
txnLastUpdateTimestamp=1496957141444) 
(kafka.coordinator.transaction.TransactionMetadata)
{noformat}

then the system clock rolled back by a couple of hundred milliseconds, and the 
'prepare transition' to 'PrepareCommit' had this transition metadata 

{noformat}
[2017-06-08 21:25:41,285] DEBUG TransactionalId my-first-transactional-id 
prepare transition from Ongoing to TxnTransitMetadata(producerId=2000, 
producerEpoch=0, txnTimeoutMs=60000, txnState=PrepareCommit, 
topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, 
output-topic-1), txnStartTimestamp=1496957141430, 
txnLastUpdateTimestamp=1496957141285) 
(kafka.coordinator.transaction.TransactionMetadata)
{noformat}

So when it came time to complete the transition, the timestamp check would fail 
because the new update timestamp was older than the previous one. We wolud 
throw an illegalStateException, which would be caught and swallowed in the 
delayed fetch operation, hence leving the transaction hanging with a 
pendingState of PrepareCommit.



> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-5415
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5415
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Apurva Mehta
>            Assignee: Apurva Mehta
>            Priority: Blocker
>              Labels: exactly-once
>             Fix For: 0.11.0.0
>
>         Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to