Apurva Mehta created KAFKA-5482:
-----------------------------------

             Summary: A CONCURRENT_TRANASCTIONS error for the first 
AddPartitionsToTxn request slows down transactions significantly
                 Key: KAFKA-5482
                 URL: https://issues.apache.org/jira/browse/KAFKA-5482
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 0.11.0.0
            Reporter: Apurva Mehta
            Assignee: Apurva Mehta
             Fix For: 0.11.0.1


Here is the issue.

# When we do a commit transaction, the producer sends an `EndTxn` request to 
the coordinator. The coordinator writes the `PrepareCommit` message to the 
transaction log and then returns the response the client. It writes the 
transaction markers and the final 'CompleteCommit' message asynchronously. 
# In the mean time, if the client starts another transaction, it will send an 
`AddPartitions` request on the next `Sender.run` loop. If the markers haven't 
been written yet, then the coordinator will return a retriable 
`CONCURRENT_TRANSACTIONS` error to the client.
# The current behavior in the producer is to sleep for `retryBackoffMs` before 
retrying the request. The current default for this is 100ms. So the producer 
will sleep for 100ms before sending the `AddPartitions` again. This puts a 
floor on the latency for back to back transactions.

This has been worked around in https://issues.apache.org/jira/browse/KAFKA-5477 
by reducing the retryBackoff for the first AddPartitions request. But we need a 
stronger solution: like having the commit block until the transaction is 
complete, or delaying the addPartitions until batches are actually ready to be 
sent to the transaction.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to