[jira] [Commented] (KAFKA-13683) Transactional Producer - Transaction with key xyz went wrong with exception: Timeout expired after 60000milliseconds while awaiting InitProducerId

Michael Hornung (Jira) Wed, 09 Mar 2022 00:14:07 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503372#comment-17503372
 ]


Michael Hornung commented on KAFKA-13683:
-----------------------------------------

In the meanwhile we got this advice from Confluent Support:
{quote}Hello Michael,
 
My name is Nicolas I am one of Eliot colleague, he brought this ticket to my 
attention because it looks like you may be using transactional producer when 
you don't really need it
 
We are still reviewing what we can improve on this cluster to limit  potential 
timeout occurrence, because it should not happen with the 60000m timeout you 
have configured
 
Is the shared code snippet the actual code that is going in production ?
Code review is outside our our usual scope,  but I  am concerned you are using 
transaction as a way to "transactionally" send data to Kafka, as its usually 
the case with classic Database
 
If I understand correctly, your code is starting an AkkaHttpRestServer and on 
each received POST request, you are creating a new KafkaProducer and doing the 
full transaction sequence to send a single record
 
Transactions, when used only with a producer (as in "not tied with a 
consumer"), are beneficial when you are writing multiple records to Kafka on 
multiple partition leaders   (see 
[https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/|https://urldefense.proofpoint.com/v2/url?u=https-3A__www.confluent.io_blog_exactly-2Donce-2Dsemantics-2Dare-2Dpossible-2Dheres-2Dhow-2Dapache-2Dkafka-2Ddoes-2Dit_&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=Wm9epT8Im6UTApAKPK3AXoyKGAB9EPs5XdvIwvNpone28zsdbIqqmbyhD_VucnSS&m=WGs8sYawFfjWfy7oRD9Rq5WIAyzter8xyfaNnWTyvH3j1F_qZe1cRrqQzQBKP2kk&s=UqTaqpDyCPcKtvyAHt79EBEc9PbrMbOUzWXQcUPbo4w&e=]
 section {*}Transactions: Atomic writes across multiple partitions{*})
 
I believe that what you are looking for in your code is the guarantee that the 
HTTP message was successfully delivered and replicated in the kafka cluster, to 
be able to synchronously answer back the HTTP request. For this to work you 
only need to configure your producer with exactly once semantic delivery, for 
which you do not need the transaction overhead, but "just" 
enable.idempotence=true and acks=all, you will have Replication Factor=3 and 
min.isr=2 guarantee on Confluent Cloud
 
You may also considering using a long lived KafkaProducer that would be reused 
in your HTTPServer, so that you do not have to pay the Producer initialization 
time on each HTTP Request
 
Let me know what you think here, we are working on the cluster to see if we can 
identify potential improvement, but considering this only happened once for 
your application it may have just be a temporary spike
 
Have a good day
 
Nicolas


{quote}
We are implementing that solution proposal at the moment.

> Transactional Producer - Transaction with key xyz went wrong with exception: 
> Timeout expired after 60000milliseconds while awaiting InitProducerId
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13683
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13683
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 2.6.0, 2.7.0, 3.0.0
>            Reporter: Michael Hornung
>            Priority: Critical
>              Labels: new-txn-protocol-should-fix
>         Attachments: AkkaHttpRestServer.scala, 
> image-2022-02-24-09-12-04-804.png, image-2022-02-24-09-13-01-383.png, 
> timeoutException.png
>
>
> We have an urgent issue with our customer using kafka transactional producer 
> with kafka cluster with 3 or more nodes. Our customer is using confluent 
> cloud on azure.
> We this exception regularly: "Transaction with key XYZ went wrong with 
> exception: Timeout expired after 60000milliseconds while awaiting 
> InitProducerId" (see attachment)
> We assume that the cause is a node which is down and the producer still sends 
> messages to the “down node”. 
> We are using kafa streams 3.0.
> *We expect that if a node is down kafka producer is intelligent enough to not 
> send messages to this node any more.*
> *What’s the solution of this issue? Is there any config we have to set?*
> *This request is urgent because our costumer will soon have production 
> issues.*
> *Additional information*
>  * send record --> see attachment “AkkaHttpRestServer.scala” – line 100
>  * producer config --> see attachment “AkkaHttpRestServer.scala” – line 126



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (KAFKA-13683) Transactional Producer - Transaction with key xyz went wrong with exception: Timeout expired after 60000milliseconds while awaiting InitProducerId

Reply via email to