[ https://issues.apache.org/jira/browse/KAFKA-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503372#comment-17503372 ]
Michael Hornung commented on KAFKA-13683: ----------------------------------------- In the meanwhile we got this advice from Confluent Support: {quote}Hello Michael, My name is Nicolas I am one of Eliot colleague, he brought this ticket to my attention because it looks like you may be using transactional producer when you don't really need it We are still reviewing what we can improve on this cluster to limit potential timeout occurrence, because it should not happen with the 60000m timeout you have configured Is the shared code snippet the actual code that is going in production ? Code review is outside our our usual scope, but I am concerned you are using transaction as a way to "transactionally" send data to Kafka, as its usually the case with classic Database If I understand correctly, your code is starting an AkkaHttpRestServer and on each received POST request, you are creating a new KafkaProducer and doing the full transaction sequence to send a single record Transactions, when used only with a producer (as in "not tied with a consumer"), are beneficial when you are writing multiple records to Kafka on multiple partition leaders (see [https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/|https://urldefense.proofpoint.com/v2/url?u=https-3A__www.confluent.io_blog_exactly-2Donce-2Dsemantics-2Dare-2Dpossible-2Dheres-2Dhow-2Dapache-2Dkafka-2Ddoes-2Dit_&d=DwMFaQ&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=Wm9epT8Im6UTApAKPK3AXoyKGAB9EPs5XdvIwvNpone28zsdbIqqmbyhD_VucnSS&m=WGs8sYawFfjWfy7oRD9Rq5WIAyzter8xyfaNnWTyvH3j1F_qZe1cRrqQzQBKP2kk&s=UqTaqpDyCPcKtvyAHt79EBEc9PbrMbOUzWXQcUPbo4w&e=] section {*}Transactions: Atomic writes across multiple partitions{*}) I believe that what you are looking for in your code is the guarantee that the HTTP message was successfully delivered and replicated in the kafka cluster, to be able to synchronously answer back the HTTP request. For this to work you only need to configure your producer with exactly once semantic delivery, for which you do not need the transaction overhead, but "just" enable.idempotence=true and acks=all, you will have Replication Factor=3 and min.isr=2 guarantee on Confluent Cloud You may also considering using a long lived KafkaProducer that would be reused in your HTTPServer, so that you do not have to pay the Producer initialization time on each HTTP Request Let me know what you think here, we are working on the cluster to see if we can identify potential improvement, but considering this only happened once for your application it may have just be a temporary spike Have a good day Nicolas {quote} We are implementing that solution proposal at the moment. > Transactional Producer - Transaction with key xyz went wrong with exception: > Timeout expired after 60000milliseconds while awaiting InitProducerId > -------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-13683 > URL: https://issues.apache.org/jira/browse/KAFKA-13683 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 2.6.0, 2.7.0, 3.0.0 > Reporter: Michael Hornung > Priority: Critical > Labels: new-txn-protocol-should-fix > Attachments: AkkaHttpRestServer.scala, > image-2022-02-24-09-12-04-804.png, image-2022-02-24-09-13-01-383.png, > timeoutException.png > > > We have an urgent issue with our customer using kafka transactional producer > with kafka cluster with 3 or more nodes. Our customer is using confluent > cloud on azure. > We this exception regularly: "Transaction with key XYZ went wrong with > exception: Timeout expired after 60000milliseconds while awaiting > InitProducerId" (see attachment) > We assume that the cause is a node which is down and the producer still sends > messages to the “down node”. > We are using kafa streams 3.0. > *We expect that if a node is down kafka producer is intelligent enough to not > send messages to this node any more.* > *What’s the solution of this issue? Is there any config we have to set?* > *This request is urgent because our costumer will soon have production > issues.* > *Additional information* > * send record --> see attachment “AkkaHttpRestServer.scala” – line 100 > * producer config --> see attachment “AkkaHttpRestServer.scala” – line 126 -- This message was sent by Atlassian Jira (v8.20.1#820001)