[ 
https://issues.apache.org/jira/browse/KAFKA-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17602131#comment-17602131
 ] 

Guozhang Wang commented on KAFKA-10635:
---------------------------------------

Hello [~nicktelford], thanks for the updates!

I've summarized the EOS related exception and its handling logic in KIP-691 
(the section starting with "As of 08/16/2022"). As you can see 
OutOfSequenceException is an abortable exception not a fatal one. Kafka Streams 
relying on Kafka Producer would handle this exception as a 
TaskMigratedException, note that the latter is an internal exception that would 
be handled without failing the Kafka Streams app, i.e. the second stack trace 
should not be killing the app.

The first stack trace, though, in version 2.5, would be killing the app. The 
difference between these two is that the first stack trace was throw when 
committing streams task, while the second was thrown when the task was being 
processed normally while trying to send a record.

I checked the source code and have confirmed that this issue has been resolved 
in trunk, i.e. we would always throw it as TaskMigrated and handle it 
internally rather than failing the app. So I'd suggest you upgrading your 
application beyond 2.5.



> Streams application fails with OutOfOrderSequenceException after rolling 
> restarts of brokers
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-10635
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10635
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, producer 
>    Affects Versions: 2.5.1
>            Reporter: Peeraya Maetasatidsuk
>            Priority: Blocker
>
> We are upgrading our brokers to version 2.5.1 (from 2.3.1) by performing a 
> rolling restart of the brokers after installing the new version. After the 
> restarts we notice one of our streams app (client version 2.4.1) fails with 
> OutOfOrderSequenceException:
>  
> {code:java}
> ERROR [2020-10-13 22:52:21,400] [com.aaa.bbb.ExceptionHandler] Unexpected 
> error. Record: a_record, destination topic: 
> topic-name-Aggregation-repartition 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> ERROR [2020-10-13 22:52:21,413] 
> [org.apache.kafka.streams.processor.internals.AssignedTasks] stream-thread 
> [topic-name-StreamThread-1] Failed to commit stream task 1_39 due to the 
> following error: org.apache.kafka.streams.errors.StreamsException: task 
> [1_39] Abort sending since an error caught with a previous record (timestamp 
> 1602654659000) to topic topic-name-Aggregation-repartition due to 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.        at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144)
>         at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52)
>         at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204)
>         at 
> org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348)
>         at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230)
>         at 
> org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) 
>        at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:716) 
>        at 
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:674)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:596)
>         at 
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74) 
>        at 
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:798)
>         at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)   
>      at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:569)
>         at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:561)        at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)   
>      at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)       
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> org.apache.kafka.common.errors.OutOfOrderSequenceException: The broker 
> received an out of order sequence number.
> {code}
> We see a corresponding error on the broker side:
> {code:java}
> [2020-10-13 22:52:21,398] ERROR [ReplicaManager broker=137636348] Error 
> processing append operation on partition 
> topic-name-Aggregation-repartition-52  
> (kafka.server.ReplicaManager)org.apache.kafka.common.errors.OutOfOrderSequenceException:
>  Out of order sequence number for producerId 2819098 at offset 1156041 in 
> partition topic-name-Aggregation-repartition-52: 29 (incoming seq. number), 
> -1 (current end sequence number)
> {code}
> We are able to reproduce this many times and it happens regardless of whether 
> the broker shutdown (at restart) is clean or unclean. However, when we 
> rollback the broker version to 2.3.1 from 2.5.1 and perform similar rolling 
> restarts, we don't see this error on the streams application at all. This is 
> blocking us from upgrading our broker version. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to