Hi,

In one of our clusters, some of our clients occasionally see this exception:
java.lang.IllegalStateException: Correlation id for response (4564)
does not match request (4562)
at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:486)
at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:381)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
at java.lang.Thread.run(Unknown Source)

We've also seen it from consumer poll() and commit()

Usually the response's correlation id is off by just 1 or 2 (like
above) but we've also seen it off by a few hundreds:
java.lang.IllegalStateException: Correlation id for response (742)
does not match request (174)
    at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:486)
    at 
org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:381)
    at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
    at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
    at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
    at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
    at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
    at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
    at 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:426)
    at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1059)
    at 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1027)

When this happens, all subsequent responses are also shifted:
java.lang.IllegalStateException: Correlation id for response (743)
does not match request (742)
java.lang.IllegalStateException: Correlation id for response (744)
does not match request (743)
java.lang.IllegalStateException: Correlation id for response (745)
does not match request (744)
java.lang.IllegalStateException: Correlation id for response (746)
does not match request (745)
 ...
It's easy to discard and recreate the consumer instance to recover
however we can't do that with the producer as it occurs in the Sender
thread.

Our cluster and our clients are running Kafka 0.10.0.1.
Under which circumstances would such an error happen ?
Even with logging set to TRACE, we can't spot anything suspicious
shortly before the issue. Is there any data we should try to capture
when this happens ?

Thanks!

Reply via email to