[
https://issues.apache.org/jira/browse/KAFKA-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219731#comment-15219731
]
Ismael Juma commented on KAFKA-3488:
------------------------------------
Good catch [~rsivaram]. Thoughts [~hachikuji]?
> commitAsync() fails if metadata update creates new SASL/SSL connection
> ----------------------------------------------------------------------
>
> Key: KAFKA-3488
> URL: https://issues.apache.org/jira/browse/KAFKA-3488
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 0.9.0.1
> Reporter: Rajini Sivaram
> Assignee: Rajini Sivaram
>
> Sasl/SslConsumerTest.testSimpleConsumption() fails intermittently with a
> failure in {{commitAsync()}}. The exception stack trace shows:
> {quote}
> kafka.api.SaslPlaintextConsumerTest.testSimpleConsumption FAILED
> java.lang.AssertionError: expected:<1> but was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:631)
> at
> kafka.api.BaseConsumerTest.awaitCommitCallback(BaseConsumerTest.scala:340)
> at
> kafka.api.BaseConsumerTest.testSimpleConsumption(BaseConsumerTest.scala:85)
> {quote}
> I have recreated this with some additional trace. The tests run with a very
> small metadata expiry interval, triggering metadata updates quite often. If a
> metadata request immediately following a {{commitAsync()}} call creates a new
> SSL/SASL connection, {{ConsumerNetworkClient.poll}} returns to process the
> connection handshake packets. Since {{ConsumerNetworkClient.poll}} discards
> all unsent packets before returning from poll, this can result in the failure
> of the commit - the callback is invoked with {{SendFailedException}}.
> I understand that {{ConsumerNetworkClient.poll()}} discards unsent packets
> rather than buffer them to keep the code simple. And perhaps it is ok to fail
> {{commitAsync}} occasionally since the callback does indicate that the caller
> should retry. But it feels like an unnecessary limitation that requires error
> handling in client applications when there are no real failures and makes it
> much harder to test reliably. As special handling to fix issues like
> KAFKA-3412, KAFKA-2672 adds more complexity to the code anyway, and because
> it is much harder to debug failures that affect only SSL/SASL, it may be
> worth considering improving this behaviour.
> I will see if I can submit a PR for the specific issue I was seeing with the
> impact of handshakes on {{commitAsync()}}, but I will be interested in views
> on improving the logic in {{ConsumerNetworkClient}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)