[ 
https://issues.apache.org/jira/browse/KAFKA-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-3488:
--------------------------------
    Fix Version/s: 0.10.0.0

> commitAsync() fails if metadata update creates new SASL/SSL connection
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-3488
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3488
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.9.0.1
>            Reporter: Rajini Sivaram
>            Assignee: Rajini Sivaram
>             Fix For: 0.10.1.0, 0.10.0.0
>
>
> Sasl/SslConsumerTest.testSimpleConsumption() fails intermittently with a 
> failure in {{commitAsync()}}. The exception stack trace shows:
> {quote}
> kafka.api.SaslPlaintextConsumerTest.testSimpleConsumption FAILED
> java.lang.AssertionError: expected:<1> but was:<0>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:834)
>       at org.junit.Assert.assertEquals(Assert.java:645)
>       at org.junit.Assert.assertEquals(Assert.java:631)
>       at 
> kafka.api.BaseConsumerTest.awaitCommitCallback(BaseConsumerTest.scala:340)
>       at 
> kafka.api.BaseConsumerTest.testSimpleConsumption(BaseConsumerTest.scala:85)
> {quote}
> I have recreated this with some additional trace. The tests run with a very 
> small metadata expiry interval, triggering metadata updates quite often. If a 
> metadata request immediately following a {{commitAsync()}} call creates a new 
> SSL/SASL connection, {{ConsumerNetworkClient.poll}} returns to process the 
> connection handshake packets. Since {{ConsumerNetworkClient.poll}} discards 
> all unsent packets before returning from poll, this can result in the failure 
> of the commit - the callback is invoked with {{SendFailedException}}.
> I understand that {{ConsumerNetworkClient.poll()}} discards unsent packets 
> rather than buffer them to keep the code simple. And perhaps it is ok to fail 
> {{commitAsync}} occasionally since the callback does indicate that the caller 
> should retry. But it feels like an unnecessary limitation that requires error 
> handling in client applications when there are no real failures and makes it 
> much harder to test reliably. As special handling to fix issues like 
> KAFKA-3412, KAFKA-2672 adds more complexity to the code anyway, and because 
> it is much harder to debug failures that affect only SSL/SASL, it may be 
> worth considering improving this behaviour.
> I will see if I can submit a PR for the specific issue I was seeing with the 
> impact of handshakes on {{commitAsync()}}, but I will be interested in views 
> on improving the logic in {{ConsumerNetworkClient}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to