[ 
https://issues.apache.org/jira/browse/KAFKA-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229269#comment-15229269
 ] 

Jason Gustafson commented on KAFKA-3513:
----------------------------------------

[~ewencp] Took a look at this, but no luck reproducing. As you observed, the 
consumers join the group successfully and apparently well before the timeout 
expectation. That suggests that there may have been a delay before one of the 
rebalance events was propagated to the test driver. Unfortunately, we don't 
currently have enough information to confirm whether that happened. So maybe we 
can make a few improvements to make future debugging easier:

1. Persist the output from stdout which contains the verifiable consumer events 
(this is safer for the verifiable consumer since we only log the start and end 
offsets of message batches).
2. Add a timestamp to the verifiable consumer events.
3. Add a log message in the service when the event is received.

That should give us enough information to see whether and where delays are 
occurring.

> Transient failure of OffsetValidationTest
> -----------------------------------------
>
>                 Key: KAFKA-3513
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3513
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer, system tests
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Jason Gustafson
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-04-05--001.1459840046--apache--trunk--31e263e/report.html
> The version of the test fails in this case is:
> Module: kafkatest.tests.client.consumer_test
> Class:  OffsetValidationTest
> Method: test_broker_failure
> Arguments:
> {
>   "clean_shutdown": true,
>   "enable_autocommit": false
> }
> and others passed. It's unclear if the parameters actually have any impact on 
> the failure.
> I did some initial triage and it looks like the test code isn't seeing all 
> the group members join the group (receive partition assignments), but it 
> appears from the logs that they all did. This could indicate a simple timing 
> issue, but I haven't been able to verify that yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to