[ https://issues.apache.org/jira/browse/KAFKA-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126753#comment-16126753 ]
Dong Lin commented on KAFKA-5663: --------------------------------- [~cmccabe] I just looked into the issue. It seems that the error is not related the to log directory failure handling. The test failed at line 119 of log_dir_failure.py, which is before the test tries to log directory unavailable. The test failed because consumer failed to start. According to ConsoleConsumer-0-139748592136016/worker7/console_consumer.log, the consumer failed to start due to "kafka.common.InvalidConfigException: Wrong value earliest of auto.offset.reset in ConsumerConfig; Valid values are smallest and largest". I looked into the python code and the log to understand why "auto.offset.reset" is configured to be "earliest". However, the code suggests that this should not happen. This error should consistently cause the test to fail. I tried to verify this but https://jenkins.confluent.io/job/system-test-kafka-branch-builder is not working. I tried to test this locally but for some reason vagrant fails to work... I will try again tomorrow. Can you tell me how to find out the git hash in the log you provided? Also, does this test fail consistently on your side? Thanks, > LogDirFailureTest system test fails > ----------------------------------- > > Key: KAFKA-5663 > URL: https://issues.apache.org/jira/browse/KAFKA-5663 > Project: Kafka > Issue Type: Bug > Reporter: Apurva Mehta > Assignee: Dong Lin > Fix For: 1.0.0 > > > The recently added JBOD system test failed last night. > {noformat} > Producer failed to produce messages for 20s. > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 123, in run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py", > line 176, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py", > line 321, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/core/log_dir_failure_test.py", > line 166, in test_replication_with_disk_failure > self.start_producer_and_consumer() > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 75, in start_producer_and_consumer > self.producer_start_timeout_sec) > File > "/home/jenkins/workspace/system-test-kafka-trunk/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py", > line 36, in wait_until > raise TimeoutError(err_msg) > TimeoutError: Producer failed to produce messages for 20s. > {noformat} > Complete logs here: > http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-07-26--001.1501074756--apache--trunk--91c207c/LogDirFailureTest/test_replication_with_disk_failure/bounce_broker=False.security_protocol=PLAINTEXT.broker_type=follower/48.tgz -- This message was sent by Atlassian JIRA (v6.4.14#64029)