Grant Henke created KAFKA-4157: ---------------------------------- Summary: Transient system test failure in replica_verification_test.test_replica_lags Key: KAFKA-4157 URL: https://issues.apache.org/jira/browse/KAFKA-4157 Project: Kafka Issue Type: Bug Components: system tests Affects Versions: 0.10.0.0 Reporter: Grant Henke Assignee: Grant Henke
The replica_verification_test.test_replica_lags test runs a background thread via replica_verification_tool that populates a dict with max lag for each "topic,partition" key. Because populating that map is in a separate thread, there is a race condition on populating the key and querying it via replica_verification_tool.get_lag_for_partition. This results in a key error like below: {noformat} Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ducktape/tests/runner.py", line 106, in run_all_tests data = self.run_single_test() File "/usr/lib/python2.7/site-packages/ducktape/tests/runner.py", line 162, in run_single_test return self.current_test_context.function(self.current_test) File "/root/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", line 82, in test_replica_lags err_msg="Timed out waiting to reach zero replica lags.") File "/usr/lib/python2.7/site-packages/ducktape/utils/util.py", line 31, in wait_until if condition(): File "/root/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", line 81, in <lambda> wait_until(lambda: self.replica_verifier.get_lag_for_partition(TOPIC, 0) == 0, timeout_sec=10, File "/root/kafka/tests/kafkatest/services/replica_verification_tool.py", line 66, in get_lag_for_partition lag = self.partition_lag[topic_partition] KeyError: 'topic-replica-verification,0' {noformat} Instead of an error, None should be returned when no key is found. -- This message was sent by Atlassian JIRA (v6.3.4#6332)