[
https://issues.apache.org/jira/browse/KAFKA-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905983#comment-15905983
]
Apurva Mehta commented on KAFKA-4574:
-------------------------------------
Here is everything from the state change logs.
{noformat}
amehta-macbook-pro:KafkaService-0-140193561885648 apurva$ for i in `find .
-name state-change.log`; do grep -Hni "test_topic,2" $i; done
./worker2/debug/state-change.log:81:[2017-03-09 05:20:37,788] WARN Broker 2
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 2
for partition [test_topic,2] since its associated leader epoch 1 is not higher
than the current leader epoch 1 (state.change.logger)
./worker2/debug/state-change.log:135:[2017-03-09 05:20:53,677] WARN Broker 2
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 4
for partition [test_topic,2] since its associated leader epoch 3 is not higher
than the current leader epoch 3 (state.change.logger)
./worker2/debug/state-change.log:206:[2017-03-09 05:21:05,555] WARN Broker 2
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 5
for partition [test_topic,2] since its associated leader epoch 5 is not higher
than the current leader epoch 5 (state.change.logger)
./worker2/debug/state-change.log:927:[2017-03-09 05:21:20,303] WARN Broker 2
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 7
for partition [test_topic,2] since its associated leader epoch 7 is not higher
than the current leader epoch 7 (state.change.logger)
./worker2/info/state-change.log:81:[2017-03-09 05:20:37,788] WARN Broker 2
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 2
for partition [test_topic,2] since its associated leader epoch 1 is not higher
than the current leader epoch 1 (state.change.logger)
./worker2/info/state-change.log:135:[2017-03-09 05:20:53,677] WARN Broker 2
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 4
for partition [test_topic,2] since its associated leader epoch 3 is not higher
than the current leader epoch 3 (state.change.logger)
./worker2/info/state-change.log:206:[2017-03-09 05:21:05,555] WARN Broker 2
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 5
for partition [test_topic,2] since its associated leader epoch 5 is not higher
than the current leader epoch 5 (state.change.logger)
./worker2/info/state-change.log:927:[2017-03-09 05:21:20,303] WARN Broker 2
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 7
for partition [test_topic,2] since its associated leader epoch 7 is not higher
than the current leader epoch 7 (state.change.logger)
./worker6/debug/state-change.log:72:[2017-03-09 05:20:37,759] WARN Broker 3
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 2
for partition [test_topic,2] since its associated leader epoch 1 is not higher
than the current leader epoch 1 (state.change.logger)
./worker6/debug/state-change.log:152:[2017-03-09 05:20:46,152] WARN Broker 3
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 3
for partition [test_topic,2] since its associated leader epoch 2 is not higher
than the current leader epoch 2 (state.change.logger)
./worker6/debug/state-change.log:404:[2017-03-09 05:20:51,253] ERROR Controller
3 epoch 3 encountered error while electing leader for partition [test_topic,2]
due to: Preferred replica 2 for partition [test_topic,2] is either not alive or
not in the isr. Current leader and ISR:
[{"leader":1,"leader_epoch":3,"isr":[1]}]. (state.change.logger)
./worker6/debug/state-change.log:405:[2017-03-09 05:20:51,253] ERROR Controller
3 epoch 3 initiated state change for partition [test_topic,2] from
OnlinePartition to OnlinePartition failed (state.change.logger)
./worker6/debug/state-change.log:406:kafka.common.StateChangeFailedException:
encountered error while electing leader for partition [test_topic,2] due to:
Preferred replica 2 for partition [test_topic,2] is either not alive or not in
the isr. Current leader and ISR: [{"leader":1,"leader_epoch":3,"isr":[1]}].
./worker6/debug/state-change.log:439:Caused by:
kafka.common.StateChangeFailedException: Preferred replica 2 for partition
[test_topic,2] is either not alive or not in the isr. Current leader and ISR:
[{"leader":1,"leader_epoch":3,"isr":[1]}]
./worker6/debug/state-change.log:925:[2017-03-09 05:21:05,541] WARN Broker 3
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 5
for partition [test_topic,2] since its associated leader epoch 5 is not higher
than the current leader epoch 5 (state.change.logger)
./worker6/debug/state-change.log:1005:[2017-03-09 05:21:13,306] WARN Broker 3
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 6
for partition [test_topic,2] since its associated leader epoch 6 is not higher
than the current leader epoch 6 (state.change.logger)
./worker6/debug/state-change.log:1257:[2017-03-09 05:21:18,342] ERROR
Controller 3 epoch 6 encountered error while electing leader for partition
[test_topic,2] due to: Preferred replica 2 for partition [test_topic,2] is
either not alive or not in the isr. Current leader and ISR:
[{"leader":1,"leader_epoch":7,"isr":[1]}]. (state.change.logger)
./worker6/debug/state-change.log:1258:[2017-03-09 05:21:18,342] ERROR
Controller 3 epoch 6 initiated state change for partition [test_topic,2] from
OnlinePartition to OnlinePartition failed (state.change.logger)
./worker6/debug/state-change.log:1259:kafka.common.StateChangeFailedException:
encountered error while electing leader for partition [test_topic,2] due to:
Preferred replica 2 for partition [test_topic,2] is either not alive or not in
the isr. Current leader and ISR: [{"leader":1,"leader_epoch":7,"isr":[1]}].
./worker6/debug/state-change.log:1292:Caused by:
kafka.common.StateChangeFailedException: Preferred replica 2 for partition
[test_topic,2] is either not alive or not in the isr. Current leader and ISR:
[{"leader":1,"leader_epoch":7,"isr":[1]}]
./worker6/info/state-change.log:72:[2017-03-09 05:20:37,759] WARN Broker 3
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 2
for partition [test_topic,2] since its associated leader epoch 1 is not higher
than the current leader epoch 1 (state.change.logger)
./worker6/info/state-change.log:152:[2017-03-09 05:20:46,152] WARN Broker 3
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 3
for partition [test_topic,2] since its associated leader epoch 2 is not higher
than the current leader epoch 2 (state.change.logger)
./worker6/info/state-change.log:404:[2017-03-09 05:20:51,253] ERROR Controller
3 epoch 3 encountered error while electing leader for partition [test_topic,2]
due to: Preferred replica 2 for partition [test_topic,2] is either not alive or
not in the isr. Current leader and ISR:
[{"leader":1,"leader_epoch":3,"isr":[1]}]. (state.change.logger)
./worker6/info/state-change.log:405:[2017-03-09 05:20:51,253] ERROR Controller
3 epoch 3 initiated state change for partition [test_topic,2] from
OnlinePartition to OnlinePartition failed (state.change.logger)
./worker6/info/state-change.log:406:kafka.common.StateChangeFailedException:
encountered error while electing leader for partition [test_topic,2] due to:
Preferred replica 2 for partition [test_topic,2] is either not alive or not in
the isr. Current leader and ISR: [{"leader":1,"leader_epoch":3,"isr":[1]}].
./worker6/info/state-change.log:439:Caused by:
kafka.common.StateChangeFailedException: Preferred replica 2 for partition
[test_topic,2] is either not alive or not in the isr. Current leader and ISR:
[{"leader":1,"leader_epoch":3,"isr":[1]}]
./worker6/info/state-change.log:925:[2017-03-09 05:21:05,541] WARN Broker 3
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 5
for partition [test_topic,2] since its associated leader epoch 5 is not higher
than the current leader epoch 5 (state.change.logger)
./worker6/info/state-change.log:1005:[2017-03-09 05:21:13,306] WARN Broker 3
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 6
for partition [test_topic,2] since its associated leader epoch 6 is not higher
than the current leader epoch 6 (state.change.logger)
./worker6/info/state-change.log:1257:[2017-03-09 05:21:18,342] ERROR Controller
3 epoch 6 encountered error while electing leader for partition [test_topic,2]
due to: Preferred replica 2 for partition [test_topic,2] is either not alive or
not in the isr. Current leader and ISR:
[{"leader":1,"leader_epoch":7,"isr":[1]}]. (state.change.logger)
./worker6/info/state-change.log:1258:[2017-03-09 05:21:18,342] ERROR Controller
3 epoch 6 initiated state change for partition [test_topic,2] from
OnlinePartition to OnlinePartition failed (state.change.logger)
./worker6/info/state-change.log:1259:kafka.common.StateChangeFailedException:
encountered error while electing leader for partition [test_topic,2] due to:
Preferred replica 2 for partition [test_topic,2] is either not alive or not in
the isr. Current leader and ISR: [{"leader":1,"leader_epoch":7,"isr":[1]}].
./worker6/info/state-change.log:1292:Caused by:
kafka.common.StateChangeFailedException: Preferred replica 2 for partition
[test_topic,2] is either not alive or not in the isr. Current leader and ISR:
[{"leader":1,"leader_epoch":7,"isr":[1]}]
./worker8/debug/state-change.log:63:[2017-03-09 05:20:46,125] WARN Broker 1
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 3
for partition [test_topic,2] since its associated leader epoch 2 is not higher
than the current leader epoch 2 (state.change.logger)
./worker8/debug/state-change.log:117:[2017-03-09 05:20:53,689] WARN Broker 1
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 4
for partition [test_topic,2] since its associated leader epoch 3 is not higher
than the current leader epoch 3 (state.change.logger)
./worker8/debug/state-change.log:197:[2017-03-09 05:21:13,291] WARN Broker 1
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 6
for partition [test_topic,2] since its associated leader epoch 6 is not higher
than the current leader epoch 6 (state.change.logger)
./worker8/debug/state-change.log:255:[2017-03-09 05:21:20,314] WARN Broker 1
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 7
for partition [test_topic,2] since its associated leader epoch 7 is not higher
than the current leader epoch 7 (state.change.logger)
./worker8/info/state-change.log:63:[2017-03-09 05:20:46,125] WARN Broker 1
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 3
for partition [test_topic,2] since its associated leader epoch 2 is not higher
than the current leader epoch 2 (state.change.logger)
./worker8/info/state-change.log:117:[2017-03-09 05:20:53,689] WARN Broker 1
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 4
for partition [test_topic,2] since its associated leader epoch 3 is not higher
than the current leader epoch 3 (state.change.logger)
./worker8/info/state-change.log:197:[2017-03-09 05:21:13,291] WARN Broker 1
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 6
for partition [test_topic,2] since its associated leader epoch 6 is not higher
than the current leader epoch 6 (state.change.logger)
./worker8/info/state-change.log:255:[2017-03-09 05:21:20,314] WARN Broker 1
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 7
for partition [test_topic,2] since its associated leader epoch 7 is not higher
than the current leader epoch 7 (state.change.logger)
{noformat}
> Transient failure in ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade
> with security_protocol = SASL_PLAINTEXT, SSL
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-4574
> URL: https://issues.apache.org/jira/browse/KAFKA-4574
> Project: Kafka
> Issue Type: Test
> Components: system tests
> Reporter: Shikhar Bhushan
> Assignee: Apurva Mehta
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html
> {{ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade}} failed with these
> {{security_protocol}} parameters
> {noformat}
> ====================================================================================================
> test_id:
> kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=SASL_PLAINTEXT
> status: FAIL
> run time: 3 minutes 44.094 seconds
> 1 acked message did not make it to the Consumer. They are: [5076]. We
> validated that the first 1 of these missing messages correctly made it into
> Kafka's data files. This suggests they were lost on their way to the consumer.
> Traceback (most recent call last):
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
> line 123, in run
> data = self.run_test()
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
> line 176, in run_test
> return self.test_context.function(self.test)
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
> line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/zookeeper_security_upgrade_test.py",
> line 117, in test_zk_security_upgrade
> self.run_produce_consume_validate(self.run_zk_migration)
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
> line 101, in run_produce_consume_validate
> self.validate()
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
> line 163, in validate
> assert success, msg
> AssertionError: 1 acked message did not make it to the Consumer. They are:
> [5076]. We validated that the first 1 of these missing messages correctly
> made it into Kafka's data files. This suggests they were lost on their way to
> the consumer.
> {noformat}
> {noformat}
> ====================================================================================================
> test_id:
> kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=SSL
> status: FAIL
> run time: 3 minutes 50.578 seconds
> 1 acked message did not make it to the Consumer. They are: [3559]. We
> validated that the first 1 of these missing messages correctly made it into
> Kafka's data files. This suggests they were lost on their way to the consumer.
> Traceback (most recent call last):
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
> line 123, in run
> data = self.run_test()
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
> line 176, in run_test
> return self.test_context.function(self.test)
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
> line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/zookeeper_security_upgrade_test.py",
> line 117, in test_zk_security_upgrade
> self.run_produce_consume_validate(self.run_zk_migration)
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
> line 101, in run_produce_consume_validate
> self.validate()
> File
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
> line 163, in validate
> assert success, msg
> AssertionError: 1 acked message did not make it to the Consumer. They are:
> [3559]. We validated that the first 1 of these missing messages correctly
> made it into Kafka's data files. This suggests they were lost on their way to
> the consumer.
> {noformat}
> Previously: KAFKA-3985
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)