[jira] [Commented] (KAFKA-4574) Transient failure in ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade with security_protocol = SASL_PLAINTEXT, SSL

Apurva Mehta (JIRA) Fri, 10 Mar 2017 17:31:11 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905983#comment-15905983
 ]


Apurva Mehta commented on KAFKA-4574:
-------------------------------------

Here is everything from the state change logs.

{noformat}
amehta-macbook-pro:KafkaService-0-140193561885648 apurva$ for i in `find . 
-name state-change.log`; do grep -Hni "test_topic,2" $i; done
./worker2/debug/state-change.log:81:[2017-03-09 05:20:37,788] WARN Broker 2 
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 2 
for partition [test_topic,2] since its associated leader epoch 1 is not higher 
than the current leader epoch 1 (state.change.logger)
./worker2/debug/state-change.log:135:[2017-03-09 05:20:53,677] WARN Broker 2 
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 4 
for partition [test_topic,2] since its associated leader epoch 3 is not higher 
than the current leader epoch 3 (state.change.logger)
./worker2/debug/state-change.log:206:[2017-03-09 05:21:05,555] WARN Broker 2 
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 5 
for partition [test_topic,2] since its associated leader epoch 5 is not higher 
than the current leader epoch 5 (state.change.logger)
./worker2/debug/state-change.log:927:[2017-03-09 05:21:20,303] WARN Broker 2 
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 7 
for partition [test_topic,2] since its associated leader epoch 7 is not higher 
than the current leader epoch 7 (state.change.logger)
./worker2/info/state-change.log:81:[2017-03-09 05:20:37,788] WARN Broker 2 
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 2 
for partition [test_topic,2] since its associated leader epoch 1 is not higher 
than the current leader epoch 1 (state.change.logger)
./worker2/info/state-change.log:135:[2017-03-09 05:20:53,677] WARN Broker 2 
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 4 
for partition [test_topic,2] since its associated leader epoch 3 is not higher 
than the current leader epoch 3 (state.change.logger)
./worker2/info/state-change.log:206:[2017-03-09 05:21:05,555] WARN Broker 2 
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 5 
for partition [test_topic,2] since its associated leader epoch 5 is not higher 
than the current leader epoch 5 (state.change.logger)
./worker2/info/state-change.log:927:[2017-03-09 05:21:20,303] WARN Broker 2 
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 7 
for partition [test_topic,2] since its associated leader epoch 7 is not higher 
than the current leader epoch 7 (state.change.logger)
./worker6/debug/state-change.log:72:[2017-03-09 05:20:37,759] WARN Broker 3 
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 2 
for partition [test_topic,2] since its associated leader epoch 1 is not higher 
than the current leader epoch 1 (state.change.logger)
./worker6/debug/state-change.log:152:[2017-03-09 05:20:46,152] WARN Broker 3 
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 3 
for partition [test_topic,2] since its associated leader epoch 2 is not higher 
than the current leader epoch 2 (state.change.logger)
./worker6/debug/state-change.log:404:[2017-03-09 05:20:51,253] ERROR Controller 
3 epoch 3 encountered error while electing leader for partition [test_topic,2] 
due to: Preferred replica 2 for partition [test_topic,2] is either not alive or 
not in the isr. Current leader and ISR: 
[{"leader":1,"leader_epoch":3,"isr":[1]}]. (state.change.logger)
./worker6/debug/state-change.log:405:[2017-03-09 05:20:51,253] ERROR Controller 
3 epoch 3 initiated state change for partition [test_topic,2] from 
OnlinePartition to OnlinePartition failed (state.change.logger)
./worker6/debug/state-change.log:406:kafka.common.StateChangeFailedException: 
encountered error while electing leader for partition [test_topic,2] due to: 
Preferred replica 2 for partition [test_topic,2] is either not alive or not in 
the isr. Current leader and ISR: [{"leader":1,"leader_epoch":3,"isr":[1]}].
./worker6/debug/state-change.log:439:Caused by: 
kafka.common.StateChangeFailedException: Preferred replica 2 for partition 
[test_topic,2] is either not alive or not in the isr. Current leader and ISR: 
[{"leader":1,"leader_epoch":3,"isr":[1]}]
./worker6/debug/state-change.log:925:[2017-03-09 05:21:05,541] WARN Broker 3 
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 5 
for partition [test_topic,2] since its associated leader epoch 5 is not higher 
than the current leader epoch 5 (state.change.logger)
./worker6/debug/state-change.log:1005:[2017-03-09 05:21:13,306] WARN Broker 3 
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 6 
for partition [test_topic,2] since its associated leader epoch 6 is not higher 
than the current leader epoch 6 (state.change.logger)
./worker6/debug/state-change.log:1257:[2017-03-09 05:21:18,342] ERROR 
Controller 3 epoch 6 encountered error while electing leader for partition 
[test_topic,2] due to: Preferred replica 2 for partition [test_topic,2] is 
either not alive or not in the isr. Current leader and ISR: 
[{"leader":1,"leader_epoch":7,"isr":[1]}]. (state.change.logger)
./worker6/debug/state-change.log:1258:[2017-03-09 05:21:18,342] ERROR 
Controller 3 epoch 6 initiated state change for partition [test_topic,2] from 
OnlinePartition to OnlinePartition failed (state.change.logger)
./worker6/debug/state-change.log:1259:kafka.common.StateChangeFailedException: 
encountered error while electing leader for partition [test_topic,2] due to: 
Preferred replica 2 for partition [test_topic,2] is either not alive or not in 
the isr. Current leader and ISR: [{"leader":1,"leader_epoch":7,"isr":[1]}].
./worker6/debug/state-change.log:1292:Caused by: 
kafka.common.StateChangeFailedException: Preferred replica 2 for partition 
[test_topic,2] is either not alive or not in the isr. Current leader and ISR: 
[{"leader":1,"leader_epoch":7,"isr":[1]}]
./worker6/info/state-change.log:72:[2017-03-09 05:20:37,759] WARN Broker 3 
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 2 
for partition [test_topic,2] since its associated leader epoch 1 is not higher 
than the current leader epoch 1 (state.change.logger)
./worker6/info/state-change.log:152:[2017-03-09 05:20:46,152] WARN Broker 3 
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 3 
for partition [test_topic,2] since its associated leader epoch 2 is not higher 
than the current leader epoch 2 (state.change.logger)
./worker6/info/state-change.log:404:[2017-03-09 05:20:51,253] ERROR Controller 
3 epoch 3 encountered error while electing leader for partition [test_topic,2] 
due to: Preferred replica 2 for partition [test_topic,2] is either not alive or 
not in the isr. Current leader and ISR: 
[{"leader":1,"leader_epoch":3,"isr":[1]}]. (state.change.logger)
./worker6/info/state-change.log:405:[2017-03-09 05:20:51,253] ERROR Controller 
3 epoch 3 initiated state change for partition [test_topic,2] from 
OnlinePartition to OnlinePartition failed (state.change.logger)
./worker6/info/state-change.log:406:kafka.common.StateChangeFailedException: 
encountered error while electing leader for partition [test_topic,2] due to: 
Preferred replica 2 for partition [test_topic,2] is either not alive or not in 
the isr. Current leader and ISR: [{"leader":1,"leader_epoch":3,"isr":[1]}].
./worker6/info/state-change.log:439:Caused by: 
kafka.common.StateChangeFailedException: Preferred replica 2 for partition 
[test_topic,2] is either not alive or not in the isr. Current leader and ISR: 
[{"leader":1,"leader_epoch":3,"isr":[1]}]
./worker6/info/state-change.log:925:[2017-03-09 05:21:05,541] WARN Broker 3 
ignoring LeaderAndIsr request from controller 2 with correlation id 1 epoch 5 
for partition [test_topic,2] since its associated leader epoch 5 is not higher 
than the current leader epoch 5 (state.change.logger)
./worker6/info/state-change.log:1005:[2017-03-09 05:21:13,306] WARN Broker 3 
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 6 
for partition [test_topic,2] since its associated leader epoch 6 is not higher 
than the current leader epoch 6 (state.change.logger)
./worker6/info/state-change.log:1257:[2017-03-09 05:21:18,342] ERROR Controller 
3 epoch 6 encountered error while electing leader for partition [test_topic,2] 
due to: Preferred replica 2 for partition [test_topic,2] is either not alive or 
not in the isr. Current leader and ISR: 
[{"leader":1,"leader_epoch":7,"isr":[1]}]. (state.change.logger)
./worker6/info/state-change.log:1258:[2017-03-09 05:21:18,342] ERROR Controller 
3 epoch 6 initiated state change for partition [test_topic,2] from 
OnlinePartition to OnlinePartition failed (state.change.logger)
./worker6/info/state-change.log:1259:kafka.common.StateChangeFailedException: 
encountered error while electing leader for partition [test_topic,2] due to: 
Preferred replica 2 for partition [test_topic,2] is either not alive or not in 
the isr. Current leader and ISR: [{"leader":1,"leader_epoch":7,"isr":[1]}].
./worker6/info/state-change.log:1292:Caused by: 
kafka.common.StateChangeFailedException: Preferred replica 2 for partition 
[test_topic,2] is either not alive or not in the isr. Current leader and ISR: 
[{"leader":1,"leader_epoch":7,"isr":[1]}]
./worker8/debug/state-change.log:63:[2017-03-09 05:20:46,125] WARN Broker 1 
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 3 
for partition [test_topic,2] since its associated leader epoch 2 is not higher 
than the current leader epoch 2 (state.change.logger)
./worker8/debug/state-change.log:117:[2017-03-09 05:20:53,689] WARN Broker 1 
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 4 
for partition [test_topic,2] since its associated leader epoch 3 is not higher 
than the current leader epoch 3 (state.change.logger)
./worker8/debug/state-change.log:197:[2017-03-09 05:21:13,291] WARN Broker 1 
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 6 
for partition [test_topic,2] since its associated leader epoch 6 is not higher 
than the current leader epoch 6 (state.change.logger)
./worker8/debug/state-change.log:255:[2017-03-09 05:21:20,314] WARN Broker 1 
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 7 
for partition [test_topic,2] since its associated leader epoch 7 is not higher 
than the current leader epoch 7 (state.change.logger)
./worker8/info/state-change.log:63:[2017-03-09 05:20:46,125] WARN Broker 1 
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 3 
for partition [test_topic,2] since its associated leader epoch 2 is not higher 
than the current leader epoch 2 (state.change.logger)
./worker8/info/state-change.log:117:[2017-03-09 05:20:53,689] WARN Broker 1 
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 4 
for partition [test_topic,2] since its associated leader epoch 3 is not higher 
than the current leader epoch 3 (state.change.logger)
./worker8/info/state-change.log:197:[2017-03-09 05:21:13,291] WARN Broker 1 
ignoring LeaderAndIsr request from controller 3 with correlation id 1 epoch 6 
for partition [test_topic,2] since its associated leader epoch 6 is not higher 
than the current leader epoch 6 (state.change.logger)
./worker8/info/state-change.log:255:[2017-03-09 05:21:20,314] WARN Broker 1 
ignoring LeaderAndIsr request from controller 1 with correlation id 1 epoch 7 
for partition [test_topic,2] since its associated leader epoch 7 is not higher 
than the current leader epoch 7 (state.change.logger)
{noformat}

> Transient failure in ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade 
> with security_protocol = SASL_PLAINTEXT, SSL
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4574
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4574
>             Project: Kafka
>          Issue Type: Test
>          Components: system tests
>            Reporter: Shikhar Bhushan
>            Assignee: Apurva Mehta
>
> http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2016-12-29--001.1483003056--apache--trunk--dc55025/report.html
> {{ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade}} failed with these 
> {{security_protocol}} parameters 
> {noformat}
> ====================================================================================================
> test_id:    
> kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=SASL_PLAINTEXT
> status:     FAIL
> run time:   3 minutes 44.094 seconds
>     1 acked message did not make it to the Consumer. They are: [5076]. We 
> validated that the first 1 of these missing messages correctly made it into 
> Kafka's data files. This suggests they were lost on their way to the consumer.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
>     data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
>     return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/zookeeper_security_upgrade_test.py",
>  line 117, in test_zk_security_upgrade
>     self.run_produce_consume_validate(self.run_zk_migration)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 101, in run_produce_consume_validate
>     self.validate()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 163, in validate
>     assert success, msg
> AssertionError: 1 acked message did not make it to the Consumer. They are: 
> [5076]. We validated that the first 1 of these missing messages correctly 
> made it into Kafka's data files. This suggests they were lost on their way to 
> the consumer.
> {noformat}
> {noformat}
> ====================================================================================================
> test_id:    
> kafkatest.tests.core.zookeeper_security_upgrade_test.ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade.security_protocol=SSL
> status:     FAIL
> run time:   3 minutes 50.578 seconds
>     1 acked message did not make it to the Consumer. They are: [3559]. We 
> validated that the first 1 of these missing messages correctly made it into 
> Kafka's data files. This suggests they were lost on their way to the consumer.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
>     data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
>     return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/zookeeper_security_upgrade_test.py",
>  line 117, in test_zk_security_upgrade
>     self.run_produce_consume_validate(self.run_zk_migration)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 101, in run_produce_consume_validate
>     self.validate()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 163, in validate
>     assert success, msg
> AssertionError: 1 acked message did not make it to the Consumer. They are: 
> [3559]. We validated that the first 1 of these missing messages correctly 
> made it into Kafka's data files. This suggests they were lost on their way to 
> the consumer.
> {noformat}
> Previously: KAFKA-3985



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (KAFKA-4574) Transient failure in ZooKeeperSecurityUpgradeTest.test_zk_security_upgrade with security_protocol = SASL_PLAINTEXT, SSL

Reply via email to