[ 
https://issues.apache.org/jira/browse/KAFKA-10041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116761#comment-17116761
 ] 

Ismael Juma edited comment on KAFKA-10041 at 5/26/20, 1:54 PM:
---------------------------------------------------------------

The upgrade notes for Kafka mention this issue and the workaround:
{quote}ZooKeeper has been upgraded to 3.5.7, and a ZooKeeper upgrade from 3.4.X 
to 3.5.7 can fail if there are no snapshot files in the 3.4 data directory. 
This usually happens in test upgrades where ZooKeeper 3.5.7 is trying to load 
an existing 3.4 data dir in which no snapshot file has been created. For more 
details about the issue please refer to ZOOKEEPER-3056. A fix is given in 
ZOOKEEPER-3056, which is to set snapshot.trust.empty=true config in 
zookeeper.properties before the upgrade.
{quote}
 


was (Author: ijuma):
The upgrade notes for Kafka mention this issue and the workaround:
{noformat}
ZooKeeper has been upgraded to 3.5.7, and a ZooKeeper upgrade from 3.4.X to 
3.5.7 can fail if there are no snapshot files in the 3.4 data directory. This 
usually happens in test upgrades where ZooKeeper 3.5.7 is trying to load an 
existing 3.4 data dir in which no snapshot file has been created. For more 
details about the issue please refer to ZOOKEEPER-3056. A fix is given in 
ZOOKEEPER-3056, which is to set snapshot.trust.empty=true config in 
zookeeper.properties before the upgrade.
{noformat}
 

> Kafka upgrade fails from 1.1 to 2.4/2.5/trunk fails due to failure in 
> ZooKeeper
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-10041
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10041
>             Project: Kafka
>          Issue Type: Bug
>          Components: zkclient
>    Affects Versions: 2.4.0, 2.5.0, 2.6.0
>            Reporter: Zhuqi Jin
>            Priority: Major
>
> When we tested upgrading Kafka from 1.1 to 2.4/2.5, the upgraded node failed 
> to start due to a known zookeeper failure - ZOOKEEPER-3056.
> The error message is shown below:
>  
> {code:java}
> [2020-05-24 23:45:17,638] ERROR Unexpected exception, exiting abnormally 
> (org.apache.zookeeper.server.ZooKeeperServerMain)
> java.io.IOException: No snapshot found, but there are log entries. Something 
> is broken!
>  at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:240)
>  at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>  at 
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>  at 
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>  at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>  at 
> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>  at 
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>  at 
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>  at 
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> {code}
>  
> {code:java}
> [2020-05-24 23:45:25,142] ERROR Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for 
> connection while in state: CONNECTING
>  at 
> kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:259)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
>  at 
> kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:255)
>  at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:113)
>  at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1858)
>  at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:375)
>  at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:399)
>  at kafka.server.KafkaServer.startup(KafkaServer.scala:207)
>  at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
>  at kafka.Kafka$.main(Kafka.scala:84)
>  at kafka.Kafka.main(Kafka.scala){code}
> It can be reproduced through the following steps:
> 1. Start a single-node kafka 1.1. 
> 2. Create a topic and use kafka-producer-perf-test.sh to produce several 
> message.
> {code:java}
> bin/kafka-topics.sh --create --bootstrap-server localhost:9092 
> --replication-factor 1 --partitions 1 --topic test 
> bin/kafka-producer-perf-test.sh --topic test --num-records 500 --record-size 
> 300 --throughput 100 --producer-props bootstrap.servers=localhost:9092{code}
> 3. Upgrade the node to 2.4/2.5 with the same configuration. The new version 
> node failed to start because of the zookeeper.
> Kafka 1.1 is using dependant-libs-2.11.12/zookeeper-3.4.10.jar, and Kafka 
> 2.4/2.5/trunk(5302efb2d1b7a69bcd3173a13b2d08a2666979ed) are using 
> zookeeper-3.5.8.jar
> The bug is fixed in zookeeper-3.6.0, should we upgrade the dependency of 
> Kafka 2.4/2.5/trunk to use zookeeper-3.6.0.jar?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to