[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975101#comment-16975101
 ] 

jfc commented on ZOOKEEPER-2553:
--------------------------------

We have the same problem with zookeeper 3.4.13-12, is there a fix ?

ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2019-10-25 09:53:07,396 [myid:] - INFO  [main:QuorumPeerConfig@136] - Reading 
configuration from: /conf/zoo.cfg
2019-10-25 09:53:07,429 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - 
Resolved hostname: zookeeper03 to address: zookeeper03/10.0.0.141
2019-10-25 09:53:07,430 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - 
Resolved hostname: zookeeper02 to address: zookeeper02/10.0.0.153
2019-10-25 09:53:07,431 [myid:] - INFO  [main:QuorumPeer$QuorumServer@184] - 
Resolved hostname: 0.0.0.0 to address: /0.0.0.0
2019-10-25 09:53:07,432 [myid:] - INFO  [main:QuorumPeerConfig@398] - 
Defaulting to majority quorums
2019-10-25 09:53:07,437 [myid:1] - INFO  [main:DatadirCleanupManager@78] - 
autopurge.snapRetainCount set to 3
2019-10-25 09:53:07,437 [myid:1] - INFO  [main:DatadirCleanupManager@79] - 
autopurge.purgeInterval set to 0
2019-10-25 09:53:07,437 [myid:1] - INFO  [main:DatadirCleanupManager@101] - 
Purge task is not scheduled.
2019-10-25 09:53:07,460 [myid:1] - INFO  [main:QuorumPeerMain@130] - Starting 
quorum peer
2019-10-25 09:53:07,482 [myid:1] - INFO  [main:ServerCnxnFactory@117] - Using 
org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2019-10-25 09:53:07,498 [myid:1] - INFO  [main:NIOServerCnxnFactory@89] - 
binding to port 0.0.0.0/0.0.0.0:2181
2019-10-25 09:53:07,515 [myid:1] - INFO  [main:QuorumPeer@1158] - tickTime set 
to 2000
2019-10-25 09:53:07,515 [myid:1] - INFO  [main:QuorumPeer@1204] - initLimit set 
to 5
2019-10-25 09:53:07,515 [myid:1] - INFO  [main:QuorumPeer@1178] - 
minSessionTimeout set to -1
2019-10-25 09:53:07,516 [myid:1] - INFO  [main:QuorumPeer@1189] - 
maxSessionTimeout set to -1
2019-10-25 09:53:07,527 [myid:1] - INFO  [main:QuorumPeer@1467] - QuorumPeer 
communication is not secured!
2019-10-25 09:53:07,528 [myid:1] - INFO  [main:QuorumPeer@1496] - 
quorum.cnxn.threads.size set to 20
2019-10-25 09:53:07,530 [myid:1] - INFO  [main:FileSnap@86] - Reading snapshot 
/data/version-2/snapshot.100000000
2019-10-25 09:53:07,657 [myid:1] - ERROR [main:Util@214] - Last transaction was 
partial.
2019-10-25 09:53:07,692 [myid:1] - ERROR [main:QuorumPeer@692] - Unable to load 
database on disk
java.io.IOException: Unreasonable length = 1357383642
        at 
org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127)
        at 
org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:92)
        at 
org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:208)
        at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:629)
        at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:219)
        at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:176)
        at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:217)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:645)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
2019-10-25 09:53:07,698 [myid:1] - ERROR [main:QuorumPeerMain@92] - Unexpected 
exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
        at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
Caused by: java.io.IOException: Unreasonable length = 1357383642
        at 
org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127)
        at 
org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:92)
        at 
org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:208)
        at 
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:629)
        at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:219)
        at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:176)
        at 
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:217)
        at 
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:645)
        ... 4 more

> ZooKeeper cluster unavailable due to corrupted log file during power failures 
> -- java.io.IOException: Unreasonable length
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2553
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2553
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.8
>         Environment: Normal ZooKeeper cluster with 3 nodes running Linux
>            Reporter: Ramnatthan Alagappan
>            Priority: Major
>
> I am running a three node ZooKeeper cluster. 
> When a new log file is created by ZooKeeper, I see the following sequence of 
> system calls:
> 1. creat(new_log)
> 2. write(new_log, count=16) // This is a log header I believe/
> 3. truncate(new_log, from 16 bytes to 16 KBytes) // I have configured the log 
> size to be 16K. 
> When the above sequence of operations complete, it is reasonable to expect 
> the newly created log file to contain the header(16 bytes) and then filled 
> with zeros till the end of the log.
> But when a crash occurs (due to a power failure), while the truncate system 
> call is in progress, it is possible for the log to contain garbage data when 
> the system restarts from the crash. Note that if the crash occurs just after 
> the truncate system call completes, then there is no problem. Basically, the 
> truncate needs to be atomically persisted for ZooKeeper to recover from 
> crashes correctly  or (more realistically) the recovery code needs to deal 
> with the case of expecting garbage in a newly created log. 
> As mentioned, if a crash occurs during the truncate system call, then 
> ZooKeeper will fail to start with the following exception. Here is the stack 
> trace:
> java.io.IOException: Unreasonable length = -295704495
>         at 
> org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127)
>         at 
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:92)
>         at 
> org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:652)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:552)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:527)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:354)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132)
>         at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [myid:1] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting 
> abnormally
> java.lang.RuntimeException: Unable to run quorum server
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:558)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:500)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> Caused by: java.io.IOException: Unreasonable length = -295704495
>         at 
> org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:127)
>         at 
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:92)
>         at 
> org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:233)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:652)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:552)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:527)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:354)
>         at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132)
>         at 
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:510)
>         ... 4 more
> Next, it is possible for two nodes of a 3-node  ZooKeeper cluster to reach 
> the same state. In that case, they both will fail to startup, rendering the 
> entire cluster unavailable. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to