[ https://issues.apache.org/jira/browse/ZOOKEEPER-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891281#comment-13891281 ]
Patrick Hunt commented on ZOOKEEPER-1573: ----------------------------------------- This patch looks good to me. +1, I'll wait to commit until I hear from [~fpj] fwiw I tested this manually with a failing repo I had available. Without this patch the server failed to start, with this patch applied the server started successfully and the data looked correct on the server thereafter. (tested with 3.4 branch) > Unable to load database due to missing parent node > -------------------------------------------------- > > Key: ZOOKEEPER-1573 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1573 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.3, 3.5.0 > Reporter: Thawan Kooburat > Assignee: Vinay > Priority: Critical > Fix For: 3.4.6, 3.5.0 > > Attachments: ZOOKEEPER-1573-3.4.patch, ZOOKEEPER-1573.patch, > ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch, ZOOKEEPER-1573.patch > > > While replaying txnlog on data tree, the server has a code to detect missing > parent node. This code block was last modified as part of ZOOKEEPER-1333. In > our production, we found a case where this check is return false positive. > The sequence of txns is as follows: > zxid 1: create /prefix/a > zxid 2: create /prefix/a/b > zxid 3: delete /prefix/a/b > zxid 4: delete /prefix/a > The server start capturing snapshot at zxid 1. However, by the time it > traversing the data tree down to /prefix, txn 4 is already applied and > /prefix have no children. > When the server restore from snapshot, it process txnlog starting from zxid > 2. This txn generate missing parent error and the server refuse to start up. > The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know > if we have any option beside removing this check to solve this issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)