I've found a super-easy solution to the problem. We don't lose data, just need to fix the ACL reference, because it gets a new ID in the cache.
Txn log files are continuous and ZK goes back to the previous one to process creating the parent znode, but eventually skips it because it's already in the tree. I added details to the Jira ticket, please take a look. https://issues.apache.org/jira/browse/ZOOKEEPER-4846?focusedCommentId=17924725&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17924725 Andor On Thu, 2025-02-06 at 14:17 -0600, Andor Molnar wrote: > Maybe we could fix the gap somehow...? > Start new txn log strictly on snap zxid + 1. > > In which case we'll have createTxn on parent with Acl information in > txn log with which we can fix the missing Acl reference. > > I need to take a closer look. > > Andor > > > > > > On Thu, 2025-02-06 at 14:13 -0600, Andor Molnar wrote: > > There's a txn gap between snapshot and the next txn log: > > > > -rw-r--r-- 1 andor andor 67108880 Feb 4 13:01 log.674c2 > > -rw-r--r-- 1 andor andor 2630669 Feb 1 11:11 snapshot.674c0 > > > > Txn zxid 674c1 (create of the parent znode) has never been > > persisted, > > so the ACL information is completely lost. The only thing we can do > > is > > to restore consistency by fixing the ACL ref to something which is > > present in the cache. > > > > > The only other solution I can think of is to persist a full > > > representation of the ACLs per every node. > > > > I agree, but it requires DB format change and it's not backward > > compatible. > > > > Andor > > > > > > > > On Thu, 2025-02-06 at 13:36 -0600, Andor Molnar wrote: > > > That's true, but we'll lose that ACL either way, right? > > > > > > I mean before that recent patch mentioned by Damien, the > > > situation > > > was > > > the same: we emitted a warning message about a znode which had a > > > non- > > > existing ACL reference and went on. > > > > > > "Ignoring acl XYZ as it does not exist in the cache" at INFO > > > level. > > > > > > We're talking about an ACL which never gets persisted and now > > > with > > > the > > > patch, ZK is not willing to startup in the situation anymore. > > > > > > Andor > > > > > > > > > > > > > > > On Thu, 2025-02-06 at 11:03 -0800, Chris Nauroth wrote: > > > > In the scenario described in the pull request, it sounds like > > > > we > > > > ultimately > > > > are not serializing the ACL for inode /foo. IIUC, this proposal > > > > would > > > > result in setting /foo to world readable, not the original > > > > intended > > > > ACL > > > > that was dropped from serialization. It would fix the failure, > > > > but > > > > it > > > > could > > > > open a security risk. (Let me know if I misunderstood.) > > > > > > > > The only other solution I can think of is to persist a full > > > > representation > > > > of the ACLs per every node. The ReferenceCountedACLCache would > > > > not > > > > be > > > > serialized, and instead be reconstructed from the deserialized > > > > nodes, > > > > becoming purely a runtime memory optimization. The HDFS > > > > implementation for > > > > ACLs is similar to this. Of course, this would be a much bigger > > > > change. > > > > > > > > Chris Nauroth > > > > > > > > > > > > On Wed, Feb 5, 2025 at 4:03 PM Andor Molnar <an...@apache.org> > > > > wrote: > > > > > > > > > I've a crazy idea for this which is super quick: > > > > > > > > > > Here we add usage of ACL to the cache: > > > > > > > > > > https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java#L1358 > > > > > > > > > > What if we do this...? > > > > > > > > > > In the cache, when we realize that ACL is missing, return > > > > > false: > > > > > > > > > > https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/ReferenceCountedACLCache.java#L175 > > > > > > > > > > In DataTree we'll modify the Znode ACL reference to "-1" > > > > > which > > > > > is > > > > > "world readable", essentially removing the ACL from the znode > > > > > and > > > > > continue: > > > > > > > > > > synchronized (node) { > > > > > if (!aclCache.addUsage(node.acl)) { > > > > > // Fix missing ACL > > > > > node.acl = OPEN_UNSAFE_ACL_ID; > > > > > LOG.warn("Missing ACL has been removed from znode, > > > > > proceeding."); > > > > > } > > > > > } > > > > > > > > > > Txn's processing will be fine, next snapshot will be "fixed". > > > > > > > > > > Andor > > > > > > > > > > > > > > > > > > > > On Wed, 2025-02-05 at 15:45 -0600, Andor Molnar wrote: > > > > > > Hi ZK folks, > > > > > > > > > > > > Let me draw your attention to this ticket. We've seen this > > > > > > happening > > > > > > in > > > > > > production and I would like to work on a fix. > > > > > > > > > > > > Damien already created a draft PR here: > > > > > > https://github.com/apache/zookeeper/pull/2183 > > > > > > > > > > > > Let's take a closer look and work on a strategic solution. > > > > > > > > > > > > Thanks, > > > > > > Andor > > > > > > > > > > > > > > > > > > > > > > > > > > > >