[
https://issues.apache.org/jira/browse/ZOOKEEPER-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802980#comment-16802980
]
Norbert Kalmar commented on ZOOKEEPER-3333:
-------------------------------------------
Thanks for your input Brian!
I would say then ZK should log the warning, do a snapshot and start a new txn
log. And make sure we do not lose any txns along the way.
> Detect if txnlogs and / or snapshots is deleted under a running ZK instance
> ---------------------------------------------------------------------------
>
> Key: ZOOKEEPER-3333
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3333
> Project: ZooKeeper
> Issue Type: Improvement
> Components: server
> Affects Versions: 3.5.5, 3.4.14
> Reporter: Norbert Kalmar
> Priority: Major
>
> ZK does not notice if txnlogs are deleted from it's dataDir, and it will just
> keep running, writing txns in the buffer. Than, when ZK is restarted, it will
> lose all data.
> To reproduce:
> I run a 3 node ZK ensemble, and deleted dataDir for just one instance, than
> wrote some data. It turns out, it will not write the transaction to disk. ZK
> stores everything in memory, until it “feels like” it’s time to persist it on
> disk. So it doesn’t even notice the file is deleted, and when it tried to
> flush, I imagine it just fails and keeps it in the buffer.
> So anyway, I restarted the instance, it got the snapshot + latest txn logs
> from the other nodes, as expected it would. It also wrote them in dataDir, so
> now every node had the dataDir.
> So deleting from one node is fine (again, as expected, they will sync after a
> restart).
> Then, I deleted all 3 nodes dataDir under running instances. Until restart,
> it worked fine (of course I was getting my buffer full, I did not test until
> the point it got overflowed).
> But after restart, I got a fresh new ZK with all my znodes gone.
> For starter, I think ZK should detect if the file it is appending is removed.
> What should ZK do? At least give a warning log message. The question should
> it try to create a new file? Or try to get it from other nodes? Or just fail
> instantly? Restart itself, see if it can sync?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)