[
https://issues.apache.org/jira/browse/ZOOKEEPER-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ed Rowe updated ZOOKEEPER-2434:
-------------------------------
Description:
# If FileTxnLog.truncate() is unable to delete a log file, it calls LOG.warn()
but otherwise does nothing. I think this should be a fatal error not a logged
warning. Otherwise those log files are going be be read later when the DB is
reloaded and data that should have been removed will still be present.
# Learner.syncWithLeader() expects ZKDatabase.truncateLog() to return false on
failure, and if this occurs it calls System.exit(). However, this will never
happen because ZKDatabase.truncateLog() never returns false - instead an
exception is thrown on failure. ZKDatabase.truncateLog() calls
FileTxnSnapLog.truncateLog() which calls FileTxnLog.truncate(), each of which
is documented to return false on failure but none of which ever does in
practice. TruncateTest.testTruncationNullLog() clearly expects an exception on
error in ZKDatabase.truncateLog() so there are conflicting expectations in the
codebase. It appears that if Learner.syncWithLeader() encounters an exception,
System.exit() will _not_ be called and instead we land in the main run loop
where we'll start the whole thing again. So there are two things to deal with:
a) whether we want to do system.exit or go back to the main run loop if
truncation fails, and b) sort out the return false vs. throw exception
discrepancy and make it consistent (and change the docs as needed).
I'm happy to propose a patch, but I'd need people with more experience in the
codebase to weigh in on the questions above.
was:
# If FileTxnLog.truncate() is unable to delete a log file, it calls LOG.warn()
but otherwise ignores does nothing. I think this should be a fatal error not a
logged warning. Otherwise those log files are going be be read later when the
DB is reloaded and data that should have been removed will still be present.
# Learner.syncWithLeader() expects ZKDatabase.truncateLog() to return false on
failure, and if this occurs it calls System.exit(). However, this will never
happen because ZKDatabase.truncateLog() never returns false - instead an
exception is thrown on failure. ZKDatabase.truncateLog() calls
FileTxnSnapLog.truncateLog() which calls FileTxnLog.truncate(), each of which
is documented to return false on failure but none of which ever does in
practice. TruncateTest.testTruncationNullLog() clearly expects an exception on
error in ZKDatabase.truncateLog() so there are conflicting expectations in the
codebase. It appears that if Learner.syncWithLeader() encounters an exception,
System.exit() will _not_ be called and instead we land in the main run loop
where we'll start the whole thing again. So there are two things to deal with:
a) whether we want to do system.exit or go back to the main run loop if
truncation fails, and b) sort out the return false vs. throw exception
discrepancy and make it consistent (and change the docs as needed).
I'm happy to propose a patch, but I'd need people with more experience in the
codebase to weigh in on the questions above.
> Error handling issues in truncation code
> ----------------------------------------
>
> Key: ZOOKEEPER-2434
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2434
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Reporter: Ed Rowe
> Priority: Minor
>
> # If FileTxnLog.truncate() is unable to delete a log file, it calls
> LOG.warn() but otherwise does nothing. I think this should be a fatal error
> not a logged warning. Otherwise those log files are going be be read later
> when the DB is reloaded and data that should have been removed will still be
> present.
> # Learner.syncWithLeader() expects ZKDatabase.truncateLog() to return false
> on failure, and if this occurs it calls System.exit(). However, this will
> never happen because ZKDatabase.truncateLog() never returns false - instead
> an exception is thrown on failure. ZKDatabase.truncateLog() calls
> FileTxnSnapLog.truncateLog() which calls FileTxnLog.truncate(), each of which
> is documented to return false on failure but none of which ever does in
> practice. TruncateTest.testTruncationNullLog() clearly expects an exception
> on error in ZKDatabase.truncateLog() so there are conflicting expectations in
> the codebase. It appears that if Learner.syncWithLeader() encounters an
> exception, System.exit() will _not_ be called and instead we land in the main
> run loop where we'll start the whole thing again. So there are two things to
> deal with: a) whether we want to do system.exit or go back to the main run
> loop if truncation fails, and b) sort out the return false vs. throw
> exception discrepancy and make it consistent (and change the docs as needed).
>
> I'm happy to propose a patch, but I'd need people with more experience in the
> codebase to weigh in on the questions above.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)