Ed Rowe created ZOOKEEPER-2434:
----------------------------------

             Summary: Error handling issues in truncation code
                 Key: ZOOKEEPER-2434
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2434
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
            Reporter: Ed Rowe
            Priority: Minor


# If FileTxnLog.truncate() is unable to delete a log file, it calls LOG.warn() 
but otherwise ignores does nothing. I think this should be a fatal error not a 
logged warning. Otherwise those log files are going be be read later when the 
DB is reloaded and data that should have been removed will still be present. 
   
# Learner.syncWithLeader() expects ZKDatabase.truncateLog() to return false on 
failure, and if this occurs it calls System.exit(). However, this will never 
happen because ZKDatabase.truncateLog() never returns false - instead an 
exception is thrown on failure. ZKDatabase.truncateLog() calls 
FileTxnSnapLog.truncateLog() which calls FileTxnLog.truncate(), each of which 
is documented to return false on failure but none of which ever does in 
practice. TruncateTest.testTruncationNullLog() clearly expects an exception on 
error in ZKDatabase.truncateLog() so there are conflicting expectations in the 
codebase. It appears that if Learner.syncWithLeader() encounters an exception, 
System.exit() will _not_ be called and instead we land in the main run loop 
where we'll start the whole thing again. So there are two things to deal with: 
a) whether we want to do system.exit or go back to the main run loop if 
truncation fails, and b) sort out the return false vs. throw exception 
discrepancy and make it consistent (and change the docs as needed).
   
I'm happy to propose a patch, but I'd need people with more experience in the 
codebase to weigh in on the questions above.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to