[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ed Rowe updated ZOOKEEPER-2434:
-------------------------------
    Description: 
# If FileTxnLog.truncate() is unable to delete a log file, it calls LOG.warn() 
but otherwise does nothing. I think this should be a fatal error not a logged 
warning. Otherwise those log files are going be be read later when the DB is 
reloaded and data that should have been removed will still be present. 
# Learner.syncWithLeader() expects ZKDatabase.truncateLog() to return false on 
failure, and if this occurs it calls System.exit(). However, this will never 
happen because ZKDatabase.truncateLog() never returns false - instead an 
exception is thrown on failure. ZKDatabase.truncateLog() calls 
FileTxnSnapLog.truncateLog() which calls FileTxnLog.truncate(), each of which 
is documented to return false on failure but none of which ever does in 
practice. TruncateTest.testTruncationNullLog() clearly expects an exception on 
error in ZKDatabase.truncateLog() so there are conflicting expectations in the 
codebase. It appears that if Learner.syncWithLeader() encounters an exception, 
System.exit() will _not_ be called and instead we land in the main run loop 
where we'll start the whole thing again. So there are two things to deal with: 
a) whether we want to do system.exit or go back to the main run loop if 
truncation fails, and b) sort out the return false vs. throw exception 
discrepancy and make it consistent (and change the docs as needed).
   
I'm happy to propose a patch, but I'd need people with more experience in the 
codebase to weigh in on the questions above.


  was:
# If FileTxnLog.truncate() is unable to delete a log file, it calls LOG.warn() 
but otherwise ignores does nothing. I think this should be a fatal error not a 
logged warning. Otherwise those log files are going be be read later when the 
DB is reloaded and data that should have been removed will still be present. 
# Learner.syncWithLeader() expects ZKDatabase.truncateLog() to return false on 
failure, and if this occurs it calls System.exit(). However, this will never 
happen because ZKDatabase.truncateLog() never returns false - instead an 
exception is thrown on failure. ZKDatabase.truncateLog() calls 
FileTxnSnapLog.truncateLog() which calls FileTxnLog.truncate(), each of which 
is documented to return false on failure but none of which ever does in 
practice. TruncateTest.testTruncationNullLog() clearly expects an exception on 
error in ZKDatabase.truncateLog() so there are conflicting expectations in the 
codebase. It appears that if Learner.syncWithLeader() encounters an exception, 
System.exit() will _not_ be called and instead we land in the main run loop 
where we'll start the whole thing again. So there are two things to deal with: 
a) whether we want to do system.exit or go back to the main run loop if 
truncation fails, and b) sort out the return false vs. throw exception 
discrepancy and make it consistent (and change the docs as needed).
   
I'm happy to propose a patch, but I'd need people with more experience in the 
codebase to weigh in on the questions above.



> Error handling issues in truncation code
> ----------------------------------------
>
>                 Key: ZOOKEEPER-2434
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2434
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>            Reporter: Ed Rowe
>            Priority: Minor
>
> # If FileTxnLog.truncate() is unable to delete a log file, it calls 
> LOG.warn() but otherwise does nothing. I think this should be a fatal error 
> not a logged warning. Otherwise those log files are going be be read later 
> when the DB is reloaded and data that should have been removed will still be 
> present. 
> # Learner.syncWithLeader() expects ZKDatabase.truncateLog() to return false 
> on failure, and if this occurs it calls System.exit(). However, this will 
> never happen because ZKDatabase.truncateLog() never returns false - instead 
> an exception is thrown on failure. ZKDatabase.truncateLog() calls 
> FileTxnSnapLog.truncateLog() which calls FileTxnLog.truncate(), each of which 
> is documented to return false on failure but none of which ever does in 
> practice. TruncateTest.testTruncationNullLog() clearly expects an exception 
> on error in ZKDatabase.truncateLog() so there are conflicting expectations in 
> the codebase. It appears that if Learner.syncWithLeader() encounters an 
> exception, System.exit() will _not_ be called and instead we land in the main 
> run loop where we'll start the whole thing again. So there are two things to 
> deal with: a) whether we want to do system.exit or go back to the main run 
> loop if truncation fails, and b) sort out the return false vs. throw 
> exception discrepancy and make it consistent (and change the docs as needed).
>    
> I'm happy to propose a patch, but I'd need people with more experience in the 
> codebase to weigh in on the questions above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to