[ 
https://issues.apache.org/jira/browse/HBASE-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881937#action_12881937
 ] 

Nicolas Spiegelberg commented on HBASE-2437:
--------------------------------------------

Hey, late peer review, I know.  I had a couple things I saw during the peer 
review that I would like to get resolution on...

1. FSUtils.recoverFileLease() - does the InterruptedException handler need to 
set Thread.currentThread().interrupt()?  What about users actually trying to 
kill the master?
2. parseHLog() - the comment says that len==0 can still happen with append 
support? At least for 0.20, that's only if the file wasn't closed.  However, we 
just did that immediately before in recoverFileLease. don't mind the code, but 
comment should change or be clarified
3. writeEditsBatchToRegions() - same thing with InterruptedException.  maybe 
want to have something higher up the call stack notice isInterrupted() and 
display a Log.info() message.  At the very least, we definitely don't want to 
delete the log directory in splitLog() if the user interrupts these threads and 
skipErrors==true
4. archiveLogs() - by default, we are archiving successfully split logs ala 
'processedLogs'?  I'm not sure we want to do that by default.  I think people 
are mainly interested in problematic logs that couldn't survive the transition. 
 Having this as an optional toggle is okay, but a naive user wouldn't know he 
has these trash items.

> Refactor HLog splitLog
> ----------------------
>
>                 Key: HBASE-2437
>                 URL: https://issues.apache.org/jira/browse/HBASE-2437
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Assignee: Cosmin Lehene
>             Fix For: 0.21.0
>
>         Attachments: 2437-v2.txt, 2437-v3.txt, 2437-v4.patch, 2437.txt, 
> HBASE-2437-v5.patch, 
> HBASE-2437_for_HBase-0.21_with_unit_tests_for_HDFS-0.21.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> the HLog.splitLog got really long and complex and hard to verify for 
> correctness. 
> I started to refactor it and also ported changes from hbase-2337 that deals 
> with premature deletion of log files in case of errors. Further improvements 
> will be possible, however the scope of this issue is to clean the code and 
> make it behave correctly (i.e. not lose any edits)  
> Added a suite of unit tests that might be ported to 0.20 as well.
> Added a setting (hbase.skip.errors - feel free to suggest a better name) 
> that, when set to false will make the process less tolerant to failures or 
> corrupted files:  in case a log file is corrupted or an error stops the 
> process from consistently splitting the log, will abort the entire operation 
> to avoid losing any edits. When hbase.skip.errors is on any corrupted files 
> will be partially parsed and then moved to the corrupted logs archive (see 
> hbase-2337). 
> Like hbase-2337 the splitLog method will first split all the logs and then 
> proceed to archive them. If any splitted log file (oldlogfile.log) that is 
> the result of an earlier splitLog attempt is found in the region directory, 
> it will be deleted - this is safe since we won't move the original log files 
> until the splitLog process completes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to