[jira] Commented: (HBASE-2437) Refactor HLog splitLog

HBase Review Board (JIRA) Tue, 25 May 2010 14:42:01 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871376#action_12871376
 ]

HBase Review Board commented on HBASE-2437:
-------------------------------------------

Message from: "Cosmin Lehene" <[email protected]>

Let's have a separate issue for that

done - for both reader and writer

done - both reader and writer impl

the original version of this function determined me to start refactoring in the 
first place. I'll add the description but if it's still confusing it might need 
more work.

I know and tried to get a better name when created it. Can you suggest 
something better? I can't figure a short descriptive enough name

perhaps hbase.regionserver.hlog.splitlog.batch.size?

done

done

fixed

done

Fixed

done

done

sync() used to call syncFs(). It looks like HBASE-2544 changed things a bit, 
but it doesn't only add the SequenceFile sync marker.

I added this after I've seen inconsistent results when running splitLog on 
bigger hlogs. Try copying a log from the cluster locally and run splitLog from 
the command line a few times without flushing it after each append. I used to 
get inconsistent results between runs and calling sync fixed it.

There's this "//TODO: test the split of a large (lots of regions > 500 file). 
In my tests it seems without hflush"  in the TestHLogSplit. 

We could do some testing to figure out why would log entries be lost when 
running locally.

What would be a better way to flush the writer?

done

fixed

I don't know what it's supposed to mean either :)

fixed

I'd like to be able to investigate the trailing garbage. I don't think this 
should ever happen (do you see any scenarios?). If it did we might lose data. 
We used to fix NameNode edits for fsImage by adding a missing byte to a 
corrupted entry.

I'd like to reflect more on this, maybe see other opinions. 

I'd rather have these differences dealt at the lowest level (writers) and 
abstracted than spread across code.
What do you think? 

done 
I'll need help setting guava as a maven dependency.

createNewSplitter is fine.
It's a Callable, changed to submit in order to check for the result in case one 
of the writers failed (see comment below)

more aspects here:
I think the reported size will be >0 after recover, even if file has no 
records. I was asking if we should add logic to check if it's the last log. 
EOF for non zero length, non zero records file means file is corrupted. 

see above comment

what's the other JIRA? see my above comments.

my previous comment got lost somehow.
Todd suggested submitting a Callable<Void> to executor thread.
I wonder if we could use getCompletedTaskCount. Documentation sais it's an 
estimation, however it's an estimation only during runtime and seems to be 
correct after shutdown finishes (I looked in the source as well)

Another option would be ExecutorCompletionService which seems to be suited for 
this kind of job.

- Cosmin

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/74/#review40
-----------------------------------------------------------

On 2010-05-21 12:11:59, stack wrote:

> Refactor HLog splitLog
> ----------------------
>
>                 Key: HBASE-2437
>                 URL: https://issues.apache.org/jira/browse/HBASE-2437
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Assignee: Cosmin Lehene
>             Fix For: 0.21.0
>
>         Attachments: 2437-v2.txt, 2437-v3.txt, 2437-v4.patch, 2437.txt, 
> HBASE-2437_for_HBase-0.21_with_unit_tests_for_HDFS-0.21.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> the HLog.splitLog got really long and complex and hard to verify for 
> correctness. 
> I started to refactor it and also ported changes from hbase-2337 that deals 
> with premature deletion of log files in case of errors. Further improvements 
> will be possible, however the scope of this issue is to clean the code and 
> make it behave correctly (i.e. not lose any edits)  
> Added a suite of unit tests that might be ported to 0.20 as well.
> Added a setting (hbase.skip.errors - feel free to suggest a better name) 
> that, when set to false will make the process less tolerant to failures or 
> corrupted files:  in case a log file is corrupted or an error stops the 
> process from consistently splitting the log, will abort the entire operation 
> to avoid losing any edits. When hbase.skip.errors is on any corrupted files 
> will be partially parsed and then moved to the corrupted logs archive (see 
> hbase-2337). 
> Like hbase-2337 the splitLog method will first split all the logs and then 
> proceed to archive them. If any splitted log file (oldlogfile.log) that is 
> the result of an earlier splitLog attempt is found in the region directory, 
> it will be deleted - this is safe since we won't move the original log files 
> until the splitLog process completes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2437) Refactor HLog splitLog

Reply via email to