[ 
https://issues.apache.org/jira/browse/HBASE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780981#action_12780981
 ] 

Lars George commented on HBASE-1994:
------------------------------------

I am really torn looking at HBASE-1364. It seems there is a lot of planning 
already to get splits to work better. Is this bug here simply a preliminary fix 
to avoid the empty log files and also not using the same file names? If so it 
is fine. But I would like to know what the overall plan is.

Before reading HBASE-1364 I had a similar thought how I would improve on my 
suggestion above. I thought we could do this:

- Master sees abandoned log and sets marker in ZK for regions contained in log
- RegionServer on start up or region open checks ZK marker and reads the 
original file extracting the HLogKey's it needs storing them into a local file 
in the region (this is the distributed log split, not all RS's split here but 
those that need to read the logs local anyways.) Also possible to also put the 
data into a MemStore at the same time.
- The RS applies the log and flushes, then sets a semaphore that it completed 
next to the original log.
- If we have a local copy done as well as per above then the RS can delete it 
now
- RS clears marker in ZK for its region(s)
- Once the master sees that all RS have read the log and processed it it can 
delete the original log

I am probably missing some intrinsic details as I have read the HLog etc. code 
but did not debug it or so to see if I got all the steps right. Let me know 
what you think and what you want me to do here.

> Master will lose hlog entries while splitting if region has empty 
> oldlogfile.log
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-1994
>                 URL: https://issues.apache.org/jira/browse/HBASE-1994
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Cosmin Lehene
>            Priority: Blocker
>             Fix For: 0.21.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I don't know yet how an empty oldlogfile.log can exist, however it happened.
> Master will fail to put the splits in the region oldlogfile.log if an empty 
> oldlogfile.log already exists there.
> This is the master log after I artificially reproduced it by placing an empty 
> oldlogfile.log in /hbase/.META./1028785192/oldlogfile.log and then killed the 
> regionserver that was holding the .META. table
> 2009-11-19 09:08:36,012 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Splitting 1 hlog(s) in hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773
> 2009-11-19 09:08:36,012 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Splitting hlog 1 of 1: 
> hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128, 
> length=0
> 2009-11-19 09:08:36,019 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Adding queue for .META.,,1
> 2009-11-19 09:08:36,037 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Pushed=795 entries from 
> hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773/hlog.dat.1258637493128
> 2009-11-19 09:08:36,038 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Thread got 795 to process
> 2009-11-19 09:08:36,043 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Old hlog file hdfs://b0:9000/hbase/.META./1028785192/oldlogfile.log already 
> exists. Copying existing file to new file
> 2009-11-19 09:08:36,079 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Got while writing region .META.,,1 log java.io.EOFException
> 2009-11-19 09:08:36,081 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> hlog file splitting completed in 70 millis for 
> hdfs://b0:9000/hbase/.logs/b4,60020,1258637492773

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to