[ 
https://issues.apache.org/jira/browse/HBASE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845532#action_12845532
 ] 

Todd Lipcon commented on HBASE-2312:
------------------------------------

Maybe we can actually avoid this situation. If the Master sees an "intent to 
roll" record that refers to a file it didn't see in the listStatus() it knows 
that the region server has actually made progress since it was originally 
detected dead. Thus, it can abort the whole recovery operation since the RS is 
known to be alive again.

> Possible data loss when RS goes into GC pause while rolling HLog
> ----------------------------------------------------------------
>
>                 Key: HBASE-2312
>                 URL: https://issues.apache.org/jira/browse/HBASE-2312
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.20.3
>            Reporter: Karthik Ranganathan
>
> There is a very corner case when bad things could happen(ie data loss):
> 1)    RS #1 is going to roll its HLog - not yet created the new one, old one 
> will get no more writes
> 2)    RS #1 enters GC Pause of Death
> 3)    Master lists HLog files of RS#1 that is has to split as RS#1 is dead, 
> starts splitting
> 4)    RS #1 wakes up, created the new HLog (previous one was rolled) and 
> appends an edit - which is lost
> The following seems like a possible solution:
> 1)    Master detects RS#1 is dead
> 2)    The master renames the /hbase/.logs/<regionserver name>  directory to 
> something else (say /hbase/.logs/<regionserver name>-dead)
> 3)    Add mkdir support (as opposed to mkdirs) to HDFS - so that a file 
> create fails if the directory doesn't exist. Dhruba tells me this is very 
> doable.
> 4)    RS#1 comes back up and is not able create the new hlog. It restarts 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to