[ 
https://issues.apache.org/jira/browse/HBASE-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087899#comment-13087899
 ] 

Ted Yu commented on HBASE-4222:
-------------------------------

@Gary:
Can you rebase the patch now that HBASE-4095 got integrated ?
{code}
Hunk #7 succeeded at 1055 (offset 21 lines).
1 out of 7 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java.rej
patching file 
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
Hunk #1 FAILED at 19.
Hunk #2 FAILED at 67.
Hunk #3 succeeded at 122 (offset -2 lines).
Hunk #4 succeeded at 378 with fuzz 2 (offset 42 lines).
2 out of 4 hunks FAILED -- saving rejects to file 
src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java.rej
{code}
Thanks

> Make HLog more resilient to write pipeline failures
> ---------------------------------------------------
>
>                 Key: HBASE-4222
>                 URL: https://issues.apache.org/jira/browse/HBASE-4222
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>             Fix For: 0.92.0
>
>
> The current implementation of HLog rolling to recover from transient errors 
> in the write pipeline seems to have two problems:
> # When {{HLog.LogSyncer}} triggers an {{IOException}} during time-based sync 
> operations, it triggers a log rolling request in the corresponding catch 
> block, but only after escaping from the internal while loop.  As a result, 
> the {{LogSyncer}} thread will exit and never be restarted from what I can 
> tell, even if the log rolling was successful.
> # Log rolling requests triggered by an {{IOException}} in {{sync()}} or 
> {{append()}} never happen if no entries have yet been written to the log.  
> This means that write errors are not immediately recovered, which extends the 
> exposure to more errors occurring in the pipeline.
> In addition, it seems like we should be able to better handle transient 
> problems, like a rolling restart of DataNodes while the HBase RegionServers 
> are running.  Currently this will reliably cause RegionServer aborts during 
> log rolling: either an append or time-based sync triggers an initial 
> {{IOException}}, initiating a log rolling request.  However the log rolling 
> then fails in closing the current writer ("All datanodes are bad"), causing a 
> RegionServer abort.  In this case, it seems like we should at least allow you 
> an option to continue with the new writer and only abort on subsequent errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to