[ 
https://issues.apache.org/jira/browse/HBASE-15265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153676#comment-15153676
 ] 

Duo Zhang commented on HBASE-15265:
-----------------------------------

There are two problems here, so the comments of this test file.

https://github.com/Apache9/hbase/blob/HBASE-15265/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestAsyncLogRolling.java

First, {{FanOutOneBlockAsyncDFSOutput}} is fail-fast, which means the creation 
is fail-fast too. But in the current log rolling architecture, we will abort RS 
if log rolling failed. For the old {{FSHLog}} implementation, {{DFSClient}} and 
{{DFSOutputStream}} have done a lot of retries when calling namenode failed or 
connecting datanode failed so it is not a problem, but now we just throw 
exception out so... We need to solve this, may change the abort logic of 
{{LogRoller}} or add retry in {{AsyncFSWAL}}?

Second, AsyncFSWAL will not fail any sync request, instead, it will try rolling 
the WALWriter and try again. But in testcase, this could lead to an infinite 
waiting when shutdown. The shutdown timing is a little strange. We first mark 
RS as stopped, and then close all regions on this RS. And if the abort flag is 
false, we will flush the region and need to write something to WAL. If the WAL 
writer is broken just at this time, {{AsyncFSWAL}} will try rolling the WAL 
writer. But as said above, RS is marked as stopped, so LogRoller may have 
already exited, the rolling will never success and the shutdown process hang...
Yes, I think {{AsyncFSWAL}} should have the ability to quit the infinite 
waiting since we know that it will never success, but also I think we should 
revisit the shutdown timing since lots of modules in RS is depending on the 
stopped flag of RS.

Thanks.

> Implement an asynchronous FSHLog
> --------------------------------
>
>                 Key: HBASE-15265
>                 URL: https://issues.apache.org/jira/browse/HBASE-15265
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to