Xiaolin Ha created HBASE-26347:
----------------------------------

             Summary: Support detect and exclude slow DNs in fan-out of WAL
                 Key: HBASE-26347
                 URL: https://issues.apache.org/jira/browse/HBASE-26347
             Project: HBase
          Issue Type: New Feature
          Components: wal
    Affects Versions: 2.0.0, 3.0.0-alpha-2
            Reporter: Xiaolin Ha
            Assignee: Xiaolin Ha


We all knows the WAL sync performance directly affects the RPC process time.

And we use self-designed FanOutOneBlockAsyncDFSOutput to sync WAL entries, 
which connect straightly to all the block located DNs. But when even one DN of 
the locations is slow, e.g. some disk hardware failures, the WAL syncs slow. 
And what's more, the hardware failure detected by the lower layer HDFS system 
is not so sensitive.

We can detect slow DNs by the ACK time of packets in 
FanOutOneBlockAsyncDFSOutput, and exclude them when add new blocks after log 
rolled(rolling log can also be triggered by slow syncs). And shows this info in 
UI. We can also invalid these excluded DN cache after a duration, to aware the 
recovery of those DNs. 

But anymore, this idea can quickly reduce the influence of slow DNs, and 
improve the service availability.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to