[ 
https://issues.apache.org/jira/browse/HDFS-13103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351236#comment-16351236
 ] 

Wei-Chiu Chuang commented on HDFS-13103:
----------------------------------------

Post an initial patch to illustrate the solution #1. If the approach makes 
sense, I can add a test to it.

> HDFS Client write acknowledgement timeout should not depend on read timeout
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-13103
>                 URL: https://issues.apache.org/jira/browse/HDFS-13103
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs-client
>    Affects Versions: 2.8.0, 3.0.0-alpha1
>         Environment: CDH5.7.0 and above + Cloudera Manager. HBase Region 
> Server.
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: HDFS-13103.001.patch
>
>
> HDFS-8311 added a timeout for client write acknowledgement for both
>  # transferring blocks
>  # writing to a DataNode.
> The timeout shares the same configuration as client read timeout 
> (dfs.client.socket-timeout).
> While I agree having a timeout is good, *it does not make sense for the write 
> acknowledgement timeout to depend on read timeout*. We saw a case where 
> cluster admin wants to reduce HBase RegionServer read timeout so as to detect 
> DataNode crash quickly, but did not realize it affects write acknowledgement 
> timeout.
> In the end, the effective DataNode write timeout is shorter than the 
> effective client write acknowledgement timeout. If the last two DataNodes in 
> the write pipeline crashes, client would think the first DataNode is faulty 
> (the DN appears unresponsive because it is still waiting for the ack from 
> downstream DNs), dropping it and then HBase RS would crash because it is 
> unable to write to any good DataNode. This scenario is possible during a rack 
> failure.
> This problem is even worse for Cloudera Manager-managed cluster. By default, 
> CM-managed HBase RegionServer sets 
> {{dfs.client.block.write.replace-datanode-on-failure.enable = true}}. Even 
> one unresponsive DataNode could crash HBase RegionServer.
> I am raising this Jira to discuss two possible solutions
>  # add a new config for write acknowledgement timeout. Do not depend on read 
> timeout
>  # or, update the description of dfs.client.socket-timeout in 
> core-default.xml so that admin is aware write acknowledgement timeout depends 
> on this configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to