Íñigo Goiri commented on HDFS-13103:

I agree that the current approach is tricky to setup and a separate 
configuration makes sense.
Checking into this patch, I checked what was the documentation for this 
parameter and the only one available is the one in {{hdfs-default.xml}} (not 
complete) and the code itself.
It may make sense to create a separate JIRA to document the setup of the client.

> HDFS Client write acknowledgement timeout should not depend on read timeout
> ---------------------------------------------------------------------------
>                 Key: HDFS-13103
>                 URL: https://issues.apache.org/jira/browse/HDFS-13103
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs-client
>    Affects Versions: 2.8.0, 3.0.0-alpha1
>         Environment: CDH5.7.0 and above + Cloudera Manager. HBase Region 
> Server.
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: HDFS-13103.001.patch
> HDFS-8311 added a timeout for client write acknowledgement for both
>  # transferring blocks
>  # writing to a DataNode.
> The timeout shares the same configuration as client read timeout 
> (dfs.client.socket-timeout).
> While I agree having a timeout is good, *it does not make sense for the write 
> acknowledgement timeout to depend on read timeout*. We saw a case where 
> cluster admin wants to reduce HBase RegionServer read timeout so as to detect 
> DataNode crash quickly, but did not realize it affects write acknowledgement 
> timeout.
> In the end, the effective DataNode write timeout is shorter than the 
> effective client write acknowledgement timeout. If the last two DataNodes in 
> the write pipeline crashes, client would think the first DataNode is faulty 
> (the DN appears unresponsive because it is still waiting for the ack from 
> downstream DNs), dropping it and then HBase RS would crash because it is 
> unable to write to any good DataNode. This scenario is possible during a rack 
> failure.
> This problem is even worse for Cloudera Manager-managed cluster. By default, 
> CM-managed HBase RegionServer sets 
> {{dfs.client.block.write.replace-datanode-on-failure.enable = true}}. Even 
> one unresponsive DataNode could crash HBase RegionServer.
> I am raising this Jira to discuss two possible solutions
>  # add a new config for write acknowledgement timeout. Do not depend on read 
> timeout
>  # or, update the description of dfs.client.socket-timeout in 
> core-default.xml so that admin is aware write acknowledgement timeout depends 
> on this configuration.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to