[ 
https://issues.apache.org/jira/browse/HDFS-15650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224911#comment-17224911
 ] 

Wei-Chiu Chuang commented on HDFS-15650:
----------------------------------------

Sounds like a possible scenario... thanks for sharing that. I think in the 
current DataNode design there's no good way to prevent that from happening in 
general. A workaround is to increase DN thread pool size. But I am aware of 
other cases where DN exhausts thread pool even to the point where it exceeds 
maximum number of open file descriptors. 

A quick remedy for this is to update the descriptionion of hdfs-default.xml
{noformat}
<property>
  <name>dfs.client.socket-timeout</name>
  <value>60000</value>
  <description>
    Default timeout value in milliseconds for all sockets.
  </description>
</property>
{noformat}

and add reminders that in an erasure coded cluster it is recommended to make 
dfs.client.socket-timeout consistent on both client and DNs.


> Make the socket timeout for computing checksum of striped blocks configurable
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-15650
>                 URL: https://issues.apache.org/jira/browse/HDFS-15650
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, ec, erasure-coding
>            Reporter: Yushi Hayasaka
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Regarding the DataNode tries to get the checksum of EC internal blocks from 
> another DataNode for computing the checksum of striped blocks, the timeout is 
> hard-coded now, but it should be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to