[
https://issues.apache.org/jira/browse/HDFS-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454865#comment-13454865
]
Hudson commented on HDFS-3703:
------------------------------
Integrated in Hadoop-Hdfs-trunk #1164 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1164/])
HDFS-3703. Datanodes are marked stale if heartbeat is not received in
configured timeout and are selected as the last location to read from.
Contributed by Jing Zhao. (Revision 1384209)
Result = FAILURE
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1384209
Files :
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
*
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetBlocks.java
> Decrease the datanode failure detection time
> --------------------------------------------
>
> Key: HDFS-3703
> URL: https://issues.apache.org/jira/browse/HDFS-3703
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node, name-node
> Affects Versions: 1.0.3, 2.0.0-alpha, 3.0.0
> Reporter: nkeywal
> Assignee: Jing Zhao
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: 3703-hadoop-1.0.txt, HDFS-3703-branch2.patch,
> HDFS-3703.patch, HDFS-3703-trunk-read-only.patch,
> HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch,
> HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch,
> HDFS-3703-trunk-read-only.patch, HDFS-3703-trunk-read-only.patch,
> HDFS-3703-trunk-with-write.patch
>
>
> By default, if a box dies, the datanode will be marked as dead by the
> namenode after 10:30 minutes. In the meantime, this datanode will still be
> proposed by the nanenode to write blocks or to read replicas. It happens as
> well if the datanode crashes: there is no shutdown hooks to tell the nanemode
> we're not there anymore.
> It especially an issue with HBase. HBase regionserver timeout for production
> is often 30s. So with these configs, when a box dies HBase starts to recover
> after 30s and, while 10 minutes, the namenode will consider the blocks on the
> same box as available. Beyond the write errors, this will trigger a lot of
> missed reads:
> - during the recovery, HBase needs to read the blocks used on the dead box
> (the ones in the 'HBase Write-Ahead-Log')
> - after the recovery, reading these data blocks (the 'HBase region') will
> fail 33% of the time with the default number of replica, slowering the data
> access, especially when the errors are socket timeout (i.e. around 60s most
> of the time).
> Globally, it would be ideal if HDFS settings could be under HBase settings.
> As a side note, HBase relies on ZooKeeper to detect regionservers issues.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira