[
https://issues.apache.org/jira/browse/HADOOP-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640088#action_12640088
]
dhruba borthakur commented on HADOOP-4278:
------------------------------------------
Thanks Sameer for your suggestion.
I followed Raghu's suggestion of investigating why the client never recovered.
There were three datanodes A, B and C in the pipeline. The test killed B. The
client thought that C was killed.
The client designated A as the primary datanode. It made a recoverBlock RPC to
the primary datanode. The primary datanode, in turn , should have made a
recoverBlock RPC to itslf and B. However, this did not occur. Looking at the
logs, it appears that the primary datanode tried to make a RPC only to B. This
failed (again and again) and the primary datanode returned error to the client.
The client than aborted.
> TestDatanodeDeath failed occasionally
> -------------------------------------
>
> Key: HADOOP-4278
> URL: https://issues.apache.org/jira/browse/HADOOP-4278
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Reporter: Tsz Wo (Nicholas), SZE
> Assignee: dhruba borthakur
> Priority: Blocker
> Fix For: 0.19.0
>
>
> TestDatanodeDeath keeps failing occasionally. For example, see
> http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3365/testReport/
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.