[
https://issues.apache.org/jira/browse/HDFS-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901267#action_12901267
]
Todd Lipcon commented on HDFS-1056:
-----------------------------------
bq. This fix could impact other code paths too, especially since the DN
comparision is used by many code paths. Maybe a unit test would be good.
Are you suggesting that the change be made to the equals() call instead of
locally in the DataNode code? As is, the patch Nicolas uploaded is scoped to
just that bit of code where it's been tested a lot and it's clear what the
correct semantics are. I think changing equals() itself would be dangerous as
it might break things in FSNamesystem, replication policy, etc.
bq. also, does this problem exist in trunk?
Yep, it does - same fix applies
> Multi-node RPC deadlocks during block recovery
> ----------------------------------------------
>
> Key: HDFS-1056
> URL: https://issues.apache.org/jira/browse/HDFS-1056
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Affects Versions: 0.20.2, 0.21.0, 0.22.0
> Reporter: Todd Lipcon
> Fix For: 0.20-append
>
> Attachments:
> 0013-HDFS-1056.-Fix-possible-multinode-deadlocks-during-b.patch
>
>
> Believe it or not, I'm seeing HADOOP-3657 / HADOOP-3673 in a 5-node 0.20
> cluster. I have many concurrent writes on the cluster, and when I kill a DN,
> some percentage of the time I get one of these cross-node deadlocks among 3
> of the nodes (replication 3). All of the DN RPC server threads are tied up
> waiting on RPC clients to other datanodes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.