[ 
https://issues.apache.org/jira/browse/HDFS-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901267#action_12901267
 ] 

Todd Lipcon commented on HDFS-1056:
-----------------------------------

bq. This fix could impact other code paths too, especially since the DN 
comparision is used by many code paths. Maybe a unit test would be good.

Are you suggesting that the change be made to the equals() call instead of 
locally in the DataNode code? As is, the patch Nicolas uploaded is scoped to 
just that bit of code where it's been tested a lot and it's clear what the 
correct semantics are. I think changing equals() itself would be dangerous as 
it might break things in FSNamesystem, replication policy, etc.

bq. also, does this problem exist in trunk?

Yep, it does - same fix applies

> Multi-node RPC deadlocks during block recovery
> ----------------------------------------------
>
>                 Key: HDFS-1056
>                 URL: https://issues.apache.org/jira/browse/HDFS-1056
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>             Fix For: 0.20-append
>
>         Attachments: 
> 0013-HDFS-1056.-Fix-possible-multinode-deadlocks-during-b.patch
>
>
> Believe it or not, I'm seeing HADOOP-3657 / HADOOP-3673 in a 5-node 0.20 
> cluster. I have many concurrent writes on the cluster, and when I kill a DN, 
> some percentage of the time I get one of these cross-node deadlocks among 3 
> of the nodes (replication 3). All of the DN RPC server threads are tied up 
> waiting on RPC clients to other datanodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to