[ 
https://issues.apache.org/jira/browse/HADOOP-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609989#action_12609989
 ] 

Hairong Kuang commented on HADOOP-3673:
---------------------------------------

I do not like the idea of spawning a new thread for every incoming RPC. A RPC 
server does need to limit the amount of resource it uses. What if there are 
thousands of blocks that need to be recovered concurrently?

Sameer's idea seems simple but works. This means that all datanodes need to 
have a global order, for example the alphabetical order of it name. A client 
simply sorts the  datanodes in its pipeline before contacting the primary 
datanode.

> Deadlock in Datanode RPC servers
> --------------------------------
>
>                 Key: HADOOP-3673
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3673
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> There is a deadlock scenario in the way Lease Recovery is triggered using the 
> Datanode RPC server via HADOOP-3283.
> Each Datanode has dfs.datanode.handler.count handler threads (default of 3). 
> These handler threads are used to support the generation-stamp-dance protocol 
> as described in HADOOP-1700.
> Let me try to explain the scenario with an example. Suppose, a cluster has 
> two datanodes. Also, let's assume that dfs.datanode.handler.count is set to 
> 1. Suppose that there are two clients, each writing to a separate file with a 
> replication factor of 2. Let's assume that both clients encounter an IO error 
> and triggers the generation-stamp-dance protocol. The first client may invoke 
> recoverBlock on the first datanode while the second client may invoke 
> recoverBlock on the second datanode. Now, each of the datanode will try to 
> make a getBlockMetaDataInfo() to the other datanode. But since each datanode 
> has only 1 server handler threads, both threads will block for eternity. 
> Deadlock!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to