Deadlock in Datanode RPC servers
--------------------------------

                 Key: HADOOP-3673
                 URL: https://issues.apache.org/jira/browse/HADOOP-3673
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
            Reporter: dhruba borthakur


There is a deadlock scenario in the way Lease Recovery is triggered using the 
Datanode RPC server via HADOOP-3283.

Each Datanode has dfs.datanode.handler.count handler threads (default of 3). 
These handler threads are used to support the generation-stamp-dance protocol 
as described in HADOOP-1700.

Let me try to explain the scenario with an example. Suppose, a cluster has two 
datanodes. Also, let's assume that dfs.datanode.handler.count is set to 1. 
Suppose that there are two clients, each writing to a separate file with a 
replication factor of 2. Let's assume that both clients encounter an IO error 
and triggers the generation-stamp-dance protocol. The first client may invoke 
recoverBlock on the first datanode while the second client may invoke 
recoverBlock on the second datanode. Now, each of the datanode will try to make 
a getBlockMetaDataInfo() to the other datanode. But since each datanode has 
only 1 server handler threads, both threads will block for eternity. Deadlock!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to