[ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357324#comment-14357324
 ] 

Yongjun Zhang commented on HDFS-7915:
-------------------------------------

Hi [~cmccabe],

Thanks for reporting the issue and the solution. The patch looks good in 
general. I have couple of comments:

1. Can we add a log message when doing unregisterSlot below to state that "slot 
x is unregistered due to ..."? I think this will help future debugging of 
similar issue.
{code}
     if ((!success) && (registeredSlotId != null)) {
        datanode.shortCircuitRegistry.unregisterSlot(registeredSlotId);
      }
{code}

2. I applied your patch, and reverted DataXceiver, ran the test, expecting it 
to fail, but it did not. I wonder if I missed anything.

Thanks.


> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
> the DFSClient about it because of a network error
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7915
>                 URL: https://issues.apache.org/jira/browse/HDFS-7915
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7915.001.patch
>
>
> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
> the DFSClient about it because of a network error.  In 
> {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
> part (mark the slot as used) and fail at the second part (tell the DFSClient 
> what it did). The "try" block for unregistering the slot only covers a 
> failure in the first part, not the second part. In this way, a divergence can 
> form between the views of which slots are allocated on DFSClient and on 
> server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to