[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error

Colin Patrick McCabe (JIRA) Thu, 12 Mar 2015 13:47:00 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359369#comment-14359369
 ]


Colin Patrick McCabe commented on HDFS-7915:
--------------------------------------------

I found another problem here.  To explain it, I need to explain how the 
communication happens now.

1. In the {{BlockReaderFactory}}, the DFSClient initiates the file descriptor 
request by sending:
{code}
        [2-byte] 28 [DATA_TRANSFER_VERSION]
        [1-byte] 87 [REQUEST_SHORT_CIRCUIT_FDS]
        [var] OpRequestShortCircuitAccessProto(blk, blockToken, slotId, tracing 
stuff)
{code}

2. On the DataNode, in {{DataXceiver}}, we read the 
{{OpRequestShortCircuitAccessProto}} that the client sent.  We call 
{{DataNode#requestShortCircuitFdsForRead}} to load the file descriptors.  If 
that succeeded, we send back a {{BlockOpResponseProto}} with status {{SUCCESS}}.

3. Back in the DFSClient, we read the {{BlockOpResponseProto}}.

4. If it contains a SUCCESS response, the DFSClient calls 
{{sock.recvFileInputStreams}}.  This reads a single byte and also passes the 
new file descriptor to us (the DFSClient.)

The problem is that if the DFSClient closes the socket after step #3, but 
before step #4, the DataNode thinks that the transfer was successful and never 
unregisters the slot.  This is what led to the unit test failures earlier.  It 
seems that there is a buffer in the UNIX domain socket that we are writing to, 
which lets the DataNode's write succeed immediately even before the DFSClient 
actually reads the data.

To fix this, we can add a step #5: the DFSClient writes a byte for the DataNode 
to receive.  And step #6: the datanode reads it.  That way, if a socket close 
or other error happens before step #5, we know that the FD didn't get sent.

This can be done compatibly by adding a new boolean to the protobuf which 
indicates to the DataNode that the client supports "receipt verification."  New 
datanodes will set this bit and old ones will not.  Neither the datanode nor 
the dfsclient will attempt to do receipt verification unless the other party 
supports it.

> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
> the DFSClient about it because of a network error
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7915
>                 URL: https://issues.apache.org/jira/browse/HDFS-7915
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch
>
>
> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
> the DFSClient about it because of a network error.  In 
> {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
> part (mark the slot as used) and fail at the second part (tell the DFSClient 
> what it did). The "try" block for unregistering the slot only covers a 
> failure in the first part, not the second part. In this way, a divergence can 
> form between the views of which slots are allocated on DFSClient and on 
> server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error

Reply via email to