[
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359369#comment-14359369
]
Colin Patrick McCabe commented on HDFS-7915:
--------------------------------------------
I found another problem here. To explain it, I need to explain how the
communication happens now.
1. In the {{BlockReaderFactory}}, the DFSClient initiates the file descriptor
request by sending:
{code}
[2-byte] 28 [DATA_TRANSFER_VERSION]
[1-byte] 87 [REQUEST_SHORT_CIRCUIT_FDS]
[var] OpRequestShortCircuitAccessProto(blk, blockToken, slotId, tracing
stuff)
{code}
2. On the DataNode, in {{DataXceiver}}, we read the
{{OpRequestShortCircuitAccessProto}} that the client sent. We call
{{DataNode#requestShortCircuitFdsForRead}} to load the file descriptors. If
that succeeded, we send back a {{BlockOpResponseProto}} with status {{SUCCESS}}.
3. Back in the DFSClient, we read the {{BlockOpResponseProto}}.
4. If it contains a SUCCESS response, the DFSClient calls
{{sock.recvFileInputStreams}}. This reads a single byte and also passes the
new file descriptor to us (the DFSClient.)
The problem is that if the DFSClient closes the socket after step #3, but
before step #4, the DataNode thinks that the transfer was successful and never
unregisters the slot. This is what led to the unit test failures earlier. It
seems that there is a buffer in the UNIX domain socket that we are writing to,
which lets the DataNode's write succeed immediately even before the DFSClient
actually reads the data.
To fix this, we can add a step #5: the DFSClient writes a byte for the DataNode
to receive. And step #6: the datanode reads it. That way, if a socket close
or other error happens before step #5, we know that the FD didn't get sent.
This can be done compatibly by adding a new boolean to the protobuf which
indicates to the DataNode that the client supports "receipt verification." New
datanodes will set this bit and old ones will not. Neither the datanode nor
the dfsclient will attempt to do receipt verification unless the other party
supports it.
> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell
> the DFSClient about it because of a network error
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-7915
> URL: https://issues.apache.org/jira/browse/HDFS-7915
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch
>
>
> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell
> the DFSClient about it because of a network error. In
> {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first
> part (mark the slot as used) and fail at the second part (tell the DFSClient
> what it did). The "try" block for unregistering the slot only covers a
> failure in the first part, not the second part. In this way, a divergence can
> form between the views of which slots are allocated on DFSClient and on
> server.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)