[ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359351#comment-14359351
 ] 

Colin Patrick McCabe commented on HDFS-7915:
--------------------------------------------

bq. Here bld is set to SUCCESS status, without checking whether fis is null or 
not. However, down in the code below:.... success is set to true only when fis 
is not null. I saw a bit inconsistency here. Is it success when fis is null? If 
not, then the first section has an issue. If yes, then we can probably change 
success to isFisObtained.

There is no inconsistency.  {{DataNode#requestShortCircuitFdsForRead}} cannot 
return null.  It can only throw an exception or return some fds.  There is a 
difference between attempting to send a SUCCESS response to the DFSClient, and 
the whole function being successful.  Just because we attempted to send a 
SUCCESS response doesn't mean we actually did it.  We must actually send the 
fds and the response to succeed.

I will add a Precondition check to make it clearer that {{fis}} cannot be null 
when a SUCCESS response is being sent.

bq. The reason that we have to unregister a slot could be an exception recorded 
in bld, or because of an exception not currently caught in this method. I think 
we can add code to capture the currently uncaught exception, remember it, then 
re-throw it. Such that when we do the logging above in the final block, we can 
report this exception as the reason why we are un-registering the slot in this 
log.

I think this would add too much complexity.  If we catch Throwable, we can't 
re-throw Throwable.  So we'd have to have separate catch blocks for 
RuntimeException, IOException, and probably another block to catch other things.

> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
> the DFSClient about it because of a network error
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7915
>                 URL: https://issues.apache.org/jira/browse/HDFS-7915
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch
>
>
> The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
> the DFSClient about it because of a network error.  In 
> {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
> part (mark the slot as used) and fail at the second part (tell the DFSClient 
> what it did). The "try" block for unregistering the slot only covers a 
> failure in the first part, not the second part. In this way, a divergence can 
> form between the views of which slots are allocated on DFSClient and on 
> server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to