[ 
https://issues.apache.org/jira/browse/HDFS-10311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-10311:
-----------------------------------
    Attachment: HDFS-10311.HDFS-8707.000.patch

Fix for this bug:

-add lock guards to DataNodeConnectionImpl methods because underlying asio 
socket isn't thread safe
-Refactor the deleter used by DataNodeConnectionImpl, cancel can now call the 
same code for disconnecting but it won't do the delete
-socket deleter checks if the socket is open before running SafeDisconnect to 
avoid false positive errors.

Tested by running 1K threads; each doing reads in a busy loop.  Then wait 10 
seconds to make sure the FS is connected and files opened properly before 
calling hdfsCancel on all file handles.  They all stop with no segfaults and 
return -1 as expected.

> libhdfs++: DatanodeConnection::Cancel should not delete the underlying socket
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-10311
>                 URL: https://issues.apache.org/jira/browse/HDFS-10311
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>         Attachments: HDFS-10311.HDFS-8707.000.patch
>
>
> DataNodeConnectionImpl calls reset on the unique_ptr that references the 
> underlying asio::tcp::socket.  If this happens after the continuation 
> pipeline checks the cancel state but before asio uses the socket it will 
> segfault because unique_ptr::reset will explicitly change it's value to 
> nullptr.
> Cancel should only call shutdown() and close() on the socket but keep the 
> instance of it alive.  The socket can probably also be turned into a member 
> of DataNodeConnectionImpl to get rid of the unique pointer and simplify 
> things a bit.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to