[jira] [Commented] (HDFS-13745) libhdfs++: Fix race in FileSystem destructor

James Clampffer (JIRA) Thu, 23 Aug 2018 12:06:08 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590702#comment-16590702
 ]


James Clampffer commented on HDFS-13745:
----------------------------------------

Thanks for checkout this out [~anatoli.shein].
{quote}Is there a possibility that some task executed by the IoService will run 
forever?
{quote}
Yes.  Someone can pass in a callback that does a long sleep or busy wait.  
There's plenty of comments that say you should never pass in a callback that 
can block for an indeterminate amount of time; there's nothing the library can 
do if someone chooses to ignore those.  All of the internal tasks that the 
library runs in the ioservice context have timeouts to prevent them from 
running forever.
{quote}Should we add some timeout in BlockingStop method if we have been 
waiting too long?
{quote}
No. BlockingStop is only there to prevent a thread self-join (and only blocks 
if that's what would happen otherwise).  The only thing exiting the loop early 
can do is let the self join happen. On the surface it looks like you could 
spawn another thread and run the dtor there but then you're stick with a 
similar issue when it comes to managing the lifetime of that thread.
{quote}In the hdfs_ioservice_test in longRunningCallback we sleep for just 1 
second, which might not be enough since if there is some sort of system delay 
longer than 1 second the test might fail. Even though with any amount of sleep 
there is a chance of this happening, it might make sense to increase it to 2-3 
seconds.
{quote}
Increasing the sleep to 2 or 3 seconds seems just as arbitrary as a 1 second 
sleep. I'll see if I can get rid of the sleep by adding an extra condition 
variable.
 
bq. Also, can we submit another CI run for this? Looks like the previous one 
didn't run for some reason.
Yeah.  I'll do that once I add the condition variable.
 

> libhdfs++: Fix race in FileSystem destructor
> --------------------------------------------
>
>                 Key: HDFS-13745
>                 URL: https://issues.apache.org/jira/browse/HDFS-13745
>             Project: Hadoop HDFS
>          Issue Type: Task
>          Components: native
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>            Priority: Major
>         Attachments: HDFS-13745.000.patch
>
>
> Whatever happens to have the last shared_ptr to the IoService will run 
> ~IoService when the shared_ptr goes out of scope.  IoService's destructor is 
> responsible for joining all worker threads in the pool.  Most callbacks now 
> own weak_ptr<IoService> that can be promoted to a shared_ptr in order to post 
> new async tasks.  If a callback object is the last thing holding the 
> IoService shared_ptr it's going to try to join the thread pool inside of one 
> of the thread pool's threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13745) libhdfs++: Fix race in FileSystem destructor

Reply via email to