[
https://issues.apache.org/jira/browse/HDFS-13745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590702#comment-16590702
]
James Clampffer commented on HDFS-13745:
----------------------------------------
Thanks for checkout this out [~anatoli.shein].
{quote}Is there a possibility that some task executed by the IoService will run
forever?
{quote}
Yes. Someone can pass in a callback that does a long sleep or busy wait.
There's plenty of comments that say you should never pass in a callback that
can block for an indeterminate amount of time; there's nothing the library can
do if someone chooses to ignore those. All of the internal tasks that the
library runs in the ioservice context have timeouts to prevent them from
running forever.
{quote}Should we add some timeout in BlockingStop method if we have been
waiting too long?
{quote}
No. BlockingStop is only there to prevent a thread self-join (and only blocks
if that's what would happen otherwise). The only thing exiting the loop early
can do is let the self join happen. On the surface it looks like you could
spawn another thread and run the dtor there but then you're stick with a
similar issue when it comes to managing the lifetime of that thread.
{quote}In the hdfs_ioservice_test in longRunningCallback we sleep for just 1
second, which might not be enough since if there is some sort of system delay
longer than 1 second the test might fail. Even though with any amount of sleep
there is a chance of this happening, it might make sense to increase it to 2-3
seconds.
{quote}
Increasing the sleep to 2 or 3 seconds seems just as arbitrary as a 1 second
sleep. I'll see if I can get rid of the sleep by adding an extra condition
variable.
bq. Also, can we submit another CI run for this? Looks like the previous one
didn't run for some reason.
Yeah. I'll do that once I add the condition variable.
> libhdfs++: Fix race in FileSystem destructor
> --------------------------------------------
>
> Key: HDFS-13745
> URL: https://issues.apache.org/jira/browse/HDFS-13745
> Project: Hadoop HDFS
> Issue Type: Task
> Components: native
> Reporter: James Clampffer
> Assignee: James Clampffer
> Priority: Major
> Attachments: HDFS-13745.000.patch
>
>
> Whatever happens to have the last shared_ptr to the IoService will run
> ~IoService when the shared_ptr goes out of scope. IoService's destructor is
> responsible for joining all worker threads in the pool. Most callbacks now
> own weak_ptr<IoService> that can be promoted to a shared_ptr in order to post
> new async tasks. If a callback object is the last thing holding the
> IoService shared_ptr it's going to try to join the thread pool inside of one
> of the thread pool's threads.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]