Weston Pace created ARROW-12879:
-----------------------------------
Summary: [C++] Thread pool leaks memory when forking (and could
maybe deadlock) if threads exist at the time of fork
Key: ARROW-12879
URL: https://issues.apache.org/jira/browse/ARROW-12879
Project: Apache Arrow
Issue Type: Bug
Components: C++
Affects Versions: 4.0.0
Reporter: Weston Pace
While working on ARROW-12878 I have made the leak more obvious. When we fork
we cannot delete any remaining std::thread. In addition, we cannot safely use
any mutexes that might have been claimed by child threads.
The existing implementation works around this by creating a new
ThreadPool::State instance. However, shared_ptr's to the old instance are
still held by (now defunct) std::thread instances and so the state object will
never be deleted (valgrind confirms this).
Furthermore, if the fork were to happen while a thread task was running and had
captured some mutex (e.g. any of the ones used in the datasets API) then that
mutex will never be released.
A more correct workaround would be to hook into pthread_atfork and shut down
all threads (don't have to wait for all jobs to complete), forking, then
restarting all the threads on BOTH the child and the parent (today we restart
on just the child and we leave the parent running).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)