Adam Jakubek created THRIFT-5127:
------------------------------------
Summary: Race condition in TNonblockingServer
Key: THRIFT-5127
URL: https://issues.apache.org/jira/browse/THRIFT-5127
Project: Thrift
Issue Type: Bug
Components: C++ - Library
Affects Versions: 0.13.0
Reporter: Adam Jakubek
Attachments: thrift_deadlock.cpp
When {{TNonblockingServer::stop}} method is called on a different thread
shortly after {{TNonblockingServer::serve}}, the server occassionally fails to
terminate.
The following sequence of events has been observed with Thrift 0.13:
# {{TNonblockingServer::serve}} starts spawning listener threads.
# Another thread calls {{TNonblockingServer::stop}} before all listeners are
created.
A shutdown request is sent to those IO threads which have been already
initialized (but not all).
# {{TNonblockingServer::serve}} completes spawning the remaining listener
threads (including the primary IO thread with index 0).
# {{TNonblockingServer::serve}} continues to run despite the stop request,
since the main thread and some of the listener threads are still active.
The issue seems to be caused by late initialization of
{{TNonblockingIOThread}}'s state.
Server's listener threads are spawned in the {{TNonblockingServer::serve}}
method (in a nested call to {{registerEvents}}. They finish initialization for
some of their state in the {{TNonblockingIOThread::run}} method (part of the
{{Runnable}} interface).
One of the fields which is initialized at that stage is the
{{notificationPipeFDs_}} array, which as far as I can tell is used to pass
messages between threads.
It seems that the thread which invokes {{TNonblockingServer::stop}} might
attempt to use the notification pipe to request shutdown while the
{{notificationPipeFDs_}} descriptor array is still uninitialized.
In that case, the message is lost (the {{TNonblockingIOThread::notify}} call
will return immediately) and the target thread never exits.
Btw. the {{threadId_}} field of {{TNonblockingIOThread}} is also accessed
concurrently by multiple threads without synchronization:
- the field is written in {{TNonblockingIOThread::registerEvents}} after
creation of the listener thread,
- there is a read in {{TNonblockingIOThread::breakLoop}} when server is being
stopped.
I'm attaching sample code which can reproduce the issue (although not
deterministically).
Some tweaking of the {{STOPPING_THREAD_DELAY}} constant might be necessary to
observe the deadlock.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)