[
https://issues.apache.org/jira/browse/THRIFT-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264254#comment-15264254
]
ASF GitHub Bot commented on THRIFT-3768:
----------------------------------------
Github user tpcwang commented on a diff in the pull request:
https://github.com/apache/thrift/pull/980#discussion_r61601611
--- Diff: lib/cpp/src/thrift/concurrency/ThreadManager.cpp ---
@@ -421,7 +416,7 @@ void ThreadManager::Impl::removeWorker(size_t value) {
{
Synchronized s(workerMonitor_);
- while (workerCount_ != workerMaxCount_) {
+ while (workerCount_ > goalCount) {
--- End diff --
I had missed that it is using different monitors and thus different
mutexes. Thanks for clarifying.
I would still argue that this new implementation is worse than the existing
implementation. With the existing implementation, concurrent **writes** to
`workerMaxCount_` and `workerCount_` is synchronized, but the reading of those
values is not synchronized. Reading an aligned integer is most likely atomic on
most architectures, so this might not even be that unsafe (?). Getting either
the old value or the new value will not invalidate what it is trying to do.
On the other hand, the new implementation has a problem with concurrent
removeWorker and addWorker as far as I can tell. A removeWorker can be stuck
forever if a addWorker call comes in after the removeWorker sets the local
`goalCount` because it will never hit that goal with new workers that are
produced by addWorker.
Hopefully I didn't miss another detail. Thanks for working with me to
address these concerns.
> TThreadedServer may crash if it is destroyed immediately after it returns
> from serve(); TThreadedServer disconnects clients when they connec
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: THRIFT-3768
> URL: https://issues.apache.org/jira/browse/THRIFT-3768
> Project: Thrift
> Issue Type: Bug
> Components: C++ - Library
> Affects Versions: 0.9.3
> Reporter: Ted Wang
> Assignee: James E. King, III
> Priority: Minor
>
> Here's a sequence that shows the race:
> Thread-1 (Users of TThreadedServer): Calls TThreadedServer::stop(), which
> calls interruptChildren and initiates the tearing down of client connections.
> Thread-2: In TServerFramework::serve(), broke out of accept, and now blocks
> in TThreadedServer::serve() waiting to drain all the clients.
> Thread-3 (The connected client thread created by TThreadedServer): In
> disposeConnectedClient, running because the server is shutting down and the
> shared_ptr specified this function to be the cleanup function for the client.
> This thread just returned from onClientDisconnected and now context switches.
> Thread-2: TThreadedServer::serve() is notified that all of the clients have
> disconnected and completes.
> Thread-1: Joins on Thread-2 and destroys the server object because it is done.
> Thread-3: Finally gets a chance to run, but now encounters undefined behavior
> because it is still executing a member function of an object that has been
> destroyed.
> You can force this race in action if you put sleep(1) before
> onClientDisconnected() in disposeConnectedClient
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)