[ https://issues.apache.org/jira/browse/THRIFT-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229593#comment-15229593 ]
ASF GitHub Bot commented on THRIFT-3768: ---------------------------------------- Github user tpcwang commented on a diff in the pull request: https://github.com/apache/thrift/pull/980#discussion_r58814042 --- Diff: lib/cpp/src/thrift/server/TThreadedServer.cpp --- @@ -92,29 +97,42 @@ TThreadedServer::~TThreadedServer() { void TThreadedServer::serve() { TServerFramework::serve(); - // Drain all clients - no more will arrive - try { - Synchronized s(clientsMonitor_); - while (getConcurrentClientCount() > 0) { - clientsMonitor_.wait(); - } - } catch (TException& tx) { - string errStr = string("TThreadedServer: Exception joining workers: ") + tx.what(); - GlobalOutput(errStr.c_str()); + // Ensure post-condition of no active clients + Synchronized s(clientMonitor_); + while (!clientMap_.empty()) { + clientMonitor_.wait(); } } void TThreadedServer::onClientConnected(const shared_ptr<TConnectedClient>& pClient) { - threadFactory_->newThread(pClient)->start(); + Synchronized sync(clientMonitor_); + clientMap_.insert(ClientMap::value_type(pClient.get(), boost::make_shared<TConnectedClientTracker>(pClient))); + + // We do not track the threads themselves + ClientMap::const_iterator it = clientMap_.find(pClient.get()); + threadFactory_->newThread(it->second)->start(); } void TThreadedServer::onClientDisconnected(TConnectedClient* pClient) { - THRIFT_UNUSED_VARIABLE(pClient); - Synchronized s(clientsMonitor_); - if (getConcurrentClientCount() == 0) { - clientsMonitor_.notify(); + Synchronized sync(clientMonitor_); + clientMap_.erase(pClient); + if (clientMap_.empty()) { --- End diff -- This doesn't fix the issue because as soon as you notify clientMonitor_, TThreadedServer::serve() will exit and clients are free to release TThreadedServer, but the thread that initiated onClientDisconnected is still in the middle of disposeConnectedClient, which it should not be executing any more because the object is already destroyed. Try adding one second sleep after the delete in disposeConnectedClient and run the tests. I see the same segfault in TServerIntegrationTest. I am not sure we can get away with not joining the threads. > TThreadedServer may crash if it is destroyed immediately after it returns > from serve(); TThreadedServer disconnects clients when they connec > -------------------------------------------------------------------------------------------------------------------------------------------- > > Key: THRIFT-3768 > URL: https://issues.apache.org/jira/browse/THRIFT-3768 > Project: Thrift > Issue Type: Bug > Components: C++ - Library > Affects Versions: 0.9.3 > Reporter: Ted Wang > Assignee: James E. King, III > Priority: Minor > > Here's a sequence that shows the race: > Thread-1 (Users of TThreadedServer): Calls TThreadedServer::stop(), which > calls interruptChildren and initiates the tearing down of client connections. > Thread-2: In TServerFramework::serve(), broke out of accept, and now blocks > in TThreadedServer::serve() waiting to drain all the clients. > Thread-3 (The connected client thread created by TThreadedServer): In > disposeConnectedClient, running because the server is shutting down and the > shared_ptr specified this function to be the cleanup function for the client. > This thread just returned from onClientDisconnected and now context switches. > Thread-2: TThreadedServer::serve() is notified that all of the clients have > disconnected and completes. > Thread-1: Joins on Thread-2 and destroys the server object because it is done. > Thread-3: Finally gets a chance to run, but now encounters undefined behavior > because it is still executing a member function of an object that has been > destroyed. > You can force this race in action if you put sleep(1) before > onClientDisconnected() in disposeConnectedClient -- This message was sent by Atlassian JIRA (v6.3.4#6332)