Hi Pavel, I closed this phabracator, I've been working on this on-and-off and I think a different approach would be better.
I am working up a patch where I change tools/lldb-server/lldb-platform.cpp from using a fork() model to using std::thread's. The main thread listens for incoming connections, and when one is received, it starts up a new std::thread to handle that connection. I have a PortCoordinator singleton object that manages the ports we can use for communications. Newly created threads request ports from (& free them when the thread is finished) so that it would be possible to run multiple tests at the same time. The current code fork()'s when it receives a connection, and gives the child process a copy of all the ports that are available. Because it's in a separate process, if we were to receive a second connection and fork off a new process, it would have the same list of ports listed as available. When debugserver is being used, this is a problem - when the lldb-server platform thread/process gets a qLaunchGDBServer packet, we run debugserver saying "hey debugserver listen for a connection on port N" and then tell the remote lldb process "there's a debugserver waiting to hear from you on port N" - so lldb-server can't test whether port N is in use or not. (there was also a problem in GDBRemoteCommunication::StartDebugserverProcess where it has a url like localhost:1111 and then it tries to run debugserver in --named-pipe mode even though we already have a port # to use in the url.) The final problem is that TCPSocket::Accept waits for an incoming connection on the assigned port #, and when it comes in it gets a new file descriptor for that connection. It should resume listening to that assigned port for the next connection that comes in ... but today TCPSocket::Accept calls CloseListenSockets() so after opening the first lldb-server platform connection that comes in, when we go to accept() the second, the socket has been closed and we exit immediately. To quickly recap the POSIX API (I never do network programming so I have to remind myself of this every time), the workflow for a server like this usually looks like parentfd = socket(AF_INET, SOCK_STREAM, 0); optval = 1; setsockopt(parentfd, SOL_SOCKET, SO_REUSEADDR, (const void *)&optval , sizeof(int)); serveraddr.sin_family = AF_INET; serveraddr.sin_addr.s_addr = htonl(INADDR_ANY); serveraddr.sin_port = htons((unsigned short)portno); bind(parentfd, (struct sockaddr *) &serveraddr, sizeof(serveraddr); listen(parentfd, 100); // allow 100 connections to queue up -- whatever childfd = accept(parentfd, (struct sockaddr *) &clientaddr, &clientlen); and then we go back to accept()'ing on parentfd after we've spun off something to talk to childfd. I've been doing all of my work on an old branch for reasons that are boring, so some/all of this may be fixed on top of tree already! But this is what I had to do to get my branch to work correctly on a remote system. I also noticed that dotest.py won't run multiple debug sessions with a remote target. That was part of my goal, to speed these up, but it seems to run in --no-multiprocess mode currently. I'll be picking this up next week - my changes right now are a mess, and they're against this old branch that I have to work on, but I'll get them cleaned up & updated to top of tree and make up a patchset. But I wanted to give you a heads up on where I'm headed as this touches a lot of your code. Thanks! J > On Jan 18, 2018, at 3:44 AM, Pavel Labath via Phabricator > <revi...@reviews.llvm.org> wrote: > > labath added a comment. > > In https://reviews.llvm.org/D42210#979608, @jasonmolenda wrote: > >> Jim remembers some problem with logging across a fork()'ed process, Pavel >> does this ring a bell? This change might be completely bogus -- but it's >> very difficult to debug the child side of an lldb-server platform today. > > > I believe Jim is thinking of https://reviews.llvm.org/D38938. The issue is > that if another thread holds the logging mutex while we fork(), the mutex > will remain in a bogus state in the child process. This would mean that any > operation on the Log subsystem (such as enabling logging) could hang. We hold > the mutex for just a couple of instructions (just enough to copy a > shared_ptr), but after some code reshuffles, we were hitting this case very > often in liblldb. > > Now, I don't think this can happen here as at the point where we are forking, > the platform process is single-threaded. So, enabling logging here should be > safe, but only accidentally. It should be possible to make this always safe, > but that will need to be done with a very steady hand. My opinion is we > should not bother with that (i.e., just check this in as-is) until we > actually need more threads in the platform, particularly as I think the > long-term direction here should be to replace the fork with a new thread for > handling the connection. > > As for testing, the problem we have now is that we have no direct platform > tests today -- remote tests will test platform by the virtue of all > connections going through it but a standard check-lldb will not even run this > code. I have been planning to add tests like these, but I haven't got around > to that yet, and I'm not sure when will that happen. If you want to take a > stab at it, I can give you some pointers. > > BTW: for most debugging needs you should be able to just start the platform > without the --server argument, which will disable forking and handle each > connection in the main process. > > > Repository: > rL LLVM > > https://reviews.llvm.org/D42210 > > > _______________________________________________ lldb-commits mailing list email@example.com http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits