Hi Pavel, I closed this phabracator, I've been working on this on-and-off and I
think a different approach would be better.
I am working up a patch where I change tools/lldb-server/lldb-platform.cpp from
using a fork() model to using std::thread's. The main thread listens for
incoming connections, and when one is received, it starts up a new std::thread
to handle that connection. I have a PortCoordinator singleton object that
manages the ports we can use for communications. Newly created threads request
ports from (& free them when the thread is finished) so that it would be
possible to run multiple tests at the same time.
The current code fork()'s when it receives a connection, and gives the child
process a copy of all the ports that are available. Because it's in a separate
process, if we were to receive a second connection and fork off a new process,
it would have the same list of ports listed as available. When debugserver is
being used, this is a problem - when the lldb-server platform thread/process
gets a qLaunchGDBServer packet, we run debugserver saying "hey debugserver
listen for a connection on port N" and then tell the remote lldb process
"there's a debugserver waiting to hear from you on port N" - so lldb-server
can't test whether port N is in use or not.
(there was also a problem in GDBRemoteCommunication::StartDebugserverProcess
where it has a url like localhost:1111 and then it tries to run debugserver in
--named-pipe mode even though we already have a port # to use in the url.)
The final problem is that TCPSocket::Accept waits for an incoming connection on
the assigned port #, and when it comes in it gets a new file descriptor for
that connection. It should resume listening to that assigned port for the next
connection that comes in ... but today TCPSocket::Accept calls
CloseListenSockets() so after opening the first lldb-server platform connection
that comes in, when we go to accept() the second, the socket has been closed
and we exit immediately. To quickly recap the POSIX API (I never do network
programming so I have to remind myself of this every time), the workflow for a
server like this usually looks like
parentfd = socket(AF_INET, SOCK_STREAM, 0);
optval = 1;
setsockopt(parentfd, SOL_SOCKET, SO_REUSEADDR,
(const void *)&optval , sizeof(int));
serveraddr.sin_family = AF_INET;
serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);
serveraddr.sin_port = htons((unsigned short)portno);
bind(parentfd, (struct sockaddr *) &serveraddr, sizeof(serveraddr);
listen(parentfd, 100); // allow 100 connections to queue up -- whatever
childfd = accept(parentfd, (struct sockaddr *) &clientaddr, &clientlen);
and then we go back to accept()'ing on parentfd after we've spun off something
to talk to childfd.
I've been doing all of my work on an old branch for reasons that are boring, so
some/all of this may be fixed on top of tree already! But this is what I had
to do to get my branch to work correctly on a remote system.
I also noticed that dotest.py won't run multiple debug sessions with a remote
target. That was part of my goal, to speed these up, but it seems to run in
--no-multiprocess mode currently.
I'll be picking this up next week - my changes right now are a mess, and
they're against this old branch that I have to work on, but I'll get them
cleaned up & updated to top of tree and make up a patchset. But I wanted to
give you a heads up on where I'm headed as this touches a lot of your code.
> On Jan 18, 2018, at 3:44 AM, Pavel Labath via Phabricator
> <revi...@reviews.llvm.org> wrote:
> labath added a comment.
> In https://reviews.llvm.org/D42210#979608, @jasonmolenda wrote:
>> Jim remembers some problem with logging across a fork()'ed process, Pavel
>> does this ring a bell? This change might be completely bogus -- but it's
>> very difficult to debug the child side of an lldb-server platform today.
> I believe Jim is thinking of https://reviews.llvm.org/D38938. The issue is
> that if another thread holds the logging mutex while we fork(), the mutex
> will remain in a bogus state in the child process. This would mean that any
> operation on the Log subsystem (such as enabling logging) could hang. We hold
> the mutex for just a couple of instructions (just enough to copy a
> shared_ptr), but after some code reshuffles, we were hitting this case very
> often in liblldb.
> Now, I don't think this can happen here as at the point where we are forking,
> the platform process is single-threaded. So, enabling logging here should be
> safe, but only accidentally. It should be possible to make this always safe,
> but that will need to be done with a very steady hand. My opinion is we
> should not bother with that (i.e., just check this in as-is) until we
> actually need more threads in the platform, particularly as I think the
> long-term direction here should be to replace the fork with a new thread for
> handling the connection.
> As for testing, the problem we have now is that we have no direct platform
> tests today -- remote tests will test platform by the virtue of all
> connections going through it but a standard check-lldb will not even run this
> code. I have been planning to add tests like these, but I haven't got around
> to that yet, and I'm not sure when will that happen. If you want to take a
> stab at it, I can give you some pointers.
> BTW: for most debugging needs you should be able to just start the platform
> without the --server argument, which will disable forking and handle each
> connection in the main process.
> rL LLVM
lldb-commits mailing list