Hi Martin, On Mon, May 13, 2019 at 2:08 PM Martin Buchholz <marti...@google.com> wrote:
> > > I am happy this is resolved and the intermittent behavior explained. Yes, >> we could improve exception messages, especially since analyzing fork >> scenarios is cumbersome. >> > > I tried hard back in 2005 to provide pretty good java-level diagnostics > when subprocess starting failed somehow (see WhyCantJohnnyExec) . At least > the errno did get reported. > > I know your code. For many years I wondered who Johnny is :) We have a very similar solution in our port: we have our own error codes (plus errno mixed in where it makes sense) for the many things that can go wrong in the forkhelper. Maybe we can improve upon your solution a bit. And/or add tracing for environment etc. But here is one thing that I still do not understand with Remis problem: The theory is that the first exec(), starting jspawnhelper, went wrong with NOACCESS, yes? Man page for posix_spawn() states: <quote> Upon successful completion, posix_spawn() and posix_spawnp() place the PID of the child process in pid, and return 0. If there is an error before or during the fork(2), then no child is created, the contents of *pid are unspecified, and these functions return an error number as described below. Even when these functions return a success status, the child process may still fail for a plethora of reasons related to its pre-exec() initialization. In addition, the exec(3) may fail. In all of these cases, the child process will exit with the exit value of 127. </quote> To me this looks as if what should have happened is: posix_spawn() should have returned with success, since the fork() went thru. Then, the child process (still inside posix_spawn()) attempts exec and gets a NOACCESS. Then, child process should have ended with exit code 127. Your fail pipe would never read an error code since we never entered the main function of jspawnhelper. For the java caller it should have looked like a very short lived process with exit code 127. Obviously this is not what happened, since Remi reported an IOException with an errno. So, where do I understand this wong? I've had this little script around for ages: > > #!/bin/bash > # -v: Print unabbreviated versions of environment, etc > > exec /usr/bin/strace -f -v -s 256 -e signal=none -e trace=process "$@" > > We had all this as part of spawn traces. But this is a nice and neat idea. Does it print current directory? Cheers, Thomas