On Mon, May 13, 2019 at 3:42 PM Thomas Stüfe <thomas.stu...@gmail.com> wrote:
> > Hi Martin, > > On Mon, May 13, 2019 at 2:08 PM Martin Buchholz <marti...@google.com> > wrote: > >> >> >> I am happy this is resolved and the intermittent behavior explained. Yes, >>> we could improve exception messages, especially since analyzing fork >>> scenarios is cumbersome. >>> >> >> I tried hard back in 2005 to provide pretty good java-level diagnostics >> when subprocess starting failed somehow (see WhyCantJohnnyExec) . At least >> the errno did get reported. >> >> > I know your code. For many years I wondered who Johnny is :) > > We have a very similar solution in our port: we have our own error codes > (plus errno mixed in where it makes sense) for the many things that can go > wrong in the forkhelper. Maybe we can improve upon your solution a bit. > And/or add tracing for environment etc. > > But here is one thing that I still do not understand with Remis problem: > > The theory is that the first exec(), starting jspawnhelper, went wrong > with NOACCESS, yes? > > Man page for posix_spawn() states: > > <quote> > Upon successful completion, posix_spawn() and posix_spawnp() place > the PID of the child process in pid, and return 0. If there is an > error before or during the fork(2), then no child is created, the > contents of *pid are unspecified, and these functions return an > error > number as described below. > > Even when these functions return a success status, the child process > may still fail for a plethora of reasons related to its pre-exec() > initialization. In addition, the exec(3) may fail. In all of these > cases, the child process will exit with the exit value of 127. > </quote> > > To me this looks as if what should have happened is: posix_spawn() should > have returned with success, since the fork() went thru. Then, the child > process (still inside posix_spawn()) attempts exec and gets a NOACCESS. > Then, child process should have ended with exit code 127. Your fail pipe > would never read an error code since we never entered the main function of > jspawnhelper. For the java caller it should have looked like a very short > lived process with exit code 127. > > Obviously this is not what happened, since Remi reported an IOException > with an errno. So, where do I understand this wong? > > Hmm this looks wrong. Just tested (Ubuntu 16.4): removing execute permission from jspawnhelper does not result in an IOException. Instead, Runtime.exec() seemingly succeeds. strace shows the exec() for jspawnhelper to fail as expected: 5676 [pid 13796] execve("/shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/lib/jspawnhelper", ["11:14"], [/* 79 vars */]) = -1 EACCES (Permission denied) 5677 [pid 13796] exit_group(127) = ? 5678 [pid 13780] <... vfork resumed> ) = 13796 5679 [pid 13796] +++ exited with 127 +++ 5680 [pid 13780] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13796, si_uid=1027, si_status=127, si_utime=0, si_stime=0} --- But we completely fail to notice. This is bad. We should fix it. One more thing, not sure if this is libc specific? The OpenGroup manpage for posix_spawn() states: <quote> If *posix_spawn*() or *posix_spawnp*() fail for any of the reasons that would cause *fork*() <http://pubs.opengroup.org/onlinepubs/007904875/functions/fork.html> or one of the *exec <http://pubs.opengroup.org/onlinepubs/007904875/functions/exec.html>* family of functions to fail, an error value shall be returned as described by *fork*() <http://pubs.opengroup.org/onlinepubs/007904875/functions/fork.html> and *exec <http://pubs.opengroup.org/onlinepubs/007904875/functions/exec.html>*, respectively (or, if the error occurs after the calling process successfully returns, the child process shall exit with exit status 127). </quote> which I interpret as the standard leaves open the decision if exec() errors are communicated outside to the caller of posix_spawn(). ..Thomas > I've had this little script around for ages: >> >> #!/bin/bash >> # -v: Print unabbreviated versions of environment, etc >> >> exec /usr/bin/strace -f -v -s 256 -e signal=none -e trace=process "$@" >> >> > We had all this as part of spawn traces. But this is a nice and neat idea. > Does it print current directory? > > Cheers, Thomas > >