On Mon, May 13, 2019 at 4:11 PM Thomas Stüfe <thomas.stu...@gmail.com> wrote:
> > > On Mon, May 13, 2019 at 3:42 PM Thomas Stüfe <thomas.stu...@gmail.com> > wrote: > >> >> Hi Martin, >> >> On Mon, May 13, 2019 at 2:08 PM Martin Buchholz <marti...@google.com> >> wrote: >> >>> >>> >>> I am happy this is resolved and the intermittent behavior explained. Yes, >>>> we could improve exception messages, especially since analyzing fork >>>> scenarios is cumbersome. >>>> >>> >>> I tried hard back in 2005 to provide pretty good java-level diagnostics >>> when subprocess starting failed somehow (see WhyCantJohnnyExec) . At least >>> the errno did get reported. >>> >>> >> I know your code. For many years I wondered who Johnny is :) >> >> We have a very similar solution in our port: we have our own error codes >> (plus errno mixed in where it makes sense) for the many things that can go >> wrong in the forkhelper. Maybe we can improve upon your solution a bit. >> And/or add tracing for environment etc. >> >> But here is one thing that I still do not understand with Remis problem: >> >> The theory is that the first exec(), starting jspawnhelper, went wrong >> with NOACCESS, yes? >> >> Man page for posix_spawn() states: >> >> <quote> >> Upon successful completion, posix_spawn() and posix_spawnp() place >> the PID of the child process in pid, and return 0. If there is an >> error before or during the fork(2), then no child is created, the >> contents of *pid are unspecified, and these functions return an >> error >> number as described below. >> >> Even when these functions return a success status, the child >> process >> may still fail for a plethora of reasons related to its pre-exec() >> initialization. In addition, the exec(3) may fail. In all of >> these >> cases, the child process will exit with the exit value of 127. >> </quote> >> >> To me this looks as if what should have happened is: posix_spawn() should >> have returned with success, since the fork() went thru. Then, the child >> process (still inside posix_spawn()) attempts exec and gets a NOACCESS. >> Then, child process should have ended with exit code 127. Your fail pipe >> would never read an error code since we never entered the main function of >> jspawnhelper. For the java caller it should have looked like a very short >> lived process with exit code 127. >> >> Obviously this is not what happened, since Remi reported an IOException >> with an errno. So, where do I understand this wong? >> >> > Hmm this looks wrong. Just tested (Ubuntu 16.4): removing execute > permission from jspawnhelper does not result in an IOException. Instead, > Runtime.exec() seemingly succeeds. strace shows the exec() for jspawnhelper > to fail as expected: > > 5676 [pid 13796] > execve("/shared/projects/openjdk/jdk-jdk/output-fastdebug/images/jdk/lib/jspawnhelper", > ["11:14"], [/* 79 vars */]) = -1 EACCES (Permission denied) > 5677 [pid 13796] exit_group(127) = ? > 5678 [pid 13780] <... vfork resumed> ) = 13796 > 5679 [pid 13796] +++ exited with 127 +++ > 5680 [pid 13780] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, > si_pid=13796, si_uid=1027, si_status=127, si_utime=0, si_stime=0} --- > > But we completely fail to notice. > > This is bad. We should fix it. > > One more thing, not sure if this is libc specific? The OpenGroup manpage > for posix_spawn() states: > > <quote> > If *posix_spawn*() or *posix_spawnp*() fail for any of the reasons that > would cause *fork*() > <http://pubs.opengroup.org/onlinepubs/007904875/functions/fork.html> or > one of the *exec > <http://pubs.opengroup.org/onlinepubs/007904875/functions/exec.html>* family > of functions to fail, an error value shall be returned as described by > *fork*() > <http://pubs.opengroup.org/onlinepubs/007904875/functions/fork.html> and *exec > <http://pubs.opengroup.org/onlinepubs/007904875/functions/exec.html>*, > respectively (or, if the error occurs after the calling process > successfully returns, the child process shall exit with exit status 127). > </quote> > > which I interpret as the standard leaves open the decision if exec() > errors are communicated outside to the caller of posix_spawn(). > > ..Thomas > > .. opened https://bugs.openjdk.java.net/browse/JDK-8223777 to track this. > >> I've had this little script around for ages: >>> >>> #!/bin/bash >>> # -v: Print unabbreviated versions of environment, etc >>> >>> exec /usr/bin/strace -f -v -s 256 -e signal=none -e trace=process "$@" >>> >>> >> We had all this as part of spawn traces. But this is a nice and neat >> idea. Does it print current directory? >> >> Cheers, Thomas >> >> >