Hi Roger, thanks for the feedback. Please find my answers inline.
On Tue, Jun 4, 2019 at 5:09 PM Roger Riggs <roger.ri...@oracle.com> wrote: > Hi Thomas, > > A minor concern is the impact of the extra write and read that can cause > rescheduling > of the parent and child processes. But that's probably in the noise > compared to the > real work of exec. > I would think so too. exec() needs to read the binary from the file system, so there is some IO at least to come. > It would raise the complexity quite a bit/too much to code a single read > in the parent that could expect 0/4/8 bytes. > > I guess I could do that.. but I am not sure it is worthwhile. In our propietary port we have a more involved solution with some more read-write cycles between child and parent. I am not saying this is better - there are pros and cons - but I never saw any performance issues with our solution compared to OpenJDK. Based on that I believe one more read would be fine. > At ProcessImpl_md.c: 708: the "Read failed" is less than informative. > (Though it is the same as the pre-existing one at 720). > Yes. I have a follow up issue open (linked in the JBS issue) to improve reporting on errors. That may improve matters a bit. > But I suppose it has never happened. The 'Exec failed' is more specific > than 'read'. > And it has probably never been seen. > You know this is interesting. I thought exec errors should be not that uncommon or difficult to provoke. I did a test. I maimed an executable such that I cannot execute it without getting an exec format error (ENOEXEC). Then I tried to spawn it with Runtime.exec. But that appeared to be successful, but the child process died immediately. Happened regardless of launchMechanism. Then I ran an strace over it and saw this: 5332 [pid 3911] execve("./sleep2", ["./sleep2"], [/* 78 vars */] <unfinished ...> .. 5342 [pid 3911] <... execve resumed> ) = -1 ENOEXEC (Exec format error) 5343 [pid 3911] execve("/bin/sh", ["/bin/sh", "./sleep2"], [/* 78 vars */] <unfinished ...> 5347 [pid 3911] <... execve resumed> ) = 0 So, if the first exec fails for whatever reason, we try again, passing the executable file name as argument to the system shell. This does not feel right? Do you know why we do this? > > Thanks, Roger > > Thank you Roger. Can I consider the patch to be reviewed by you? ..Thomas > On 06/04/2019 02:06 AM, Thomas Stüfe wrote: > > Hi all, > > may I please have reviews/opinions on this fix? > > Fix has been live in our test landscape since some weeks. > > If we do not want this fix to be in JDK13, we may want to revert the > posix_spawn-by-default-on-Linux change. > > Thanks, Thomas > > > On Mon, May 20, 2019 at 4:15 PM Thomas Stüfe <thomas.stu...@gmail.com> > wrote: > >> Hi all, >> >> (old mail thread: >> https://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060139.html >> ) >> >> May I please have your reviews and opinions for the following bug fix: >> >> issue: https://bugs.openjdk.java.net/browse/JDK-8223777 >> cr: >> http://cr.openjdk.java.net/~stuefe/webrevs/8223777-posix_spawn-no-exec-error-alternate-impl/webrev.00/webrev/ >> >> --- >> >> The fix implements Florians proposal: the jspawnhelper will signal its >> aliveness to the parent process the moment it gains control. If the parent >> process does not get the signal, it assumes exec'ing the jspawnhelper never >> worked. >> >> I only do this for posix_spawn mode, out of cautiousness. >> >> (Note that I kept the fix as minimal as possible. I found a minor bug and >> some improvement possibilities and opened follow up issues to track them: >> JDK-8224180 and JDK-8224181). >> >> Thanks, Thomas >> >> >> >