Hi David, On Tue, Sep 11, 2018 at 8:29 PM, David Lloyd <david.ll...@redhat.com> wrote: > I think this is a cool idea.
Thanks. I think I did not come up with it though, I think the technique was known already. > Do you have any performance numbers? Sure: small program, just spawning off /bin/true a 1000 times, measured on my t450s running Ubuntu 16.4: Number open files: <none> 1000 100000 openjdk8: 305ms 1.5s 115s sapjvm8: 721ms 2.3s 142s factor 2.4 1.53 1.23 So, it starts off with factor 2.3, but penalty diminishes with the number of open files. This comparison is a imprecise however since we compare different JVMs with completely different Runtime.exec() implementations. We do more checks in our JVM, which may mean more syscalls per fork(). ..Thomas > On Tue, Sep 11, 2018 at 12:52 PM Thomas Stüfe <thomas.stu...@gmail.com> wrote: >> >> Hi all, >> >> I wanted to gauge opinions on the following issue: >> >> Runtime.exec, on Linux, uses vfork(2) by default. It gives us better >> performance compared with fork() and robustness in constrained memory >> situations. >> >> But as we know vfork() can be dangerous if used incorrectly. In the >> child process before exec'ing, we live in the memory of the parent >> process. If we are not very careful we can influence or crash the >> parent process. >> >> According to POSIX pretty much the only thing the child process is >> allowed to do after vfork(2) is to exec(3) immediately; if that fails, >> you must call _exit(2). >> >> http://pubs.opengroup.org/onlinepubs/009604599/functions/vfork.html >> >> However, in the openjdk we do a number of things beyond that: >> >> - stdin,out,err pipe handling business >> - closing all file descriptors >> - we change the working directory >> - we may actually modify errno manually >> - in case exec fails, we communicate the error back to the parent using pipe. >> >> This involves calling a number of libc functions beyond exec(), namely >> read, close, dup2, opendir/readdir, write, chdir... It also needs a >> bit of stack, since we assemble path names. >> >> -- >> >> I was curious whether there were any real issues, so I tested (on >> Ubuntu 16.4) and found: >> >> 1) A crash - any crash - in the child process before exec() will kill >> the parent jvm dead. Weirdly enough, we do not even enter our error >> handling, but seem to die instantly with the default "Segmentation >> Fault". >> >> 2) Signals received by the child process before exec() influence the >> parent process. For example: >> - SIGINT set to the child ends both parent and child, immediately >> - SIGABRT aborts both child and parent >> - any error signal sent to the child lead to the behavior described at (1) >> >> 3) A stack overflow in the child before exec() also kills the parent. >> Unsurprising, since guard page hit -> segfault -> see (1). >> >> 4) more amusing, setting errno in the child before exec() changes the >> errno in the parent process. propagates to the parent process. >> But since errno is thread local and the thread in the parent process >> is waiting in vfork() and will, upon return, not look at errno (after >> all, vfork succeeded) this causes no trouble. >> >> There may be more issues, but these were the ones I tested. >> >> In all cases I counter-tested with fork() instead of vfork() and as >> expected with fork() the parent process stays unaffected as it should >> be. >> >> ------------- >> >> Whether you think these issues are worth solving is an open question. >> >> All these cases may happen in the wild (well, apart from >> crash-by-programming-error if one assumes the program to be really bug >> free) albeit with a very small probability. But once these bugs occur, >> they can be very difficult to analyse. So fixing this may be >> worthwhile. >> >> At SAP, we opted for robustness, so we changed the Runtime.exec() >> implementation to deal with vfork() issues. Basically, we employ the >> exec-twice technique: >> >> - in the child, after the vfork(), we immediately exec() into a little >> bootstrap binary ("forkhelper"). >> - Now we are safe in the sense that we do not share memory with the >> parent process anymore >> - Then, parent process communicates with the child via pipes and gives >> it all information needed to do the "real" exec: environ, current dir, >> arguments... . >> - Now the child exec's a second time, this time into the real target binary. >> >> The point of this technique is that we minimize the window in the >> child between vfork and the first exec. In fact, we are now fully >> POSIX compliant. This solves the described pathological cases. >> >> It has some other advantages too, e.g. allowing for better error >> handling and tracing in the Runtime.exec() area. Performance-wise it >> depends: we exec twice, so we pay twice. However, since the parent >> continues execution after the first exec, it spends less time waiting >> on the child process, which can make a difference if there are many >> file descriptors open. >> >> --- >> >> Checking opinions here. Do you think we are okay with our current >> implementation or would a change as described above be welcome in the >> OpenJDK too? >> >> Thanks, and Best Regards, Thomas > > > > -- > - DML