On Thu, 26 Feb 2026 15:16:33 GMT, Thomas Stuefe <[email protected]> wrote:
> When starting child processes from Java, we bootstrap the child process after > fork and before exec. As part of that process, up to five pipes are handed to > the child: three for stdin/out/err, respectively, and two internal > communication pipes (fail and childenv). > > If, concurrently with our invocation of `ProcessBuilder.start()`, third-party > native code forks a child of its own, the natively forked child carries > copies of these pipes. It then may keep these pipes open. This results in > various forms of communication errors, most likely hangs - either in > `ProcessBuilder.start()`, or in customer code. > > In the customer case that started this investigation, > `ProcessBuilder.start()` hung intermittently when using a third-party Eclipse > library that happened to perform forks natively. > > The JVM has no full control over what happens in its process, since we allow > native code to run. Therefore, native forks can happen, and we have to work > around them. > > The fix makes sure that the pipes we use in ProcessBuilder are always tagged > with CLOEXEC. Since forks are typically followed by execs, this will close > any file descriptors that were accidentally inherited. > > ### FORK/VFORK mode > > Here, it is sufficient to open all our pipes with O_CLOEXEC. > > The caveat here is that only Linux offers an API to do that cleanly: > `pipe2(2)` ([1]). On MacOS and AIX, we don't have `pipe2(2)`, so we need to > emulate that behavior using `pipe(2)` and `fcntl(2)` in quick succession. > That is still racy, since we did not completely close the time window within > which pipe file descriptors are not O_CLOEXEC. But this is the best we can do. > > ### POSIX_SPAWN mode > > Creating the pipes with CLOEXEC alone is not sufficient. With > `posix_spawn(3)`, we exec twice: first to load the jspawnhelper (inside > `posix_spawn(3)`), a second time to load the target binary. Pipes created > with O_CLOEXEC would not survive the first exec. > > Therefore, instead of manually `dup2(2)`'ing our file descriptors after the > first exec in jspawnhelper itself, we set up dup2 file actions to let > posix_spawn do the dup'ing. According to POSIX, these dup2 file actions will > be processed before the kernel closes the inherited CLOEXEC file descriptors. > > Unfortunately, macOS is again not POSIX-compliant, since the macOS kernel can > close CLOEXEC file descriptors before posix_spawn processes them in its dup2 > file actions. As a workaround for that bug, we create temporary copies of the > pipe file descriptors that are untagged with CLOEXEC and use ... Hi Thomas, Thanks a lot for finding this issue, describing it in all details and creating regression tests for it. >From a first glance the changes look OK but I'll have to take a closer look >next week. I am just a little concerned about the ever increasing code complexity in this area. Have you thought about using Unix domain sockets with `socketpair()` instead of pipes for the parent/child communication? That might be simpler and more portable, although I haven't really tried it out yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29939#issuecomment-3974413937
