On Thu, 26 Feb 2026 15:16:33 GMT, Thomas Stuefe <[email protected]> wrote:
> When starting child processes from Java, we bootstrap the child process after
> fork and before exec. As part of that process, up to five pipes are handed to
> the child: three for stdin/out/err, respectively, and two internal
> communication pipes (fail and childenv).
>
> If, concurrently with our invocation of `ProcessBuilder.start()`, third-party
> native code forks a child of its own, the natively forked child carries
> copies of these pipes. It then may keep these pipes open. This results in
> various forms of communication errors, most likely hangs - either in
> `ProcessBuilder.start()`, or in customer code.
>
> In the customer case that started this investigation,
> `ProcessBuilder.start()` hung intermittently when using a third-party Eclipse
> library that happened to perform forks natively.
>
> The JVM has no full control over what happens in its process, since we allow
> native code to run. Therefore, native forks can happen, and we have to work
> around them.
>
> The fix makes sure that the pipes we use in ProcessBuilder are always tagged
> with CLOEXEC. Since forks are typically followed by execs, this will close
> any file descriptors that were accidentally inherited.
>
> ### FORK/VFORK mode
>
> Here, it is sufficient to open all our pipes with O_CLOEXEC.
>
> The caveat here is that only Linux offers an API to do that cleanly:
> `pipe2(2)` ([1]). On MacOS and AIX, we don't have `pipe2(2)`, so we need to
> emulate that behavior using `pipe(2)` and `fcntl(2)` in quick succession.
> That is still racy, since we did not completely close the time window within
> which pipe file descriptors are not O_CLOEXEC. But this is the best we can do.
>
> ### POSIX_SPAWN mode
>
> Creating the pipes with CLOEXEC alone is not sufficient. With
> `posix_spawn(3)`, we exec twice: first to load the jspawnhelper (inside
> `posix_spawn(3)`), a second time to load the target binary. Pipes created
> with O_CLOEXEC would not survive the first exec.
>
> Therefore, instead of manually `dup2(2)`'ing our file descriptors after the
> first exec in jspawnhelper itself, we set up dup2 file actions to let
> posix_spawn do the dup'ing. According to POSIX, these dup2 file actions will
> be processed before the kernel closes the inherited CLOEXEC file descriptors.
>
> Unfortunately, macOS is again not POSIX-compliant, since the macOS kernel can
> close CLOEXEC file descriptors before posix_spawn processes them in its dup2
> file actions. As a workaround for that bug, we create temporary copies of the
> pipe file descriptors that are untagged with CLOEXEC and use ...
Except for my question on `ERR_FD_SETUP` looks good.
src/java.base/unix/native/jspawnhelper/jspawnhelper.c line 55:
> 53: * file-descriptor errors. We may have no other way of
> 54: * communicating those errors to the parent. */
> 55: #define ERR_FD_SETUP 245
Why do we need this? It seems that it isn't checked anywhere. Also, because
`exit()` only returns the most significant byte as exit code, we might return
`0` (i.e. "*success*") if the offending file descriptor happens to be 11.
src/java.base/unix/native/jspawnhelper/jspawnhelper.c line 150:
> 148: if (!fdIsValid(fd)) {
> 149: printf("Invalid fd: %d (%s)\n", fd, strerror(errno));
> 150: exit(ERR_FD_SETUP + fd);
Why do we need this special error handling here instead of simply calling
`shutItDown()` like we do for other errors?
src/java.base/unix/native/jspawnhelper/jspawnhelper.c line 157:
> 155: if (!fdIsPipe(fd)) {
> 156: printf("Not a pipe? %d\n", fd);
> 157: exit(ERR_FD_SETUP + fd);
Why do we need this special error handling here instead of simply calling
`shutItDown()` like we do for other errors?
-------------
PR Review: https://git.openjdk.org/jdk/pull/29939#pullrequestreview-3875555823
PR Review Comment: https://git.openjdk.org/jdk/pull/29939#discussion_r2871873965
PR Review Comment: https://git.openjdk.org/jdk/pull/29939#discussion_r2871878233
PR Review Comment: https://git.openjdk.org/jdk/pull/29939#discussion_r2871878903