On Fri, 13 Mar 2026 05:00:44 GMT, Thomas Stuefe <[email protected]> wrote:
> While working on a patch for https://bugs.openjdk.org/browse/JDK-8377907, we > saw that error analysis in this area can take an unreasonable amount of time, > especially for cases that are not directly reproducible on a developer > machine. > > This is because if an error occurs in the child process between the fork and > exec stages (FORK mode) or between the first and second exec stages > (POSIX_SPAWN mode), the ways in which we can communicate the errors are quite > limited: standard IO may not yet work, and there is no logging. All we have > is a single 32-bit value that we send back to the parent. > > Today, this 32-bit value contains errno or a handful of our own error codes > (which even overlap with errno values). > > Following an idea by @RogerRiggs, this patch enhances the returned code by > encoding more information: > > 1) An 8-bit step number: this shows us the exact step at which the child > program encountered an error. > 2) An optional 8-bit errno: like today, but only set for errors that are > directly associated with an API call > 3) An optional 16-bit "hint": Free-form information whose meaning depends on > the step. > > The system is clean and easily expandable with more detailed error > information, should we need it. We can also bump the error code to 64-bit to > get more encoding space, but I left it at 32-bit for now. > > ------ > > Error handling: > > When an error occurs, we now attempt to send the error code back to the > parent as before, but if that fails (e.g., because the fail pipe was not > established), we also print the error code and exit the child process with an > exit code corresponding to the step number. > > Where we print the error code, we use the form `(<step>-<hint>-<errno>)`. For > example, "(2-0-0)" is a jspawnhelper version mismatch. "(3-17-9)" means that > in step 3 (early jspawnhelper setup), we found that file descriptor 17 was > invalid (9=EBADF). We only do this for internal errors that are meant to be > read by us, not by the end user. > > As a by-product of this patch, we now get higher error granularity for some > user-facing errors. E.g., if the caller handed in an invalid working > directory, the IOE exception message now reads "Invalid working directory" > instead of the generic "Exec failed". > > ------ > > Implementation notes: > > - The "CHILD_ALIVE" ping was changed to be a special errcode with a special > step, `ESTEP_CHILD_ALIVE`; but it works the same. > - Where functions were changed to return error codes in case of an error, I > always followed the same scheme. See comment in childproc... Looks good, A bit more detailed than I was thinking but is should be worth it when things take an unexpected turn. test/jdk/java/lang/ProcessBuilder/InvalidWorkDir.java line 30: > 28: * @library /test/lib > 29: * @run main/othervm -Xmx64m -Djdk.lang.Process.launchMechanism=FORK > InvalidWorkDir > 30: * @summary Check that passing an invalid work dir yields a corresponding > IOE text. The tag order is usually `@test`, `@bug`, `@summary`, `@library`, `@requires`, `@run`. test/jdk/java/lang/ProcessBuilder/InvalidWorkDir.java line 50: > 48: > 49: public static void main(String[] args) { > 50: ProcessBuilder bld = new ProcessBuilder("ls").directory(new > File("/doesnotexist")); Using the `/` root directory seems unusual, the current directory (for tests) is safe and the directory is cleared before running. ------------- PR Review: https://git.openjdk.org/jdk/pull/30232#pullrequestreview-3945131079 PR Review Comment: https://git.openjdk.org/jdk/pull/30232#discussion_r2932167279 PR Review Comment: https://git.openjdk.org/jdk/pull/30232#discussion_r2932182903
