On Fri, 13 Mar 2026 16:35:43 GMT, Thomas Stuefe <[email protected]> wrote:

>> While working on a patch for https://bugs.openjdk.org/browse/JDK-8377907, we 
>> saw that error analysis in this area can take an unreasonable amount of 
>> time, especially for cases that are not directly reproducible on a developer 
>> machine.
>> 
>> This is because if an error occurs in the child process between the fork and 
>> exec stages (FORK mode) or between the first and second exec stages 
>> (POSIX_SPAWN mode), the ways in which we can communicate the errors are 
>> quite limited: standard IO may not yet work, and there is no logging. All we 
>> have is a single 32-bit value that we send back to the parent.
>> 
>> Today, this 32-bit value contains errno or a handful of our own error codes 
>> (which even overlap with errno values).
>> 
>> Following an idea by @RogerRiggs, this patch enhances the returned code by 
>> encoding more information:
>> 
>> 1) An 8-bit step number: this shows us the exact step at which the child 
>> program encountered an error.
>> 2) An optional 8-bit errno: like today, but only set for errors that are 
>> directly associated with an API call
>> 3) An optional 16-bit "hint": Free-form information whose meaning depends on 
>> the step.
>> 
>> The system is clean and easily expandable with more detailed error 
>> information, should we need it. We can also bump the error code to 64-bit to 
>> get more encoding space, but I left it at 32-bit for now.
>> 
>> ------
>> 
>> Error handling:
>> 
>> When an error occurs, we now attempt to send the error code back to the 
>> parent as before, but if that fails (e.g., because the fail pipe was not 
>> established), we also print the error code and exit the child process with 
>> an exit code corresponding to the step number.
>> 
>> Where we print the error code, we use the form `(<step>-<hint>-<errno>)`. 
>> For example, "(2-0-0)" is a jspawnhelper version mismatch. "(3-17-9)" means 
>> that in step 3 (early jspawnhelper setup), we found that file descriptor 17 
>> was invalid (9=EBADF). We only do this for internal errors that are meant to 
>> be read by us, not by the end user.
>> 
>> As a by-product of this patch, we now get higher error granularity for some 
>> user-facing errors. E.g., if the caller handed in an invalid working 
>> directory, the IOE exception message now reads "Invalid working directory" 
>> instead of the generic "Exec failed".
>> 
>> ------
>> 
>> Implementation notes:
>> 
>> - The "CHILD_ALIVE" ping was changed to be a special errcode with a special 
>> step, `ESTEP_CHILD_ALIVE`; but it works the same.
>> - Where functions were changed to return error codes in case of an error,...
>
> Thomas Stuefe has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   feedback roger

Thank you, Roger!

Let's see if Skara lets me integrate without re-approval.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30232#issuecomment-4059677364

Reply via email to