On Wed, 24 May 2023 14:30:10 GMT, Volker Simonis <simo...@openjdk.org> wrote:
>>> > Sorry, but I don't understand this argument. If we do a short read we >>> > will work with corrupted ChildStuff and SpawnInfo >>> > structures. This can in the extreme case execute arbitrary code (e.g. if >>> > ChildStuff.argv is not fully read from the parent). You are >>> > basically saying it is better to work on corrupted data rather than >>> > reporting an error. >>> >>> No I am simply pointing out that this has changed more than just the issue >>> with close. And maybe a short-read does indicate data "corruption" and >>> maybe it should be a fatal error. But I don't know exactly how this might >>> manifest so perhaps there are benign short-reads that actually do happen. >>> Regardless it might be better to split this part out and focus on the close >>> issue here. >> >> Given the purpose and implementation of the `readFully` function, I don't >> see how it can return anything other than an error or the full requested >> read length. > > @RogerRiggs , @tstuefe sorry for bothering you one more time, but I think > @mlichtblau brought up an interesting case yesterday which isn't fully > resolved by the current fix. In @mlichtblau example, > `Java_java_lang_ProcessImpl_forkAndExec()` got stuck waiting on the ping from > `jspawnhelper` which itself blocks in `readFully()` because of a truncated > `write()` from the parent. > > The current fix already replaces `write()` with `restartableWrite()` which > handles the case where a `write()` call was interrupted **before** writing a > single byte. But there's a second case, namely when `write()` was interrupted > after it already wrote some (but not all) of the requested bytes. The > `write()` man page states: > >> Note that a successful `write()` may transfer fewer than `count` bytes. >> Such partial writes can occur for various reasons; for example, because .., >> or because a blocked `write()` to a socket, pipe, or similar was >> interrupted by a signal handler after it had transferred some, but before it >> had transferred all of the requested bytes. In the event of a partial >> write, the caller can make another `write()` call to transfer the remaining >> bytes. > > So in order to safely handle the second case as well, we have to replace > `restartableWrite()` with something like `writeFully()` and that's exactly > what this new commit does. I've also added a new test case which reproduces > the issue by simulating a truncated `write()`. > > For your convenience I've tried to explain all the additional changes (except > for trivial cleanups and renamings in the test) below: > > Thanks in advance, > Volker > > ##### `jspawnhelper.c`: > > > + // The file descriptor for reporting errors back to our parent we got on > the command > + // line should be the same like the one in the ChildStuff struct we've > just read. > + assert(c.fail[1] == fdout); > > > Just trying to be overly cautious. > > ##### childproc.{c,h} > > > -ssize_t restartableWrite(int fd, const void *buf, size_t count); > +ssize_t writeFully(int fd, const void *buf, size_t count); > > > > -ssize_t > -restartableWrite(int fd, const void *buf, size_t count) > ... > -} > > +ssize_t > +writeFully(int fd, const void *buf, size_t nbyte) > +#ifdef DEBUG > ... > +#endif > ... > +} > > > Replace `restartableWrite()` with `writeFully()` which is basically a copy of > `readFully()` (with additional testing code for jtreg). This handles both > cases, when `write()` is interrupted before having written a single byte and > returns EINTR, as well as the case where... @simonis I try to look at this before the weekend, but no guarantees. Your analysis looks sound, but the code is sensitive and I want to give it my full attention. One remark though, you should change JBS title and issue text to reflect the growing number of issues this patch is fixing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13956#issuecomment-1561750301