John Hammond wrote: > On 07/08/2010 08:53 AM, Kevin Van Maren wrote: >> Hi David, >> >> I've also seen short writes on local file systems -- can't even count >> the number of times I've modified codes to use wrappers that handle >> short reads/writes. Not at all surprised you see them when suspending >> the app. >> >> http://www.opengroup.org/onlinepubs/000095399/functions/write.html >> "If write() is interrupted by a signal after it successfully writes some >> data, it shall return the number of bytes written." >> Similar language exists for read as well. I always thought libc should >> handle the retry for you by default, but I didn't write the spec. >> >> Signals are relatively rare, and the window is a bit smaller for a local >> file system, which may be why they haven't seen it/properly dealt with >> it yet. > > It also says "The issue of which files or file types are interruptible > is considered an implementation design issue. This is often affected > primarily by hardware and reliability issues." > > For Linux, the signal(7) manpage indicates that read(2), readv(2), > write(2), writev(2), and ioctl(2) calls on "slow" devices should > return -EINTR when interrupted by a signal, and goes on to say that > "slow" devices are ones "where the I/O call may block for an > indefinite time, for example, a terminal, pipe, or socket. (A disk is > not a slow device according to this definition.)"
How about a network file system waiting for server failover (especially if it is not automatic)? > Nowhere does it say something really helpfully clear like "Writing to > a regular file shall suspend the calling process until such time > as..." But, I interpret this to mean that operations on regular files > are not interruptible, and should not return -EINTR. Moreover, I > understand that this is the consensus among those unlucky enough to care. > > On the other hand, there are some explicitly specified situations > which will result in short writes to a regular file, like file size > limits. With NFS, "hard,intr" is the most sane configuration. For Lustre, operations (should) become interruptible after the initial timeout period has passed. Kevin _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
