On Wed, Apr 27, 2022 at 7:13 PM Linus Torvalds <torva...@linux-foundation.org> wrote: > On Wed, Apr 27, 2022 at 5:29 AM Andreas Gruenbacher <agrue...@redhat.com> > wrote: > > > > Regular (buffered) reads and writes are expected to be atomic with > > respect to each other. > > Linux has actually never honored that completely broken POSIX > requirement, although I think some filesystems (notably XFS) have > tried.
Okay, I can happily live with that. I wonder if this could be documented in the read and write manual pages. Or would that be asking too much? > It's a completely broken concept. It's not possible to honor atomicity > with mmap(), and nobody has *ever* cared. > > And it causes huge amounts of problems and basically makes any sane > locking entirely impossible. > > The fact that you literally broke regular file writes in ways that are > incompatible with (much MUCH more important) POSIX file behavior to > try to get that broken read/write atomicity is only one example among > many for why that alleged rule just has to be ignored. > > We do honor the PIPE_BUF atomicity on pipes, which is a completely > different kind of atomicity wrt read/write, and doesn't have the > fundamental issues that arbitrary regular file reads/writes have. > > There is absolutely no sane way to do that file atomicity wrt > arbitrary read/write calls (*), and you shouldn't even try. > > That rule needs to be forgotten about, and buried 6ft deep. > > So please scrub any mention of that idiotic rule from documentation, > and from your brain. > > And please don't break "partial write means disk full or IO error" due > to trying to follow this broken rule, which was apparently what you > did. > > Because that "regular file read/write is done in full" is a *MUCH* > more important rule, and there is a shitton of applications that most > definitely depend on *that* rule. > > Just go to debian code search, and look for > > "if (write(" > > and you'll get thousands of hits, and on the first page of hits 9 out > of 10 of the hits are literally about that "partial write is an > error", eg code like this: > > if (write(fd,&triple,sizeof(triple)) != sizeof(triple)) > reporterr(1,NULL); > > from libreoffice. > > Linus > > (*) Yeah, if you never care about performance(**) of mixed read/write, > and you don't care about mmap, and you have no other locking issues, > it's certainly possible. The old rule came about from original UNIX > literally taking an inode lock around the whole IO access, because > that was simple, and back in the days you'd never have multiple > concurrent readers/writers anyway. > > (**) It's also instructive how O_DIRECT literally throws that rule > away, and then some direct-IO people said for years that direct-IO is > superior and used this as one of their arguments. Probably the same > people who thought that "oh, don't report partial success", because we > can't deal with it. > Thanks a lot, Andreas