On Wed, Apr 27, 2022 at 5:29 AM Andreas Gruenbacher <agrue...@redhat.com> wrote: > > Regular (buffered) reads and writes are expected to be atomic with > respect to each other.
Linux has actually never honored that completely broken POSIX requirement, although I think some filesystems (notably XFS) have tried. It's a completely broken concept. It's not possible to honor atomicity with mmap(), and nobody has *ever* cared. And it causes huge amounts of problems and basically makes any sane locking entirely impossible. The fact that you literally broke regular file writes in ways that are incompatible with (much MUCH more important) POSIX file behavior to try to get that broken read/write atomicity is only one example among many for why that alleged rule just has to be ignored. We do honor the PIPE_BUF atomicity on pipes, which is a completely different kind of atomicity wrt read/write, and doesn't have the fundamental issues that arbitrary regular file reads/writes have. There is absolutely no sane way to do that file atomicity wrt arbitrary read/write calls (*), and you shouldn't even try. That rule needs to be forgotten about, and buried 6ft deep. So please scrub any mention of that idiotic rule from documentation, and from your brain. And please don't break "partial write means disk full or IO error" due to trying to follow this broken rule, which was apparently what you did. Because that "regular file read/write is done in full" is a *MUCH* more important rule, and there is a shitton of applications that most definitely depend on *that* rule. Just go to debian code search, and look for "if (write(" and you'll get thousands of hits, and on the first page of hits 9 out of 10 of the hits are literally about that "partial write is an error", eg code like this: if (write(fd,&triple,sizeof(triple)) != sizeof(triple)) reporterr(1,NULL); from libreoffice. Linus (*) Yeah, if you never care about performance(**) of mixed read/write, and you don't care about mmap, and you have no other locking issues, it's certainly possible. The old rule came about from original UNIX literally taking an inode lock around the whole IO access, because that was simple, and back in the days you'd never have multiple concurrent readers/writers anyway. (**) It's also instructive how O_DIRECT literally throws that rule away, and then some direct-IO people said for years that direct-IO is superior and used this as one of their arguments. Probably the same people who thought that "oh, don't report partial success", because we can't deal with it.