On Wed, Apr 27, 2022 at 2:26 PM Andreas Gruenbacher <agrue...@redhat.com> wrote: > > Well, POSIX explicitly mentions those atomicity expectations, e.g., > for read [1]:
Yes. I'm aware. And my point is that we've never done _that_ kind of atomicity. It's also somewhat ambiguous what it actually means, since what it then talks about is "all bytes that started out together ends together" and "interleaving". That all implies that it's about the *position* of the reads and writes being atomic, not the *data* of the reads and writes. That, btw, was something we honored even before we had the locking around f_pos accesses - a read or write system call would get its own local *copy* the file position, the read or write would then do that IO based on that copied position - so that things that "started out together ends together" - and then after the operation is done it would *update* the file position atomically. Note that that is exactly so that data would end up "together". But it would mean that two concurrent reads using the same file position might read the *same* area of the file. Which still honors that "the read is atomic wrt the range", but obviously the actual values of "f_pos" is basically random after the read (ie is it the end of the first read, or the end of the second read?). The same paragraph also explicitly mentions pipes and FIFOs, despite an earlier paragraph dismissing them, which is all just a sign of things being very confused. Anyway, I'm not objecting very sternously to making it very clear in some documentation that this "data atomicity" is not what Linux has ever done. If you do overlapping IO, you get what you deserve. But I do have objections. On one hand, it's not all that different from some of the other notes we have in the man-pages (ie documenting that whole "just under 2GB" limit on the read size, although that's actually using the wrong constant: it's not 0x7ffff000 bytes, it's MAX_RW_COUNT, which is "INT_MAX & PAGE_MASK" and that constant in the man-page is as such only true on a system with 4kB page sizes) BUT! I'm 100% convinced that NOBODY HAS EVER given the kind of atomicity guarantees that you would see from reading that document as a language-lawyer. For example, that section "2.9.7 Thread Interactions with Regular File Operations" says that "fstat()" is atomic wrt "write()", and that you should see "all or nothing". I *GUARANTEE* that no operating system ever has done that, and I further claim that reading it the way you read it is not only against reality, it's against sanity. Example: if I do a big write to a file that I just created, do you really want "fstat()" in another thread or process to not even be able to see how the file grows as the write happens? It's not what anybody has *EVER* done, I'm pretty sure. So I really think (a) you are mis-reading the standard by attributing too strong logic to paperwork that is English prose and not so exact (b) documenting Linux as not doing what you are mis-reading it for is only encouraging others to mis-read it too The whole "arbitrary writes have to be all-or-nothing wrt all other system calls" is simply not realistic, and has never been. Not just not in Linux, but in *ANY* operating system that POSIX was meant to describe. And equally importantly: if some crazy person were to actually try to implement such "true atomicity" things, the end result would be objectively worse. Because you literally *want* to see a big write() updating the file length as the write happens. The fact that the standard then doesn't take those kinds of details into account is simply because the standard isn't meant to be read as a language lawyer, but as a "realistically .." Linus