On Mon, May 2, 2022 at 8:58 PM Linus Torvalds <torva...@linux-foundation.org> wrote: > On Mon, May 2, 2022 at 11:31 AM Linus Torvalds > <torva...@linux-foundation.org> wrote: > > > > NOTE! This patch is entirely untested. I also didn't actually yet go > > look at what gfs2 does when 'bytes' and 'copied' are different. > > Oh, it's a lot of generic iomap_write_end() code, so I guess it's just > as well that I brought in the iomap people. > > And the iomap code has many different cases. Some of them do > > if (unlikely(copied < len && !folio_test_uptodate(folio))) > return 0; > > to force the whole IO to be re-done (looks sane - that's the "the > underlying folio wasn't uptodate, because we expected the write to > make it so"). > > And that might not have happened before, but it looks like gfs2 does > actually try to deal with that case. > > But since Andreas said originally that the IO wasn't aligned, I don't > think that "not uptodate" case is what is going on, and it's more > about some "partial write in the middle of a buffer succeeded" > > And the code also has things like > > if (ret < len) > iomap_write_failed(iter->inode, pos, len); > > which looks very wrong - it's not that the write failed, it's just > incomplete because it was done with page faults disabled. It seems to > try to do some page cache truncation based on the original 'len', but > not taking the successful part into account. Which all sounds > horrifically wrong. > > But I don't know the code well enough to really judge. It just makes > me uncomfortable, and I do suspect this code may be quite buggy if the > copy of the full 'len' doesn't succeed.
This has thrown me off in the past as well; it should be changed to iomap_write_failed(iter->inode, pos + ret, len - ret) for legibility. However, iomap_write_failed() only truncates past EOF and is preceded by i_size_write(iter->inode, pos + ret) here, so it's not strictly a bug. > Again, the patch I sent only _hides_ any issues and makes them > practically impossible to see. It doesn't really _fix_ anything, since > - as mentioned - regardless of fault_in_iov_iter_readable() > succeeding, racing with page-out could then cause the later > copy_page_from_iter_atomic() to have a partial copy anyway. Indeed. Let's see what we'll get with it. In the meantime, we've reproduced with 5.18-rc4 + commit 296abc0d91d8 ("gfs2: No short reads or writes upon glock contention"), and it still has the data corruption. > And hey, maybe there's something entirely different going on, and my > "Heureka! It might be explained by that partial write_end that > generally didn't happen before" is only my shouting at windmills. > > Linus >