On 17/09/17 17:38, Al Viro wrote:
On Sun, Sep 17, 2017 at 09:34:01AM -0700, Linus Torvalds wrote:
Now, I suspect most (all?) do, but that's a historical artifact rather
than "design". In particular, the VFS layer used to do the locking for
the filesystems, to guarantee the POSIX requirements (POSIX requires
that writes be seen atomically).

But that lock was pushed down into the filesystems, since some
filesystems really wanted to have parallel writes (particularly for
direct IO, where that POSIX serialization requirement doesn't exist).

That's all many years ago, though. New filesystems are likely to have
copied the pattern from old ones, but even then..

Also, it's worth noting that "inode->i_rwlock" isn't even well-defined
as a lock. You can have the question of *which* inode gets talked
about when you have things like eoverlayfs etc. Normally it would be
obvious, but sometimes you'd use "file->f_mapping->host" (which is the
same thing in the simple cases), and sometimes it really wouldn't be
obvious at all..

So... I'm really not at all convinced that i_rwsem is sensible. It's
one of those things that are "mostly right for the simple cases",
The thing pretty much common to all of them is that write() might need
to modify permissions (suid removal), which brings ->i_rwsem in one
way or another - notify_change() needs that held...

For GFS2, if we are to hold the inode info constant while it is checked, we would need to take a glock (read lock in this case) across the relevant operations. The glock will be happy under i_rwlock, since we have a lock ordering that takes local locks ahead of cluster locks. I've not dug into this enough to figure out whether the current proposal will allow this to work with GFS2 though. Does IMA cache the results from the ->read_integrity() operation?


Reply via email to