Re: rfc: [patch] change attribute for ext3

2006-12-14 Thread Trond Myklebust
On Wed, 2006-12-13 at 20:52 -0500, J. Bruce Fields wrote:
  What kind of requirements does NFSv4 place on the version?  Monotonic is
  probably a good bet.
 
 The only requirement is that it be unique (assuming a file is never
 modified 2^64 times).  Clients can't compare them except for equality.

The other requirement is that they be updated in more or less any
situation where you would normally see a 'ctime' update. In other words
any time when the file metadata or data changes, and any time when the
ACL changes.

(NB: I'm not sure what we should do w.r.t. xattr changes since those are
not really covered by RFC3530.)

Atomicity is not a hard requirement, however the server is required to
know whether or not the update was atomic. If the update is atomic, a
careful client may perform certain optimisations based upon it knowing
that no other changes to the inode have raced with this one. For
instance, if it knows that a file creation atomically updated the change
attribute of the directory, then it can determine that it does not need
to check for other changes to that directory.

  Does it need to be global for the filesystem
 
 Nope.
 
  or is a per-inode version sufficient?
 
 Yes.

Yes. If your filesystem wants to support Solaris or Reiser4-like
subfiles, then it is expected that each subfile should have its own
change attribute (whereas changes to the subfile 'directory' will be
reflected by the parent inode's change attribute.

Change attribute values may be reused if the inode number is reused (as
long as the filesystem has something like a generation counter that
allows it to distinguish between different instances of the same inode
number).

  What functionality of NFSv4 needs the version?
 
 Clients use it to revalidate their caches.

Yup. It is used to detect changes made on the NFS server itself
(possibly by other NFS clients, possibly by local processes on the
server), so that the client can flush out any stale cached data.

Cheers
  Trond

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [patch 2/3] change attribute for ext4: ext4 specific code

2006-12-14 Thread Andreas Dilger
On Dec 14, 2006  11:03 -0500, Theodore Tso wrote:
 There was discussion on yesterday's call about whether or not 32-bit
 was enough for NFSv4, or whether it also requried 64-bits of change
 notification in the RFC's.  So one of the questions is whether this is
 something that would justify requiring 64-bits --- and if so, maybe we
 need to require that big inodes be used and store the entire 64-bit
 value beyond 128 bytes.  This would mean that NFSv4 cache management
 couldn't be fully implemented without big inodes, or we'd have to make
 do by using the inode ctime as a partial substitute.

Per Trond and Bruce Field's reply to my email it seems that NFSv4 only
needs the version to compare for inequality.  If the change numbers are
sequential for a given inode it can OPTIONALLY extract additional
information about the server (i.e. it still has an up-to-date cache
because it was the only one that did an update on a given file).

So, I think for basic NFSv4 setups that 2^32 is sufficient (per Bull's
original patch) but 2^64 is desirable to avoid collisions and allow the
sequential updates logic to work properly for long-lived files.

So, I think a 32-bit field in the small inode, and an additional 32-bit
field in the large inode would be perfect.  It allows this functionality
to work with existing ext3 filesystems, if not quite optimally.

In addition, for Lustre, could we get a 64-bit field in the superblock
which contains the fs-wide version number.

I'm proposing that, per the original Bull patch, l_i_reserved1 be changed
to be i_version for linux, and we add i_version_hi after cr_time_extra in
the large inode.  The disk i_version would be stored in the vfs_inode
i_version (which is already used for this same purpose).  It would be good
for NFSv4 if the i_version field could be expanded to 64 bits to avoid
the need for it to have fs-specific operations, but failing that we can
put the high word into ext4_inode_info and NFS can access it via
export_operations I think.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html