Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-11 Thread Mingming Cao
On Tue, 2007-07-10 at 22:17 -0700, Andrew Morton wrote:
 On Tue, 10 Jul 2007 22:09:08 -0400 Mingming Cao [EMAIL PROTECTED] wrote:
 
  David Chinneer pointed that we need to journal the version number
  updates together with the operations that causes the change of the inode
  version number, in order to survive server crashes so clients won't see
  the counter go backwards.
  
  So increment i_version in fs code is probably the place to ensure the
  inode version changes are stored to disk. It's seems update the ext4
  inode version in every ext4_mark_inode_dirty() is the easiest way.
 
 That still makes us dependent upon _something_ changing the inode.  For
 overwrites the only something is mtime.
 
 If we don't want to have a peculiar dependency upon s_time_gran=1e9 (and
 I don't think we do) then I guess we'll need new code in or around
 file_update_time() to do this.

do you mean mark inode dirty all the times in file_update_time()? Not
sure about the overhead for ext3/4.

Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-11 Thread Andrew Morton
On Tue, 10 Jul 2007 23:18:50 -0400 Mingming Cao [EMAIL PROTECTED] wrote:

 On Tue, 2007-07-10 at 22:17 -0700, Andrew Morton wrote:
  On Tue, 10 Jul 2007 22:09:08 -0400 Mingming Cao [EMAIL PROTECTED] wrote:
  
   David Chinneer pointed that we need to journal the version number
   updates together with the operations that causes the change of the inode
   version number, in order to survive server crashes so clients won't see
   the counter go backwards.
   
   So increment i_version in fs code is probably the place to ensure the
   inode version changes are stored to disk. It's seems update the ext4
   inode version in every ext4_mark_inode_dirty() is the easiest way.
  
  That still makes us dependent upon _something_ changing the inode.  For
  overwrites the only something is mtime.
  
  If we don't want to have a peculiar dependency upon s_time_gran=1e9 (and
  I don't think we do) then I guess we'll need new code in or around
  file_update_time() to do this.
 
 do you mean mark inode dirty all the times in file_update_time()? Not
 sure about the overhead for ext3/4.
 

hm, I hadn't thought about it in any detail.

Maybe something like

--- a/fs/inode.c~a
+++ a/fs/inode.c
@@ -1238,6 +1238,11 @@ void file_update_time(struct file *file)
sync_it = 1;
}
 
+   if (IS_I_VERSION_64(inode)) {
+   inode_inc_iversion(inode);  /* Takes i_lock on 32-bit */
+   sync_it = 1;
+   }
+
if (sync_it)
mark_inode_dirty_sync(inode);
 }
_

?
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-11 Thread Andreas Dilger
On Jul 10, 2007  23:34 -0400, Trond Myklebust wrote:
 On Wed, 2007-07-11 at 13:21 +1000, Neil Brown wrote:
  So my vote is to increment i_version in common code every time any
  change is made to the file, and alloc_inode should initialise it to
  current time, which might be changed by the filesystem before it calls
  unlock_new_inode. 
  ... but doesn't lustre want to control its i_version... so maybe not :-(
 
 If lustre wants to be exportable via pNFS (as Peter Braam has suggested
 it should), then it had better be able to return a change attribute that
 is compatible with the NFSv4.1 spec...

The Lustre use of i_version is a superset of what NFSv4 needs - the Lustre
version can be used to compare the updates of two inodes.  It is set
to be the Lustre transaction number (which is sequentially incremented
on a per filesystem basis), so that we can use the per-inode version
to do replay of client operations even if they have been disconnected for
a long time, which is why we want to be able to control the actual value.
We don't want the version to be updated for e.g. file defragmentation
or other similar internal-only changes which need ext4_mark_inode_dirty().

We had a patch to disable ext4 inode versioning by a flag the superblock,
but we dropped it at the last minute because it needed some updates and
we didn't want to wait on that for submitting these changes upstream.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-11 Thread J. Bruce Fields
On Wed, Jul 11, 2007 at 01:21:55PM +1000, Neil Brown wrote:
 And just by-the-way, the server doesn't really have the option of not
 sending the attribute.  If i_version isn't defined, it has to fake
 something using mtime, and hope that is good enough.

ctime, actually--the change attribute is also supposed to be updated on
attribute updates.

 Alternately we could mandate that i_version is always kept up-to-date
 and if a filesystem doesn't have anything to load from storage, it
 just sets it to the current time in nanoseconds.
 
 That would mean that a client would need to flush it's cache whenever
 the inode fell out of cache on the server, but I don't think we can
 reliably do better than that.
 
 I think I like that approach.
 
 So my vote is to increment i_version in common code every time any
 change is made to the file, and alloc_inode should initialise it to
 current time, which might be changed by the filesystem before it calls
 unlock_new_inode. 

So the client would be invalidating its cache more often than necessary,
rather than failing to invalidate it when it should.  I agree that
that's probably the better tradeoff, although I wish I had a better idea
of the downside.  I don't know, for example, whether users might see
unpleasant results if every client has to reread its cached data on a
reboot.

The currently proposed change--just providing a model change attribute
implementation for ext4 and leaving other filesystems untouched--is a
more conservative step.

So I'm inclined to just do this ext4 thing first, and then look into
further change attribute experiments next time around

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-11 Thread J. Bruce Fields
On Wed, Jul 11, 2007 at 09:28:06AM -0500, Dave Kleikamp wrote:
 On Wed, 2007-07-11 at 15:05 +1000, Neil Brown wrote:
  It just occurred to me:
  
   If i_version is 64bit, then knfsd would need to be careful when
   reading it on a 32bit host.  What are the locking rules?
 
 How does knfsd use i_version?  I would think that if all it was doing
 was to compare (i_version == previous_version)

That's correct.  (Though it's the client that's doing the comparison,
actually--the server is just reporting the value.)

 then locking wouldn't really matter.  Well, theoretically,
 previous_version could be 0x1, and i_version could be
 0x1, knfsd checks the high word, then ext4 updates i_version
 to 0x2, then knfsd checks the low word, detecting no change.
 How likely is this?

The choice of upper word in your example is arbitrary, but other than
that I believe your example is essentially the only one.  So this would
only happen when *both*

- the read of the new value of the low word happens precisely
  2^32 i_version updates after the word was read on the client's
  previous cache revalidation, and
- the value of i_version itself is close enough to a 32-bit
  boundary that wraparound can happen between the reads of the
  high and low words.

 (I don't understand why i_version even needs to be 64 bits in the
 first place.)

A 32-bit i_version could in theory wrap pretty quickly, couldn't it?
That's not a problem in itself--the problem would only arise if two
subsequent client queries of the change attribute happened a multiple of
2^32 i_version increments apart.

This is more likely than the previous scenario, but still very unlikely.
I would have guessed that even in situations with a very high rate of
updates and a low rate of client revalidations, the chance of two
revalidations happening exactly 2^32 updates apart would still be no
more than 1 in 2^32.  (Could odd characteristics of the workloads (like
updates that tend to happen in power-of-2 groups?) make it any more
likely?)

I'd be happier if ext4 at least allowed the possibility of 64 bits in
the future.  And there's always the chance someone would find a use for
an i_version that was nondecreasing, even if nfs didn't care.

   Presumably it is only updated under i_mutex protection, but having to
   get i_mutex to read it would seem a little heavy handed.
 
 How does knfsd protect itself from the inode changing after i_version is
 checked?  Is any locking being done otherwise?

If the client always requests the change attribute before reading, and
the i_version is always updated after data is modified, I think we're
OK.  Admittedly this is a little subtle.

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Andrew Morton
On Sun, 01 Jul 2007 03:37:04 -0400
Mingming Cao [EMAIL PROTECTED] wrote:

 This patch converts the 32-bit i_version in the generic inode to a 64-bit
 i_version field.
 

That's obvious from the patch.  But what was the reason for making this
(unrelated to ext4) change?

Please update the changelog for this.

 Index: linux-2.6.21/include/linux/fs.h
 ===
 --- linux-2.6.21.orig/include/linux/fs.h
 +++ linux-2.6.21/include/linux/fs.h
 @@ -549,7 +549,7 @@ struct inode {
   uid_t   i_uid;
   gid_t   i_gid;
   dev_t   i_rdev;
 - unsigned long   i_version;
 + u64 i_version;
   loff_t  i_size;
  #ifdef __NEED_I_SIZE_ORDERED
   seqcount_t  i_size_seqcount;

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Neil Brown
On Tuesday July 10, [EMAIL PROTECTED] wrote:
 
 Yes, thanks.  It doesn't actually tell us why we want to implement
 this attribute and it doesn't tell us what the implications of failing
 to do so are, but I guess we can take that on trust from the NFS guys.

You would like to think so, but remember NFSv4 was designed by a
committee :-)

The 'change' number is used for cache consistency, and as the spec
makes very strong statements about the 'change' number, it is very
hard (or impossible) to implement a server correctly without storing a
change number in stable storage (just one of my grips about V4).

 
 But I suspect the ext4 implementation doesn't actually do this.  afaict we
 won't update i_version for file overwrites (especially if s_time_gran can
 indeed be 1,000,000,000) and of course for MAP_SHARED modifications.  What
 would be the implications of this?

The first part sounds like a bug - i_version should really be updated
by every call to -commit_write (if that is still what it is called).

The MAP_SHARED thing is less obvious.  I guess every time we notice
that the page might have been changed, we need to increment i_version.

 
 And how does the NFS server know that the filesystem implements i_version? 
 Will a zero-value of i_version have special significance, telling the
 server to not send this attribute, perhaps?

That is a very important question.  Zero probably makes sense, but
what ever it is needs to be agreed and documented.
And just by-the-way, the server doesn't really have the option of not
sending the attribute.  If i_version isn't defined, it has to fake
something using mtime, and hope that is good enough.

Alternately we could mandate that i_version is always kept up-to-date
and if a filesystem doesn't have anything to load from storage, it
just sets it to the current time in nanoseconds.

That would mean that a client would need to flush it's cache whenever
the inode fell out of cache on the server, but I don't think we can
reliably do better than that.

I think I like that approach.

So my vote is to increment i_version in common code every time any
change is made to the file, and alloc_inode should initialise it to
current time, which might be changed by the filesystem before it calls
unlock_new_inode. 
... but doesn't lustre want to control its i_version... so maybe not :-(

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Trond Myklebust
On Wed, 2007-07-11 at 13:21 +1000, Neil Brown wrote:
 On Tuesday July 10, [EMAIL PROTECTED] wrote:
  
  Yes, thanks.  It doesn't actually tell us why we want to implement
  this attribute and it doesn't tell us what the implications of failing
  to do so are, but I guess we can take that on trust from the NFS guys.
 
 You would like to think so, but remember NFSv4 was designed by a
 committee :-)
 
 The 'change' number is used for cache consistency, and as the spec
 makes very strong statements about the 'change' number, it is very
 hard (or impossible) to implement a server correctly without storing a
 change number in stable storage (just one of my grips about V4).

Well... It reflects a requirement that was just as present in the
caching models that we use for NFSv2/v3, but that we glossed over for
other reasons.
The real difference here is that v2/v3 caching model assumes that you
have sufficient resolution in the ctime/mtime to allow clients to detect
any changes to the file/directory contents, whereas NFSv4 allows you to
use an arbitrary variable (that may be the ctime, if it has sufficient
resolution) for the same purposes.

  But I suspect the ext4 implementation doesn't actually do this.  afaict we
  won't update i_version for file overwrites (especially if s_time_gran can
  indeed be 1,000,000,000) and of course for MAP_SHARED modifications.  What
  would be the implications of this?
 
 The first part sounds like a bug - i_version should really be updated
 by every call to -commit_write (if that is still what it is called).
 
 The MAP_SHARED thing is less obvious.  I guess every time we notice
 that the page might have been changed, we need to increment i_version.

You need to increment is any time that you detect remotely visible
changes.
IOW: any change that posix mandates should result in a ctime update,
must also result in an update of i_version. The only difference is that
i_version must not suffer from the time resolution issues that ctime
does.

  And how does the NFS server know that the filesystem implements i_version? 
  Will a zero-value of i_version have special significance, telling the
  server to not send this attribute, perhaps?
 
 That is a very important question.  Zero probably makes sense, but
 what ever it is needs to be agreed and documented.
 And just by-the-way, the server doesn't really have the option of not
 sending the attribute.  If i_version isn't defined, it has to fake
 something using mtime, and hope that is good enough.
 
 Alternately we could mandate that i_version is always kept up-to-date
 and if a filesystem doesn't have anything to load from storage, it
 just sets it to the current time in nanoseconds.
 
 That would mean that a client would need to flush it's cache whenever
 the inode fell out of cache on the server, but I don't think we can
 reliably do better than that.
 
 I think I like that approach.
 
 So my vote is to increment i_version in common code every time any
 change is made to the file, and alloc_inode should initialise it to
 current time, which might be changed by the filesystem before it calls
 unlock_new_inode. 
 ... but doesn't lustre want to control its i_version... so maybe not :-(

If lustre wants to be exportable via pNFS (as Peter Braam has suggested
it should), then it had better be able to return a change attribute that
is compatible with the NFSv4.1 spec...

Trond

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Neil Brown

It just occurred to me:

 If i_version is 64bit, then knfsd would need to be careful when
 reading it on a 32bit host.  What are the locking rules?

 Presumably it is only updated under i_mutex protection, but having to
 get i_mutex to read it would seem a little heavy handed.

 Should it use a seqlock like i_size?
 Could we use the same seqlock that i_size uses, or would we need a
 separate one?

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Andrew Morton
On Tue, 10 Jul 2007 22:09:08 -0400 Mingming Cao [EMAIL PROTECTED] wrote:

 David Chinneer pointed that we need to journal the version number
 updates together with the operations that causes the change of the inode
 version number, in order to survive server crashes so clients won't see
 the counter go backwards.
 
 So increment i_version in fs code is probably the place to ensure the
 inode version changes are stored to disk. It's seems update the ext4
 inode version in every ext4_mark_inode_dirty() is the easiest way.

That still makes us dependent upon _something_ changing the inode.  For
overwrites the only something is mtime.

If we don't want to have a peculiar dependency upon s_time_gran=1e9 (and
I don't think we do) then I guess we'll need new code in or around
file_update_time() to do this.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Andrew Morton
On Wed, 11 Jul 2007 15:05:27 +1000 Neil Brown [EMAIL PROTECTED] wrote:

 
 It just occurred to me:
 
  If i_version is 64bit, then knfsd would need to be careful when
  reading it on a 32bit host.  What are the locking rules?
 
  Presumably it is only updated under i_mutex protection, but having to
  get i_mutex to read it would seem a little heavy handed.
 
  Should it use a seqlock like i_size?
  Could we use the same seqlock that i_size uses, or would we need a
  separate one?
 

seqlocks are a bit of a pain to use (we've had plenty of deadlocks on the
i_size one).  We could reuse inode.i_lock for this modification.  Its
mandate is general purpose innermost lock to protect stuff in this inode.


-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-10 Thread Mingming Cao
On Tue, 2007-07-10 at 21:22 -0700, Andrew Morton wrote:
 On Tue, 10 Jul 2007 20:19:16 -0400 Mingming Cao [EMAIL PROTECTED] wrote:
 
  On Tue, 2007-07-10 at 18:22 -0700, Andrew Morton wrote:
   On Tue, 10 Jul 2007 18:09:40 -0400 Mingming Cao [EMAIL PROTECTED] wrote:
   
On Tue, 2007-07-10 at 16:30 -0700, Andrew Morton wrote:
 On Sun, 01 Jul 2007 03:37:04 -0400
 Mingming Cao [EMAIL PROTECTED] wrote:
 
  This patch converts the 32-bit i_version in the generic inode to a 
  64-bit
  i_version field.
  
 
 That's obvious from the patch.  But what was the reason for making 
 this
 (unrelated to ext4) change?
 

The need is came from NFSv4

On Fri, 2007-05-25 at 18:25 +0200, Jean noel Cordenner wrote: 
 Hi,
 
 This is an update of the i_version patch.
 The i_version field is a 64bit counter that is set on every inode
 creation and that is incremented every time the inode data is modified
 (similarly to the ctime time-stamp).
 The aim is to fulfill a NFSv4 requirement for rfc3530:
 5.5.  Mandatory Attributes - Definitions
 Name  #   DataType   Access   Description
 ___
 change3   uint64   READ A value created 
 by the
   server that the client can use to determine if file
   data, directory contents or attributes of the object
   have been modified.  The servermay return the object's
   time_metadata attribute for this attribute's value but
   only if the filesystem object can not be updated more
   frequently than the resolution of time_metadata.
 
 

 Please update the changelog for this.
 

Is above description clear to you?

   
   Yes, thanks.  It doesn't actually tell us why we want to implement
   this attribute and it doesn't tell us what the implications of failing
   to do so are, but I guess we can take that on trust from the NFS guys.
   
   But I suspect the ext4 implementation doesn't actually do this.  afaict we
   won't update i_version for file overwrites (especially if s_time_gran can
   indeed be 1,000,000,000) and of course for MAP_SHARED modifications.  What
   would be the implications of this?
   
  
  In the case of overwrite (file date updated), I assume the ctime/mtime
  is being updated and the inode is being dirtied, so the version number
  is being updated.
  
   vfs_write()-..
  -__generic_file_aio_write_nolock()
  -file_update_time()
  -mark_inode_dirty_sync()
  -__mark_inode_dirty(I_DIRTY_SYNC)
  -ext4_dirty_inode()
  -ext4_mark_inode_dirty()
 
 That assumes an mtime update for every write().  OK, so two writes in a
 single nanosecond won't be happening.  But in that case why is this code:
 
 static inline struct timespec ext4_current_time(struct inode *inode)
 {
   return (inode-i_sb-s_time_gran  NSEC_PER_SEC) ?
   current_fs_time(inode-i_sb) : CURRENT_TIME_SEC;
 }
 
 checking (s_time_gran  NSEC_PER_SEC) ??
 

Ext4 can still load/read ext3 fs (which by default with 128 bytes old
inode size, means doens't have support nanosecond timestamps), so it's
not always gurantee nanosecond timestamps granularity.(it depends on the
size of the inode (128 bytes), by default, a fresh ext4  increase inode
size to 256 bytes to have the room to store nanoseond timestamps, inode
versioning etc)

 Overall it is a bit unpleasing to rely upon mtime updates for a correct NFS
 server implementation: if we were to later decrease s_time_gran (as we
 might do, for performance reasons), the NFS server implementation starts
 reporting incorrect information.
 

:( that is a problem...

   And how does the NFS server know that the filesystem implements 
   i_version? 
   Will a zero-value of i_version have special significance, telling the
   server to not send this attribute, perhaps?
  
  Bruce raised up this question a few days back when he reviewed this
  patch, I think the solution is add a superblock flag for fs support
  inode versioning, probably at VFS layer?
 
 That would work.
 -
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-09 Thread Mingming Cao
On Fri, 2007-07-06 at 16:53 -0600, Andreas Dilger wrote:
 On Jul 06, 2007  09:51 -0400, J. Bruce Fields wrote:
  The use of a mount option means the change attribute could be
  inconsistent across mounts.  If we really need this, wouldn't it make
  more sense for it to be a persistent feature of the filesystem, set at
  mkfs time?
 
 Yes, having it stored into the superblock in s_flags is probably a good
 idea.  Kalpak, do you think you could get a patch that adds e.g.
 EXT4_FLAGS_NO_INODE_VERSION (like EXT4_FLAGS_SIGNED_HASH in e2fsprogs).
 
Per our ext4 interlock discussion today, I have dropped the
ext4-no-inode version-mount-option patch from ext4 patch queue. 

Thanks,
Mingming

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-06 Thread J. Bruce Fields
On Tue, Jul 03, 2007 at 05:32:00PM -0600, Andreas Dilger wrote:
 On Jul 03, 2007  18:15 -0400, J. Bruce Fields wrote:
  How will nfsd tell whether it can really on a given filesystem's
  i_version, or whether it should fall back on ctime?
 
 Good question.

Well, we don't need anything particularly complicated--just a one-bit
flag on the superblock would be enough.

  So what's the motivation for the noversion mount option?
 
 Lustre needs to be able to control the version number directly (version
 number needs to be ordered between all inodes, is set by Lustre to be a
 transaction number).  Instead of trying to incorporate this unused code
 into ext4 we just turn off the ext4 version code and let Lustre control
 this directly.  It may even be that NFSv4 will need to control the version
 numbers itself...

I can't think of any reason we would need to in the near future, but
maybe I'm insufficiently creative.

The use of a mount option means the change attribute could be
inconsistent across mounts.  If we really need this, wouldn't it make
more sense for it to be a persistent feature of the filesystem, set at
mkfs time?

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-06 Thread Andreas Dilger
On Jul 06, 2007  09:51 -0400, J. Bruce Fields wrote:
 The use of a mount option means the change attribute could be
 inconsistent across mounts.  If we really need this, wouldn't it make
 more sense for it to be a persistent feature of the filesystem, set at
 mkfs time?

Yes, having it stored into the superblock in s_flags is probably a good
idea.  Kalpak, do you think you could get a patch that adds e.g.
EXT4_FLAGS_NO_INODE_VERSION (like EXT4_FLAGS_SIGNED_HASH in e2fsprogs).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-03 Thread Andreas Dilger
On Jul 03, 2007  10:24 -0400, Trond Myklebust wrote:
 It looks OK to me, but you might want to strip out the now redundant
 i_version updates in add_dirent_to_buf(), ext4_rmdir(), ext4_rename().

Agreed, and I thought we discussed that already on the ext4 list.

 I also have some questions about how this will affect the readdir code:
 unless I missed something, the filp-f_version is still unsigned long,
 so the comparisons and assignments in ext4_readdir()/ext4_dx_readdir()
 no longer make sense.

I don't see them as any worse than existing checks.  For 32-bit systems
we only ever had a 32-bit in-memory version anyway so using only the
low 32 bits of i_version in f_version is no more racy than in the past.
For 64-bit systems using the full on-disk i_version is possible.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-03 Thread J. Bruce Fields
On Mon, Jul 02, 2007 at 10:58:33AM -0400, Mingming Cao wrote:
 Trond or Bruce, can you please review these patch series and ack if you
 agrees?

Thanks, looks like what we need!

How will nfsd tell whether it can really on a given filesystem's
i_version, or whether it should fall back on ctime?

 As to performance concerns that raise before the inode version counter
 (at least for ext4) is done inside ext4_mark_inode_dirty), so there is
 no extra IO work to store this counter to disk.

So what's the motivation for the noversion mount option?

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [EXT4 set 4][PATCH 1/5] i_version:64 bit inode version

2007-07-03 Thread Andreas Dilger
On Jul 03, 2007  18:15 -0400, J. Bruce Fields wrote:
 How will nfsd tell whether it can really on a given filesystem's
 i_version, or whether it should fall back on ctime?

Good question.

  As to performance concerns that raise before the inode version counter
  (at least for ext4) is done inside ext4_mark_inode_dirty), so there is
  no extra IO work to store this counter to disk.
 
 So what's the motivation for the noversion mount option?

Lustre needs to be able to control the version number directly (version
number needs to be ordered between all inodes, is set by Lustre to be a
transaction number).  Instead of trying to incorporate this unused code
into ext4 we just turn off the ext4 version code and let Lustre control
this directly.  It may even be that NFSv4 will need to control the version
numbers itself...

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line unsubscribe linux-ext4 in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html