On 2010-10-29, at 21:20, Daniel Kobras wrote:
> On Fri, Oct 29, 2010 at 09:40:33AM +0100, Frederik Ferner wrote:
>> Doing a 'strace -T -e file ls -n' on one directory with about 750 files, 
>> while users were seeing the hanging ls, showed lstat calls taking 
>> seconds, up to 23s.
> 
> The (l)stat() calls determine the exact size of all files in the displayed 
> directory. This means that each OSTs needs to revoke client write locks for 
> all these files, ie. client-side write caches for all files in the directory 
> are flushed before the (l)stat() returns. This can easily take several 
> seconds if there is heavy write activity on the file.

Actually, unlike most other cluster filesystems Lustre does not need to revoke 
the OST write locks in order to determine the file size.  The OST extent locks 
are conditionally revoked if the client is no longer using them, but if they 
are in use the clients holding those locks only return a "glimpse" of the 
current file size to the OST, which in turn returns the size to the client 
doing the (l)stat() call.

Since the (l)stat() call is itself not atomic (i.e. the size may be out-of-date 
even before the system call returns to userspace even for local filesystems), 
this glimpse behaviour is ok for (l)stat() calls.  For system calls where the 
client needs to know the actual file size (e.g. open(O_APPEND) writes, or 
truncate()) then the client actually does need to get the extent lock that 
covers the end of the file, and of course it does so.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to