On 2010-08-03, at 12:45, Kevin Van Maren wrote:
> Since Bug 22492 hit a lot of people, it sounds like opencache isn't generally 
> useful unless enabled on every node. Is there an easy way to force files out 
> of the cache (ie, echo 3 > /proc/sys/vm/drop_caches)?

For Lustre you can do "lctl set_param ldlm.namespaces.*.lru_size=clear" will 
drop all the DLM locks on the clients, which will flush all pages from the 
cache.

>> I just looked into strace of cp -fal (which I guess you meant instead of 
>> just -fa that would just copy everything).
>> 
>> so we traverse the tree down creating a dir structure in parallel first (or 
>> just doing it in readdir order)
>> 
>> open("/mnt/lustre/a/b/c/d/e/f", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
>> +1 RPC
>> 
>> fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
>> +1 RPC (if no opencache)
>> 
>> fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
>> getdents(3, /* 4 entries */, 4096)      = 96
>> getdents(3, /* 0 entries */, 4096)      = 0
>> +1 RPC

Having large readdir RPCs would help for directories with more than about 170 
entries.

>> close(3)                                = 0
>> +1 RPC (if no opencache)
>> 
>> lstat("/mnt/lustre/a/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, 
>> ...}) = 0
>> (should be cached, so no RPC)
>> 
>> mkdir("/mnt/lustre/blah2/b/c/d/e/f/g", 040755) = 0
>> +1 RPC
>> 
>> lstat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, 
>> ...}) = 0
>> +1 RPC

If we do the mkdir(), the client does not cache the entry?

>> stat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, 
>> ...}) = 0
>> (should be cached, so no RPC)
>> 
>> Then we get to files:
>> link("/mnt/lustre/a/b/c/d/e/f/g/k/8", "/mnt/lustre/blah2/b/c/d/e/f/g/k/8") = >> 0
>> +1 RPC
>> 
>> futimesat(AT_FDCWD, "/mnt/lustre/blah2/b/c/d/e/f/g/k", {{1280856246, 0}, 
>> {1280856291, 0}}) = 0
>> +1 RPC
>> 
>> then we start traversing the just created tree up and chowning it:
>> chown("/mnt/lustre/blah2/b/c/d/e/f/g/k", 0, 0) = 0
>> +1 RPC 
>> 
>> getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_access", 
>> 0x7fff519f0950, 132) = -1 ENODATA (No data available)
>> +1 RPC

This is gone in 1.8.4

>> stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, 
>> ...}) = 0
>> (not sure why another stat here, we already did it on the way up. Should be 
>> cached)
>> 
>> setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k", "system.posix_acl_access", 
>> "\x02\x00\x00\x00\x01\x00\x07\x00\xff\xff\xff\xff\x04\x00\x05\x00\xff\xff\xff\xff
>>  \x00\x05\x00\xff\xff\xff\xff", 28, 0) = 0
>> +1 RPC

Strange that it is setting an ACL when it didn't read one?

>> getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_default", 
>> 0x7fff519f0950, 132) = -1 ENODATA (No data available)
>> +1 RPC
>> 
>> stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, 
>> ...}) = 0
>> Hm, stat again? did not we do it a few syscalls back?

Gotta love those GNU file utilities.  They are very stat happy.

>> stat("/mnt/lustre/blah2/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, 
>> ... }) = 0
>> stat of the target. +1 RPC (the cache got invalidated by link above).
>> 
>> setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k", "system.posix_acl_default", 
>> "\x02\x00\x00\x00", 4, 0) = 0
>> +1 RPC

Here it is also setting an ACL even though it didn't get one from the source.

>> So I guess there is a certain number of stat RPCs that would not be present 
>> on NFS due to different ways the caching works, plus all the getxattrs. Not 
>> sure if this is enough to explain 4x rate difference.
>> 
>> Also you can try disabling debug (if you did not already) to see how big of 
>> an impact that would make. It used to be that debug was affecting metadata 
>> loads a lot, though with recent debug levels adjustments I think it was 
>> somewhat improved.

Useful would be to run "strace -tttT" to get timestamps for each operation to 
see for which operations it is slower on Lustre than NFS.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to