On 2010-08-03, at 12:45, Kevin Van Maren wrote:
> Since Bug 22492 hit a lot of people, it sounds like opencache isn't generally
> useful unless enabled on every node. Is there an easy way to force files out
> of the cache (ie, echo 3 > /proc/sys/vm/drop_caches)?
For Lustre you can do "lctl set_param ldlm.namespaces.*.lru_size=clear" will
drop all the DLM locks on the clients, which will flush all pages from the
cache.
>> I just looked into strace of cp -fal (which I guess you meant instead of
>> just -fa that would just copy everything).
>>
>> so we traverse the tree down creating a dir structure in parallel first (or
>> just doing it in readdir order)
>>
>> open("/mnt/lustre/a/b/c/d/e/f", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
>> +1 RPC
>>
>> fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
>> +1 RPC (if no opencache)
>>
>> fcntl(3, F_SETFD, FD_CLOEXEC) = 0
>> getdents(3, /* 4 entries */, 4096) = 96
>> getdents(3, /* 0 entries */, 4096) = 0
>> +1 RPC
Having large readdir RPCs would help for directories with more than about 170
entries.
>> close(3) = 0
>> +1 RPC (if no opencache)
>>
>> lstat("/mnt/lustre/a/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096,
>> ...}) = 0
>> (should be cached, so no RPC)
>>
>> mkdir("/mnt/lustre/blah2/b/c/d/e/f/g", 040755) = 0
>> +1 RPC
>>
>> lstat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096,
>> ...}) = 0
>> +1 RPC
If we do the mkdir(), the client does not cache the entry?
>> stat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096,
>> ...}) = 0
>> (should be cached, so no RPC)
>>
>> Then we get to files:
>> link("/mnt/lustre/a/b/c/d/e/f/g/k/8", "/mnt/lustre/blah2/b/c/d/e/f/g/k/8") = >> 0
>> +1 RPC
>>
>> futimesat(AT_FDCWD, "/mnt/lustre/blah2/b/c/d/e/f/g/k", {{1280856246, 0},
>> {1280856291, 0}}) = 0
>> +1 RPC
>>
>> then we start traversing the just created tree up and chowning it:
>> chown("/mnt/lustre/blah2/b/c/d/e/f/g/k", 0, 0) = 0
>> +1 RPC
>>
>> getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_access",
>> 0x7fff519f0950, 132) = -1 ENODATA (No data available)
>> +1 RPC
This is gone in 1.8.4
>> stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096,
>> ...}) = 0
>> (not sure why another stat here, we already did it on the way up. Should be
>> cached)
>>
>> setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k", "system.posix_acl_access",
>> "\x02\x00\x00\x00\x01\x00\x07\x00\xff\xff\xff\xff\x04\x00\x05\x00\xff\xff\xff\xff
>> \x00\x05\x00\xff\xff\xff\xff", 28, 0) = 0
>> +1 RPC
Strange that it is setting an ACL when it didn't read one?
>> getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_default",
>> 0x7fff519f0950, 132) = -1 ENODATA (No data available)
>> +1 RPC
>>
>> stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096,
>> ...}) = 0
>> Hm, stat again? did not we do it a few syscalls back?
Gotta love those GNU file utilities. They are very stat happy.
>> stat("/mnt/lustre/blah2/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096,
>> ... }) = 0
>> stat of the target. +1 RPC (the cache got invalidated by link above).
>>
>> setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k", "system.posix_acl_default",
>> "\x02\x00\x00\x00", 4, 0) = 0
>> +1 RPC
Here it is also setting an ACL even though it didn't get one from the source.
>> So I guess there is a certain number of stat RPCs that would not be present
>> on NFS due to different ways the caching works, plus all the getxattrs. Not
>> sure if this is enough to explain 4x rate difference.
>>
>> Also you can try disabling debug (if you did not already) to see how big of
>> an impact that would make. It used to be that debug was affecting metadata
>> loads a lot, though with recent debug levels adjustments I think it was
>> somewhat improved.
Useful would be to run "strace -tttT" to get timestamps for each operation to
see for which operations it is slower on Lustre than NFS.
Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss