Well, you can drop all locks on a given FS that would in effect drop all metadata caches, but will leave data caches intact.
echo clear >/proc/fs/lustre/ldlm/namespaces/your_MDC_namespace/lru_size On Aug 3, 2010, at 2:45 PM, Kevin Van Maren wrote: > Since Bug 22492 hit a lot of people, it sounds like opencache isn't generally > useful unless enabled on every node. Is there an easy way to force files out > of the cache (ie, echo 3 > /proc/sys/vm/drop_caches)? > > Kevin > > > On Aug 3, 2010, at 11:50 AM, Oleg Drokin <[email protected]> wrote: > >> Hello! >> >> On Aug 3, 2010, at 12:49 PM, Daire Byrne wrote: >>>>> So even with the metadata going over NFS the opencache in the client >>>>> seems to make quite a difference (I'm not sure how much the NFS client >>>>> caches though). As expected I see no mdt activity for the NFS export >>>>> once cached. I think it would be really nice to be able to enable the >>>>> opencache on any lustre client. A couple of potential workloads that I >>>> A simple workaround for you to enable opencache on a specific client would >>>> be to add cr_flags |= MDS_OPEN_LOCK; in mdc/mdc_lib.c:mds_pack_open_flags() >>> Yea that works - cheers. FYI some comparisons with a simple find on a >>> remote client (~33,000 files): >>> >>> find /mnt/lustre (not cached) = 41 secs >>> find /mnt/lustre (cached) = 19 secs >>> find /mnt/lustre (opencache) = 3 secs >> >> Hm, initially I was going to say that find is not open-intensive so it should >> not benefit from opencache at all. >> But then I realized if you have a lot of dirs, then indeed there would be a >> positive impact on subsequent reruns. >> I assume that the opencache result is a second run and first run produces >> same 41 seconds? >> >> BTW, another unintended side-effect you might experience if you have mixed >> opencache enabled/disabled network is if you run something (or open for >> write) >> on an opencache-enabled client, you might have problems writing (or >> executing) >> that file from non-opencache enabled nodes as long as the file handle >> would remain cached on the client. This is because if open lock was not >> requested, >> we don't try to invalidate current ones (expensive) and MDS would think >> the file is genuinely open for write/execution and disallow conflicting >> accesses >> with EBUSY. >> >>> performance when compared to something simpler like NFS. Slightly off >>> topic (and I've kinda asked this before) but is there a good reason >>> why link() speeds in Lustre are so slow compare to something like NFS? >>> A quick comparison of doing a "cp -al" from a remote Lustre client and >>> an NFS client (to a fast NFS server): >>> >>> cp -fa /mnt/lustre/blah /mnt/lustre/blah2 = ~362 files/sec >>> cp -fa /mnt/nfs/blah /mnt/nfs/blah2 = ~1863 files/sec >>> >>> Is it just the extra depth of the lustre stack/code path? Is there >>> anything we could do to speed this up if we know that no other client >>> will touch these dirs while we hardlink them? >> >> Hm, this is a first complaint about this that I hear. >> I just looked into strace of cp -fal (which I guess you mant instead of just >> -fa that >> would just copy everything). >> >> so we traverse the tree down creating a dir structure in parallel first (or >> just doing it in readdir order) >> >> open("/mnt/lustre/a/b/c/d/e/f", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3 >> +1 RPC >> >> fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 >> +1 RPC (if no opencache) >> >> fcntl(3, F_SETFD, FD_CLOEXEC) = 0 >> getdents(3, /* 4 entries */, 4096) = 96 >> getdents(3, /* 0 entries */, 4096) = 0 >> +1 RPC >> >> close(3) = 0 >> +1 RPC (if no opencache) >> >> lstat("/mnt/lustre/a/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, >> ...}) = 0 >> (should be cached, so no RPC) >> >> mkdir("/mnt/lustre/blah2/b/c/d/e/f/g", 040755) = 0 >> +1 RPC >> >> lstat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, >> ...}) = 0 >> +1 RPC >> >> stat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, >> ...}) = 0 >> (should be cached, so no RPC) >> >> Then we get to files: >> link("/mnt/lustre/a/b/c/d/e/f/g/k/8", "/mnt/lustre/blah2/b/c/d/e/f/g/k/8") = >> 0 >> +1 RPC >> >> futimesat(AT_FDCWD, "/mnt/lustre/blah2/b/c/d/e/f/g/k", {{1280856246, 0}, >> {128085 >> 6291, 0}}) = 0 >> +1 RPC >> >> then we start traversing the just created tree up and chowning it: >> chown("/mnt/lustre/blah2/b/c/d/e/f/g/k", 0, 0) = 0 >> +1 RPC >> >> getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_access", >> 0x7fff519f0950, 132) = -1 ENODATA (No data available) >> +1 RPC >> >> stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, >> ...}) = 0 >> (not sure why another stat here, we already did it on the way up. Should be >> cached) >> >> setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k", "system.posix_acl_access", >> "\x02\x00 >> \x00\x00\x01\x00\x07\x00\xff\xff\xff\xff\x04\x00\x05\x00\xff\xff\xff\xff >> \x00\x0 >> 5\x00\xff\xff\xff\xff", 28, 0) = 0 >> +1 RPC >> >> getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_default", >> 0x7fff519f09 >> 50, 132) = -1 ENODATA (No data available) >> +1 RPC >> >> stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, >> ...}) = >> 0 >> Hm, stat again? did not we do it a few syscalls back? >> >> stat("/mnt/lustre/blah2/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, >> ... >> }) = 0 >> stat of the target. +1 RPC (the cache got invalidated by link above). >> >> setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k", "system.posix_acl_default", >> "\x02\x0 >> 0\x00\x00", 4, 0) = 0 >> +1 RPC >> >> >> So I guess there is a certain number of stat RPCs that would not be present >> on NFS >> due to different ways the caching works, plus all the getxattrs. Not sure if >> this >> is enough to explain 4x rate difference. >> >> Also you can try disabling debug (if you did not already) to see how big of >> an impact >> that would make. It used to be that debug was affecting metadata loads a >> lot, though >> with recent debug levels adjustments I think it was somewhat improved. >> >> Bye, >> Oleg >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
