Re: [Lustre-discuss] Client directory entry caching

Kevin Van Maren Tue, 03 Aug 2010 11:47:16 -0700

Since Bug 22492 hit a lot of people, it sounds like opencache isn'tgenerally useful unless enabled on every node. Is there an easy way toforce files out of the cache (ie, echo 3 > /proc/sys/vm/drop_caches)?


Kevin



On Aug 3, 2010, at 11:50 AM, Oleg Drokin <[email protected]> wrote:

Hello!

On Aug 3, 2010, at 12:49 PM, Daire Byrne wrote:
So even with the metadata going over NFS the opencache in theclientseems to make quite a difference (I'm not sure how much the NFSclientcaches though). As expected I see no mdt activity for the NFSexportonce cached. I think it would be really nice to be able to enabletheopencache on any lustre client. A couple of potential workloadsthat I
A simple workaround for you to enable opencache on a specificclient wouldbe to add cr_flags |= MDS_OPEN_LOCK; in mdc/mdc_lib.c:mds_pack_open_flags()
Yea that works - cheers. FYI some comparisons with a simple find on a
remote client (~33,000 files):

find /mnt/lustre (not cached) = 41 secs
find /mnt/lustre (cached) = 19 secs
find /mnt/lustre (opencache) = 3 secs
Hm, initially I was going to say that find is not open-intensive soit should
not benefit from opencache at all.
But then I realized if you have a lot of dirs, then indeed therewould be a
positive impact on subsequent reruns.
I assume that the opencache result is a second run and first runproduces
same 41 seconds?
BTW, another unintended side-effect you might experience if you havemixedopencache enabled/disabled network is if you run something (or openfor write)on an opencache-enabled client, you might have problems writing (orexecuting)
that file from non-opencache enabled nodes as long as the file handle
would remain cached on the client. This is because if open lock wasnot requested,we don't try to invalidate current ones (expensive) and MDS wouldthinkthe file is genuinely open for write/execution and disallowconflicting accesses
with EBUSY.
performance when compared to something simpler like NFS. Slightly off
topic (and I've kinda asked this before) but is there a good reason
why link() speeds in Lustre are so slow compare to something likeNFS?A quick comparison of doing a "cp -al" from a remote Lustre clientand
an NFS client (to a fast NFS server):

cp -fa /mnt/lustre/blah /mnt/lustre/blah2 = ~362 files/sec
cp -fa /mnt/nfs/blah /mnt/nfs/blah2 = ~1863 files/sec

Is it just the extra depth of the lustre stack/code path? Is there
anything we could do to speed this up if we know that no other client
will touch these dirs while we hardlink them?
Hm, this is a first complaint about this that I hear.
I just looked into strace of cp -fal (which I guess you mant insteadof just -fa that
would just copy everything).
so we traverse the tree down creating a dir structure in parallelfirst (or just doing it in readdir order)
open("/mnt/lustre/a/b/c/d/e/f", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
+1 RPC

fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
+1 RPC (if no opencache)

fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents(3, /* 4 entries */, 4096)      = 96
getdents(3, /* 0 entries */, 4096)      = 0
+1 RPC

close(3)                                = 0
+1 RPC (if no opencache)
lstat("/mnt/lustre/a/b/c/d/e/f/g", {st_mode=S_IFDIR|0755,st_size=4096, ...}) = 0
(should be cached, so no RPC)

mkdir("/mnt/lustre/blah2/b/c/d/e/f/g", 040755) = 0
+1 RPC
lstat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755,st_size=4096, ...}) = 0
+1 RPC
stat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755,st_size=4096, ...}) = 0
(should be cached, so no RPC)

Then we get to files:
link("/mnt/lustre/a/b/c/d/e/f/g/k/8", "/mnt/lustre/blah2/b/c/d/e/f/g/k/8") = 0
+1 RPC
futimesat(AT_FDCWD, "/mnt/lustre/blah2/b/c/d/e/f/g/k", {{1280856246,0}, {128085
6291, 0}}) = 0
+1 RPC

then we start traversing the just created tree up and chowning it:
chown("/mnt/lustre/blah2/b/c/d/e/f/g/k", 0, 0) = 0
+1 RPC
getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_access",0x7fff519f0950, 132) = -1 ENODATA (No data available)
+1 RPC
stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755,st_size=4096, ...}) = 0(not sure why another stat here, we already did it on the way up.Should be cached)
setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k","system.posix_acl_access", "\x02\x00\x00\x00\x01\x00\x07\x00\xff\xff\xff\xff\x04\x00\x05\x00\xff\xff\xff\xff \x00\x0
5\x00\xff\xff\xff\xff", 28, 0) = 0
+1 RPC
getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_default",0x7fff519f09
50, 132) = -1 ENODATA (No data available)
+1 RPC
stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755,st_size=4096, ...}) =
0
Hm, stat again? did not we do it a few syscalls back?
stat("/mnt/lustre/blah2/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755,st_size=4096, ...
}) = 0
stat of the target. +1 RPC (the cache got invalidated by link above).
setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k","system.posix_acl_default", "\x02\x0
0\x00\x00", 4, 0) = 0
+1 RPC
So I guess there is a certain number of stat RPCs that would not bepresent on NFSdue to different ways the caching works, plus all the getxattrs. Notsure if this
is enough to explain 4x rate difference.
Also you can try disabling debug (if you did not already) to see howbig of an impactthat would make. It used to be that debug was affecting metadataloads a lot, though
with recent debug levels adjustments I think it was somewhat improved.

Bye,
   Oleg
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Client directory entry caching

Reply via email to