Since Bug 22492 hit a lot of people, it sounds like opencache isn't generally useful unless enabled on every node. Is there an easy way to force files out of the cache (ie, echo 3 > /proc/sys/vm/drop_caches)?

Kevin


On Aug 3, 2010, at 11:50 AM, Oleg Drokin <[email protected]> wrote:

Hello!

On Aug 3, 2010, at 12:49 PM, Daire Byrne wrote:
So even with the metadata going over NFS the opencache in the client seems to make quite a difference (I'm not sure how much the NFS client caches though). As expected I see no mdt activity for the NFS export once cached. I think it would be really nice to be able to enable the opencache on any lustre client. A couple of potential workloads that I
A simple workaround for you to enable opencache on a specific client would be to add cr_flags |= MDS_OPEN_LOCK; in mdc/ mdc_lib.c:mds_pack_open_flags()
Yea that works - cheers. FYI some comparisons with a simple find on a
remote client (~33,000 files):

find /mnt/lustre (not cached) = 41 secs
find /mnt/lustre (cached) = 19 secs
find /mnt/lustre (opencache) = 3 secs

Hm, initially I was going to say that find is not open-intensive so it should
not benefit from opencache at all.
But then I realized if you have a lot of dirs, then indeed there would be a
positive impact on subsequent reruns.
I assume that the opencache result is a second run and first run produces
same 41 seconds?

BTW, another unintended side-effect you might experience if you have mixed opencache enabled/disabled network is if you run something (or open for write) on an opencache-enabled client, you might have problems writing (or executing)
that file from non-opencache enabled nodes as long as the file handle
would remain cached on the client. This is because if open lock was not requested, we don't try to invalidate current ones (expensive) and MDS would think the file is genuinely open for write/execution and disallow conflicting accesses
with EBUSY.

performance when compared to something simpler like NFS. Slightly off
topic (and I've kinda asked this before) but is there a good reason
why link() speeds in Lustre are so slow compare to something like NFS? A quick comparison of doing a "cp -al" from a remote Lustre client and
an NFS client (to a fast NFS server):

cp -fa /mnt/lustre/blah /mnt/lustre/blah2 = ~362 files/sec
cp -fa /mnt/nfs/blah /mnt/nfs/blah2 = ~1863 files/sec

Is it just the extra depth of the lustre stack/code path? Is there
anything we could do to speed this up if we know that no other client
will touch these dirs while we hardlink them?

Hm, this is a first complaint about this that I hear.
I just looked into strace of cp -fal (which I guess you mant instead of just -fa that
would just copy everything).

so we traverse the tree down creating a dir structure in parallel first (or just doing it in readdir order)

open("/mnt/lustre/a/b/c/d/e/f", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
+1 RPC

fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
+1 RPC (if no opencache)

fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents(3, /* 4 entries */, 4096)      = 96
getdents(3, /* 0 entries */, 4096)      = 0
+1 RPC

close(3)                                = 0
+1 RPC (if no opencache)

lstat("/mnt/lustre/a/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
(should be cached, so no RPC)

mkdir("/mnt/lustre/blah2/b/c/d/e/f/g", 040755) = 0
+1 RPC

lstat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
+1 RPC

stat("/mnt/lustre/blah2/b/c/d/e/f/g", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
(should be cached, so no RPC)

Then we get to files:
link("/mnt/lustre/a/b/c/d/e/f/g/k/8", "/mnt/lustre/blah2/b/c/d/e/f/g/ k/8") = 0
+1 RPC

futimesat(AT_FDCWD, "/mnt/lustre/blah2/b/c/d/e/f/g/k", {{1280856246, 0}, {128085
6291, 0}}) = 0
+1 RPC

then we start traversing the just created tree up and chowning it:
chown("/mnt/lustre/blah2/b/c/d/e/f/g/k", 0, 0) = 0
+1 RPC

getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_access", 0x7fff519f0950, 132) = -1 ENODATA (No data available)
+1 RPC

stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 (not sure why another stat here, we already did it on the way up. Should be cached)

setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k", "system.posix_acl_access", "\x02\x00 \x00\x00\x01\x00\x07\x00\xff\xff\xff\xff\x04\x00\x05\x00\xff\xff\xff \xff \x00\x0
5\x00\xff\xff\xff\xff", 28, 0) = 0
+1 RPC

getxattr("/mnt/lustre/a/b/c/d/e/f/g/k", "system.posix_acl_default", 0x7fff519f09
50, 132) = -1 ENODATA (No data available)
+1 RPC

stat("/mnt/lustre/a/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, ...}) =
0
Hm, stat again? did not we do it a few syscalls back?

stat("/mnt/lustre/blah2/b/c/d/e/f/g/k", {st_mode=S_IFDIR|0755, st_size=4096, ...
}) = 0
stat of the target. +1 RPC (the cache got invalidated by link above).

setxattr("/mnt/lustre/blah2/b/c/d/e/f/g/k", "system.posix_acl_default", "\x02\x0
0\x00\x00", 4, 0) = 0
+1 RPC


So I guess there is a certain number of stat RPCs that would not be present on NFS due to different ways the caching works, plus all the getxattrs. Not sure if this
is enough to explain 4x rate difference.

Also you can try disabling debug (if you did not already) to see how big of an impact that would make. It used to be that debug was affecting metadata loads a lot, though
with recent debug levels adjustments I think it was somewhat improved.

Bye,
   Oleg
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to