J.R., On Thu, Dec 17, 2009 at 2:36 AM, <[email protected]> wrote: > > Daire Byrne: >> I am looking to create a cache of commonly used read-only data but >> with an underlying NFS filesystem which we resort to only if it is not >> in the local cache. The problem now is that aufs still goes to the >> network even if the files are in the top (local) branch. > > Let me make sure, > - you have two branches > - the upper branch is a local filesystem, commonly used one instead of > exotic rare fs > - the lower is NFS > - the target is a regular file, instead of a directory > > When "fileA" exists on both of two branches and you access it through > aufs, you observed aufs accesses the lower fileA on NFS. > If so, that is weird. Give me these info.
I am more concerned with the actual metadata going across the network when the files are local. Say I have thousands of files in a dirtree that are also on the remote filesystem and are all exactly the same - I don't want to even stat the remote files even when I will then read from the local file. Saying that, if there is one file remotely that is not local then obviously go and look at it remotely. The metadata lookups for many small files on a high latency link (think VPN) can slow things down a lot. Here is what I am doing roughly to test this: mount server:/test /mnt/test mksquashfs /mnt/test/ /tmp/test.sqfs mount -o loop /tmp/test.sqfs /mnt/test-local touch /mnt/test/new mount -t aufs -o ro,br:test-local aufs /mnt/test-aufs /usr/sbin/time -f %e find /mnt/test-aufs mount -t aufs -o remount,append:/mnt/test aufs /mnt/test-aufs /usr/sbin/time -f %e find /mnt/test-aufs I can watch the network traffic with tcpdump and see that for every (?) file there are lots of NFS ops (plenty of "readdirplus" for example) to the NFS server. The time taken to list all the files increases (0.4s -> 15s in my simple test). I'm wondering if it is possible to tell aufs never to check the NFS branch if the file exists on the local branch. I understand that in order to make a union the directory contents of the remote filesystem needs to be known but is there any way to minimise the traffic so that this operation is only done once (for each dir?). I can probably turn up the NFS attribute caching but this doesn't help much with the very first read. Even with something like FS-Cache for NFS there is always this network metadata overhead which kills performance across slow links. If I know what portions of the data are read-only then it would be great of I could use aufs to only read it locally even when unioned with an NFS filesystem. I could then also use the local data when "offline" and add the NFS branch when "online". And conversely I could transparently delete the branch again when going offline. I can send you the extra debug data but I suspect that AUFS was never designed to be able to do what I am describing so it is not a "bug". Is it possible though? Daire ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev
