J.R.,

On Thu, Dec 17, 2009 at 2:36 AM,  <[email protected]> wrote:
>
> Daire Byrne:
>> I am looking to create a cache of commonly used read-only data but
>> with an underlying NFS filesystem which we resort to only if it is not
>> in the local cache. The problem now is that aufs still goes to the
>> network even if the files are in the top (local) branch.
>
> Let me make sure,
> - you have two branches
> - the upper branch is a local filesystem, commonly used one instead of
>  exotic rare fs
> - the lower is NFS
> - the target is a regular file, instead of a directory
>
> When "fileA" exists on both of two branches and you access it through
> aufs, you observed aufs accesses the lower fileA on NFS.
> If so, that is weird. Give me these info.

I am more concerned with the actual metadata going across the network
when the files are local. Say I have thousands of files in a dirtree
that are also on the remote filesystem and are all exactly the same -
I don't want to even stat the remote files even when I will then read
from the local file. Saying that, if there is one file remotely that
is not local then obviously go and look at it remotely. The metadata
lookups for many small files on a high latency link (think VPN) can
slow things down a lot. Here is what I am doing roughly to test this:

  mount server:/test /mnt/test
  mksquashfs /mnt/test/ /tmp/test.sqfs
  mount -o loop /tmp/test.sqfs /mnt/test-local
  touch /mnt/test/new
  mount -t aufs -o ro,br:test-local aufs /mnt/test-aufs
  /usr/sbin/time -f %e find /mnt/test-aufs
  mount -t aufs -o remount,append:/mnt/test aufs /mnt/test-aufs
  /usr/sbin/time -f %e find /mnt/test-aufs

I can watch the network traffic with tcpdump and see that for every
(?) file there are lots of NFS ops (plenty of "readdirplus" for
example) to the NFS server. The time taken to list all the files
increases (0.4s -> 15s in my simple test). I'm wondering if it is
possible to tell aufs never to check the NFS branch if the file exists
on the local branch. I understand that in order to make a union the
directory contents of the remote filesystem needs to be known but is
there any way to minimise the traffic so that this operation is only
done once (for each dir?). I can probably turn up the NFS attribute
caching but this doesn't help much with the very first read.

Even with something like FS-Cache for NFS there is always this network
metadata overhead which kills performance across slow links. If I know
what portions of the data are read-only then it would be great of I
could use aufs to only read it locally even when unioned with an NFS
filesystem. I could then also use the local data when "offline" and
add the NFS branch when "online". And conversely I could transparently
delete the branch again when going offline.

I can send you the extra debug data but I suspect that AUFS was never
designed to be able to do what I am describing so it is not a "bug".
Is it possible though?

Daire

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 

Reply via email to