I implemented a working low level fuse proxy fs, which is usable for simple
directory transversal. Its a simple single threaded read only proxyfs, and
doesnt have the whole suite of optimisations that fuse guys have
implemented.

The low level implementation is still highly unstable, and the  deadlocks
are somehow really hard to debug, I had a really tough time finding the
problem as every time the process went into unterruptable sleep making it
impossible to kill! even gdb wouldnt respond and the only option left is to
restart the pc, turned out that the kernel deadlocks in a recursive syscall
to the fuse fs and I still need to find a fix for it.

Although I do realize that going through this approach could take a some
effort reimplementing all of the optimizations that default fuse high level
apis have implemented, so I might end up copying over source files from the
libfuse project or statically compile the  project with the libfuse library
with modifications. But I also started to look into maintaining an inode
table over the  high level api and then build caching on top of that, so
that the optimisations from fuse come from free in exchange of some
overhead of duplicating the mappings of inodes to paths, both at the fuse
level and at our fs level. The codebase is pretty modular so for now I can
switch between using both of the apis very easily and currently maintaining
implementation over both of them.

At the same time I have also started to benchmark different approaches. So
I got monkey running over the fuse fs and standard fs and then compared
them. but initial benchmarks werent very good, the numbers are really bad
with proxyfs, fuse guys have an overlay server in their codebase and even
running benchmark over that gives some really bad numbers. For concurrent
connection in ranges like hundreds, seems like the context overhead of fuse
fs really starts to push the throughput down,

I used wrk as the benchmarking tool (https://github.com/wg/wrk), running at
400 concurrent connections, and duration of 30s, using standard fs over
monkey we get:
Requests/sec:  29043.93
Transfer/sec:    739.06MB
but over proxyfs we get:

Requests/sec:   1517.46
Transfer/sec:     38.66MB

and with the fusexmp_fh overlay fs in libfuse sources we get similar
results to the proxyfs:

Requests/sec:   1418.41
Transfer/sec:     36.12MB

Something is going terribly wrong in the fuse world, maybe its concurrent
io requests but I dont really know for now, running something like
callgrind over the the program suggests that most of the time is taken in
the libfuse world. I have started a discussion over the libfuse devel
mailing list and looking to get their advice.

I will continue to measure the perf numbers from now. anyways I have
started initial work over a caching proxyfs. currently it maintains file
descriptors over longer periods of time (even after the server closing
them) so that the kernel doesnt drop the caches, but  I would soon land an
implementation using mmap and continue the experiments. The modular
structure of the project turned out to be really useful as I have multiple
approaches and experiements all in the same codebase (even both using high
level fuse api and the low level) and they can easily be turned on and off
individually using the makefile and compiled into a fuse fs.

Permanant link: http://ziahamza.wordpress.com/2013/06/30/weekly-progress-3/
_______________________________________________
Monkey mailing list
[email protected]
http://lists.monkey-project.com/listinfo/monkey

Reply via email to