[9fans] Re: Fwd: Reading from FS with inaccurate file sizes?

Amit Singh Thu, 29 Mar 2007 01:19:59 -0800

On Mar 27, 6:20 am, [EMAIL PROTECTED] (Russ Cox) wrote:
> To be fair, these are the kinds of mistakes I would expect any
> Unix-mindset implementation to make, and it surprised me quite
> a bit that Linux FUSE got so much of this right from the start
> (or at least from when I started using it).  I wonder how many
> of these mistakes BSD FUSE makes.


You're assuming quite a bit here, especially in concluding that these
are "mistakes" that you "expect" because of a "Unix-mindset"
implementation.

BTW, I don't know when you started using FUSE on Linux, but it's been
there on Linux at least since 2001. MacFUSE came out in 2007, so your
surprise is surprising.

> Synthetic file systems tend not to care about the
> offset on writes anyway.

And the Mac OS X VFS kernel extension environment isn't exactly geared
towards synthetic file systems. OS X may have an open source kernel,
*but* it's not practical to write kernel extensions that require
kernel changes. Therefore, things a kernel extension can do is limited
by the interfaces/data that are available to the extension in a stock
kernel. In the case of reads/writes when the advertised size is 0, you
run into the unified buffer cache, which really wants to believe the
file size. To get around this, MacFUSE must explicitly implement
separate read/write paths from the vnode operations to user-space and
back. Release 0.2.2 does this for reads if you use the 'direct_io'
option. In other words, if you add the 'direct_io' option while
mounting, what you are looking for should already work. Note that you
will have no buffer cache (which is what you'd want anyway in this
case).

'direct_io' doesn't do anything for writes in Release 0.2.2. It'd be
straightforward to expand the write implementation. A future release
of MacFUSE might have it.

> MacFUSE also seems to employ somesubterfuge where fds
> do not map one-to-one with FUSE file handles.  Another bug I've 
> filed:http://code.google.com/p/macfuse/issues/detail?id=133

The subterfuge is intentional and necessary in the current design. The
open() and close() vnode operations of MacFUSE *do not* have access to
the file descriptor in question. The data structures involved are
opaque, so it'd be quite ugly and unmaintainable to try to get at the
descriptor by brute force. Given the lack of descriptor, you can't
match opens and closes. Along the same lines, MacFUSE only can look at
the vnode, and *not* at file structures, which are inaccessible. You
can't track connections between file structures and FUSE file handles.
Therefore, as a matter of feasibility and simplicity, MacFUSE shares
file handles when possible, with reference counting. For multiple
opens of a single given file, you won't see every open invocation go
up to user space unless the open flags are different from a previous
invocation.

> On Linux apparently things happen the other way around:
> O_TRUNC is never sent, but O_APPEND is sent for >> opens.
> MacFUSE doesn't send either, which is another bug I've 
> filed:http://code.google.com/p/macfuse/issues/detail?id=132
>

Right now, MacFUSE distinguishes between 3 types of open for a given
file: O_RDONLY, O_WRONLY, and O_RDWR. Since write handles could be
shared, adding O_APPEND to the mix means we essentially have two
additional types of open that MacFUSE must track. This isn't too big
of a deal eventually, but the extra complexity wasn't justified in
MacFUSE's nascent days, even though it meant sacrificing some arguably
contrived semantics. I say contrived because O_APPEND is still handled
correctly by the kernel (if you report the correct file size)--it's
just that the flag is not passed to user space. So, like you said in
your bug report, this matters in cases like "shared append-only files
on the server side".

Hope this clarifies things.

[9fans] Re: Fwd: Reading from FS with inaccurate file sizes?

Reply via email to