Hallo there,

Some of you may remember a message I sent last millenium detailing the
problems we've been having with poor write performance on AFS, due to
the fact that the AFS cache manager insists on doing a series of read
RPC calls before it does a bunch of write RPC calls ... even for newly
created files.

Anyway, I did some tiptoing through the AFS sources, and I found what may
be part of the reason.

In afs_dcache.c:afs_GetDCache(), I found this very interesting comment:

    /*
     * Not a newly created file so we need to check the file's length and
     * compare data versions since someone could have changed the data or we're
     * reading a file written elsewhere. We only want to bypass doing no-op
     * read rpcs on newly created files (dv of 0) since only then we guarantee
     * that this chunk's data hasn't been filled by another client.
     */
    if (!hiszero(avc->m.DataVersion))

And sure enough, using the AFS trace facility I was able to see that the
DataVersion of the newly created file is nonzero (in fact, it's 1).

So, where does this get set?  As far as I could tell, this is set from
the results of the AFS CreateFile RPC call; the returned DataVersion
from that call is 1, and that seems to get propagated up through the
various structures until it reaches the GetDCache() call.

At this point I'm not really sure what the real problem is.  I think
either the fileserver shouldn't return a 1 for the dataversion, the
cache manager code should set the dv to 0 for newly created files, or
the test done by the code in GetDCache() should be changed to something
else.

Does anyone have any ideas?  I'm willing to entertain the notion that
my analysis is incomplete/wrong, but it's the best I could come up with.

BTW, does anyone from Transarc read this list?  It would seem to me that
this is a serious problem, but I haven't really gotten a sense that anyone
from Transarc cares about it.

--Ken

Reply via email to