I always knew I did those tcpdump patches for AFS for a reason ... and I
finally found that reason!

I've just discovered some rather odd behavior in the AFS cache manager.
It seems to be present in 3.4a and 3.5 clients.  It seems that when a file
is written, read RPCs are issued when they're clearly not necessary,
which end up impacting performance significantly.

Here's an example.  In this trace, "fileserver" is running a AFS 3.4a
fileserver (but it shouldn't matter), and "client" is running AFS 3.5.

Things start out normally:

11:38:06.150427 janeway.afscb > q.afsfs: rx data fs call create-file fid 
536872319/5/8094 "outfile" (84)
11:38:06.151098 q.afsfs > janeway.afscb: rx challenge (44)
11:38:06.153274 janeway.afscb > q.afsfs: rx response (132)
11:38:06.660034 q.afsfs > janeway.afscb: rx ack (61)
11:38:07.873589 q.afsfs > janeway.afscb: rx data fs reply create-file new fid 
536872319/16262/19711 (244)
11:38:07.878316 janeway.afscb > q.afsfs: rx data fs call store-status fid 
536872319/16262/19711 (68)
11:38:07.879635 q.afsfs > janeway.afscb: rx data fs reply store-status (136)

The standard sort of things that happen when you create a file.  No
surprises there.

Here's where it gets weird.

11:38:07.890567 janeway.afscb > q.afsfs: rx data fs call fetch-data fid 53687231
9/16262/19711 offset 0 length 16384 (52)
11:38:07.891496 q.afsfs > janeway.afscb: rx data fs reply fetch-data (152)
11:38:07.897972 janeway.afscb > q.afsfs: rx data fs call fetch-data fid 53687231
9/16262/19711 offset 65536 length 16384 (52)

Now, this is odd ... why are we issuing fetch RPCs when we've just created
this file?

This happens to be a large file (700 MB), larger than the cache size on this
machine.  But this happens no matter what the file size is.

Anyway, these fetches continue until the cache is full on the client:

11:41:07.986469 janeway.afscb > q.afsfs: rx data fs call fetch-data fid 
536872319/16262/19711 offset 409468928 length 16384 (52)
11:41:07.987155 q.afsfs > janeway.afscb: rx data fs reply fetch-data (152)
11:41:08.003740 janeway.afscb > q.afsfs: rx data fs call fetch-data fid 
536872319/16262/19711 offset 409534464 length 16384 (52)
11:41:08.004459 q.afsfs > janeway.afscb: rx data fs reply fetch-data (152)
11:41:08.746114 janeway.afscb > q.afsfs: rx ack (65)

And then store RPCs start happening:

11:41:08.898095 janeway.afscb > q.afsfs: rx data fs call store-data fid 53687231
9/16262/19711 offset 0 length 4194304 flen 409600000 (1444)
11:41:08.898435 q.afsfs > janeway.afscb: rx ack (62)
11:41:08.901104 janeway.afscb > q.afsfs: rx data (1444)
11:41:08.901149 janeway.afscb > q.afsfs: rx data (1444)
11:41:08.901708 q.afsfs > janeway.afscb: rx ack (62)
11:41:08.903953 janeway.afscb > q.afsfs: rx data (1444)
11:41:08.904027 janeway.afscb > q.afsfs: rx data (1444)
[...]
11:42:02.229224 q.afsfs > janeway.afscb: rx ack (62)
11:42:02.230162 q.afsfs > janeway.afscb: rx ack (62)
11:42:02.231116 q.afsfs > janeway.afscb: rx ack (62)
11:42:02.231440 janeway.afscb > q.afsfs: rx data (5692)
11:42:02.231522 janeway.afscb > q.afsfs: rx data (5552)
11:42:02.232079 q.afsfs > janeway.afscb: rx ack (62)
11:42:02.240724 q.afsfs > janeway.afscb: rx data fs reply store-data (136)

Okay, that's what should be happening.  But now, we start getting more fetches
for the new chunk of data that's being written to the cache:

11:42:02.240724 q.afsfs > janeway.afscb: rx data fs reply store-data (136)
11:42:02.394504 janeway.afscb > q.afsfs: rx data fs call fetch-data fid 
536872319/16262/19711 offset 409600000 length 16384 (52)
11:42:02.396163 q.afsfs > janeway.afscb: rx data fs reply fetch-data (152)
11:42:02.409630 janeway.afscb > q.afsfs: rx data fs call fetch-data fid 
536872319/16262/19711 offset 409665536 length 16384 (52)
[...]
11:44:10.334524 janeway.afscb > q.afsfs: rx data fs call fetch-data fid 
536872319/16262/19711 offset 707526656 length 16384 (52)
11:44:10.335163 q.afsfs > janeway.afscb: rx data fs reply fetch-data (152)

and start doing stores after we're done:

11:44:18.274130 janeway.afscb > q.afsfs: rx data fs call store-data fid 53687231
9/16262/19711 offset 409600000 length 1441792 flen 707568190 (5692)
11:44:18.274615 q.afsfs > janeway.afscb: rx ack (62)
11:44:18.276722 janeway.afscb > q.afsfs: rx data (5692)
[...]
11:44:53.195868 janeway.afscb > q.afsfs: rx data (2574)
11:44:53.206400 q.afsfs > janeway.afscb: rx data fs reply store-data (136)

To say that this behavior is inefficient is being kind.

The first set of fetches start at 11:38:07.890567 and continue until
11:41:08.004459.  That's 60.113892 seconds.

The first set of stores start at 11:41:08.898095 and finish at
11:42:02.240724.  That's 53.342629 seconds.

The second set of fetches start at 11:42:02.240724 and continue until
11:44:10.335163.  That's 128.094439 seconds.

The second set of stores start at 11:44:18.274130 (I guess that was
8 seconds of copying data to the cache?  Or maybe cache cleaning) and
finish at 11:44:53.206400.  That's 51.378987 seconds.

So we have a total of approximately 188 seconds worth of fetch RPCs
and approximately 104 seconds of store RPCs.  Out of all RPCs, stores
only consist of 35% of all RPCs (by time).

This explains why we have been seeing such bad write performance when
using AFS.

Questions:

- Has anyone else seen this behavior?
- Is it fixable?

--Ken

Reply via email to