> an AFS cache with fewer files will always perform faster than a
> one with more. it is simple to write a tool that analyzes the average
> size and age of the files in your AFS cache, and adjust your cache
> parameters to optimize performance.
I can see how this is easy to write, but which cache parameters are
you talking about ajdusting, and how? My understanding is that you
want the cache to be the size of the typical 'working set' of data the
users on that machine use over some period like a day. Theoretically,
this working set is a window of data that moves slowly enough through
time that cache hits are very frequent. Are you talking about
measuring the working set? Would it used to set -dcache, -volumes,
-chunksize, cachesize, -stat...? How do you determine the cache hit
rate? I guess I haven't thought about the problem enough.
> AFS home directories) can get by great with a 32M cache and 1800 files.
> using the same tools i used on these machines, i looked at dave's
Really? I thought the working set of that many users would be much
larger.
> the biggest problem with AFS caches is the number of disk writes.
> these are generally slow and synchronous (even though they are
> buffered), so if you are hitting the cache hard, disk write request
> queuing will slow you down. one of the performance problems with
We have a web server which is almost exclusively read-only with
respect to AFS. Does this change the situtation enough that larger
cache sizes (>150MB) will indeed be more helpful?
> a) using memory cache (still see heavy write request load;
> haven't figured out why)
I remember reading a paper about this. I wish I could remember which
one. Basically, I think the code path to look up this information is
very long and unoptimized (no hints).
It beats the heck out of me why AFS can't have O(1) cache lookups.
Hashing based on unique identifiers (AFS's fid's & chunk offset would
make a nice one) is something from my undergradudate years. Someone
made a comment about hardware vs. software, and AFS was limited by
being all in software. This doesn't really matter when it comes down
to O(f(n)) calculations. Hardware only improves by a constant
factor. It's the *algorithm* that matters.
If there's some other limitation I'm not aware of, please let me know.
Optimizedly yours,
--
Daniel Bromberg, Co-op M/S 171-300 (818) 354-4122
[EMAIL PROTECTED] FAX: (818) 393-5009
Metrics Engineer, EIS project 4800 Oak Grove Dr.
Jet Propulsion Laboratory Pasadena, CA 91109