This is just an FYI. I'm not asserting that any of this information
will make you feel any better about your frustrating week or that any
of this information will be helpful to you in the short term.
>* The fs flush/flushv commands do not work reliably. Our cell looks
> like: (root.afs)(root.cell)(users)(user.username). We recently
> replicated the volume users and *nothing* short of a reboot got
> clients to recognize this properly (in some cases, the afs
> cache had to be "rm -rf"ed too).
I *believe* the command you wanted was fs checkv. The checkv command
breaks the psuedo callback on RO volumes (in pre 3.3 releases),
forcing the cachemanager to refetch RO data from the server. Waiting
an hour would do the same thing. This may not be the case with a
newly released volume, but I suspect so. When I did administration, I
never had to reboot or rm caches to make clients start using RO's.
(Of course, I did administration a long long time ago).
Also, I'm not sure of the 3.3 implications given that there are now
real callbacks on the volume level for ROs.
>* With 18 drives/partitions per server and several hundred volumes per
> partition it takes an awfully long time for the fileserver to get started.
> Why can't it attach volumes from multiple partitions in parallel (like
> the salvager works in parallel)?
Good suggestion. The answer may be tied to the question on "why does
the fileserver have to shutdown in order to salvage < all partitions."
Not knowing the internals of the fileserver well, I can't comment on
how difficult a job this may be. If it requires a rewrite of the
fileserver, development would have to weigh these fixes against all
the others they could do in that amount of time.
In DFS, because volumes (filesets) reside on a physical file system
created with them in mind, corruption is much less likely to happen,
and salvaging should only be necessary when there is a media defect.
Server recovery (rather than salvage) runs when a server machine
crashes or is rebooted. Recovery is log based and takes 10s of
seconds instead of 10s of minutes, regardless of the amount of data on
the server.
>* The per-cell setuid/no-setuid is not very useful. This should
> be a per-volume flag too.
This is the case in DFS.
>* Why are quotas set/shown with fs and not vos?
Years ago, back in the AFS infancy, there was no such thing as vos
commands and volume manipulation was truly black magic. But users
needed to get quota information. fs existed and fs lq was born.
Later, when the vos command suite was developed, it was expected that
users would never use it or even know about it - so quota stuff stayed
in fs. In hindsight, it would be better in vos, but would have
required an interface change that CMU (the original AFS site) probably
wouldn't have been willing to take on.
In DFS, all quota and volume (fileset) commands are rolled into fts
(the vos replacement). You'll find the commands to manage mountpoints
there too. The remaining fs commands that truly deal only with the
client are in the cm command suite. So fs goes away entirely - its
subcommands are rolled into cm or fts as appropriate.
Again, knowing what is happening in DFS may not make you feel better
about about what happened to you in AFS. At least you know that many
of your opinions/observerations are shared and are being acted upon.
Its alot easier to make drastic changes in a product that is in
development or is newly released. Deciding what changes can and
should be made in AFS is much more difficult due to the large
established customer base and the very diverse variety of requests
being made. But we are hearing you.
Pierette VanRyzin
Transarc Educational Services