jhutz writes:
...
> Secondly, there's an error number problem. The fileserver supports only
> non-blocking locks; blocking locks are done entirely within the fileserver.
> First, let's assume the simple case - you want to do a non-blocking lock.
> You call flock/lockf/fcntl, and expect to get back either success or
> EWOULDBLOCK. The problem is, no one does error code translation, so you
> get back the SERVER's value for EWOULDBLOCK. If the server and client
> aren't using the same value, you get confused.
>
> To make life even more fun, the cache manager implements blocking locks
> by retrying the lock periodically, as long as it gets EWOULDBLOCK. If
> it gets some other error code, that gets returned to you. Net result:
> blocking locks are impossible if the fileserver and client don't use
> the same value for EWOULDBLOCK.
>
> Values for EWOULDBLOCK from some popular OS's:
>
> *_mach 35
> alpha_osf20 35
> hp700_ux90 246
> pmax_ul43a 35
> rs_aix32 54
> sgi_53 1101
> sun4c_411 35
> sun4c_54 11
>
Actually, recent versions of AFS, or rather RX, include functions
in librx.a(rx_errconvert.o) that are supposed to convert certain error
codes between the local host version, and a "network" error code range.
Unfortunately, as jhutz observes, they don't handle EWOULDBLOCK, they
only handle:
manifest name:
local network
ENOSPC VDISKFULL
EDQUOT VOVERQUOTA
VDISKFULL & VOVERQUOTA are 108 & 109 respectively; looks like
they could clash with other error codes. It looks like
hp700_ux90 skips around, as does solaris. Dunno about sgi_53. On
the RS/6K, 109 is ENOSYS, while 108 is reserved for the PS/2. On
the bright side of things, sys_nerr can't be *that* large, unless the
implementation wants to allocate a really sparse array for
sys_errlist[].
There is still some more real nastiness with EWOULDBLOCK. In
4.3BSD, EAGAIN is 11, while EWOULDBLOCK is 35 & EDEADLK is 80.
In 4.4bsd, EWOULDBLOCK is EAGAIN is 35, & EDEADLK is 11.
Various other vendors have adapted other slightly different
conventions for folding in EGAIN/EWOULDBLOCK/EDEADLK.
In Solaris, EAGAIN & EWOULDBLOCK are 11, while EDEADLK is 45.
In AIX, like solaris, EAGAIN & EWOULDBLOCK are in fact really
both 11 & EDEADLK is 45. That interesting error code that jhutz
found, 54, is reserved for EWOULDBLOCK, but only to allow case
statements to compile without error; the kernel isn't supposed
to actually return that value. My guess is that because some systems
fold error codes together, there isn't any really fair way to
return the same value everywhere, and the cache manager
may just have to to learn to handle a few well-chosen
error codes, carefully coordinated with the logic in
rx_errconvert.c and very likely in the fileserver proper as well.
I suppose the "right" solution would be to reserve the first
numbers starting at 1 for "network" error codes, then reserve
another range for unrecognized OS specific errors. The error code
mapping functions could then map "supported" OS specific error
codes to to the "network" range starting at 1. *Most* OS's
seem to agree on the first few error codes, so the "network"
range could be defined to be the "most popular" first few values.
For particular applications such as the fileserver, which
may need to force a particular network error code even when
the local OS doesn't necessarily distinguish codes (EWOULDBLOCK/EAGAIN),
it may also be necessary to have a 3rd range.
Isn't it fun trying to make heterogeneous operating systems work
well together?
-Marcus Watts
UM ITD PD&D Umich Systems Group