I can't claim to have actually tried the MIT kerberos server
with AFS, but I have stared at a lot of the code, while rummaging
around various other problems, including that accursed null
error code. Here is my tentative list of steps (it's probably
not complete):
________list of steps
(1) figure out way to switch databases
(a) copy AFS database over to MIT, and continue to use
AFS string-to-key function.
- or -
(b) restart new MIT database, and make everyone switch
to new keys.
(2) stop and delete kaserver instances
(3) if "restart new database" is selected, then the following needs to be done:
new key assigned to "afs".
new keyfile built and distributed to every fileserver and database server
(Now would also be a good time to implement any special
action to be done with pt entries if the corresponding ka
entries aren't created.)
(4) start new mit kerberos daemon "wherever". If the concern is
weaknesses in the RX and other layers of the AFS server code,
then you'll want to run the kerberos database on a separate set
of dedicated servers that do not run any other software than
needed by kerberos itself.
(5) This switch-over will break:
uss
kas
login rlogind, ftpd, klog.. - anything that uses rx to get tickets.
kpasswd
There are 2 reasons why these things won't work. (A) the string-to-key
function, and (B) most of these utilities use RX to talk to
the ka database. MIT Kerberos supports something somewhat like
kas only completely different for administrative access,
and of course the "original" non-AFS versions of klog and kpasswd.
For the others, it's a matter of slogging through all the
programs and fixing them - since some of these are client
end programs, there is a massive distribution problem
compounded by having to do it "all at once".
While using the AFS string-to-key function with MIT's daemon
won't fix any of these utilities, it will at least permit
the use of the information from the old key database with
the new daemon. That still means the client software
has to be updated "somehow" but saves having to
distribute new passwords to "everyone" all at once.
Another possibility is to distribute client software
ahead of time that works "dual mode" - either with
the old stuff, or the new stuff. Even changing the
string-to-key function is not an impossible problem
here -- it's no great effort to try both string-to-key
functions on a kerberos ticket received via UDP.
________why not
I would not care to do this switchover on a large active cell if
I could possibly avoid it. If I had to do it, I would try it first
on a separate small test cell until I felt *very* comfortable with it,
and knew the reason behind each step above by heart (as well as any
steps I might have overlooked). I would not want to follow
blindly anyone else's directions here if I could possibly
help it, because the consequences of any mistake are so monstrous.
I'd much rather make the mistakes somewhere else first,
and come up with all the solutions possible ahead of time.
I would also want to have a good understanding of just why I'm switching
over, because there are a lot of costs and trade-offs involved in the switch.
Just for starters, I would have to support a much wider range
of client software - so my site better have source for that
software if those services are important to users. I might seriously
consider some kind of compatiblity operation (perhaps a "read-only" RX
compatibility stub in MIT's kerberos daemon somewhat like the "udp" stub in
kaserver) that would ease the conversion effort and avoid the "everything
breaks one midnight" scenario that would otherwise ensue.
________more on error codes
But, before doing any of this, I'd first consider rattling transarc's
cage a bit. I mean, really, null error codes?! Sadly,
MIT kerberos library behaves no better when it receives these
errors -- if I had to assign blame, I'd blame MIT and Transarc equally.
But as long as I'm thinking about error codes:
For kaserver, the line in the routine "err_packet" in krb_udp.c that clears
its result code if it's out of range is just plain *Wrong* -- in fact, it
should consider 0 to be out of range and never set a 0, something like this:
if (code <= 0 || code > 255) code = 70;
(It wouldn't hurt if the routines that called it didn't
make some effort to pass more appropriate codes to err_packet
as well, or perhaps err_packet could always assume it's an
"rx" style error code, which would give it a better shot
at filling in the error message from the ka error tables at
the same time.
For libkrb, in the files get_ad_tkt.c and krb_get_in_tkt.c,
the cases for AUTH_MSG_ERR_REPLY should include a line
something like this:
if (!rep_err_code) rep_err_code = INTK_ERR;
just after the bcopy to rep_err_code and before the return.
I have now learned, however, that this error almost always
means I have the wrong time conversion factor, and that
I can save much time by just assuming that the date and
timezone information is automatically wrong on the client machine.
"kaserver" also does weird things when it should return
KDC_PR_UNKNOWN. This is important to me because it means
I can write programs that say "unknown user" vs "wrong password";
on a campus with over 35,000 users, the more precise
the feedback on errors I can provide to users, the better.
Since some of this code has to run in yucky 32K byte
environments (Macintosh DA's and the like) the last thing
I want is to suck in yet another API library to do X.500 lookups,
and wonder what to do if X.500 is down or has different data.
So, for my purposes, it's acceptable if ugly that kaserver
returns null length tickets, and I trust transarc won't
break kaserver further by trying to return better invalid
tickets (but I'd be happy if they returned KDC_PR_UNKNOWN.)
-Marcus Watts
UM ITD RS Umich Systems Group