[EMAIL PROTECTED] (Leonard J. Peirce) writes: > We're seeing something very strange on our KDC. We have approximately > 46,000 total principals. When we propagate (kdb5_util dump) or do > getprincs in kadmin to get a list of all principal names the resulting > output (in both cases) is missing over half of the principals that we > know are in the database. Our slave server is pretty much useless at > this point until we get this working again.
Ouch. This sounds rather like a form of database corruption I've seen once before while I was at Cygnus, though it was with a different database back end (BSD NDBM, if I remember correctly). One of the failure modes (there were many we encountered in the investigation) was that in certain overflow cases, iterating over the database entries would miss some, or would even loop indefinitely over a small set of entries. The overflow was caused by a large number of collisions in the NDBM hash function (or the set of bits from the hash value actually used for selecting a position in the file). It's not trivial to recover; you'll probably have to write code for operating on the database directly rather than through the so-called higher-level interfaces we use for most operations that understand the Kerberos database format. We don't have any such code lying around at the moment. First of all, I'd recommend disabling kadmind, turning off execute permission on kadmin.local, whatever else it takes to stop changes from being made. If the database is in fact corrupted, you don't want to risk breaking it further. If you can extract the database contents, you may be able to create a database with the same data in a different format. Our use of db2 (Sleepycat's DB 1.85 with patches, actually, and we really need to look into updating once we figure out a more specific policy on what licenses we can accept on code we import) supports at least two back end formats. If yours is hash, I'd strongly recommend switching to btree. If you're using btree, you could debug the problem, or switch to hash, but the hash back end has tended to be more buggy. Or, if you're really psyched to dive into it, you could try updating your tree to use a more recent Sleepycat release or some other back end. Have you dumped and reloaded your master database any time recently? We switched to btree format a while back, but if you never dumped and reloaded, you may still be using hash format, which would not be good. You can tell the database type by the magic number in the first four bytes -- 0x053162 is btree, 0x061561 is hash. The more interesting part right now is how to get the data out, so that you can stuff it back into a database in a different format. Even though you can't walk through the database sequentially, there are still a couple ways you may be able to extract the data. First, if you have a complete list of current principal names, write a little program to walk over that list, generate the correct database key for each name, and extract the data from the database through the db2 interface. Then write it into another, freshly-created database. Or, second, if the above approach doesn't work, open the database file as a plain file, and simply scan through it, taking note of anything that looks like it might be a database record. ASCII string for the principal name in the key with a limited range of characters. For the database record, key data has reasonable key types and correct lengths, reasonable-looking flags set on the principals, etc. Once you get names, maybe you can use the db2 interface to read out the data. If not, you'll have to decipher the database format enough to locate the data yourself and pull it out. Oh yes... If it's btree that's broken, and you don't want to debug the btree code yourself, please send us, or me, a copy of the data you get out -- with actual key data overwritten, of course -- in a bug report so we can try to fix it, assuming we don't switch database formats. Changing principal names or sizes of records may make it impossible to reproduce the problem. > The really odd part is that the principals that don't show up are in the > database and continue to work fine. Users can get tickets, use them for > rlogin/telnet/ftp, and change their passwords. We can do getprinc for any > one of the missing entries and they show up just fine. But running getprincs > to list the entire database or kdb5_util dump both fail to list them. Yes, this is consistent. Random and sequential access often use very different code paths. > BTW, I tried using > > kdb5_util dump dump.out <principal> > > to dump a single principal and didn't get the principal dumped. Instead, > it appeared to dump just the policies that we have defined. Am I misreading > the man page? I had hoped to be able to dump each individual principal, > append to a file, and possibly reload the database. Sorry.... The implementation of that form is basically, "while you're walking through the database, ignore entries not matching one of these principal names". So if the normal dump doesn't see the principal, this form won't either. > Any suggestions on troubleshooting this? Could it be a buffer being over- > run someplace? There is a chance that it's just a bug with sequentially retrieving data from the database. The only way that helps you, though, is that *if* you go and find the bug and fix it, then you don't still have the problem of extracting what data you can from a broken database. The problem still has to be fixed for your slave KDCs to become useful again. Ken ________________________________________________ Kerberos mailing list [EMAIL PROTECTED] http://mailman.mit.edu/mailman/listinfo/kerberos