The following email is about an issue in the Heimdal libraries, and how it affects OpenAFS fileservers. If you know that you don't use Heimdal anywhere at your site, or if you only use Heimdal on Linux, then you are not affected by this and you can probably skip this. If you use Heimdal on Solaris or AIX (or any such "commercial Unix"), you probably do want to read this.
Anyway, we've become recently aware that Heimdal on certain platforms does not report errors correctly in threaded environments. This means that certain functions in Heimdal's libraries in certain conditions will return 0 (success), even when they have encountered an error and bailed out. I believe this happens with all known currently-existing Heimdal versions. At the time of this writing, the newest stable release of Heimdal is 1.5.3, and the head of the 1.6 branch is 005f69c0cbe3538cbdb2f8808114b48995e0ca32. I can confirm that the relevant issue does _not_ appear on Linux or FreeBSD, but it _does_ appear on Solaris and AIX. I haven't tested every version and variation of those platforms of course, but I don't expect it to vary much between versions. This issue is not OpenAFS-specific, and is an issue with Heimdal itself. However, I believe this is of particular interest to OpenAFS, because OpenAFS has not made libkrb5 calls from within threads until recently; we started doing that in OpenAFS 1.6.5. So, this issue does not occur with OpenAFS versions before 1.6.5, but it can happen with 1.6.5 and later. The most obvious way this issue manifests is that the fileserver can crash very quickly, if you are not using rxkad.keytab. This is the issue reported in <https://rt.central.org/rt/Ticket/Display.html?id=131852&user=guest&pass=guest>, and is easily worked around by just creating an rxkad.keytab file. However, if you are using rxkad.keytab with a problematic Heimdal library, you won't see such a crash, but there still may be other issues. Since the underlying problem is that errors are not reported properly, there may be other issues in other areas of code that are less obvious. Although I don't know of any particular issues, running the fileserver in such an environment would make me personally very nervous; I would consider such an environment to effectively be "undefined behavior" at this point. So, if you are on a problematic platform, I would personally recommend avoiding running OpenAFS >= 1.6.5 servers with Heimdal right now. Either downgrade OpenAFS, link to a different libkrb5 library, or don't link to a libkrb5 library at all (not linking to libkrb5 removes rxkad-k5 support, so you can only use DES). And of course, if this issue concerns you, contact your support vendor and/or go talk to Heimdal. It should be noted that running Heimdal on those platforms is probably very uncommon, so we (OpenAFS) aren't making a big fuss about it. (At least Heimdal 1.5.3 and 1.5.2 don't build on those platforms without modification.) But I wanted to at least send something to give anyone a chance to notice this, if they are for some reason running such an environment. As for what OpenAFS will do about this, the current plan is that we'll include a workaround in OpenAFS to avoid the crash when not using rxkad.keytab, but nothing more. We could possibly detect the issue when using rxkad.keytab as well, and turn off rxkad-k5 when we detect it, but right now it doesn't seem "worth it" to do that (and personally to me doing that feels pretty ridiculous). Some brief technical details: Heimdal doesn't build with -mt/-pthread in CFLAGS, so anything that uses 'errno' doesn't work properly on Solaris/AIX. This is because _REENTRANT changes the definition of 'errno' from a regular global int to a function call to give a thread-specific storage location; so on Solaris/AIX you get only the "main thread" errno when you reference errno without -D_REENTRANT. On Linux/FreeBSD/others, errno is always defined as using a function call, so it still works well enough even without -pthread. The crash occurs because accessing rxkad.keytab fails with ENOENT, but Heimdal errors out with the error code '0', leaving some pointers in a structure set to NULL or something similarly weird/wrong. We see the function return success, so we think everything is fine, and calling subsequent libkrb5 functions on that structure segfaults. See <https://rt.central.org/rt/Ticket/Display.html?id=131852&user=guest&pass=guest> for any more details, of course. -- Andrew Deason adea...@sinenomine.net _______________________________________________ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info