Hi Ben,

On Sun, May 06, 2018 at 06:56:08PM +0100, Ben Hutchings wrote:
> I've cloned this bug as #898073 and reassigned that to krb5.
> 
> krb5 is using the new(ish) getrandom() system call to read random bits,
> with the code comment "This ensures strong randomness while only
> blocking during first system boot."
> 
> While this is a regression, the kernel is only doing what krb5 was
> asking for (whereas previously it could wrongly provide weak random
> bits).
> 
> We might still revert this change in the kernel temporarily.  However,
> the krb5 developers need to decide what they really want, and if that's
> strong randomness then they need to configure the service to allow for
> a longer delay at boot.

I read through the history on #898073 and am still not sure I have
the backstory quite right.  This is what it sounds like has
happened:

The kernel in stable has for some time provided a getrandom() system
call that provided "weak" (more on this later) random numbers for
some time after startup, though did eventually converge to "strong"
randomness after some time (a few minutes?).  The kernel 4.9.88-1
upload fixed the bug that getrandom() could provide "weak" output
(since getrandom() is supposed to block until strong output is
ready), and this in turn caused the krb5 KDC to block at boot until
the RNG was ready, blocking long enough that systemd timed out the
unit and marked it as failed.  We're now talking about the proper
way to improve the situation.

If the above is correct, I'm not yet sure that I see a krb5-specific
bug.  It is definitely true that krb5 is specifically requesting the
getrandom() semantics of blocking until the RNG is fully seeded, but
krb5 is hardly expected to be the only consumer of getrandom().  As
such, why should krb5 be responsible for increasing the systemd
timeout at boot -- could not systemd be responsible for increasing
the default timeout to allow for entropy seeding as used by multiple
applications?  Arguably more preferable would be to have a systemd
target that indicates the RNG is seeded, and then krb5 could have
its KDC service depend on this "RNG-available" service.  So far as I
know, no such service currently exists, so again, there would need
to be some sytsemd effort (potentially in cooperation with the
kernel) to provide such a service.

To rephrase in a different way, "getrandom() is a system service,
and the system's init system should not penalize other services for
using system services -- why should the onus of adapting be placed on
individual consumers of that system service?"


Back to the "weak" random numbers.  How weak are we talking about?
The krb5 KDC and kadmind are used (among other things) to generate
shared symmetric keys, used to encrypt and authenticate traffic over
the network.  Some of these keys are long-lived, and an
insufficiently random long-lived key could have rather disasterous
consequences for deployments unlucky enough to have generated them.
Are we looking at a repeat of the openssl RNG fiasco where piles of
ssh keys and TLS certificates had to be regenerated?  If there's a
real issue here of weak randomness, we may need to publicize this
issue much more widely.

Thanks,

Ben

Attachment: signature.asc
Description: PGP signature

Reply via email to