On Sun, 2018-05-06 at 14:02 -0500, Benjamin Kaduk wrote:
> Hi Ben,
> 
> On Sun, May 06, 2018 at 06:56:08PM +0100, Ben Hutchings wrote:
> > I've cloned this bug as #898073 and reassigned that to krb5.
> > 
> > krb5 is using the new(ish) getrandom() system call to read random bits,
> > with the code comment "This ensures strong randomness while only
> > blocking during first system boot."
> > 
> > While this is a regression, the kernel is only doing what krb5 was
> > asking for (whereas previously it could wrongly provide weak random
> > bits).
> > 
> > We might still revert this change in the kernel temporarily.  However,
> > the krb5 developers need to decide what they really want, and if that's
> > strong randomness then they need to configure the service to allow for
> > a longer delay at boot.
> 
> I read through the history on #898073 and am still not sure I have
> the backstory quite right.  This is what it sounds like has
> happened:
> 
> The kernel in stable has for some time provided a getrandom() system
> call that provided "weak" (more on this later) random numbers for
> some time after startup, though did eventually converge to "strong"
> randomness after some time (a few minutes?).  The kernel 4.9.88-1
> upload fixed the bug that getrandom() could provide "weak" output
> (since getrandom() is supposed to block until strong output is
> ready), and this in turn caused the krb5 KDC to block at boot until
> the RNG was ready, blocking long enough that systemd timed out the
> unit and marked it as failed.  We're now talking about the proper
> way to improve the situation.

Right.

> If the above is correct, I'm not yet sure that I see a krb5-specific
> bug.  It is definitely true that krb5 is specifically requesting the
> getrandom() semantics of blocking until the RNG is fully seeded, but
> krb5 is hardly expected to be the only consumer of getrandom().  As
> such, why should krb5 be responsible for increasing the systemd
> timeout at boot -- could not systemd be responsible for increasing
> the default timeout to allow for entropy seeding as used by multiple
> applications?

How would systemd determine which systems require this?

> Arguably more preferable would be to have a systemd
> target that indicates the RNG is seeded, and then krb5 could have
> its KDC service depend on this "RNG-available" service.  So far as I
> know, no such service currently exists, so again, there would need
> to be some sytsemd effort (potentially in cooperation with the
> kernel) to provide such a service.

Yes, that certainly seems like a good approach.

> To rephrase in a different way, "getrandom() is a system service,
> and the system's init system should not penalize other services for
> using system services -- why should the onus of adapting be placed on
> individual consumers of that system service?"
> 
> 
> Back to the "weak" random numbers.  How weak are we talking about?

If I'm reading the code correctly, the previous condition for
successful return of getrandom() (without the GRND_RANDOM flag) was
that at least 64 bits of raw random data have been added to the random
pool.  The raw random data might come from a high quality hardware
random number generator but might be much weaker.  The current
condition is that at least 128 bits of entropy have been added (based
on a conservative estimate of entropy).

> The krb5 KDC and kadmind are used (among other things) to generate
> shared symmetric keys, used to encrypt and authenticate traffic over
> the network.  Some of these keys are long-lived, and an
> insufficiently random long-lived key could have rather disasterous
> consequences for deployments unlucky enough to have generated them.
> Are we looking at a repeat of the openssl RNG fiasco where piles of
> ssh keys and TLS certificates had to be regenerated?  If there's a
> real issue here of weak randomness, we may need to publicize this
> issue much more widely.

The real issue is that k5_get_os_entropy() silently falls back to
reading /dev/urandom, which has never, and will never, wait for a
reasonable amount of entropy to be available.

Worse still, it ignores the "strong" flag when calling getrandom().

If you're serious about the quality of your random numbers, you need to
deal with those issues rather than quibbling about whether the kernel
issue (CVE-2018-1108) is a "fiasco" or not.

Ben.

-- 
Ben Hutchings
If more than one person is responsible for a bug, no one is at fault.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to