On Sun, May 06, 2018 at 08:43:13PM +0100, Ben Hutchings wrote:
> On Sun, 2018-05-06 at 14:02 -0500, Benjamin Kaduk wrote:
> > Hi Ben,
> > 
> > On Sun, May 06, 2018 at 06:56:08PM +0100, Ben Hutchings wrote:
> > > I've cloned this bug as #898073 and reassigned that to krb5.
> > > 
> > > krb5 is using the new(ish) getrandom() system call to read random bits,
> > > with the code comment "This ensures strong randomness while only
> > > blocking during first system boot."
> > > 
> > > While this is a regression, the kernel is only doing what krb5 was
> > > asking for (whereas previously it could wrongly provide weak random
> > > bits).
> > > 
> > > We might still revert this change in the kernel temporarily.  However,
> > > the krb5 developers need to decide what they really want, and if that's
> > > strong randomness then they need to configure the service to allow for
> > > a longer delay at boot.
> > 
> > I read through the history on #898073 and am still not sure I have
> > the backstory quite right.  This is what it sounds like has
> > happened:
> > 
> > The kernel in stable has for some time provided a getrandom() system
> > call that provided "weak" (more on this later) random numbers for
> > some time after startup, though did eventually converge to "strong"
> > randomness after some time (a few minutes?).  The kernel 4.9.88-1
> > upload fixed the bug that getrandom() could provide "weak" output
> > (since getrandom() is supposed to block until strong output is
> > ready), and this in turn caused the krb5 KDC to block at boot until
> > the RNG was ready, blocking long enough that systemd timed out the
> > unit and marked it as failed.  We're now talking about the proper
> > way to improve the situation.
> 
> Right.
> 
> > If the above is correct, I'm not yet sure that I see a krb5-specific
> > bug.  It is definitely true that krb5 is specifically requesting the
> > getrandom() semantics of blocking until the RNG is fully seeded, but
> > krb5 is hardly expected to be the only consumer of getrandom().  As
> > such, why should krb5 be responsible for increasing the systemd
> > timeout at boot -- could not systemd be responsible for increasing
> > the default timeout to allow for entropy seeding as used by multiple
> > applications?
> 
> How would systemd determine which systems require this?

I didn't have anything in mind other than globally increasing the
default timeout.

> > Arguably more preferable would be to have a systemd
> > target that indicates the RNG is seeded, and then krb5 could have
> > its KDC service depend on this "RNG-available" service.  So far as I
> > know, no such service currently exists, so again, there would need
> > to be some sytsemd effort (potentially in cooperation with the
> > kernel) to provide such a service.
> 
> Yes, that certainly seems like a good approach.

Do you know who would be the right person to talk to about getting
that work done?

> > To rephrase in a different way, "getrandom() is a system service,
> > and the system's init system should not penalize other services for
> > using system services -- why should the onus of adapting be placed on
> > individual consumers of that system service?"
> > 
> > 
> > Back to the "weak" random numbers.  How weak are we talking about?
> 
> If I'm reading the code correctly, the previous condition for
> successful return of getrandom() (without the GRND_RANDOM flag) was
> that at least 64 bits of raw random data have been added to the random
> pool.  The raw random data might come from a high quality hardware
> random number generator but might be much weaker.  The current
> condition is that at least 128 bits of entropy have been added (based
> on a conservative estimate of entropy).

Thanks for sharing your interpretation.  Hmm, 64 bits is not very
much (e.g., 64^W56-bit single-DES keys are brute-forceable at
relatively low cost, these days), though I don't have a sense for
what the weakest source that could be used is.  It's of course not
just as simple as the first 64 bits, since other input is
continually added, but it sounds like there is some
larger-than-normal-security-margin chance that an attacker could
reproduce a key that was generated on a user system.  It sounds like
we should try to get some additional eyes on this.

> > The krb5 KDC and kadmind are used (among other things) to generate
> > shared symmetric keys, used to encrypt and authenticate traffic over
> > the network.  Some of these keys are long-lived, and an
> > insufficiently random long-lived key could have rather disasterous
> > consequences for deployments unlucky enough to have generated them.
> > Are we looking at a repeat of the openssl RNG fiasco where piles of
> > ssh keys and TLS certificates had to be regenerated?  If there's a
> > real issue here of weak randomness, we may need to publicize this
> > issue much more widely.
> 
> The real issue is that k5_get_os_entropy() silently falls back to
> reading /dev/urandom, which has never, and will never, wait for a
> reasonable amount of entropy to be available.
> 
> Worse still, it ignores the "strong" flag when calling getrandom().

I think this risks reopening the debate about the design of the
kernel PRNG, which I don't wish to do at this time.  Suffice it to
say that I believe the upstream krb5 developers are aware of the
interface limitations of getrandom and /dev/[u]random, and made an
informed decision on the usage of those APIs within krb5.  (In
essence, one could consider the 'strong' flag of
krb5_get_os_entropy() to indicate "yes, really ensure the PRNG is
fully seeded", and a lack of belief that there is any other
qualitative difference between PRNG "quality" given the kernel
implementation.)

> If you're serious about the quality of your random numbers, you need to
> deal with those issues rather than quibbling about whether the kernel
> issue (CVE-2018-1108) is a "fiasco" or not.

It sounds like my first mail came off badly.  I was just trying to
use the openssl issue as a metric by which to assess the current
issue's severity, not trying to imply that they were of comparable
severity.  Given my duties on the IESG I don't expect to be able to
look at the actual kernel changes in the near future, and the
summary attached to (e.g.)
https://access.redhat.com/security/cve/cve-2018-1108 is
insufficiently informative, so I don't know how else to assess the
details.

Thanks again,

Ben


Attachment: signature.asc
Description: PGP signature

Reply via email to