Re: [blfs-support] Unbound slow to start with recent kernels on some machines

Richard Melville Thu, 19 Jul 2018 02:29:07 -0700

On 19 July 2018 at 04:47, Bruce Dubbs <bruce.du...@gmail.com> wrote:

> On 07/18/2018 08:04 PM, Ken Moffat wrote:
>
>> On Sat, Jun 02, 2018 at 10:02:39PM +0100, Ken Moffat wrote:
>>
>> I've been seeing problems on some of my machines with recent kernels
>>> (first noticed in 4.17-rc, but it also now happends in 4.16.4 or
>>> later).  The problem is that instead of unbound taking a handful of
>>> seconds to start (often, it is all-but immediate), on the affected
>>> machines it now takes up to two and a half minutes.
>>>
>>>
>> Finally, making slow progress on this.  The problem is caused by the
>> fix for CVE-2018-1108.  A little while ago Ted Ts'o offered a patch,
>> possibly as an RFC, to use entropy from the hwrng (unsafe for
>> critical things like key generation, but it allows less-important
>> things, e.g. in systemd units, to run and therefore it lets the box
>> boot in the absence of real entropy.
>>
>> Apparently he did this because fedora are starting to derive
>> "entropy" from jitter so that e.g. VMs can boot in a meaningful
>> time.
>>
>> For my haswell that was great, but for my kaveri it made no
>> difference - turns out that the kaveri does NOT have a hwrng (I
>> enabled the option, and /dev/hwrng exists, but reading it with dd
>> reports 'No such file').
>>
>> And the patch which introduced this fix can no-longer be reverted,
>> parts of the file, at least in 4.18-rc5, have been rewritten.
>>
>> What I will now be looking at is twofold:
>>
>> 1. start the random bootscript earlier (currently it is S25, but
>> unbound is S21; S15 - just after sysklogd - looks likely).
>> For systemd, I've no idea how to change the dependencies.
>>
>>   AND
>>
>> 2. persuade unbound to use /dev/urandom.
>>
>> Googling, mostly unsuccessfully, I found that Nixos create
>> /var/lib/unbound/dev/random (sic) with /var/lib/unbound as the home
>> directory for the unbound user, and binds /dev/urandom to it.  They
>> also seem to move the root key, and perhaps unbound.conf, to that
>> directory.  So, as well as moving the random script, the unbound
>> bootscript needs to be modified (and unmount afterwards).
>>
>> To recap, only some of my machines with an SSD (and no 'spinning
>> rust') are affected.
>>
>> The alternative for the second part is to hack unbound.  In 1.7.1,
>> the compat/getentropy_linux.c file has:
>>
>> #if defined(SYS_getrandom) && defined(__NR_getrandom)
>>          /*
>>           * Try descriptor-less getrandom()
>>           */
>>          ret = getentropy_getrandom(buf, len);
>>          if (ret != -1)
>>                  return (ret);
>>          if (errno != ENOSYS)
>>                  return (-1);
>> #endif
>>
>>          /*
>>           * Try to get entropy with /dev/urandom
>>           *
>>           * This can fail if the process is inside a chroot or if file
>>           * descriptors are exhausted.
>>           */
>>          ret = getentropy_urandom(buf, len);
>>          if (ret != -1)
>>                  return (ret);
>>
>> #ifdef SYS__sysctl
>>          /*
>>           * Try to use sysctl CTL_KERN, KERN_RANDOM, RANDOM_UUID.
>>           * sysctl is a failsafe API, so it guarantees a result.  This
>>           * should work inside a chroot, or when file descriptors are
>>           * exhausted.
>>           *
>>           * However this can fail if the Linux kernel removes support
>>           * for sysctl.  Starting in 2007, there have been efforts to
>>           * deprecate the sysctl API/ABI, and push callers towards use
>>           * of the chroot-unavailable fd-using /proc mechanism --
>>           * essentially the same problems as /dev/urandom.
>>           *
>>           * Numerous setbacks have been encountered in their deprecation
>>           * schedule, so as of June 2014 the kernel ABI still exists on
>>           * most Linux architectures. The sysctl() stub in libc is missing
>>           * on some systems.  There are also reports that some kernels
>>           * spew messages to the console.
>>           */
>>          ret = getentropy_sysctl(buf, len);
>>          if (ret != -1)
>>                  return (ret);
>> #endif /* SYS__sysctl */
>>
>> If it gets to this point, on linux it then uses
>> getentropy_fallback().
>>
>> What is happening is that it hangs until hammering on the keyboard
>> has generated enough entropy, so I'm currently assuming that the
>> initial ret = getentropy_getrandom(buf, len); now blocks until
>> sufficient entropy is available - and that is the expected behaviour
>> on linux.
>>
>> To be honest, deleting that chunk of code looks easiest, but it
>> brings an ongoing maintenance commitment (1.7.1 is no longer
>> current, and whatever else happens there will probably be newer
>> versions in the future).  This is the sort of case where I like
>> patches, they either apply to a new version, or they don't (whereas
>> deleting lines in sed might remove the wrong content).
>>
>> For the unbound systemd unit, again I have no idea what to change.
>>
>> Opinions on whether it is better to change the bootscript (assuming
>> that works) or hack the code ?  In either case, urandom needs to be
>> seeded earlier.
>>
>> Either way, this is not my number one priority.  But it would be
>> nice to fix it before 8.3.
>>
>
> Have you tried using haveged?  It's boot order is S21 and will start
> slightly before unbound.  That still leaves the problem of unbound using
> /dev/urandom, but it may help.
>
> I already suggested that -- Ken doesn't like it.  I use SSDs and it works
for me.


Richard

-- 
http://lists.linuxfromscratch.org/listinfo/blfs-support
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page

Re: [blfs-support] Unbound slow to start with recent kernels on some machines

Reply via email to