Andy LoPresto created NIFI-3313:
-----------------------------------
Summary: First deployment of NiFi can hang on VMs without
sufficient entropy if using /dev/random
Key: NIFI-3313
URL: https://issues.apache.org/jira/browse/NIFI-3313
Project: Apache NiFi
Issue Type: Bug
Components: Core Framework
Affects Versions: 1.1.1
Reporter: Andy LoPresto
Assignee: Andy LoPresto
Priority: Critical
h1. Analysis of Issue
h2. Statement of Problem:
NiFi deployed on headless VM (little user interaction by way of keyboard and
mouse I/O) can take 5-10 minutes (reported) to start up. User reports this
occurs on a "secure" cluster. Further examination is required to determine
which specific process requires the large amount of random input (no steps to
reproduce, configuration files, logs, or VM environment information provided).
h2. Context
The likely cause of this issue is that a process is attempting to read from
_/dev/random_, a \*nix "device" providing a pseudo-random number generator
(PRNG). Also available is _/dev/urandom_, a related PRNG. Despite common
misperceptions, _/dev/urandom_ is not "less-secure" than _/dev/random_ for all
general use cases. _/dev/random_ blocks if the entropy *estimate* (a "guess" of
the existing entropy introduced into the pool) is lower than the amount of
random data requested by the caller. In contrast, _/dev/urandom_ does not
block, but provides the output of the same cryptographically-secure PRNG
(CSPRNG) that _/dev/random_ reads from \[myths\]. After as little as 256 bytes
of initial seeding, accessing _/dev/random_ and _/dev/urandom_ are functionally
equivalent, as the long period of random data generated will not require
re-seeding before sufficient entropy can be provided again.
As mentioned earlier, further examination is required to determine if the
process requiring random input occurs at application boot or only at "machine"
(hardware or VM) boot. On the first deployment of the system with certificates,
the certificate generation process will require substantial random input.
However, on application launch and connection to a cluster, even the TLS/SSL
protocol requires some amount of random input.
h2. Proposed Solutions
h3. rngd
A software toolset for accessing dedicated hardware PRNG (*true* RNG, or TRNG)
called _rng-tools_ \[rngtools\] exists for Linux. Specialized hardware, as well
as Intel chips from IvyBridge and on (2012), can provide hardware-generated
random input to the kernel. Using the daemon _rngd_ to seed the _/dev/random_
and _/dev/urandom_ entropy pool is the simplest solution.
*Note: Do not use _/dev/urandom_ to seed _/dev/random_ using _rngd_. This is
like running a garden hose from a car's exhaust back into its gas tank and
trying to drive.*
h3. Instruct Java to use /dev/urandom
The Java Runtime Environment (JRE) can be instructed to use _/dev/urandom_ for
all invocations of {{SecureRandom}}, either on a per-Java process basis
\[jdk-urandom\] or in the JVM configuration \[oracle-urandom\], which means it
will not block on server startup. The NiFi {{bootstrap.conf}} file can be
modified to contain an additional Java argument directing the JVM to use
_/dev/urandom_.
h2. Other Solutions
h3. Entropy Gathering Tools
Tools to gather entropy from non-standard sources (audio card noise, video
capture from webcams, etc.) have been developed such as audio-entropyd
\[wagner\], but these tools are not verified or well-examined -- usually when
tested, they are only tested for the strength of their PRNG, not the ability of
the tool to capture entropy and generate sufficiently random data unavailable
to an attacker who may be able to determine the internal state.
h3. haveged
A solution has been proposed to use {{havaged}} \[haveged\], a user-space
daemon relying on the HAVEGE (HArdware Volatile Entropy Gathering and
Expansion) construct to continually increase the entropy on the system,
allowing _/dev/random_ to run without blocking.
However, on further investigation, multiple sources indicate this solution may
be insecure \[dice\]\[leek-havege\].
Michael Kerrisk:
bq. Having read a number of papers about HAVEGE, Peter \[Anvin\] said he had
been unable to work out whether this was a "real thing". Most of the papers
that he has read run along the lines, "we took the output from HAVEGE, and ran
some tests on it and all of the tests passed". The problem with this sort of
reasoning is the point that Peter made earlier: there are no tests for
randomness, only for non-randomness.
bq. One of Peter's colleagues replaced the random input source employed by
HAVEGE with a constant stream of ones. All of the same tests passed. In other
words, all that the test results are guaranteeing is that the HAVEGE developers
have built a very good PRNG. It is possible that HAVEGE does generate some
amount of randomness, Peter said. But the problem is that the proposed source
of randomness is simply too complex to analyze; thus it is not possible to make
a definitive statement about whether it is truly producing randomness. (By
contrast, the HWRNGs that Peter described earlier have been analyzed to produce
a quantum theoretical justification that they are producing true randomness.)
"So, while I can't really recommend it, I can't not recommend it either." If
you are going to run HAVEGE, Peter strongly recommended running it together
with rngd, rather than as a replacement for it.
Tom Leek:
bq. Of course, the whole premise of HAVEGE is questionable. For any practical
security, you need a few "real random" bits, no more than 200, which you use as
seed in a cryptographically secure PRNG. The PRNG will produce gigabytes of
pseudo-\[data\] indistinguishable from true randomness, and that's good enough
for all practical purposes.
bq. Insisting on going back to the hardware for every bit looks like yet
another outbreak of that flawed idea which sees entropy as a kind of gasoline,
which you burn up when you look at it.
h2. Next Steps
As described above, further investigation is necessary, but moving forward,
barring new information, I would propose directing the JVM to use
_/dev/urandom_ and making _rngd_ available to systems that support a TRNG.
[myths] http://www.2uo.de/myths-about-urandom/
[rngtools]
https://git.kernel.org/cgit/utils/kernel/rng-tools/rng-tools.git/about/
[jdk-urandom] http://stackoverflow.com/a/2325109/70465
[oracle-urandom]
https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmrand.html
[wagner] https://people.eecs.berkeley.edu/~daw/rnd/
[haveged] http://www.issihosts.com/haveged/
[dice] https://lwn.net/Articles/525459/
[leek-havege] http://security.stackexchange.com/a/34552/16485
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)