Andy LoPresto created NIFI-3313:
-----------------------------------

             Summary: First deployment of NiFi can hang on VMs without 
sufficient entropy if using /dev/random
                 Key: NIFI-3313
                 URL: https://issues.apache.org/jira/browse/NIFI-3313
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 1.1.1
            Reporter: Andy LoPresto
            Assignee: Andy LoPresto
            Priority: Critical


h1. Analysis of Issue

h2. Statement of Problem:

NiFi deployed on headless VM (little user interaction by way of keyboard and 
mouse I/O) can take 5-10 minutes (reported) to start up. User reports this 
occurs on a "secure" cluster. Further examination is required to determine 
which specific process requires the large amount of random input (no steps to 
reproduce, configuration files, logs, or VM environment information provided). 

h2. Context

The likely cause of this issue is that a process is attempting to read from 
_/dev/random_, a \*nix "device" providing a pseudo-random number generator 
(PRNG). Also available is _/dev/urandom_, a related PRNG. Despite common 
misperceptions, _/dev/urandom_ is not "less-secure" than _/dev/random_ for all 
general use cases. _/dev/random_ blocks if the entropy *estimate* (a "guess" of 
the existing entropy introduced into the pool) is lower than the amount of 
random data requested by the caller. In contrast, _/dev/urandom_ does not 
block, but provides the output of the same cryptographically-secure PRNG 
(CSPRNG) that _/dev/random_ reads from \[myths\]. After as little as 256 bytes 
of initial seeding, accessing _/dev/random_ and _/dev/urandom_ are functionally 
equivalent, as the long period of random data generated will not require 
re-seeding before sufficient entropy can be provided again. 

As mentioned earlier, further examination is required to determine if the 
process requiring random input occurs at application boot or only at "machine" 
(hardware or VM) boot. On the first deployment of the system with certificates, 
the certificate generation process will require substantial random input. 
However, on application launch and connection to a cluster, even the TLS/SSL 
protocol requires some amount of random input. 

h2. Proposed Solutions

h3. rngd

A software toolset for accessing dedicated hardware PRNG (*true* RNG, or TRNG) 
called _rng-tools_ \[rngtools\] exists for Linux. Specialized hardware, as well 
as Intel chips from IvyBridge and on (2012), can provide hardware-generated 
random input to the kernel. Using the daemon _rngd_ to seed the _/dev/random_ 
and _/dev/urandom_ entropy pool is the simplest solution. 

*Note: Do not use _/dev/urandom_ to seed _/dev/random_ using _rngd_. This is 
like running a garden hose from a car's exhaust back into its gas tank and 
trying to drive.*

h3. Instruct Java to use /dev/urandom

The Java Runtime Environment (JRE) can be instructed to use _/dev/urandom_ for 
all invocations of {{SecureRandom}}, either on a per-Java process basis 
\[jdk-urandom\] or in the JVM configuration \[oracle-urandom\], which means it 
will not block on server startup. The NiFi {{bootstrap.conf}} file can be 
modified to contain an additional Java argument directing the JVM to use 
_/dev/urandom_. 

h2. Other Solutions

h3. Entropy Gathering Tools

Tools to gather entropy from non-standard sources (audio card noise, video 
capture from webcams, etc.) have been developed such as audio-entropyd 
\[wagner\], but these tools are not verified or well-examined -- usually when 
tested, they are only tested for the strength of their PRNG, not the ability of 
the tool to capture entropy and generate sufficiently random data unavailable 
to an attacker who may be able to determine the internal state. 

h3. haveged

A solution has been proposed to use {{havaged}} \[haveged\], a user-space 
daemon relying on the HAVEGE (HArdware Volatile Entropy Gathering and 
Expansion) construct to continually increase the entropy on the system, 
allowing _/dev/random_ to run without blocking. 

However, on further investigation, multiple sources indicate this solution may 
be insecure \[dice\]\[leek-havege\]. 

Michael Kerrisk: 

bq. Having read a number of papers about HAVEGE, Peter \[Anvin\] said he had 
been unable to work out whether this was a "real thing". Most of the papers 
that he has read run along the lines, "we took the output from HAVEGE, and ran 
some tests on it and all of the tests passed". The problem with this sort of 
reasoning is the point that Peter made earlier: there are no tests for 
randomness, only for non-randomness.
bq. One of Peter's colleagues replaced the random input source employed by 
HAVEGE with a constant stream of ones. All of the same tests passed. In other 
words, all that the test results are guaranteeing is that the HAVEGE developers 
have built a very good PRNG. It is possible that HAVEGE does generate some 
amount of randomness, Peter said. But the problem is that the proposed source 
of randomness is simply too complex to analyze; thus it is not possible to make 
a definitive statement about whether it is truly producing randomness. (By 
contrast, the HWRNGs that Peter described earlier have been analyzed to produce 
a quantum theoretical justification that they are producing true randomness.) 
"So, while I can't really recommend it, I can't not recommend it either." If 
you are going to run HAVEGE, Peter strongly recommended running it together 
with rngd, rather than as a replacement for it.

Tom Leek:

bq. Of course, the whole premise of HAVEGE is questionable. For any practical 
security, you need a few "real random" bits, no more than 200, which you use as 
seed in a cryptographically secure PRNG. The PRNG will produce gigabytes of 
pseudo-\[data\] indistinguishable from true randomness, and that's good enough 
for all practical purposes.
bq. Insisting on going back to the hardware for every bit looks like yet 
another outbreak of that flawed idea which sees entropy as a kind of gasoline, 
which you burn up when you look at it.

h2. Next Steps

As described above, further investigation is necessary, but moving forward, 
barring new information, I would propose directing the JVM to use 
_/dev/urandom_ and making _rngd_ available to systems that support a TRNG. 

[myths] http://www.2uo.de/myths-about-urandom/
[rngtools] 
https://git.kernel.org/cgit/utils/kernel/rng-tools/rng-tools.git/about/
[jdk-urandom] http://stackoverflow.com/a/2325109/70465
[oracle-urandom] 
https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmrand.html
[wagner] https://people.eecs.berkeley.edu/~daw/rnd/
[haveged] http://www.issihosts.com/haveged/
[dice] https://lwn.net/Articles/525459/
[leek-havege] http://security.stackexchange.com/a/34552/16485




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to