On Fri, Sep 01, 2017 at 09:38:14PM -0500, Mario Castelán Castro wrote: > On 01/09/17 18:43, Zenaan Harkness wrote: > > (Probably obvious, but as long as you're reading from urandom, > > "entropy" is the wrong word, in this context, better to say "128 bits > > of crytographically secure numbers" as that which has been said e.g. > > by the Linux kernel urandom developers as being "crypographically > > secure" has changed a few times, and may change again in the future - > > it it truly were entropy (as /dev/random suggests it provides), the > > ongoing changes for "security" would not be necessary.) > > No. Entropy is the appropriate word. Please recall that “entropy” is > just a different scale
Use of the word "scale" is one example of things that lead people to use loose terms like "stretching of entropy", which, though useful in certain contexts, not only readily give rise to imprecise comprehension in the mind of someone who has no robust definition of the term, but is mathematically bogus on the face of it, unless one gets really really precise in each and every definition of every term in ones "turtles on turtles" stack of term. We humans are in general woefully untrained in axiomatic communication and thus abundant confusions and misunderstandings arise (and I'm no less guilty of misunderstandings than anyone else - this is in the nature of human communication). > for probability and quantities comparable to > probability (like expected probability). Nothing more, nothing less. https://en.wikipedia.org/wiki/Entropy_(information_theory) "Information entropy is defined as the average amount of information produced by a probabilistic stochastic source of data." (See also for disambiguation: https://en.wikipedia.org/wiki/Entropy_(disambiguation)#Information_theory_and_mathematics ) Now let's go to that first links second sentence: "The measure of information entropy associated with each possible data value is the negative logarithm of the probability mass function for the value." I am not mathematically literate enough to even properly parse that sentence! The last two sentences of that first paragraph sound a little more comprehendable/ promising: "Generally, entropy refers to disorder or uncertainty, and the definition of entropy used in information theory is directly analogous to the definition used in statistical thermodynamics. The concept of information entropy was introduced by Claude Shannon in his 1948 paper "A Mathematical Theory of Communication"." >From my naieve comprehension of what I read here, the term "entropy stretching" kind of makes sense - there's a statistical "amount" of randomness, and that randomness is "spread" over the Linux kernel's "entropy pool" by the mixing function (ChaCha or something these days), and so there may only be 1 bit of entropy fed into that pool in say a 5 minute period, which could make it easier for an attacker to reverse-calculate a primary key generated some minutes ago if say he is able to suck out a high rate of numbers directly from the kernel's /dev/urandom "output source". Yet as soon as one more bit is fed ("randomly" by the mixing function) into that pool about 5 minutes later (and thereafter further bits each 5 minutes), the difficulty of reverse calculating what some key-generating program may have been delivered by /dev/urandom ~5minutes ago, steadily becomes exponentially more difficult - even IF you were able to extract a high rate of output from /dev/urandom. On this basis of "understanding", I can understand why Ted Ts'o appears to be saying "just use /dev/urandom, even when you're generating primary keys for highly important data" - there's enough "entropy spread 'randomly' throughout the kernel's 'entropy' pool" that /dev/urandom is nowhere near the weak link in your security (software) stack, let alone your hardware stack with e.g. Intel RME etc etc. > Also note that all the theoretical (and very unrealistic) attacks on > /dev/urandom apply only when the attacker knows part of the *past* > output of /dev/urandom, and he uses this to predict the *current* and > *future* output of /dev/urandom. This is not applicable in our scenario. In general, absolutely yes. The difficulty of extracting any rate of numbers out of your target's remote server's /dev/urandom device, is a significant challenge in and of itself, and if you've achieved that, you almost certainly have 0wned the machine already, and it's at that point one hell of a lot easier to just scan the memory of that server to extract the keys of interest directly - thus in real world scenarios, such /dev/urandom attacks are (have always been?) pretty close to "entirely theoretical, no one would ever bother anyway given any reasonable implementation of /dev/urandom". > In short: Given that the state of the CSPRNG is larger than the amount > of bits read[1], the bits can be assumed to be distributed at random. Ack. > Longer answer: > > According to my reading of > <https://github.com/torvalds/linux/blob/master/drivers/char/random.c>, > /dev/urandom uses a variation of ChaCha20 which is periodically > re-seeded from the “entropy pool”. > > In a reasonable scenario for password generation, the attacker does not > know the state of the 512-bit CRNG state, and so the best he can do in > practice is to model it with uniform probability distribution. Ack. > According to my understanding, the output of /dev/urandom when reading > with my command will be truncate(ChaCha20(X)) where (X) is the aforesaid > 512-bit state and “truncate” is the function that returns the first 128 > bits of its input. The processing with ChaCha20 and truncation skew the > distribution a bit, but this is negligible. Interesting - I thought ChaCha was being used because it was such a good (non-skewing, suitably crypto-random mixing, reasonably performant) algorithm. Even theoretical attacks will undoubtedly focus on this skewing, if indeed ChaCha20, or the implementation of it in the kernel, is actually skewing. > As a side note, I noticed that Linux uses weird constants in the > ChaCha20 input for the aforesaid CSPRNG: the ASCII text “expand 32-byte > k”. This looks like a bad choice, but I doubt that it has any security > impact in practice. I assume the opposite - almost always, such constants will and do effect security of the algorithm, AIUI. It may be that this constant is a "more recent/ more recommended" constant to use over the one in "the official ChaCha20 standard" (whatever/wherever that is), or that ChaCha2 says something like "you can generate constants for this constant value in the following way, and it is (or is not, or doesn't matter) recommended to do so. > Anyway, they should have used the constants > recommended by D, J. Bernstein (the designer of ChaCha20). I have not read his ChaCha paper for a long time - I don't remember what he actually recommends in this regard. It is unwise for either of us to express certainty on the matter when what he actually says can be readily looked up. > [1]: 384 bits according to my understanding, since 128 of the 512 bits > feed to ChaCha seem to be fixed to the ASCII “expand 32-byte k”. Good luck,

