Re: [Cryptography] randomness +- entropy

Jerry Leichter Wed, 06 Nov 2013 01:35:54 -0800

On Nov 4, 2013, at 2:21 PM, John Denker wrote:
> Some people have been throwing around the word "entropy" 
> rather carelessly.
> 
> Entropy means something very special.
Well, yes, and in its technical definitions - either in thermodynamics where it 
arose or information theory where it was imported because of a similarity in 
formalisms - it plays virtually no role in cryptographic discussion.  In 
cryptography, especially when discussing random number generators, the word has 
a loosy-goosy meaning that's somehow tied to lack of predictability - but in an 
unspecified way.  When people make it a regular habit to "guess" the entropy of 
various processes and then go on to build systems based on guesses, you know 
the word has lost any formal meaning it ever had.


While the distinctions you draw are valid and important, I'm afraid "entropy" 
no longer has the capability to distinguish them.

> For a great many cryptological purposes, a high-quality 
> PSEUDO-random distribution is good enough, even though its 
> entropy density is very low.  Note the contrast:
> 
>  TRNG entropy density = 1 - epsilon
>  PRNG entropy density =   epsilon
I don't know what this means.

> As another way of emphasizing the distinction:  a PRNG places 
> orders-of-magnitude harsher demands on the strength of the 
> cryptological primitives it uses.
I don't know what *this* means either.  I drew a distinction in an earlier post 
- a distinction you can find in many papers on cryptographic primitives - 
between random values (unpredictable to the attackers being considered) and 
nonces (never repeated in a given cryptographic context, but there are no 
assumptions about unpredictability).  Where a random n-bit value is specified, 
I can think of no paper that does not assume what we call "n bits of entropy" - 
though a better way to say it is "a value chosen uniformly at random from the 
set of n-bit strings".  Sometimes the value is supposed to be chosen uniformly 
at random from some other set - e.g., from Z/p, i.e., between 0 and p-1.  
Trying to state this in terms of entropy is a losing game - and in fact it 
isn't actually trivial, given a random *bit* source, to produce such a 
distribution.  People have gotten it wrong in the past.  (The obvious technique 
of choosing a k with 2^k > p, choosing a "random" k-bit value, and the
 n reducing it mod p, if described in terms of "entropy", looks fine - I have k 
bits of entropy, which is more than enough to cover the choice I need to make.  
But the output is biased.)

> This can be quantified 
> in terms of classical cryptologic ideas such as unicity 
> distance, but for present purposes I prefer the "entropy 
> density" language.
"Unicity distance" is a nice but different concept - and one with little 
bearing on cryptography today.  (If I draw a cipher E at random from some a 
collection of functions - e.g., by choosing a key - then the unicity distance 
is the unicity distance is just the number of pairs (x, E(x)) that specify E 
uniquely within the set.  I suppose you can stretch this to talk about how many 
samples from the "random" generator are needed to specify *it* uniquely - i.e., 
be able to determine all past and future states - but I've seen the term used 
that way.)  A broader - hence more useful - classical term (which may have been 
introduced by Shannon) is "equivocation".  I don't recall the formal 
definition, but it attempts to capture the uncertainty an attacker has about 
the next plaintext, given all the information he already has.  If I know a 
message was encrypted using a one-time pad, my uncertainty about the first 
character is not absolute - it's much more likely to be "t" than "z".  T
 o provide information-theoretic security, a one-time-pad must contain "enough 
randomness" to keep my a priori and a postiori equivocation equal.  This *is* 
something that can be given in terms of the entropies of the message and 
one-time-pad sources, but the deracinated notion of "entropy" one sees in most 
discussions is way too weak to say anything useful here.)

> The rubber meets the road here:  Consider the contrast:
> 
> PRNG:  I am quite sure that on startup the machine needs to 
>  have on board a crypographically strong, well-seeded PRNG.
>  This needs to be up and running very, very early in the 
>  boot-up process.  Some things that need the PRNG cannot
>  wait.
> 
> TRNG:  At the moment I have no firm opinions as to how much 
>  actual entropy the machine needs on start-up.  I look 
>  forward to having a discussion on this topic, with use-case 
>  scenarios et cetera.
A BBS generator is "indistinguishable" from a true random number generator.  
What's missing from that statement - and from the distinction you're drawing 
above - is a specification of the attack model.

The BBS generator has inherent limitations that are given in its proof of 
correctness:  The attacker has polynomially bounded resources (in fact, 
"attacker" literally means a polynomially-bounded probabilistic TM), which in 
turn implies that it only has access to a polynomially-bounded number of 
outputs.  These are generally fine.  The proof doesn't bother to state (though 
it's implicit) the obvious:  That the attacker doesn't have access to the 
internal state of the generator.  This isn't an "entropy" assumption - the 
internal state must, to the attacker, appear to have been chosen uniformly at 
random from all possible internal states, and is not available to the attacker. 
 If you want to use "entropy-talk", all the entropy in a BBS generator is 
there, at the start, in the choice of the initial state.  And yet there's a 
profound difference between the output of a BBS generator and the output of, 
say, a linear congruential PRNG starting with exactly the same state.  One is pr
 edictable given a few successive samples; the other is secure given any 
polynomially-bounded number of them.  "Entropy" simply cannot capture this 
distinction.  And it's in fact exactly the distinction - in the transition from 
Shannon's classic notions of information-based security to modern 
computability-based ones - that make it possible to say anything useful at all 
about encryption algorithms other than one-time-pads.

While we have no proofs like those for BBS for PRNG's built on practical 
cryptographic primitives, we generally assume that they have similar 
properties.  (More correctly, we can prove some properties given assumptions 
about others.  But those are the same assumptions we make about the actual 
encryption and hashing and signature algorithms we use.  If they fail, the 
whole system falls down regardless of the source of "random" values.)

To summarize:  The distinction between cryptographic PRNG's and "true" RNG's 
has to do with the attack model.  The attacker considered in the former is 
polynomially bounded (OK) and doesn't receive as input some special piece that 
we label "internal state" (this one can be violated).  The attacker against a 
"true" RNG is ... what?  Formalizing that is equivalent to formalizing 
randomness - which no one has managed to do.  (In fact, modern developments of 
probability theory don't even try - random distributions are simply among the 
axioms.  Even more, if you believe John Conway, the "Free Will" Theorem he and 
he and Simon Kochen proved show that "quantum unpredictability" is *stronger* 
than randomness!)  In practical cryptography, this translates into "whatever I 
can imagine".  In my other thread on plausible attacks, I tried to focus on 
limiting the attacker to, well, plausible operations in a particular real-world 
setting.  If you *don't* do that, you can make no progress at
  all.  A hardware generator - even Turbid - is vulnerable if I include as 
plausible an attacker who's infiltrated every supplier of electronic components 
in the world and has slipped a small transmitter into everything built, 
including every diode, resistor - hell, maybe every piece of wire!

I have no problem with designing strong, practical random number generators.  
I'd love to see them deployed more widely.  But clever designs of low-level 
primitives, as important as they are, are not a substitute for *system* 
designs; in fact, they can blind us to the necessary system properties.  If a 
cryptographic primitive needs some unpredictable input, we need to go further 
and ask (a) how much? (b) unpredictable under what attack model?  Only then can 
we begin to answer the *system* design question of "what's a suitable generator 
for such inputs?"  If we can, for a reasonable cost (measured in any 
appropriate way), get our hands on a generator whose attack model assumes way 
more attacker resources than attacks on multiple uses of those values - great, 
let's make use of it.

Too much discussion of "random number generators" is the equivalent of "I'm not 
sure AES is strong enough, so I'll do a ROT-13 encoding first - it can't hurt". 
 And it can't - until you run into Tony Hoare's comment to the effect that "You 
can make a system so simple that you can see at a glance that it's correct, or 
so complex that you can't see at a glance that it's *not* correct."
                                                        -- Jerry



_______________________________________________
The cryptography mailing list
[email protected]
http://www.metzdowd.com/mailman/listinfo/cryptography

Re: [Cryptography] randomness +- entropy

Reply via email to