On Nov 4, 2013, at 2:21 PM, John Denker wrote:
> Some people have been throwing around the word "entropy"
> rather carelessly.
>
> Entropy means something very special.
Well, yes, and in its technical definitions - either in thermodynamics where it
arose or information theory where it was imported because of a similarity in
formalisms - it plays virtually no role in cryptographic discussion. In
cryptography, especially when discussing random number generators, the word has
a loosy-goosy meaning that's somehow tied to lack of predictability - but in an
unspecified way. When people make it a regular habit to "guess" the entropy of
various processes and then go on to build systems based on guesses, you know
the word has lost any formal meaning it ever had.
While the distinctions you draw are valid and important, I'm afraid "entropy"
no longer has the capability to distinguish them.
> For a great many cryptological purposes, a high-quality
> PSEUDO-random distribution is good enough, even though its
> entropy density is very low. Note the contrast:
>
> TRNG entropy density = 1 - epsilon
> PRNG entropy density = epsilon
I don't know what this means.
> As another way of emphasizing the distinction: a PRNG places
> orders-of-magnitude harsher demands on the strength of the
> cryptological primitives it uses.
I don't know what *this* means either. I drew a distinction in an earlier post
- a distinction you can find in many papers on cryptographic primitives -
between random values (unpredictable to the attackers being considered) and
nonces (never repeated in a given cryptographic context, but there are no
assumptions about unpredictability). Where a random n-bit value is specified,
I can think of no paper that does not assume what we call "n bits of entropy" -
though a better way to say it is "a value chosen uniformly at random from the
set of n-bit strings". Sometimes the value is supposed to be chosen uniformly
at random from some other set - e.g., from Z/p, i.e., between 0 and p-1.
Trying to state this in terms of entropy is a losing game - and in fact it
isn't actually trivial, given a random *bit* source, to produce such a
distribution. People have gotten it wrong in the past. (The obvious technique
of choosing a k with 2^k > p, choosing a "random" k-bit value, and the
n reducing it mod p, if described in terms of "entropy", looks fine - I have k
bits of entropy, which is more than enough to cover the choice I need to make.
But the output is biased.)
> This can be quantified
> in terms of classical cryptologic ideas such as unicity
> distance, but for present purposes I prefer the "entropy
> density" language.
"Unicity distance" is a nice but different concept - and one with little
bearing on cryptography today. (If I draw a cipher E at random from some a
collection of functions - e.g., by choosing a key - then the unicity distance
is the unicity distance is just the number of pairs (x, E(x)) that specify E
uniquely within the set. I suppose you can stretch this to talk about how many
samples from the "random" generator are needed to specify *it* uniquely - i.e.,
be able to determine all past and future states - but I've seen the term used
that way.) A broader - hence more useful - classical term (which may have been
introduced by Shannon) is "equivocation". I don't recall the formal
definition, but it attempts to capture the uncertainty an attacker has about
the next plaintext, given all the information he already has. If I know a
message was encrypted using a one-time pad, my uncertainty about the first
character is not absolute - it's much more likely to be "t" than "z". T
o provide information-theoretic security, a one-time-pad must contain "enough
randomness" to keep my a priori and a postiori equivocation equal. This *is*
something that can be given in terms of the entropies of the message and
one-time-pad sources, but the deracinated notion of "entropy" one sees in most
discussions is way too weak to say anything useful here.)
> The rubber meets the road here: Consider the contrast:
>
> PRNG: I am quite sure that on startup the machine needs to
> have on board a crypographically strong, well-seeded PRNG.
> This needs to be up and running very, very early in the
> boot-up process. Some things that need the PRNG cannot
> wait.
>
> TRNG: At the moment I have no firm opinions as to how much
> actual entropy the machine needs on start-up. I look
> forward to having a discussion on this topic, with use-case
> scenarios et cetera.
A BBS generator is "indistinguishable" from a true random number generator.
What's missing from that statement - and from the distinction you're drawing
above - is a specification of the attack model.
The BBS generator has inherent limitations that are given in its proof of
correctness: The attacker has polynomially bounded resources (in fact,
"attacker" literally means a polynomially-bounded probabilistic TM), which in
turn implies that it only has access to a polynomially-bounded number of
outputs. These are generally fine. The proof doesn't bother to state (though
it's implicit) the obvious: That the attacker doesn't have access to the
internal state of the generator. This isn't an "entropy" assumption - the
internal state must, to the attacker, appear to have been chosen uniformly at
random from all possible internal states, and is not available to the attacker.
If you want to use "entropy-talk", all the entropy in a BBS generator is
there, at the start, in the choice of the initial state. And yet there's a
profound difference between the output of a BBS generator and the output of,
say, a linear congruential PRNG starting with exactly the same state. One is pr
edictable given a few successive samples; the other is secure given any
polynomially-bounded number of them. "Entropy" simply cannot capture this
distinction. And it's in fact exactly the distinction - in the transition from
Shannon's classic notions of information-based security to modern
computability-based ones - that make it possible to say anything useful at all
about encryption algorithms other than one-time-pads.
While we have no proofs like those for BBS for PRNG's built on practical
cryptographic primitives, we generally assume that they have similar
properties. (More correctly, we can prove some properties given assumptions
about others. But those are the same assumptions we make about the actual
encryption and hashing and signature algorithms we use. If they fail, the
whole system falls down regardless of the source of "random" values.)
To summarize: The distinction between cryptographic PRNG's and "true" RNG's
has to do with the attack model. The attacker considered in the former is
polynomially bounded (OK) and doesn't receive as input some special piece that
we label "internal state" (this one can be violated). The attacker against a
"true" RNG is ... what? Formalizing that is equivalent to formalizing
randomness - which no one has managed to do. (In fact, modern developments of
probability theory don't even try - random distributions are simply among the
axioms. Even more, if you believe John Conway, the "Free Will" Theorem he and
he and Simon Kochen proved show that "quantum unpredictability" is *stronger*
than randomness!) In practical cryptography, this translates into "whatever I
can imagine". In my other thread on plausible attacks, I tried to focus on
limiting the attacker to, well, plausible operations in a particular real-world
setting. If you *don't* do that, you can make no progress at
all. A hardware generator - even Turbid - is vulnerable if I include as
plausible an attacker who's infiltrated every supplier of electronic components
in the world and has slipped a small transmitter into everything built,
including every diode, resistor - hell, maybe every piece of wire!
I have no problem with designing strong, practical random number generators.
I'd love to see them deployed more widely. But clever designs of low-level
primitives, as important as they are, are not a substitute for *system*
designs; in fact, they can blind us to the necessary system properties. If a
cryptographic primitive needs some unpredictable input, we need to go further
and ask (a) how much? (b) unpredictable under what attack model? Only then can
we begin to answer the *system* design question of "what's a suitable generator
for such inputs?" If we can, for a reasonable cost (measured in any
appropriate way), get our hands on a generator whose attack model assumes way
more attacker resources than attacks on multiple uses of those values - great,
let's make use of it.
Too much discussion of "random number generators" is the equivalent of "I'm not
sure AES is strong enough, so I'll do a ROT-13 encoding first - it can't hurt".
And it can't - until you run into Tony Hoare's comment to the effect that "You
can make a system so simple that you can see at a glance that it's correct, or
so complex that you can't see at a glance that it's *not* correct."
-- Jerry
_______________________________________________
The cryptography mailing list
[email protected]
http://www.metzdowd.com/mailman/listinfo/cryptography