On Sep 7, 2013, at 4:13 AM, Jon Callas wrote:
>> Take the plaintext and the ciphertext, and XOR them together.  Does the 
>> result reveal anything about the key or the painttext?
> It better not. That would be a break of amazing simplicity that transcends 
> broken. 
The question is much more subtle than that, getting deep into how to define a 
the security of a cipher.

Consider a very simplified and limited, but standard, way you'd want to state a 
security result:  A Turing machine with an oracle for computing the encryption 
of any input with any key, when given as input the cyphertext and allowed to 
run for time T polynomial in the size of the key, has no more than an 
probability P less than (something depending on the key size) of guessing any 
given bit of the plaintext.  (OK, I fudged on how you want to state the 
probability - writing this stuff in English rather than mathematical symbols 
rapidly becomes unworkable.)  The fundamental piece of that statement is in 
"given as input..." part:  If the input contains the key itself, then obviously 
the machine has no problem at all producing the plaintext!  Similarly, of 
course, if the input contains the plaintext, the machine has an even easier 
time of it.

You can, and people long ago did, strengthen the requirements.  They allow for 
probabilistic machines as an obvious first step.  Beyond that, you want 
semantic security:  Not only shouldn't the attacking machine be unable to get 
an advantage on any particular bit of plaintext; it shouldn't be able to get an 
advantage on, say, the XOR of the first two bits.  Ultimately, you want so say 
that given any boolean function F, the machine's a postiori probability of 
guessing F(cleartext) should be identical (within some bounds) to its a priori 
probability of guessing F(cleartext).  Since it's hard to get a handle on the 
prior probability, another way to say pretty much the same thing is that the 
probability of a correct guess for F(cleartext) is the same whether the machine 
is given the ciphertext, or a random sequence of bits.  If you push this a bit 
further, you get definitions related to indistinguishability:  The machine is 
simply expected to say "the input is the result of apply
 ing the cipher to some plaintext" or "the input is random"; it shouldn't even 
be able to get an advantage on *that* simple question.

This sounds like a very strong security property (and it is) - but it says 
*nothing at all* about the OP's question!  It can't, because the machine *can't 
compute the XOR of the plaintext and the ciphertext*.  If we *give* it that 
information ... we've just given it the plaintext!

I can't, in fact, think of any way to model the OP's question.  The closest I 
can come is:  If E(K,P) defines a strong cipher (with respect to any of the 
variety of definitions out there), does E'(K,P) = E(K,P) XOR P *also* define a 
strong cipher?  One would think the answer is yes, just on general principles: 
To someone who doesn't know K and P, E(K,P) is "indistinguishable from random 
noise", so E'(K,P) should be the same.  And yet there remains the problem that 
it's not a value that can be computed without knowing P, so it doesn't fit into 
the usual definitional/proof frameworks.  Can anyone point to a proof?

The reason I'm not willing to write this off as "obvious" is an actual failure 
in a very different circumstance.  There was work done at DEC SRC many years 
ago on a system that used a fingerprint function to uniquely identify modules.  
The fingerprints were long enough to avoid the birthday paradox, and were 
computed based on the result of a long series of coin tosses whose results were 
baked into the code.  There was a proof that the fingerprint "looked random".  
And yet, fairly soon after the system went into production, collisions started 
to appear.  They were eventually tracked down to a "merge fingerprints" 
operation, which took the fingerprints of two modules and produces a 
fingerprint of the pair by some simple technique like concatenating the inputs 
and fingerprinting that.  Unfortunately, that operation *violated the 
assumptions of the theorem*.  The theorem said that the outputs of the 
fingerprint operation would look random *if chosen "without knowledge" of the 
 n tosses*.  But the inputs were outputs of the same algorithm, hence "had 
knowledge" of the coin tosses.  (And ... I just found the reference to this.  
See ftp://ftp.dec.com/pub/dec/SRC/research-reports/SRC-113.pdf, documentation 
of the Fingerprint interface, page 42.)

                                                        -- Jerry

The cryptography mailing list

Reply via email to