# Re: [Cryptography] XORing plaintext with ciphertext

On Sep 7, 2013, at 4:13 AM, Jon Callas wrote:
>> Take the plaintext and the ciphertext, and XOR them together.  Does the
>> result reveal anything about the key or the painttext?
>
> It better not. That would be a break of amazing simplicity that transcends
> broken.
The question is much more subtle than that, getting deep into how to define a
the security of a cipher.

Consider a very simplified and limited, but standard, way you'd want to state a
security result:  A Turing machine with an oracle for computing the encryption
of any input with any key, when given as input the cyphertext and allowed to
run for time T polynomial in the size of the key, has no more than an
probability P less than (something depending on the key size) of guessing any
given bit of the plaintext.  (OK, I fudged on how you want to state the
probability - writing this stuff in English rather than mathematical symbols
rapidly becomes unworkable.)  The fundamental piece of that statement is in
"given as input..." part:  If the input contains the key itself, then obviously
the machine has no problem at all producing the plaintext!  Similarly, of
course, if the input contains the plaintext, the machine has an even easier
time of it.

You can, and people long ago did, strengthen the requirements.  They allow for
probabilistic machines as an obvious first step.  Beyond that, you want
semantic security:  Not only shouldn't the attacking machine be unable to get
an advantage on any particular bit of plaintext; it shouldn't be able to get an
advantage on, say, the XOR of the first two bits.  Ultimately, you want so say
that given any boolean function F, the machine's a postiori probability of
guessing F(cleartext) should be identical (within some bounds) to its a priori
probability of guessing F(cleartext).  Since it's hard to get a handle on the
prior probability, another way to say pretty much the same thing is that the
probability of a correct guess for F(cleartext) is the same whether the machine
is given the ciphertext, or a random sequence of bits.  If you push this a bit
further, you get definitions related to indistinguishability:  The machine is
simply expected to say "the input is the result of apply
ing the cipher to some plaintext" or "the input is random"; it shouldn't even
be able to get an advantage on *that* simple question.

This sounds like a very strong security property (and it is) - but it says
*nothing at all* about the OP's question!  It can't, because the machine *can't
compute the XOR of the plaintext and the ciphertext*.  If we *give* it that
information ... we've just given it the plaintext!

I can't, in fact, think of any way to model the OP's question.  The closest I
can come is:  If E(K,P) defines a strong cipher (with respect to any of the
variety of definitions out there), does E'(K,P) = E(K,P) XOR P *also* define a
strong cipher?  One would think the answer is yes, just on general principles:
To someone who doesn't know K and P, E(K,P) is "indistinguishable from random
noise", so E'(K,P) should be the same.  And yet there remains the problem that
it's not a value that can be computed without knowing P, so it doesn't fit into
the usual definitional/proof frameworks.  Can anyone point to a proof?

The reason I'm not willing to write this off as "obvious" is an actual failure
in a very different circumstance.  There was work done at DEC SRC many years
ago on a system that used a fingerprint function to uniquely identify modules.
The fingerprints were long enough to avoid the birthday paradox, and were
computed based on the result of a long series of coin tosses whose results were
baked into the code.  There was a proof that the fingerprint "looked random".
And yet, fairly soon after the system went into production, collisions started
to appear.  They were eventually tracked down to a "merge fingerprints"
operation, which took the fingerprints of two modules and produces a
fingerprint of the pair by some simple technique like concatenating the inputs
and fingerprinting that.  Unfortunately, that operation *violated the
assumptions of the theorem*.  The theorem said that the outputs of the
fingerprint operation would look random *if chosen "without knowledge" of the
coi
n tosses*.  But the inputs were outputs of the same algorithm, hence "had
knowledge" of the coin tosses.  (And ... I just found the reference to this.
See ftp://ftp.dec.com/pub/dec/SRC/research-reports/SRC-113.pdf, documentation
of the Fingerprint interface, page 42.)

-- Jerry

_______________________________________________
The cryptography mailing list
cryptography@metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography