Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305

2013-09-11 Thread Adam Langley
On Tue, Sep 10, 2013 at 10:59 PM, William Allen Simpson
william.allen.simp...@gmail.com wrote:
 I suggest:

ChaCha20 is run with the given key and sequence number nonce and with

the two counter words set to zero.  The first 32 bytes of the 64 byte
output are saved to become the one-time key for Poly1305.  The next 8
bytes of the output are saved to become the per-record input nonce
for this ChaCha20 TLS record.

 Or you could use 16 bytes, and cover all the input fields  There's no
 reason the counter part has to start at 1.

 Of course, this depends on not having a related-key attack, as mentioned
 in my previous messages

It is the case that most of the bottom row bits will be zero. However,
ChaCha20 is assumed to be secure at a 256-bit security level when used
as designed, with the bottom row being counters. If ChaCha/Salsa were
not secure in this formulation then I think they would have to be
abandoned completely.

Nobody worries that AES-CTR is weak when the counter starts at zero, right?

Taking 8 bytes from the initial block and using it as the nonce for
the plaintext encryption would mean that there would be a ~50% chance
of a collision after 2^32 blocks. This issue affects AES-GCM, which is
why the sequence number is used here.

Using 16 bytes from the initial block as the full bottom row would
work, but it still assumes that we're working around a broken cipher
and it prohibits implementations which pipeline all the ChaCha blocks,
including the initial one. That may be usefully faster, although it's
not the implementation path that I've taken so far.

There is an alternative formulation of Salsa/ChaCha that is designed
for random nonces, rather than counters: XSalsa/XChaCha. However,
since we have a sequence number already in TLS I've not used it.


Cheers

AGL
___
The cryptography mailing list
cryptography@metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography


Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305

2013-09-11 Thread Adam Langley
[attempt two, because I bounced off the mailing list the first time.]

On Tue, Sep 10, 2013 at 9:35 PM, William Allen Simpson
william.allen.simp...@gmail.com wrote:
ChaCha20 is run with the given key and nonce and with the two counter
words set to zero.  The first 32 bytes of the 64 byte output are
saved to become the one-time key for Poly1305.  The remainder of the
output is discarded.

 Why generate the ICV key this way, instead of using a longer key blob
 from TLS and dividing it?  Is there a related-key attack?

The keying material from the TLS handshake is per-session information.
However, for a polynomial MAC, a unique key is needed per-record and
must be secret. Using stream cipher output as MAC key material is a
trick taken from [1], although it is likely to have more history than
that. (As another example, UMAC/VMAC runs AES-CTR with a separate key
to generate the per-record keys, as did Poly1305 in its original
paper.)

Authenticated decryption is largely the reverse of the encryption
process: the Poly1305 key is generated and the authentication tag
calculated.  The calculated tag is compared against the final 16
bytes of the authenticated ciphertext in constant time.  If they
match, the remaining ciphertext is decrypted to produce the
plaintext.

 If AEAD, aren't the ICV and cipher text generated in parallel?  So how do
 you check the ICV first, then decipher?

The Poly1305 key (ICV in your terms?) is taken from a prefix of the
ChaCha20 stream output. Thus the decryption proceeds as:

1) Generate one block of ChaCha20 keystream and use the first 32 bytes
as a Poly1305 key.
2) Feed Poly1305 the additional data and ciphertext, with the length
prefixing as described in the draft.
3) Verify that the Poly1305 authenticator matches the value in the
received record. If not, the record can be rejected immediately.
4) Run ChaCha20, starting with a counter value of one, to decrypt the
ciphertext.

An alternative implementation is possible where ChaCha20 is run in one
go on a buffer that consists of 64 zeros followed by the ciphertext.
The advantage of this is that it may be faster because the ChaCha20
blocks can be pipelined. The disadvantage is that it may need memory
copies to setup the input buffer correctly. A moot advantage, in the
case of TLS, of the steps that I outlined is that forgeries are
rejected faster.

 Needs a bit more implementation details.  I assume there's an
 implementation in the works.  (Always helps define things with
 something concrete.)

I currently have Chrome talking to OpenSSL, although the code needs
cleanup of course.

[1] http://cr.yp.to/highspeed/naclcrypto-20090310.pdf


Cheers

AGL
___
The cryptography mailing list
cryptography@metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography


Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305

2013-09-11 Thread Alexandre Anzala-Yamajako
2013/9/11 William Allen Simpson william.allen.simp...@gmail.com

 It bugs me that so many of the input words are mostly zero.  Using the
 TLS Sequence Number for the nonce is certainly going to be mostly zero
 bits.  And the block counter is almost all zero bits, as you note,

(In the case of the TLS, limits on the plaintext size mean that the
first counter word will never overflow in practice.)

 [...]



 In my PPP ChaCha variant of this that I started several months ago, the
 nonce input words were replaced with my usual CBCS formulation.  That is,
invert the lower 32-bits of the sequence number,
xor with the upper 32-bits,
add (mod 2**64) both with a 64-bit secret IV,
count the bits, and
variably rotate.
 [...]


Chacha20 being  a stream cipher, the only requirement we have on the ICV is
that it doesn't repeat isn't ?
This means that if there's a problem with setting 'mostly zeroed out' ICV
for Chacha20 we shouldn't use it at all period.
As far as your proposition is concerned, the performance penalty seems to
largely depend on the target platform. Wouldn't using the same set of
operations as Chacha prevent an unexpected performance drop in case of lots
of short messages ?

Cheers
___
The cryptography mailing list
cryptography@metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography

Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305

2013-09-11 Thread William Allen Simpson

On 9/11/13 6:00 AM, Alexandre Anzala-Yamajako wrote:

Chacha20 being  a stream cipher, the only requirement we have on the ICV is 
that it doesn't repeat isn't ?


You mean IV, the Initialization Vector.  ICV is the Integrity Check Value,
usually 32-64 bits appended to the packet.  Each is separately keyed.



This means that if there's a problem with setting 'mostly zeroed out' ICV for 
Chacha20 we shouldn't use it at all period.


I strongly disagree.  In my network protocol security designs, I always
try to think about weaknesses in the implementation and potential future
attacks on the algorithm -- and try to strengthen the security margin.

For example, IP-MAC fills every available zero space with randomness,
while H-MAC (defined more than a year later) uses constants instead.
IP-MAC was proven stronger than H-MAC.

Sadly, in the usual standards committee-itis, newer is often assumed to
be improved and better.  So H-MAC was adopted instead.  Of course, we
know that H-MAC was chosen by an NSA mole in the IETF, so I don't trust it.

Also, there's a certain silliness in formal cryptology that assumes we
shouldn't have longer randomness keying than the formal strength of the
algorithm.  That might have been true in the days of silk and cyanide,
where keying was a hard problem, but modern computing can generate lots of
longer nonces without much effort.

In reality, adding longer nonces may not improve the strength of the
algorithm itself, but it improves the margin against attack.  A nearly
practical attack of order 2**80 could be converted to an impractical
attack of order 2**96



As far as your proposition is concerned, the performance penalty seems to 
largely depend on the target platform. Wouldn't using the same set of 
operations as Chacha prevent an unexpected performance drop in case of lots of 
short messages ?


I don't understand this part of your message.  My ancient CBCS
formulation that I'll probably use for PPP (Xor'ing a per-session key
with a per-packet unique value) is demonstrably much faster than using
ChaCha itself to do that same thing.

We've been using stream ciphers and pseudo-stream ciphers (made by
chaining MACs or chaining block ciphers) to create per-packet nonces
for as long as I can remember (over 20 years).  You'll see that in CHAP
and Photuris and CBCS.

So I'm not arguing with Adam's use of ChaCha for it.  It just bugs me
that we aren't filling in as much randomness as we could!

___
The cryptography mailing list
cryptography@metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography


Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305

2013-09-11 Thread William Allen Simpson

On 9/11/13 10:27 AM, Adam Langley wrote:

[attempt two, because I bounced off the mailing list the first time.]

On Tue, Sep 10, 2013 at 9:35 PM, William Allen Simpson
william.allen.simp...@gmail.com wrote:

Why generate the ICV key this way, instead of using a longer key blob
from TLS and dividing it?  Is there a related-key attack?


The keying material from the TLS handshake is per-session information.
However, for a polynomial MAC, a unique key is needed per-record and
must be secret.


Thanks, this part I knew, although it would be good explanatory text to
add to the draft.

I meant a related-key attack against the MAC-key generated by TLS?

Thereby causing you to discard it and not key the ICV with it?


Using stream cipher output as MAC key material is a
trick taken from [1], although it is likely to have more history than
that. (As another example, UMAC/VMAC runs AES-CTR with a separate key
to generate the per-record keys, as did Poly1305 in its original
paper.)


Oh sure.  We used hashes long ago.  Using AES is insane, but then
UMAC is -- to be kind -- not very efficient.

My old formulation from CBCS was developed during the old IPsec
discussions.  It's just simpler and faster to xor the per-packet counter
with the MAC-key than using the ChaCha cipher itself to generate
per-packet key expansion.

I was simply wondering about the rationale for doing it yourself.  And
worrying a little about the extra overhead on back-to-back packets.



If AEAD, aren't the ICV and cipher text generated in parallel?  So how do
you check the ICV first, then decipher?


The Poly1305 key (ICV in your terms?) is taken from a prefix of the
ChaCha20 stream output. Thus the decryption proceeds as:

1) Generate one block of ChaCha20 keystream and use the first 32 bytes
as a Poly1305 key.
2) Feed Poly1305 the additional data and ciphertext, with the length
prefixing as described in the draft.
3) Verify that the Poly1305 authenticator matches the value in the
received record. If not, the record can be rejected immediately.
4) Run ChaCha20, starting with a counter value of one, to decrypt the
ciphertext.


ICV = Integrity Check Value at the end of the packet.  So ICV-key.
Sometimes MAC-key.

Anyway, good explanation!  Please add it to the draft.



An alternative implementation is possible where ChaCha20 is run in one
go on a buffer that consists of 64 zeros followed by the ciphertext.
The advantage of this is that it may be faster because the ChaCha20
blocks can be pipelined. The disadvantage is that it may need memory
copies to setup the input buffer correctly. A moot advantage, in the
case of TLS, of the steps that I outlined is that forgeries are
rejected faster.


Depends on how swamped the processor.  I'm a big fan of rejecting
forgeries (and replay attacks) before decrypting.  Not everybody is
Google with unlimited processing power. ;)



Needs a bit more implementation details.  I assume there's an
implementation in the works.  (Always helps define things with
something concrete.)


I currently have Chrome talking to OpenSSL, although the code needs
cleanup of course.


Excellent

___
The cryptography mailing list
cryptography@metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography


Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305

2013-09-11 Thread William Allen Simpson

On 9/11/13 10:37 AM, Adam Langley wrote:

On Tue, Sep 10, 2013 at 10:59 PM, William Allen Simpson
william.allen.simp...@gmail.com wrote:

Or you could use 16 bytes, and cover all the input fields  There's no
reason the counter part has to start at 1.


It is the case that most of the bottom row bits will be zero. However,
ChaCha20 is assumed to be secure at a 256-bit security level when used
as designed, with the bottom row being counters. If ChaCha/Salsa were
not secure in this formulation then I think they would have to be
abandoned completely.


I kinda covered this in a previous message.  No, we should design with
the expectation that there's something wrong with every cipher (and
every implementation), and strengthen it as best we know how.

It's the same principle we learned (often the hard way) in school:
 * Software designers, assume the hardware has intermittent failures.
 * Hardware designers, assume the software has intermittent failures.



Taking 8 bytes from the initial block and using it as the nonce for
the plaintext encryption would mean that there would be a ~50% chance
of a collision after 2^32 blocks. This issue affects AES-GCM, which is
why the sequence number is used here.


Sorry, you're correct there -- my mind is often still thinking of DES
with its unicity distance of 2**32, so you had to re-key anyway.



Using 16 bytes from the initial block as the full bottom row would
work, but it still assumes that we're working around a broken cipher
and it prohibits implementations which pipeline all the ChaCha blocks,
including the initial one. That may be usefully faster, although it's
not the implementation path that I've taken so far.


OK.  I see the pipeline stall.  But does poly1305 pipeline anyway?



There is an alternative formulation of Salsa/ChaCha that is designed
for random nonces, rather than counters: XSalsa/XChaCha. However,
since we have a sequence number already in TLS I've not used it.


Aha, I hadn't found this (XSalsa, there doesn't seem to be an XChaCha).
Good reading, and some of the same points I was trying to make here.

___
The cryptography mailing list
cryptography@metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography


Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305

2013-09-10 Thread William Allen Simpson

It bugs me that so many of the input words are mostly zero.  Using the
TLS Sequence Number for the nonce is certainly going to be mostly zero
bits.  And the block counter is almost all zero bits, as you note,

   (In the case of the TLS, limits on the plaintext size mean that the
   first counter word will never overflow in practice.)

Heck, since the average IP packet length is 43, the average TLS record
is likely shorter than that!  At least half the connection directions,
it's going to be rare that the counter itself exceeds 1!

In my PPP ChaCha variant of this that I started several months ago, the
nonce input words were replaced with my usual CBCS formulation.  That is,
   invert the lower 32-bits of the sequence number,
   xor with the upper 32-bits,
   add (mod 2**64) both with a 64-bit secret IV,
   count the bits, and
   variably rotate.

This gives more diffusion, at least 2 bits change for every packet,
ensure a bit changes in the first 32-bits (highly predictable and
vulnerable), and varies the bits affected among 64 positions.

Note that I use a secret IV, a cipher key, and an ICV key for CBCS.

However, to adapt your current formulation for making your ICV key,

   ChaCha20 is run with the given key and nonce and with the two counter
   words set to zero.  The first 32 bytes of the 64 byte output are
   saved to become the one-time key for Poly1305.  The remainder of the
   output is discarded.

I suggest:

   ChaCha20 is run with the given key and sequence number nonce and with
   the two counter words set to zero.  The first 32 bytes of the 64 byte
   output are saved to become the one-time key for Poly1305.  The next 8
   bytes of the output are saved to become the per-record input nonce
   for this ChaCha20 TLS record.

Or you could use 16 bytes, and cover all the input fields  There's no
reason the counter part has to start at 1.

Of course, this depends on not having a related-key attack, as mentioned
in my previous message.

___
The cryptography mailing list
cryptography@metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography