Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305
On Tue, Sep 10, 2013 at 10:59 PM, William Allen Simpson william.allen.simp...@gmail.com wrote: I suggest: ChaCha20 is run with the given key and sequence number nonce and with the two counter words set to zero. The first 32 bytes of the 64 byte output are saved to become the one-time key for Poly1305. The next 8 bytes of the output are saved to become the per-record input nonce for this ChaCha20 TLS record. Or you could use 16 bytes, and cover all the input fields There's no reason the counter part has to start at 1. Of course, this depends on not having a related-key attack, as mentioned in my previous messages It is the case that most of the bottom row bits will be zero. However, ChaCha20 is assumed to be secure at a 256-bit security level when used as designed, with the bottom row being counters. If ChaCha/Salsa were not secure in this formulation then I think they would have to be abandoned completely. Nobody worries that AES-CTR is weak when the counter starts at zero, right? Taking 8 bytes from the initial block and using it as the nonce for the plaintext encryption would mean that there would be a ~50% chance of a collision after 2^32 blocks. This issue affects AES-GCM, which is why the sequence number is used here. Using 16 bytes from the initial block as the full bottom row would work, but it still assumes that we're working around a broken cipher and it prohibits implementations which pipeline all the ChaCha blocks, including the initial one. That may be usefully faster, although it's not the implementation path that I've taken so far. There is an alternative formulation of Salsa/ChaCha that is designed for random nonces, rather than counters: XSalsa/XChaCha. However, since we have a sequence number already in TLS I've not used it. Cheers AGL ___ The cryptography mailing list cryptography@metzdowd.com http://www.metzdowd.com/mailman/listinfo/cryptography
Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305
[attempt two, because I bounced off the mailing list the first time.] On Tue, Sep 10, 2013 at 9:35 PM, William Allen Simpson william.allen.simp...@gmail.com wrote: ChaCha20 is run with the given key and nonce and with the two counter words set to zero. The first 32 bytes of the 64 byte output are saved to become the one-time key for Poly1305. The remainder of the output is discarded. Why generate the ICV key this way, instead of using a longer key blob from TLS and dividing it? Is there a related-key attack? The keying material from the TLS handshake is per-session information. However, for a polynomial MAC, a unique key is needed per-record and must be secret. Using stream cipher output as MAC key material is a trick taken from [1], although it is likely to have more history than that. (As another example, UMAC/VMAC runs AES-CTR with a separate key to generate the per-record keys, as did Poly1305 in its original paper.) Authenticated decryption is largely the reverse of the encryption process: the Poly1305 key is generated and the authentication tag calculated. The calculated tag is compared against the final 16 bytes of the authenticated ciphertext in constant time. If they match, the remaining ciphertext is decrypted to produce the plaintext. If AEAD, aren't the ICV and cipher text generated in parallel? So how do you check the ICV first, then decipher? The Poly1305 key (ICV in your terms?) is taken from a prefix of the ChaCha20 stream output. Thus the decryption proceeds as: 1) Generate one block of ChaCha20 keystream and use the first 32 bytes as a Poly1305 key. 2) Feed Poly1305 the additional data and ciphertext, with the length prefixing as described in the draft. 3) Verify that the Poly1305 authenticator matches the value in the received record. If not, the record can be rejected immediately. 4) Run ChaCha20, starting with a counter value of one, to decrypt the ciphertext. An alternative implementation is possible where ChaCha20 is run in one go on a buffer that consists of 64 zeros followed by the ciphertext. The advantage of this is that it may be faster because the ChaCha20 blocks can be pipelined. The disadvantage is that it may need memory copies to setup the input buffer correctly. A moot advantage, in the case of TLS, of the steps that I outlined is that forgeries are rejected faster. Needs a bit more implementation details. I assume there's an implementation in the works. (Always helps define things with something concrete.) I currently have Chrome talking to OpenSSL, although the code needs cleanup of course. [1] http://cr.yp.to/highspeed/naclcrypto-20090310.pdf Cheers AGL ___ The cryptography mailing list cryptography@metzdowd.com http://www.metzdowd.com/mailman/listinfo/cryptography
Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305
2013/9/11 William Allen Simpson william.allen.simp...@gmail.com It bugs me that so many of the input words are mostly zero. Using the TLS Sequence Number for the nonce is certainly going to be mostly zero bits. And the block counter is almost all zero bits, as you note, (In the case of the TLS, limits on the plaintext size mean that the first counter word will never overflow in practice.) [...] In my PPP ChaCha variant of this that I started several months ago, the nonce input words were replaced with my usual CBCS formulation. That is, invert the lower 32-bits of the sequence number, xor with the upper 32-bits, add (mod 2**64) both with a 64-bit secret IV, count the bits, and variably rotate. [...] Chacha20 being a stream cipher, the only requirement we have on the ICV is that it doesn't repeat isn't ? This means that if there's a problem with setting 'mostly zeroed out' ICV for Chacha20 we shouldn't use it at all period. As far as your proposition is concerned, the performance penalty seems to largely depend on the target platform. Wouldn't using the same set of operations as Chacha prevent an unexpected performance drop in case of lots of short messages ? Cheers ___ The cryptography mailing list cryptography@metzdowd.com http://www.metzdowd.com/mailman/listinfo/cryptography
Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305
On 9/11/13 6:00 AM, Alexandre Anzala-Yamajako wrote: Chacha20 being a stream cipher, the only requirement we have on the ICV is that it doesn't repeat isn't ? You mean IV, the Initialization Vector. ICV is the Integrity Check Value, usually 32-64 bits appended to the packet. Each is separately keyed. This means that if there's a problem with setting 'mostly zeroed out' ICV for Chacha20 we shouldn't use it at all period. I strongly disagree. In my network protocol security designs, I always try to think about weaknesses in the implementation and potential future attacks on the algorithm -- and try to strengthen the security margin. For example, IP-MAC fills every available zero space with randomness, while H-MAC (defined more than a year later) uses constants instead. IP-MAC was proven stronger than H-MAC. Sadly, in the usual standards committee-itis, newer is often assumed to be improved and better. So H-MAC was adopted instead. Of course, we know that H-MAC was chosen by an NSA mole in the IETF, so I don't trust it. Also, there's a certain silliness in formal cryptology that assumes we shouldn't have longer randomness keying than the formal strength of the algorithm. That might have been true in the days of silk and cyanide, where keying was a hard problem, but modern computing can generate lots of longer nonces without much effort. In reality, adding longer nonces may not improve the strength of the algorithm itself, but it improves the margin against attack. A nearly practical attack of order 2**80 could be converted to an impractical attack of order 2**96 As far as your proposition is concerned, the performance penalty seems to largely depend on the target platform. Wouldn't using the same set of operations as Chacha prevent an unexpected performance drop in case of lots of short messages ? I don't understand this part of your message. My ancient CBCS formulation that I'll probably use for PPP (Xor'ing a per-session key with a per-packet unique value) is demonstrably much faster than using ChaCha itself to do that same thing. We've been using stream ciphers and pseudo-stream ciphers (made by chaining MACs or chaining block ciphers) to create per-packet nonces for as long as I can remember (over 20 years). You'll see that in CHAP and Photuris and CBCS. So I'm not arguing with Adam's use of ChaCha for it. It just bugs me that we aren't filling in as much randomness as we could! ___ The cryptography mailing list cryptography@metzdowd.com http://www.metzdowd.com/mailman/listinfo/cryptography
Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305
On 9/11/13 10:27 AM, Adam Langley wrote: [attempt two, because I bounced off the mailing list the first time.] On Tue, Sep 10, 2013 at 9:35 PM, William Allen Simpson william.allen.simp...@gmail.com wrote: Why generate the ICV key this way, instead of using a longer key blob from TLS and dividing it? Is there a related-key attack? The keying material from the TLS handshake is per-session information. However, for a polynomial MAC, a unique key is needed per-record and must be secret. Thanks, this part I knew, although it would be good explanatory text to add to the draft. I meant a related-key attack against the MAC-key generated by TLS? Thereby causing you to discard it and not key the ICV with it? Using stream cipher output as MAC key material is a trick taken from [1], although it is likely to have more history than that. (As another example, UMAC/VMAC runs AES-CTR with a separate key to generate the per-record keys, as did Poly1305 in its original paper.) Oh sure. We used hashes long ago. Using AES is insane, but then UMAC is -- to be kind -- not very efficient. My old formulation from CBCS was developed during the old IPsec discussions. It's just simpler and faster to xor the per-packet counter with the MAC-key than using the ChaCha cipher itself to generate per-packet key expansion. I was simply wondering about the rationale for doing it yourself. And worrying a little about the extra overhead on back-to-back packets. If AEAD, aren't the ICV and cipher text generated in parallel? So how do you check the ICV first, then decipher? The Poly1305 key (ICV in your terms?) is taken from a prefix of the ChaCha20 stream output. Thus the decryption proceeds as: 1) Generate one block of ChaCha20 keystream and use the first 32 bytes as a Poly1305 key. 2) Feed Poly1305 the additional data and ciphertext, with the length prefixing as described in the draft. 3) Verify that the Poly1305 authenticator matches the value in the received record. If not, the record can be rejected immediately. 4) Run ChaCha20, starting with a counter value of one, to decrypt the ciphertext. ICV = Integrity Check Value at the end of the packet. So ICV-key. Sometimes MAC-key. Anyway, good explanation! Please add it to the draft. An alternative implementation is possible where ChaCha20 is run in one go on a buffer that consists of 64 zeros followed by the ciphertext. The advantage of this is that it may be faster because the ChaCha20 blocks can be pipelined. The disadvantage is that it may need memory copies to setup the input buffer correctly. A moot advantage, in the case of TLS, of the steps that I outlined is that forgeries are rejected faster. Depends on how swamped the processor. I'm a big fan of rejecting forgeries (and replay attacks) before decrypting. Not everybody is Google with unlimited processing power. ;) Needs a bit more implementation details. I assume there's an implementation in the works. (Always helps define things with something concrete.) I currently have Chrome talking to OpenSSL, although the code needs cleanup of course. Excellent ___ The cryptography mailing list cryptography@metzdowd.com http://www.metzdowd.com/mailman/listinfo/cryptography
Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305
On 9/11/13 10:37 AM, Adam Langley wrote: On Tue, Sep 10, 2013 at 10:59 PM, William Allen Simpson william.allen.simp...@gmail.com wrote: Or you could use 16 bytes, and cover all the input fields There's no reason the counter part has to start at 1. It is the case that most of the bottom row bits will be zero. However, ChaCha20 is assumed to be secure at a 256-bit security level when used as designed, with the bottom row being counters. If ChaCha/Salsa were not secure in this formulation then I think they would have to be abandoned completely. I kinda covered this in a previous message. No, we should design with the expectation that there's something wrong with every cipher (and every implementation), and strengthen it as best we know how. It's the same principle we learned (often the hard way) in school: * Software designers, assume the hardware has intermittent failures. * Hardware designers, assume the software has intermittent failures. Taking 8 bytes from the initial block and using it as the nonce for the plaintext encryption would mean that there would be a ~50% chance of a collision after 2^32 blocks. This issue affects AES-GCM, which is why the sequence number is used here. Sorry, you're correct there -- my mind is often still thinking of DES with its unicity distance of 2**32, so you had to re-key anyway. Using 16 bytes from the initial block as the full bottom row would work, but it still assumes that we're working around a broken cipher and it prohibits implementations which pipeline all the ChaCha blocks, including the initial one. That may be usefully faster, although it's not the implementation path that I've taken so far. OK. I see the pipeline stall. But does poly1305 pipeline anyway? There is an alternative formulation of Salsa/ChaCha that is designed for random nonces, rather than counters: XSalsa/XChaCha. However, since we have a sequence number already in TLS I've not used it. Aha, I hadn't found this (XSalsa, there doesn't seem to be an XChaCha). Good reading, and some of the same points I was trying to make here. ___ The cryptography mailing list cryptography@metzdowd.com http://www.metzdowd.com/mailman/listinfo/cryptography
Re: [Cryptography] Evaluating draft-agl-tls-chacha20poly1305
It bugs me that so many of the input words are mostly zero. Using the TLS Sequence Number for the nonce is certainly going to be mostly zero bits. And the block counter is almost all zero bits, as you note, (In the case of the TLS, limits on the plaintext size mean that the first counter word will never overflow in practice.) Heck, since the average IP packet length is 43, the average TLS record is likely shorter than that! At least half the connection directions, it's going to be rare that the counter itself exceeds 1! In my PPP ChaCha variant of this that I started several months ago, the nonce input words were replaced with my usual CBCS formulation. That is, invert the lower 32-bits of the sequence number, xor with the upper 32-bits, add (mod 2**64) both with a 64-bit secret IV, count the bits, and variably rotate. This gives more diffusion, at least 2 bits change for every packet, ensure a bit changes in the first 32-bits (highly predictable and vulnerable), and varies the bits affected among 64 positions. Note that I use a secret IV, a cipher key, and an ICV key for CBCS. However, to adapt your current formulation for making your ICV key, ChaCha20 is run with the given key and nonce and with the two counter words set to zero. The first 32 bytes of the 64 byte output are saved to become the one-time key for Poly1305. The remainder of the output is discarded. I suggest: ChaCha20 is run with the given key and sequence number nonce and with the two counter words set to zero. The first 32 bytes of the 64 byte output are saved to become the one-time key for Poly1305. The next 8 bytes of the output are saved to become the per-record input nonce for this ChaCha20 TLS record. Or you could use 16 bytes, and cover all the input fields There's no reason the counter part has to start at 1. Of course, this depends on not having a related-key attack, as mentioned in my previous message. ___ The cryptography mailing list cryptography@metzdowd.com http://www.metzdowd.com/mailman/listinfo/cryptography