Apologies for crossposting but I am finding it impossible to follow the same conversation with the same set of people split across five different lists.
So as I tried to understand the difference between ML-DSA pure and pre-hashed, I realized that we might have a problem in the use of Ed448 and Ed25519 *IN APPLICATIONS* and so I want to explain what I think is a possible hole in some of them. The concern is a downgrade attack where the attacker substitutes a weak digest for a strong one. Some people seem to think this is an implausible attack but it really is not because there is no way for the signer to control the set of digests accepted by the verifier. For example, Alice signs a PDF document and sends it to Bob who checks it using an application that Mallet has added his own digest verifier to. Yes, this assumes a degree of platform compromise but so does every privilege escalation attack and those are something we worry about A LOT. The reason we spent so much time over RSA signature modes was precisely because this is a critical security concern and the RSA padding matters. So the rule is that when you have a signature over a digest value, you MUST always sign the data itself or a manifest that includes the digest value and the digest algorithm. The problem comes in the definition of 'data' because if you are looking at the problem from the application side, well, the digest is data, isn't it? And I am pretty sure this is a problem because we seem to get into a lot of confused discussions whenever the topic is raised. ML-DSA offers two signature mechanisms, 'Pure and preHashed. Pure is the version to use when you are signing the data itself or a manifest such as the one specified in OpenPGP. Jose has the JWS Protected Header and so on. Yes, we do tell people not to roll their own crypto, but that doesn't mean we should make design choices that include landmines because we don't know who is going to walk on 'em. The problem in my view lies with the 'prehashed' version where ML-DSA specifies a drop in replacement for the RSA signature API. RSA can only sign a short message and as Rogaway pointed out, the encryption strength varies over the payload. So we have to use one of the manifest formats specified by a version of PKCS 1. ML-DSA does the job right. A program that was using RSA for signature can easily switch to ML-KEM, all they need to do is to use ML-KEM Prehashed which has the exact same interface as RSA Sign: Message Digest OID, Message Digest Value. Ed448 and Ed25519 specify 'preHash modes' but they do not support the RSA API. They insist that the implementation use a particular digest. Ed25519 isn't that bad because the hash is SHA-2-512 which is the only hash I use for content anyway. But Ed448 uses SHAKE256 which isn't a hash you would use for content except when doing Ed448. The 'Prehash modes of Ed448 and Ed25519 do solve the problem of avoiding the need to stream all the data being signed through the HSM. But from my perspective as the designer of cryptographic protocols, telling me which digest algorithm I am going to use for content is not the job of the signature algorithm designer. The choice of content digest is MY choice and I have to think about other signatures I might be making over the same content. So what I propose is we specify a proper prehash mode for Ed25519 and Ed448 and any other algorithm that doesn't provide binding of a chosen digest to the algorithm used to produce it by specifying a manifest format that can be applied generically. Does not have to be fancy, in fact I would just take the ML-DSA construction: 𝑀′ ← BytesToBits(IntegerToBytes(1, 1) ∥ IntegerToBytes(|𝑐𝑡𝑥|, 1) ∥ 𝑐𝑡𝑥 ∥ OID ∥ PH𝑀) And the only thing I would change there is to replace the ML-DSA version identifier with an OID off an IETF arc to denote the manifest construction. A big advantage of this approach is that we get the context input added into our scheme as well. One area where I disagree with FIPS204 is that I don't think it is a problem to share keys across pre and pure. If that is in fact a problem it is because we have done it all wrong. The real issue is sharing keys across different applications making a semantic substitution attack possible. And the problem there is that we need to come up with a theory of how to use the Context parameter consistently. I will post on that separately.
_______________________________________________ jose mailing list -- [email protected] To unsubscribe send an email to [email protected]
