Apologies for crossposting but I am finding it impossible to follow the
same conversation with the same set of people split across five different
lists.


So as I tried to understand the difference between ML-DSA pure and
pre-hashed, I realized that we might have a problem in the use of Ed448 and
Ed25519 *IN APPLICATIONS* and so I want to explain what I think is a
possible hole in some of them.

The concern is a downgrade attack where the attacker substitutes a weak
digest for a strong one. Some people seem to think this is an implausible
attack but it really is not because there is no way for the signer to
control the set of digests accepted by the verifier.

For example, Alice signs a PDF document and sends it to Bob who checks it
using an application that Mallet has added his own digest verifier to. Yes,
this assumes a degree of platform compromise but so does every
privilege escalation attack and those are something we worry about A LOT.
The reason we spent so much time over RSA signature modes was precisely
because this is a critical security concern and the RSA padding matters.

So the rule is that when you have a signature over a digest value, you MUST
always sign the data itself or a manifest that includes the digest value
and the digest algorithm.


The problem comes in the definition of 'data' because if you are looking at
the problem from the application side, well, the digest is data, isn't it?
And I am pretty sure this is a problem because we seem to get into a lot of
confused discussions whenever the topic is raised.

ML-DSA offers two signature mechanisms, 'Pure and preHashed. Pure is the
version to use when you are signing the data itself or a manifest such as
the one specified in OpenPGP.  Jose has the  JWS Protected Header and so
on. Yes, we do tell people not to roll their own crypto, but that doesn't
mean we should make design choices that include landmines because we don't
know who is going to walk on 'em.


The problem in my view lies with the 'prehashed' version where ML-DSA
specifies a drop in replacement for the RSA signature API. RSA can only
sign a short message and as Rogaway pointed out, the encryption strength
varies over the payload. So we have to use one of the manifest formats
specified by a version of PKCS 1.

ML-DSA does the job right. A program that was using RSA for signature can
easily switch to ML-KEM, all they need to do is to use ML-KEM
Prehashed which has the exact same interface as RSA Sign: Message Digest
OID, Message Digest Value.

Ed448 and Ed25519 specify 'preHash modes' but they do not support the RSA
API. They insist that the implementation use a particular digest. Ed25519
isn't that bad because the hash is SHA-2-512 which is the only hash I use
for content anyway. But Ed448 uses SHAKE256 which isn't a hash you would
use for content except when doing Ed448.

The 'Prehash modes of Ed448 and Ed25519 do solve the problem of avoiding
the need to stream all the data being signed through the HSM. But from my
perspective as the designer of cryptographic protocols, telling me which
digest algorithm I am going to use for content is not the job of the
signature algorithm designer. The choice of content digest is MY choice and
I have to think about other signatures I might be making over the same
content.


So what I propose is we specify a proper prehash mode for Ed25519 and Ed448
and any other algorithm that doesn't provide binding of a chosen digest to
the algorithm used to produce it by specifying a manifest format that can
be applied generically.

Does not have to be fancy, in fact I would just take the ML-DSA
construction:

𝑀′ ← BytesToBits(IntegerToBytes(1, 1) ∥ IntegerToBytes(|𝑐𝑡𝑥|, 1) ∥
𝑐𝑡𝑥 ∥ OID ∥ PH𝑀)

And the only thing I would change there is to replace the ML-DSA version
identifier with an OID off an IETF arc to denote the manifest construction.

A big advantage of this approach is that we get the context input added
into our scheme as well.


One area where I disagree with FIPS204 is that I don't think it is a
problem to share keys across pre and pure. If that is in fact a problem it
is because we have done it all wrong. The real issue is sharing keys across
different applications making a semantic substitution attack possible. And
the problem there is that we need to come up with a theory of how to use
the Context parameter consistently.

I will post on that separately.
_______________________________________________
jose mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to