You can use the comment on Algorithm 7, line 6. "message representative
that may optionally be computed in a different cryptographic module"
The hash function is SHAKE256(SHAKE256(pk, 64) || 0x00 || 0x00 || m, 64),
which already takes care of the context (by setting it to the empty string,
as it always should be, since that is a question for the protocol, not the
signature scheme). While you can argue that that breaks the cryptographic
module boundary (even if explicitly allowed by NIST's comment), the same is
true for HashML-DSA, with Algorithm 5 taking an arbitrary size message and
a hash function, not the hash of the message itself, so computing it
outside of the cryptographic module similarly breaks module boundaries,
this time without that being explicitly allowed by the standard.

On Thu, Sep 26, 2024 at 9:28 AM Phillip Hallam-Baker <[email protected]>
wrote:

> Right now, we are discussing this in LAMPS, JOSE, COSE, OpenPGP and the
> NIST-PQC list. Which is really not a good way to go about things because
> this is a systems architecture issue and what we really need is for all the
> groups to arrive at the same approach so that we avoid issues of semantic
> substitution, digest substitution, etc. So added LAMPS because the
> conclusion is relevant there.
>
>
> The Issue is that ML-DSA-pure doesn't use SHAKE(content), it uses
> SHAKE(0x00 + content + context). And there is no way to divide that
> operation securely with half taking place inside the HSM and half taking
> place outside.
>
> From the NIST reference code:
>
>         //5
>         int i = 0;
>         bytes[i++] = 0;
>         if (context is null) {
>             bytes[i++] = 0;
>             }
>         else {
>             bytes[i++] = (byte)clen;
>             Array.Copy(context, 0, bytes, i, length);
>             i += clen;
>             }
>
> Since my code is designed to support multiple signatures over the same
> content under different signature algorithms, being told which algorithm to
> use doesn't work for me because I am only hashing once.
>
> The NIST division is actually rather clever because it allows the HSM to
> have a bit more intelligence than in the past because the body is always at
> least a manifest. And the HSM can be configured to only sign specific types
> of content with specific authorization and log what it did.
>
>
> ML-DSA-hash is really only relevant for CMS/PKCS#7. Everything else we do
> either has a manifest already (COSE, JOSE, OpenPGP, XML-Signature) or is
> short enough that it can be done inside the HSM (except CRLs without
> distribution points) . And most of the things that are short enough are
> exactly the sort of thing we would want an HSM to validate and log.
>
> So for example, I might have my HSM configured in such a way that it will
> only sign an object with the "PKIX Certificate" extension, if and only if
> TBSCertificate is well formatted DER and the requisite set of proofs (proof
> of right, validation assertion, CT insertion proof) have been supplied.
>
> And yes, that might be more complexity than you would want in a FIPS-140-4
> module which is why you might have a second module acting as a front end to
> a 'signer only' module.
>
>
> Soi we should define a set of context strings to separate the COSE, JOSE,
> XML-Signature, SAML, Certificate, CRL, etc. domains which are all atomic
> strings with no parameters.
>
> CMS is special in that we have all this infrastructure already committed
> based on the RSA approach and so there we should probably allow the context
> string to be a prefix followed by the application specific context
> separator.
>
> So two new IANA registries. One for signature format context strings, one
> for CMS application contexts.
>
>
> I will write up a draft proposing this and a second draft proposing adding
> the ML-DSA hash digest to Ed-448 and Ed-25519 so we can use all the
> algorithms in the same fashion with the same API.
>
>
>
> On Thu, Sep 26, 2024 at 11:25 AM Sophie Schmieg <[email protected]>
> wrote:
>
>> SHAKE256 supports streaming. It's a sponge construction, you only need to
>> keep the sponge as state, and can stream in the data.
>>
>> On Wed, Sep 25, 2024 at 4:24 PM Phillip Hallam-Baker <
>> [email protected]> wrote:
>>
>>> ML-DSAhash is designed to support streaming. That isn't necessary if you
>>> only work at the packet layer but you certainly don't want to sign 1TB
>>> files using the SHAKE256 construct and pushing all that data into a
>>> FIPS-140-3 HSM.
>>>
>>> For COSE and JOSE, ML-DSAhash is unnecessary because you are always
>>> signing over a manifest. So if you are going to sign a 1TB file, you hash
>>> with your favorite digest, specify the digest value and digest algorithm ID
>>> in the SighedHeader field and then sign over that. That is exactly what
>>> ML-DSA-pure is intended for.
>>>
>>> For other systems, the reason we need ML-DSAhash is that we have a lot
>>> of APIs that are built around the RSA interface of 'sign digest value and
>>> OID'. And ML-DSA needs to support a mode where it is a direct drop in
>>> substitute.
>>>
>>> If the digest algorithm identifier is omitted, there is a possibility of
>>> a digest substitution attack. So it is an important consideration.
>>>
>>>
>>> The other concern is semantic substitution and there there are two
>>> separate issues, one is crafting a CMS package so that it is a legitimate
>>> COSE package. Which seems far fetched but so did gifs that render as
>>> jpgs...
>>>
>>> The second is crafting a COSE package for application A so that it is
>>> accepted as legitimate by application B. This is a much more plausible
>>> attack.
>>>
>>>
>>> If every signature algorithm supported context, we could use the
>>> signature context slot for both. Since they don't and since taking data
>>> provided by the signer and putting it into the signature envelope directly
>>> is 'icky' to say the least, the better way to do this in a standardized
>>> envelope that has a manifest is to use the context string to identify the
>>> envelope format, 'XML-DIG-SIG', 'JOSE', 'COSE', etc. and make a slot in the
>>> envelope manifest for application level separation.
>>>
>>> This is actually done in the SAML assertion format which has an Audience
>>> field for the express purpose of semantic binding to the terms and
>>> conditions of the signature.
>>>
>>> CMS/PKCS#7 is really rooted in the way we did things 30 years ago and
>>> the signature context is really the only option.
>>>
>>>
>>> This may seem unnecessarily complex but it is much easier to block this
>>> class of attack completely than to spend time auditing every application to
>>> see if there is a problem.
>>>
>>> The way we form the key agreement output doing ECDH (P-256 etc) is a lot
>>> more verbose than X25519 because it binds to the keys used for the key
>>> agreement. I am really not at all sure what the advantage of doing it that
>>> way when your key names are 'Alice' and 'Bob' which is what we have in the
>>> RFC but that's what we did and maybe we should have done X25519 exactly the
>>> same way so it was the same...
>>>
>>>
>>> On Wed, Sep 25, 2024 at 5:19 PM Sophie Schmieg <sschmieg=
>>> [email protected]> wrote:
>>>
>>>> Note that there are two options for prehashing with ML-DSA: You can use
>>>> the comment on algorithm 7, line 6 and use the hash function
>>>> SHAKE256(SHAKE256(pk, 64) || 0x00 || 0x00 || m, 64), in which case it works
>>>> exactly the same as ECDSA (with a known hash function). I.e. the hash can
>>>> be computed elsewhere and transmitted to a signing oracle, producing a
>>>> signature that looks the same as if no prehashing has taken place, so from
>>>> the verifiers perspective this choice does not matter. Or you use the (in
>>>> my opinion strictly worse) option of using HashML-KEM, where you prehash
>>>> with say SHA512. In that case, the verifier needs to know that you did so.
>>>> By calling that algorithm HashML-DSA-SHA512 (and putting the algorithm
>>>> information in the public key), you can communicate that, but honestly I do
>>>> not see any reason to do so that would not be better served by just using
>>>> ML-DSA, prehashing with the SHAKE256 construction mentioned.
>>>>
>>>> On Thu, Sep 19, 2024 at 12:34 AM Ilari Liusvaara <
>>>> [email protected]> wrote:
>>>>
>>>>> On Wed, Sep 18, 2024 at 01:50:20PM -0700, Sophie Schmieg wrote:
>>>>> > On Tue, Sep 17, 2024 at 1:20 PM Ilari Liusvaara <
>>>>> [email protected]>
>>>>> > wrote:
>>>>> >
>>>>> > >
>>>>> > > In case of signed JWT, the very first thing that needs to be
>>>>> parsed out
>>>>> > > is "iss".
>>>>> > >
>>>>> > > ... Which is a bit problematic.
>>>>> >
>>>>> > Yeah, I somewhat intentionally did not mention iss, because yeah, it
>>>>> is a
>>>>> > bit problematic, and forces the "authorization decision passed down
>>>>> to
>>>>> > downstream system" as a pattern.
>>>>>
>>>>> Dedicated JWT validation code could callback to map issuer name to
>>>>> keyset. But that runs into bit annoying function color issues in many
>>>>> laguages (fortunately synchronous factorization does not seem to be too
>>>>> bad)...
>>>>>
>>>>>
>>>>> > > Unfortunately, that runs into problems with pre-hashing.
>>>>> > >
>>>>> > > Currently, that only gets problematic for RSA, but supporting
>>>>> pre-hashed
>>>>> > > ML-DSA would also introduce the problem there.
>>>>> > >
>>>>> > > ECDSA has essentially fixed prehash (ok), and EdDSA in COSE/JOSE
>>>>> does
>>>>> > > not support pre-hashing.
>>>>> > >
>>>>> >
>>>>> > I'm not sure I follow. The hash function used with a signature
>>>>> scheme is
>>>>> > part of the signature scheme as well, and so the public key should
>>>>> allow
>>>>> > you to derive that information. Several common public key
>>>>> serialization
>>>>> > formats unfortunately do not properly include the hash function,
>>>>> maybe that
>>>>> > is what you are referring to? Or do you have a system where the
>>>>> decision
>>>>> > which hash function to use is taken independently of the decision of
>>>>> which
>>>>> > key to use? In that case, yeah you have lots of incompatibilities,
>>>>> > especially in the case of ML-DSA where the hash function is fixed to
>>>>> > SHAKE256, and has to be prefixed with a hash of the public key, but
>>>>> I'm not
>>>>> > sure why the algorithm has to be part of the token to enable this
>>>>> use case.
>>>>>
>>>>> Because public keys frequently fail to include hash function, one would
>>>>> have to deduce the hash function from the key itself.
>>>>>
>>>>> That works in practice for ECDSA, EdDSA and HSS-LMS. But it does not
>>>>> work for RSA (then there is the PSS versus PKCS#1 v1.5 stuff...).
>>>>>
>>>>> For ML-DSA, supporting pre-hash mode breaks deducing hash function.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -Ilari
>>>>>
>>>>> _______________________________________________
>>>>> COSE mailing list -- [email protected]
>>>>> To unsubscribe send an email to [email protected]
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Sophie Schmieg | Information Security Engineer | ISE Crypto |
>>>> [email protected]
>>>>
>>>> _______________________________________________
>>>> COSE mailing list -- [email protected]
>>>> To unsubscribe send an email to [email protected]
>>>>
>>>
>>
>> --
>>
>> Sophie Schmieg | Information Security Engineer | ISE Crypto |
>> [email protected]
>>
>>

-- 

Sophie Schmieg | Information Security Engineer | ISE Crypto |
[email protected]
_______________________________________________
COSE mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to