[COSE] Re: Thoughts about the Context Information in COSE

Sophie Schmieg Tue, 17 Sep 2024 11:36:22 -0700

The main principle is to assume that everything in the token is crafted
specifically to lie to you, unless you have been able to confirm it came
from an honest party.
This means that until you have verified the signature, any data in the
token can only be used for the verification if you are entirely indifferent
to which of the possible values it presents. This is unfortunately a bit
more complicated than just saying "do not trust the header", but "do not
trust the header" is a good first approximation.
The one example I know of header information safely being used in the
verification is the use of a key ID, in the very specific scenario that the
key ID allows the selection between multiple different *equally trusted*
keys. I've written about that use case in a recent blog post [1]. If the
keys are not equally trusted, this will allow the attacker to select the
key that is least trusted. That still can be okay, if the downstream
application takes that information into account and uses it for the
authorization decisions it makes, but in my experience, this rarely happens
correctly if it is not enforced by the JWT library. In other words, having
a clear separation of identification, authentication, and authorization is
a good idea.


For the algorithm header field, I am not aware of any reason to have that
in the token (other than the fact that it historically has been there). To
see why, we first need to look at the public key: The public key can never
be part of the token, since it is trivial to create a token with a valid
signature if the attacker gets to choose the public key, they can just
create the public key themselves. But conceptually speaking, the public key
includes the algorithm already, a RSA key, a ECDSA key, and a ML-DSA key
are not interchangeable after all. So if a token says it uses ECDSA as
algorithm, but the public key that is supposed to be used for verification
is a ML-DSA key, the token is clearly malformed, making the algorithm field
a field that only ever has information that is either superfluous (you
already knew it has to be ML-DSA because that is your public key's key
type) or invalid (the algorithm field does not align with the public key).
Therefore including the algorithm in the token is never useful.
But it gets worse, if the application is implemented in the wrong way, it
will take the algorithm field of the token as authoritative and essentially
reinterpret_cast the public key bytes to the type the header field
suggested. This way, you get vulnerabilities casting the say an ECDSA
public key into an HMAC key, with the attacker now able to forge the MAC,
since the public key is known. But even if you cast different public keys
into each other, the results are undefined, and might very well be
insecure. For example, if you cast a ML-DSA key into a ECDSA key (with the
library truncating all the extra stuff), an adversary with a quantum
computer has disabled your post quantum protections, etc. Even just
switching between two modes of the same algorithm (say RSA PKCS1 and RSA
PSS or ECDSA and Schnorr signatures) is not guaranteed to be secure, since
it might be possible to use an artifact obtained in one mode in the other
mode, with the security analysis only ever looking at situations where all
artifacts are created with a single mode.

Another important observation is that we cannot cure this problem by
including the header in question in the signature. Since all these attacks
are about manipulating the decisions leading up to the signature
verification, the attacker either has already successfully abused them by
the time the signature verifies, or they already failed at abusing them.
Fields like the algorithm that only have one possible valid value for
example can be verified to have the one value whether or not they are part
of the signature, indeed that is what for example Tink's JWT library does
[2]. Key IDs switching between equally trusted keys are implicitly
verified, since a wrong value would switch to the wrong key, causing the
signature to fail to verify.

Note that this does not extend to any possible header fields that are used
after verification of the signature, just for header fields used in the
decision making involved in signature verification. From a cryptographic
standpoint, there is no difference between header and payload fields after
verification, that is meaning ascribed to them by the application.

[1]
https://bughunters.google.com/blog/6182336647790592/cryptographic-agility-and-key-rotation
[2]
https://github.com/tink-crypto/tink-cc/blob/main/tink/jwt/internal/jwt_format.cc#L137

On Tue, Sep 17, 2024 at 9:11 AM Orie Steele <[email protected]>
wrote:

> I agree with much of what you wrote.
>
> Lets walk through an example of building an application layer protocol for
> HPKE to see where parameters show up, if we were designing from scratch and
> with 2020 hindsight.
>
> ## HPKE Crypto Layer
>
> recipientPublicKey, recipientPrivateKey = keyGen( ciphersuite )
> contentCipherText, kemCipherText = encrypt(plaintext, recipientPublicKey)
> recoveredPlaintext = decrypt(contentCipherText, kemCipherText,
> recipientPrivateKey)
>
> HPKE has been built with the benefit of learning from ECDH-ES / KDFs /
> PartyU / PartyV.
>
> It internalizes a lot of things that we would have put in headers,
> previously.
>
> However, you still need to convey contentCipherText, kemCipherText... and
> handle errors that might be produced if kemCT is tampered with:
> https://datatracker.ietf.org/doc/html/rfc9180#base-crypto
>
> ## JOSE / COSE Application Protocol Layer
>
> At this point, you are ready to consider protocol specific context
> information, the purpose of this step is to ensure that sender and receiver
> agree they are using COSE, or JOSE... with the assumption they are already
> supporting HPKE.
>
> The first step is to construct a single message that contains both
> contentCipherText, kemCipherText ... it could use base64url and "." or cbor
> major types.
>
> After this step the information conveyed is cborEnvelope or
> joseEnvelope... not contentCipherText, kemCipherText.
>
> ## Application Protocol Context Separation
>
> Before encrypting or decrypting, sender and receiver need to agree to use
> a serialization and an hpke ciphersuite.
>
> Here you can add protocol specific context separation:
>
> - https://datatracker.ietf.org/doc/html/rfc9052#section-5.3
> - https://datatracker.ietf.org/doc/html/rfc7516#section-3
>
> JOSE and COSE go about this step differently... It's even more confusing
> because in JOSE AEADs are mandatory, whereas in COSE they are not...
> The objective of this step is to commit some protocol information, into
> the encryption step... AEAD AAD is used where it can be... KDF context info
> can also be used here:
>
> -
> https://datatracker.ietf.org/doc/html/rfc9053#name-context-information-structu
>
> ... in hindsight, this is a layer violation that forces both JOSE and COSE
> to maintain a separation between keys and algorithms... or if you want to
> think of it another way... it's the binding between algorithms and keys in
> both protocols.
>
> ... this is also the layer where we get "2 payloads", because in JOSE we
> have both the protected header and the payload... and you can put protocol
> parameters in either... Later this leads to JWT / CWT parameters in headers
> and payloads.
>
> ... it's inherited from ASN.1 supposedly... maintaining this design
> pattern is the "conservative approach", in that... it's doing what we have
> "always done".
>
> ## Key Discovery
>
> In the simple case that there is only 1 supported ciphersuite and each
> party only has 1 key, there is no need to communicate other information.
>
> If there are multiple keys, the key that is being encrypted too needs to
> be identified, to avoid the recipient having to try all their keys.
>
> At this stage we would add the key identifier as a parameter to
> the cborEnvelope or joseEnvelope.
>
> There is never a need to convey the algorithm or ciphersuite... because
> they are always included in the key representation, so the key identifier
> explicitly communicates them.
>
> In the pull request for ML-DSA key representations, we constructed a new
> key type for COSE and JOSE, called "algorithm key pair" :
>
> https://github.com/cose-wg/draft-ietf-cose-dilithium/pull/5/files
>
> The algorithm property is mandatory for this key type, and the thumbprint
> is computed over it.
>
> ... some other comments
>
> The fork in the road happens in "Application Protocol Context
> Separation"... this is where we see the AEAD differences and the context
> info differences...
> This is where we get protected header parameters... and where we first get
> our chance to put "algorithm information" in a "header parameter"...
> Because of the design of JOSE and COSE, we are forced to take the same
> path through this step each time, and that is why we are always stuck
> handling algorithm identifiers and keys as seperate things.
> In JOSE "alg" is a mandatory header parameter... in COSE it is not... but
> COSE ends up making it mandatory in a different way, and enabling not AEADs
> at the same time.
>
> JOSE has alg none, which is also a problem at this layer of the design.
>
> The counter argument to "don't put algorithms in headers" is "never use an
> algorithm which you do not trust" and "with a key it is not meant for"...
> in code this means:
>
> - restricting keys to specific algorithms (even tho the specs do not
> mandate this)
> - comparing algorithms in header to algorithms in keys (even though they
> are not required to be present in either)
>
> I think time has shown that it would have been safer / simpler to just
> "not put algorithm identifiers in headers".
>
> There is also the issue of bulk encryption / splitting key establishment
> and content encryption up... both JOSE and COSE do this, and it leads to
> "intermediate keys" and in JOSE, multiple algorithm identifiers in headers
> ("alg" and "enc").
>
> JOSE could have shuffled things around like COSE did and avoided "enc" all
> together... or internalized things like HPKE does... but JOSE came first.
>
> ... final thoughts
>
> If I could wave a magic wand, I would 100% make algorithms part of keys,
> and make identifiers committing to keys, and handle the layering
> differently.
> Regardless of the era in which these protocols were constructed, we have a
> responsibility to deprecate the parts of them that are problematic, and
> offer upgrade paths where possible.
>
> For a recent example of this, see:
>
>
> https://datatracker.ietf.org/doc/html/draft-ietf-lamps-cms-cek-hkdf-sha256-04#name-use-of-of-hkdf-with-sha-256
>
> COSE needs a draft that conceptually accomplishes the same thing.
>
> New COSE work needs to account for attacks that were discovered after COSE
> was constructed, it can't just say "we've always done it this way".
>
> If you got this far, thanks for reading.
>
> OS
>
>
>
>
>
>
> On Tue, Sep 17, 2024 at 3:33 AM <hannes.tschofenig=
> [email protected]> wrote:
>
>> Hi all,
>>
>>
>>
>> When I presented an update on the COSE HPKE draft at the last IETF
>> meeting (see slides-120-cose-use-of-hpke-with-cose (ietf.org)
>> <https://www.ietf.org>), Sophie made an insightful remark that got me
>> rethinking the construction of the context information. She noted, "you
>> cannot trust the information in the headers", in response to my
>> presentation. This is particularly relevant because the current draft
>> suggests placing all context information into the header so it is included
>> in the authenticated data.
>>
>>
>>
>> Ideally, when a recipient processes the message, the first step involves
>> using the key ID to retrieve the key required to decrypt the payload (or
>> identify the key used by the key exchange mechanism to derive the
>> content encryption key). Best practices dictate that different keys should
>> be used for different purposes, meaning there should be a one-to-one
>> relationship between the key and the associated algorithm information. For
>> instance, a key designated as a KEK for AES-KW should not be used directly
>> for content encryption.
>>
>>
>>
>> This implies that the parties involved in the communication should avoid
>> including algorithm-related information in the message header. Instead,
>> this information should be retrieved based on the key identifier. Thus,
>> more than just the key ID and the key must be shared between the
>> communicating parties; key-related metadata must also be exchanged.
>>
>>
>>
>> If I understood Sophie correctly, the current approach of relying on
>> header-based context information is not useful. We should reconsider why
>> we are embedding all of this information in the header in the first place,
>> as it may actually weaken security.
>>
>>
>>
>> Ciao
>> Hannes
>>
>>
>>
>> [1] Interestingly, I had already advocated for using the key ID to
>> select all other parameters back in 2015. See [COSE] alg Header
>> Parameter (ietf.org)
>> <https://mailarchive.ietf.org/arch/msg/cose/Ybou-lGY5C2DwYlorI8wRwxlmN0/>
>>
>>
>> _______________________________________________
>> COSE mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>>
>
>
> --
>
>
> ORIE STEELE
> Chief Technology Officer
> www.transmute.industries
>
> <https://transmute.industries>
>


-- 

Sophie Schmieg | Information Security Engineer | ISE Crypto |
[email protected]

_______________________________________________
COSE mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[COSE] Re: Thoughts about the Context Information in COSE

Reply via email to