Sophie wrote:
For the algorithm header field, I am not aware of any reason to have that in 
the token (other than the fact that it historically has been there).

The primary reason that the “alg” is in the protected header is to 
cryptographically bind the algorithm used for the JWS or JWE to the computation 
so that an attacker cannot trick the implementation into using a different 
algorithm by providing a key using a different algorithm.  Having “alg” in the 
protected header enables this mismatch to be caught before any cryptographic 
operations are performed.

Note that the Key ID does not cryptographically bind the key to the token.  
Depending upon the circumstances, the attacker is free to craft a key with 
content of their choosing, including a Key ID matching that in the token.  They 
Key ID is a hint enabling optimized processing in the happy path but doesn’t 
add any security value.

For the record, these decisions are the result of discussions including 
cryptographers at the Internet Identity Workshop (IIW) 15 years ago this month, 
including cryptographers from Sun Microsystems.  The primary decisions are 
recorded at https://self-issued.info/?p=361, which include:

  *   There is an envelope (a.k.a. header) that completely describes the 
cryptographic algorithm(s) used
  *   What to sign (envelope.payload or just payload)?
Given that the envelope is extensible and therefore may contain 
security-sensitive information, we reached a consensus (with input from Ben 
Laurie<http://www.links.org/> via IM) that the combination envelope.payload 
must be signed.
The full set of decisions made with cryptographers and implementers 15 years 
ago that resulted in the JWS<https://www.rfc-editor.org/rfc/rfc7515.html>, 
JWE<https://www.rfc-editor.org/rfc/rfc7516.html>, 
JWK<https://www.rfc-editor.org/rfc/rfc7517.html>, 
JWA<https://www.rfc-editor.org/rfc/rfc7518.html>, and 
JWT<https://www.rfc-editor.org/rfc/rfc7519.html> specs we have today are 
recorded here:

  *   JSON Token Spec Results at IIW on Tuesday<https://self-issued.info/?p=361>

·       JSON Token Encryption Spec Results at IIW on 
Wednesday<https://self-issued.info/?p=378>
·       JSON Token Naming Spec Results at IIW on 
Wednesday<https://self-issued.info/?p=386>
·       JSON Public Key Spec Results at IIW on 
Thursday<https://self-issued.info/?p=390>

Many of the COSE decision decisions followed the JOSE precedents for the same 
reasons.

                                                                Best wishes,
                                                                -- Mike

From: lgl island-resort.com <[email protected]>
Sent: Wednesday, September 18, 2024 10:37 AM
To: Sophie Schmieg <[email protected]>
Cc: Orie Steele <[email protected]>; Hannes Tschofenig 
<[email protected]>; cose <[email protected]>
Subject: [COSE] Re: Thoughts about the Context Information in COSE

It seems OK and useful to put an algorithm ID in a header to help with 
processing efficiency on the receiving side. For example, the ID of the hash 
algorithm in a signature format can allow for one-pass processing. I think 
Illari referred to this as “pre-hashing”. It’s OK to do lots of processing as 
long as the processor is robust and the data is not used or trusted until the 
receiving side work is complete, right?

What seems important is that, in the end, the ID of the algorithm came from a 
trusted source, probably out of band relative to the message. Possibly that is 
from a trusted data structure describing the key, but it doesn’t have to be.

Understood, that even “protected” headers aren’t to be trusted with blind faith.

LL



On Sep 17, 2024, at 11:35 AM, Sophie Schmieg 
<[email protected]<mailto:[email protected]>>
 wrote:

The main principle is to assume that everything in the token is crafted 
specifically to lie to you, unless you have been able to confirm it came from 
an honest party.
This means that until you have verified the signature, any data in the token 
can only be used for the verification if you are entirely indifferent to which 
of the possible values it presents. This is unfortunately a bit more 
complicated than just saying "do not trust the header", but "do not trust the 
header" is a good first approximation.
The one example I know of header information safely being used in the 
verification is the use of a key ID, in the very specific scenario that the key 
ID allows the selection between multiple different equally trustedkeys. I've 
written about that use case in a recent blog post [1]. If the keys are not 
equally trusted, this will allow the attacker to select the key that is least 
trusted. That still can be okay, if the downstream application takes that 
information into account and uses it for the authorization decisions it makes, 
but in my experience, this rarely happens correctly if it is not enforced by 
the JWT library. In other words, having a clear separation of identification, 
authentication, and authorization is a good idea.

For the algorithm header field, I am not aware of any reason to have that in 
the token (other than the fact that it historically has been there). To see 
why, we first need to look at the public key: The public key can never be part 
of the token, since it is trivial to create a token with a valid signature if 
the attacker gets to choose the public key, they can just create the public key 
themselves. But conceptually speaking, the public key includes the algorithm 
already, a RSA key, a ECDSA key, and a ML-DSA key are not interchangeable after 
all. So if a token says it uses ECDSA as algorithm, but the public key that is 
supposed to be used for verification is a ML-DSA key, the token is clearly 
malformed, making the algorithm field a field that only ever has information 
that is either superfluous (you already knew it has to be ML-DSA because that 
is your public key's key type) or invalid (the algorithm field does not align 
with the public key). Therefore including the algorithm in the token is never 
useful.
But it gets worse, if the application is implemented in the wrong way, it will 
take the algorithm field of the token as authoritative and essentially 
reinterpret_cast the public key bytes to the type the header field suggested. 
This way, you get vulnerabilities casting the say an ECDSA public key into an 
HMAC key, with the attacker now able to forge the MAC, since the public key is 
known. But even if you cast different public keys into each other, the results 
are undefined, and might very well be insecure. For example, if you cast a 
ML-DSA key into a ECDSA key (with the library truncating all the extra stuff), 
an adversary with a quantum computer has disabled your post quantum 
protections, etc. Even just switching between two modes of the same algorithm 
(say RSA PKCS1 and RSA PSS or ECDSA and Schnorr signatures) is not guaranteed 
to be secure, since it might be possible to use an artifact obtained in one 
mode in the other mode, with the security analysis only ever looking at 
situations where all artifacts are created with a single mode.

Another important observation is that we cannot cure this problem by including 
the header in question in the signature. Since all these attacks are about 
manipulating the decisions leading up to the signature verification, the 
attacker either has already successfully abused them by the time the signature 
verifies, or they already failed at abusing them. Fields like the algorithm 
that only have one possible valid value for example can be verified to have the 
one value whether or not they are part of the signature, indeed that is what 
for example Tink's JWT library does [2]. Key IDs switching between equally 
trusted keys are implicitly verified, since a wrong value would switch to the 
wrong key, causing the signature to fail to verify.

Note that this does not extend to any possible header fields that are used 
after verification of the signature, just for header fields used in the 
decision making involved in signature verification. From a cryptographic 
standpoint, there is no difference between header and payload fields after 
verification, that is meaning ascribed to them by the application.

[1] 
https://bughunters.google.com/blog/6182336647790592/cryptographic-agility-and-key-rotation
[2] 
https://github.com/tink-crypto/tink-cc/blob/main/tink/jwt/internal/jwt_format.cc#L137

On Tue, Sep 17, 2024 at 9:11 AM Orie Steele 
<[email protected]<mailto:[email protected]>> wrote:
I agree with much of what you wrote.

Lets walk through an example of building an application layer protocol for HPKE 
to see where parameters show up, if we were designing from scratch and with 
2020 hindsight.

## HPKE Crypto Layer

recipientPublicKey, recipientPrivateKey = keyGen( ciphersuite )
contentCipherText, kemCipherText = encrypt(plaintext, recipientPublicKey)
recoveredPlaintext = decrypt(contentCipherText, kemCipherText, 
recipientPrivateKey)

HPKE has been built with the benefit of learning from ECDH-ES / KDFs / PartyU / 
PartyV.

It internalizes a lot of things that we would have put in headers, previously.

However, you still need to convey contentCipherText, kemCipherText... and 
handle errors that might be produced if kemCT is tampered with: 
https://datatracker.ietf.org/doc/html/rfc9180#base-crypto

## JOSE / COSE Application Protocol Layer

At this point, you are ready to consider protocol specific context information, 
the purpose of this step is to ensure that sender and receiver agree they are 
using COSE, or JOSE... with the assumption they are already supporting HPKE.

The first step is to construct a single message that contains both 
contentCipherText, kemCipherText ... it could use base64url and "." or cbor 
major types.

After this step the information conveyed is cborEnvelope or joseEnvelope... not 
contentCipherText, kemCipherText.

## Application Protocol Context Separation

Before encrypting or decrypting, sender and receiver need to agree to use a 
serialization and an hpke ciphersuite.

Here you can add protocol specific context separation:

- https://datatracker.ietf.org/doc/html/rfc9052#section-5.3
- https://datatracker.ietf.org/doc/html/rfc7516#section-3

JOSE and COSE go about this step differently... It's even more confusing 
because in JOSE AEADs are mandatory, whereas in COSE they are not...
The objective of this step is to commit some protocol information, into the 
encryption step... AEAD AAD is used where it can be... KDF context info can 
also be used here:

- https://datatracker.ietf.org/doc/html/rfc9053#name-context-information-structu

... in hindsight, this is a layer violation that forces both JOSE and COSE to 
maintain a separation between keys and algorithms... or if you want to think of 
it another way... it's the binding between algorithms and keys in both 
protocols.

... this is also the layer where we get "2 payloads", because in JOSE we have 
both the protected header and the payload... and you can put protocol 
parameters in either... Later this leads to JWT / CWT parameters in headers and 
payloads.

... it's inherited from ASN.1 supposedly... maintaining this design pattern is 
the "conservative approach", in that... it's doing what we have "always done".

## Key Discovery

In the simple case that there is only 1 supported ciphersuite and each party 
only has 1 key, there is no need to communicate other information.

If there are multiple keys, the key that is being encrypted too needs to be 
identified, to avoid the recipient having to try all their keys.

At this stage we would add the key identifier as a parameter to the 
cborEnvelope or joseEnvelope.

There is never a need to convey the algorithm or ciphersuite... because they 
are always included in the key representation, so the key identifier explicitly 
communicates them.

In the pull request for ML-DSA key representations, we constructed a new key 
type for COSE and JOSE, called "algorithm key pair" :

https://github.com/cose-wg/draft-ietf-cose-dilithium/pull/5/files

The algorithm property is mandatory for this key type, and the thumbprint is 
computed over it.

... some other comments

The fork in the road happens in "Application Protocol Context Separation"... 
this is where we see the AEAD differences and the context info differences...
This is where we get protected header parameters... and where we first get our 
chance to put "algorithm information" in a "header parameter"...
Because of the design of JOSE and COSE, we are forced to take the same path 
through this step each time, and that is why we are always stuck handling 
algorithm identifiers and keys as seperate things.
In JOSE "alg" is a mandatory header parameter... in COSE it is not... but COSE 
ends up making it mandatory in a different way, and enabling not AEADs at the 
same time.
JOSE has alg none, which is also a problem at this layer of the design.

The counter argument to "don't put algorithms in headers" is "never use an 
algorithm which you do not trust" and "with a key it is not meant for"... in 
code this means:

- restricting keys to specific algorithms (even tho the specs do not mandate 
this)
- comparing algorithms in header to algorithms in keys (even though they are 
not required to be present in either)

I think time has shown that it would have been safer / simpler to just "not put 
algorithm identifiers in headers".

There is also the issue of bulk encryption / splitting key establishment and 
content encryption up... both JOSE and COSE do this, and it leads to 
"intermediate keys" and in JOSE, multiple algorithm identifiers in headers 
("alg" and "enc").

JOSE could have shuffled things around like COSE did and avoided "enc" all 
together... or internalized things like HPKE does... but JOSE came first.

... final thoughts

If I could wave a magic wand, I would 100% make algorithms part of keys, and 
make identifiers committing to keys, and handle the layering differently.
Regardless of the era in which these protocols were constructed, we have a 
responsibility to deprecate the parts of them that are problematic, and offer 
upgrade paths where possible.

For a recent example of this, see:

https://datatracker.ietf.org/doc/html/draft-ietf-lamps-cms-cek-hkdf-sha256-04#name-use-of-of-hkdf-with-sha-256

COSE needs a draft that conceptually accomplishes the same thing.

New COSE work needs to account for attacks that were discovered after COSE was 
constructed, it can't just say "we've always done it this way".

If you got this far, thanks for reading.

OS





On Tue, Sep 17, 2024 at 3:33 AM 
<[email protected]<mailto:[email protected]>> 
wrote:
Hi all,

When I presented an update on the COSE HPKE draft at the last IETF meeting (see 
slides-120-cose-use-of-hpke-with-cose (ietf.org)<https://www.ietf.org/>), 
Sophie made an insightful remark that got me rethinking the construction of the 
context information. She noted, "you cannot trust the information in the 
headers", in response to my presentation. This is particularly relevant because 
the current draft suggests placing all context information into the header so 
it is included in the authenticated data.

Ideally, when a recipient processes the message, the first step involves using 
the key ID to retrieve the key required to decrypt the payload (or identify the 
key used by the key exchange mechanism to derive the content encryption key). 
Best practices dictate that different keys should be used for different 
purposes, meaning there should be a one-to-one relationship between the key and 
the associated algorithm information. For instance, a key designated as a KEK 
for AES-KW should not be used directly for content encryption.

This implies that the parties involved in the communication should avoid 
including algorithm-related information in the message header. Instead, this 
information should be retrieved based on the key identifier. Thus, more than 
just the key ID and the key must be shared between the communicating parties; 
key-related metadata must also be exchanged.

If I understood Sophie correctly, the current approach of relying on 
header-based context information is not useful. We should reconsider why we are 
embedding all of this information in the header in the first place, as it may 
actually weaken security.

Ciao
Hannes

[1] Interestingly, I had already advocated for using the key ID to select all 
other parameters back in 2015. See [COSE] alg Header Parameter 
(ietf.org)<https://mailarchive.ietf.org/arch/msg/cose/Ybou-lGY5C2DwYlorI8wRwxlmN0/>

_______________________________________________
COSE mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to [email protected]<mailto:[email protected]>


--

ORIE STEELE
Chief Technology Officer
www.transmute.industries<http://www.transmute.industries/>
[https://ci3.googleusercontent.com/mail-sig/AIorK4xqtkj5psM1dDeDes_mjSsF3ylbEa5EMEQmnz3602cucAIhjLaHod-eVJq0E28BwrivrNSBMBc]<https://transmute.industries/>


--

Sophie Schmieg | Information Security Engineer | ISE Crypto | 
[email protected]<mailto:[email protected]>

_______________________________________________
COSE mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to [email protected]<mailto:[email protected]>

_______________________________________________
COSE mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to