Depending on the protocol you are using for work, it's possible this is better solved in different ways.
We would need a concrete protocol to say more, not just a location hint. I could imagine placing a lot of protocol specific metadata in the header and perhaps not even using the location and content type hints... And for some use cases that might be a much better solution. Especially if you wanted to index on protected protocol specific metadata consistently. On Fri, Mar 7, 2025, 9:07 AM Robin Bryce <[email protected]> wrote: > And actually, replying to myself > > > If you start with the data you always have a size. If you start with > > metadata, paraphrasing some larger piece of data you do not. > > For the use case exploration I discuss above, I can get behind a > collective view which says "In that case, put the size in the > metadata". Possibly that is the answer here. > > On Fri, 7 Mar 2025 at 15:55, Robin Bryce <[email protected]> wrote: > > > > > I'm trying to understand the real problem you are trying to solve, and > I'm rejecting the blind assertion that size is needed. > > > > Oh I do appreciate that! > > > > >I shared a concrete use case for it where location is not present, show > me one where size and location are required. > > > > First, I have to acknowledge that the uses I have in mind can't be > > satisfied by this id in isolation, and perhaps that makes my > > interjection "not helpful". > > > > At the company I work for, pretty much every significant use case we > > have encountered involves making statements about data. > > We actively don't want to see the data - some customers could not use > > us if we required that - we only care about the metadata. > > The metadata may be summarizing, commenting on, flagging status for, > > some action that has been taken in consideration of the "data". > > > > Consumers of the metadata are then typically starting from the > > metadata and they are often not going to be the original recipient of > > that data. > > In many situations, the original data need not be fetched by consumers > > at all, in order for automated process to act responsibly. "Has the > > right party attested to the metadata" is sufficient. > > > > But when things "go wrong" homework needs to be checked and checking > > that the metadata is indeed supported by the data is required. And the > > checker may only have the metadata to go on. > > > > I do appreciate the id we are discussing has no avenue for attaching > > metadata and I'm not suggesting it should. > > But as a primitive for systems that are "metadata" oriented, location > > is the *reason* this id is interesting. > > > > If you start with the data you always have a size. If you start with > > metadata, paraphrasing some larger piece of data you do not. > > > > I totally see you can make the connection, open the file, whatever, > > and read bytes until it is exhausted, relying on the protocol to tell > > you when to stop adding bytes to the hash. > > As an implementer, I see that as an invitation to dos my service at > > worst, and at best to introduce un-necessarily unbounded operations. > > > > You asked about use cases, in addition to LLM models, and model cards, > > two off the cuff examples would be VCons (was assent granted in the > > call) and medically significant decisions based on patient data. > > > > Granted this id is *not* about metadata, but we would certainly see it > > as useful way to deal with the data that is the subject of metadata > > > > > How come x5u doesn't have a length? > > > > No idea :-) > > > > > Can you point me to some existing RFCs to read that include the length > and that function the way you imagine? > > > > I can't but I'm going to spend some time looking on the expectation > > that I will learn something! > > > > > How come every URL in a web page doesn't include the length of the > file that will be dereferenced? > > > > I would guess that in the context of an interactive application there > > are sensible timeouts and the user can hit cancel at any time. For > > lights out operations this just isn't reasonable (imo). > > > > > Also note that git lfs does not include the location, or content type. > > > > Granted, that is interesting. > > > > > Please also consider my comment about version, should version be added > to the spec? > > > > I'm not sure I have a useful opinion on this. I think that the id > > would be useful without it and that introducing it, along with > > location (what does it mean when the version associated with the same > > location changes ?) would make things quite a bit harder to reason > > about. > > > > > Please show me evidence that size is needed. > > > > I don't know that I can meet that bar, but > > > > > I'm trying to understand the real problem you are trying to solve > > > > I've done my best for this, at least from the PoV that makes this ID > > interesting to me. > > > > I apologize for the company specific link, but it does speak > > specifically to the use case question and the problems we are trying > > to solve: > https://docs.datatrails.ai/platform/overview/advanced-concepts/#attachments--content-integrity-protection > > > > Cheers, > > > > Robin > > > > On Fri, 7 Mar 2025 at 14:44, Orie Steele <[email protected]> > wrote: > > > > > > Hi Robin, > > > > > > Let's get grounded in real use cases. > > > > > > On Fri, Mar 7, 2025, 6:03 AM Robin Bryce <[email protected]> wrote: > > >> > > >> Hi Orie, > > >> > > >> > The client could retry, the client might not even need to resolve > the resource because of caching or having previously dereferenced the > resource. > > >> > The resource could be compressed, have different transfer encoding, > etc.. > > >> > > >> In the case where location is supplied the *reason* for that is (in > > >> the uses I'd anticipate) that the verifier is expected to fetch and > > >> hash the content from that location as part of verification. > > > > > > > > > So you start with an envelope that has no payload, download a large > file, hash the file in chunks, place the hash in the payload, and very the > signature. > > > > > > And the use cases for this are what exactly? > > > > > > Without specifics I'm left imagining you are installing a large signed > binary from a small signature, is that the use case? > > > > > > Is this for signing LLM models distributed through bit torrent, or > some protocol that doesn't work when you don't know the size? > > > > > > Which protocols require size to start the download? > > > > > >> Especially because of the factors you mention impacting > > >> Content-Length, I don't see how that verifier could know how many > > >> bytes to fetch ? What am I missing here ? > > > > > > > > > The transfer protocol. > > > Last I checked, HTTP doesn't ask you how many bytes to download before > giving up, though I will admit I'm not an expert on every protocol that > might be used to resolve a file... Which is part of why I don't want to > comment on this in the draft. > > > > > > Do all popular file transfer protocols require a separate byte size > parameter to use? > > > > > >> > > >> > Given that location and content type are already optional, how can > you argue that lack of size makes the draft incomplete? > > >> > > >> Ok, the location being optional, and the size being mandatory does not > > >> make sense to me either. size being mandatory when location is > > >> supplied would make sense to me > > > > > > > > > How come x5u doesn't have a length? > > > > > > Can you point me to some existing RFCs to read that include the length > and that function the way you imagine? > > > > > > How come every URL in a web page doesn't include the length of the > file that will be dereferenced? > > > > > > Which cases outside of git lfs use the size and the hash as a point > for the content? > > > > > > Also note that git lfs does not include the location, or content type. > > > > > > Please also consider my comment about version, should version be added > to the spec? > > > > > > I'm not trying to be difficult. > > > > > > I'm trying to understand the real problem you are trying to solve, and > I'm rejecting the blind assertion that size is needed. > > > > > > Please show me evidence that size is needed. > > > > > > I shared a concrete use case for it where location is not present, > show me one where size and location are required. > > > > > >> > > >> > > >> On Fri, 7 Mar 2025 at 04:09, Orie Steele <[email protected]> > wrote: > > >> > > > >> > Hi Steve, > > >> > > > >> > So just to be clear, the denial of service (DOS) attack is caused > by attempting to resolve a file or byte stream of arbitrary length? > > >> > > > >> > And the solution is to believe the issuer has accurately reflected > the length of the pre-image and the location and content type hints? > > >> > > > >> > Does this consideration apply to any other URIs present in the > header? Such as x5u? > > >> > > > >> > Depending on the URI or string and resolver software, the transfer > could start, and then hang... Forever. > > >> > > > >> > The client could retry, the client might not even need to resolve > the resource because of caching or having previously dereferenced the > resource. > > >> > > > >> > The resource could be compressed, have different transfer encoding, > etc... > > >> > > > >> > There are a lot of reasons that the original file might not be > resolvable to the signed hash, and a lot of ways to handle resolution which > could result in unreasonable amount of work for a client or a server. > > >> > > > >> > I don't see how any of that is this drafts business. > > >> > > > >> > I can see how certain application use cases might rely on > additional metadata in the protected header or unprotected header (such as > a counter signature, or time stamp token), before attempting to use the > verified payload and header parameters. > > >> > > > >> > You assert that a complete scenario requires downloading the > original content. > > >> > > > >> > I believe that's where we disagree. > > >> > > > >> > I could send you a hash envelope, you could already have the > content, you could compute the hash of it, and attach it as the payload, > and verify the signature. > > >> > > > >> > Every use of a hash envelope with a detached payload, starts with > you computing the hash, and then verifying the signature. > > >> > > > >> > This is also how you would use hash envelope to verify a signature > for some expected hardware/firmware measurement. > > >> > > > >> > You might have JWK/COSE Key... The hash is the cose key > thumbprint... The signature proves its still trusted. > > >> > > > >> > When I share with you a signature for a package URL, I'm not > requiring you to download it. > > >> > > > >> > Package URLs don't include a size parameter. > > >> > > > >> > When I share with you a signed container layer, I'm not requiring > you to download it, or even to necessarily rebuild and confirm the hash > matches. > > >> > > > >> > If I'm hashing a large file or local configuration or tripwire > system, and uploading hash envelopes, the server won't naturally have > access to the original files, which could be sensitive, or unsuitable for > transport over the network. > > >> > > > >> > The server doesn't care what the size of the file is, it won't ever > see the original content. > > >> > > > >> > This is the SCITT use case for creating transparency around local > LLM models, or other content that can't be transferred to the transparency > service. > > >> > > > >> > Perhaps your intended use case is similar to git lfs? > > >> > > > >> > ``` > > >> > $ git lfs pointer --file=path/to/file > > >> > Git LFS pointer for path/to/file: > > >> > > > >> > version https://git-lfs.github.com/spec/v1 > > >> > oid > sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393 > > >> > size 12345 > > >> > ``` > > >> > > > >> > If you know, that the size will always be present, you can sum over > every pointer to compute the total size of stored objects. > > >> > > > >> > Perhaps we should consider a version parameter as well? > > >> > > > >> > Where is the download use case you are describing actually coming > from? > > >> > > > >> > Given that location and content type are already optional, how can > you argue that lack of size makes the draft incomplete? > > >> > > > >> > I'm not sure if I'll be available to attend the session at IETF > 122, I'm hoping we can discuss the essential remaining blockers for this > draft on list. > > >> > > > >> > Regards, > > >> > > > >> > OS > > >> > > > >> > > > >> > On Thu, Mar 6, 2025, 1:40 PM Steve Lasker <[email protected]> > wrote: > > >> >> > > >> >> > What happens if the resolved file has the correct hash, but > incorrect file size? > > >> >> > Fail transfer as soon as incorrect size is received (as soon > as HTTP Content-Length is received, when file cuts off prematurely, or file > continues past where it should end). > > >> >> I'd agree fail is the expected outcome, if the actual length is > either too small or too big. > > >> >> > > >> >> > Is the file name important? > > >> >> I’d suggest it's not "important", and if it was, be incorporated > into the location value. Beyond location, I’d suggest it’s beyond the scope > of this draft. > > >> >> > > >> >> Scope > > >> >> I believe we’re all focused on keeping the scope minimal, engaging > other specs where needed, which can add other header values to an envelope. > > >> >> We’re not trying to make a package manager, rather provide the > minimal required properties to make hash-envelope useful. > > >> >> Since a complete scenario would involve downloading the content, > confirming the hash matches, and knowing the size/length would mitigate DOS > attacks, I do believe it meets the bar for scope, or the draft could be > argued as incomplete. > > >> >> > > >> >> 122 Meeting > > >> >> I requested time from the chairs to discuss in the COSE meeting on > Wednesday > > >> >> > > >> >> Steve > > >> >> > > >> >> > > >> >> -----Original Message----- > > >> >> From: Henk Birkholz <[email protected]> > > >> >> Sent: Wednesday, March 5, 2025 2:28 AM > > >> >> To: Orie Steele <[email protected]>; Carsten Bormann < > [email protected]> > > >> >> Cc: Steve Lasker <[email protected]>; Ilari Liusvaara < > [email protected]>; cose <[email protected]> > > >> >> Subject: Re: [COSE] Re: I-D Action: > draft-ietf-cose-hash-envelope-02.txt > > >> >> > > >> >> Hi Orie, > > >> >> > > >> >> in summary what I read is: > > >> >> > > >> >> * this is exiting but does not belong here > > >> >> * adding hash-env-sigs to existing systems > > >> >> > > >> >> The question is, how much "convenience information" belongs in an > "un-profiled" cose hash envelope for existing systems, right? > > >> >> > > >> >> Pointing to an instance of the pre-image is already an option. > > >> >> Indicating the intended size of the pre-image seems to be very > close, semantically. > > >> >> > > >> >> So the question is about scope creep vs. simplicity of resulting > RFC. Yes? > > >> >> > > >> >> We added the "length" (are we settled on the name? length? ...) > due to Ilari's feedback. If I want to convey metadata about a pre-image, > there is already RFC9393. In consequence, I am leaning very so slightly to > Orie's point and "keeping it simple". I acknowledge the intended use for > that size... sorry length value, but I am not sure that this is the right > place (aka the right I-D) to address it. > > >> >> > > >> >> Are there any other strong proponents for an optional pre-image > "length" > > >> >> header parameter? If not, maybe we can come to an in-room decision > at IETF 122 meeting and not include it. > > >> >> > > >> >> > > >> >> Viele Grüße, > > >> >> > > >> >> Henk > > >> >> > > >> >> On 05.03.25 03:49, Orie Steele wrote: > > >> >> > Hi, > > >> >> > > > >> >> > I'm hesitant to start considering file transfer in scope for > this draft. > > >> >> > > > >> >> > The original motivation was to create a simple standard syntax > for > > >> >> > signing hashes that are already used as identifiers, such as > sha256 of > > >> >> > spdx sbom, or container hashes... > > >> >> > Delivery and integration for these is already a solved problem. > > >> >> > > > >> >> > We now seem to be imagining using hash envelope as part of some > > >> >> > verifiable build system, that uses the optional location, > content type, > > >> >> > and a new file size parameter, to resolve large binaries from > small > > >> >> > signatures and verifiable metadata. > > >> >> > > > >> >> > That's exciting. > > >> >> > > > >> >> > I'd been imagining adding hash envelope signatures to existing > systems, > > >> >> > not using it to build new artifact repositories or package > management > > >> >> > systems. > > >> >> > > > >> >> > At a certain point, it's probably better to sign a corim > manifest (which > > >> >> > as you can see also includes hashes)... And let the manifest > carry the > > >> >> > information necessary to download data. > > >> >> > > > >> >> > That's all exciting stuff, but I prefer to not include it in > this draft. > > >> >> > > > >> >> > Simplicity is what makes successful standards. > > >> >> > > > >> >> > I'm not opposed to profiling hash envelope to build a package > manager, > > >> >> > especially one that works well in constrained environments, I > would just > > >> >> > prefer address those requirements in a dedicated document. > > >> >> > > > >> >> > Regards, > > >> >> > > > >> >> > OS > > >> >> > > > >> >> > On Tue, Mar 4, 2025, 10:55 AM Carsten Bormann <[email protected] > > >> >> > <mailto:[email protected]>> wrote: > > >> >> > > > >> >> > Hi Orie, > > >> >> > > > >> >> > > What happens if the resolved file has the correct hash, > but > > >> >> > incorrect file size? > > >> >> > > > >> >> > You invoke crypto agility and choose a better hash function > :-) > > >> >> > (I understand Ilari’s argument that being able to limit the > file > > >> >> > size before computing the hash can help mitigate DoS.) > > >> >> > > > >> >> > > I wonder if there is some CBOR related filesystem RFC > that could > > >> >> > provide the file size and other relevant metadata. > > >> >> > > > >> >> > file-entry = { > > >> >> > filesystem-item, > > >> >> > ? size => uint, > > >> >> > ? file-version => text, > > >> >> > ? hash => hash-entry, > > >> >> > * $$file-extension, > > >> >> > global-attributes, > > >> >> > } > > >> >> > > > >> >> > Not an RFC yet, but pretty advanced already: > > >> >> > > https://www.ietf.org/archive/id/draft-ietf-rats-corim-07.html#appendix-A-1 > < > https://www.ietf.org/archive/id/draft-ietf-rats-corim-07.html#appendix-A-1 > > > > >> >> > > > >> >> > Grüße, Carsten > > >> >> > > > >> >> > > > >> >> > _______________________________________________ > > >> >> > COSE mailing list -- [email protected] > > >> >> > To unsubscribe send an email to [email protected] > > >> > > > >> > _______________________________________________ > > >> > COSE mailing list -- [email protected] > > >> > To unsubscribe send an email to [email protected] >
_______________________________________________ COSE mailing list -- [email protected] To unsubscribe send an email to [email protected]
