And actually, replying to myself > If you start with the data you always have a size. If you start with > metadata, paraphrasing some larger piece of data you do not.
For the use case exploration I discuss above, I can get behind a collective view which says "In that case, put the size in the metadata". Possibly that is the answer here. On Fri, 7 Mar 2025 at 15:55, Robin Bryce <[email protected]> wrote: > > > I'm trying to understand the real problem you are trying to solve, and I'm > > rejecting the blind assertion that size is needed. > > Oh I do appreciate that! > > >I shared a concrete use case for it where location is not present, show me > >one where size and location are required. > > First, I have to acknowledge that the uses I have in mind can't be > satisfied by this id in isolation, and perhaps that makes my > interjection "not helpful". > > At the company I work for, pretty much every significant use case we > have encountered involves making statements about data. > We actively don't want to see the data - some customers could not use > us if we required that - we only care about the metadata. > The metadata may be summarizing, commenting on, flagging status for, > some action that has been taken in consideration of the "data". > > Consumers of the metadata are then typically starting from the > metadata and they are often not going to be the original recipient of > that data. > In many situations, the original data need not be fetched by consumers > at all, in order for automated process to act responsibly. "Has the > right party attested to the metadata" is sufficient. > > But when things "go wrong" homework needs to be checked and checking > that the metadata is indeed supported by the data is required. And the > checker may only have the metadata to go on. > > I do appreciate the id we are discussing has no avenue for attaching > metadata and I'm not suggesting it should. > But as a primitive for systems that are "metadata" oriented, location > is the *reason* this id is interesting. > > If you start with the data you always have a size. If you start with > metadata, paraphrasing some larger piece of data you do not. > > I totally see you can make the connection, open the file, whatever, > and read bytes until it is exhausted, relying on the protocol to tell > you when to stop adding bytes to the hash. > As an implementer, I see that as an invitation to dos my service at > worst, and at best to introduce un-necessarily unbounded operations. > > You asked about use cases, in addition to LLM models, and model cards, > two off the cuff examples would be VCons (was assent granted in the > call) and medically significant decisions based on patient data. > > Granted this id is *not* about metadata, but we would certainly see it > as useful way to deal with the data that is the subject of metadata > > > How come x5u doesn't have a length? > > No idea :-) > > > Can you point me to some existing RFCs to read that include the length and > > that function the way you imagine? > > I can't but I'm going to spend some time looking on the expectation > that I will learn something! > > > How come every URL in a web page doesn't include the length of the file > > that will be dereferenced? > > I would guess that in the context of an interactive application there > are sensible timeouts and the user can hit cancel at any time. For > lights out operations this just isn't reasonable (imo). > > > Also note that git lfs does not include the location, or content type. > > Granted, that is interesting. > > > Please also consider my comment about version, should version be added to > > the spec? > > I'm not sure I have a useful opinion on this. I think that the id > would be useful without it and that introducing it, along with > location (what does it mean when the version associated with the same > location changes ?) would make things quite a bit harder to reason > about. > > > Please show me evidence that size is needed. > > I don't know that I can meet that bar, but > > > I'm trying to understand the real problem you are trying to solve > > I've done my best for this, at least from the PoV that makes this ID > interesting to me. > > I apologize for the company specific link, but it does speak > specifically to the use case question and the problems we are trying > to solve: > https://docs.datatrails.ai/platform/overview/advanced-concepts/#attachments--content-integrity-protection > > Cheers, > > Robin > > On Fri, 7 Mar 2025 at 14:44, Orie Steele <[email protected]> wrote: > > > > Hi Robin, > > > > Let's get grounded in real use cases. > > > > On Fri, Mar 7, 2025, 6:03 AM Robin Bryce <[email protected]> wrote: > >> > >> Hi Orie, > >> > >> > The client could retry, the client might not even need to resolve the > >> > resource because of caching or having previously dereferenced the > >> > resource. > >> > The resource could be compressed, have different transfer encoding, etc.. > >> > >> In the case where location is supplied the *reason* for that is (in > >> the uses I'd anticipate) that the verifier is expected to fetch and > >> hash the content from that location as part of verification. > > > > > > So you start with an envelope that has no payload, download a large file, > > hash the file in chunks, place the hash in the payload, and very the > > signature. > > > > And the use cases for this are what exactly? > > > > Without specifics I'm left imagining you are installing a large signed > > binary from a small signature, is that the use case? > > > > Is this for signing LLM models distributed through bit torrent, or some > > protocol that doesn't work when you don't know the size? > > > > Which protocols require size to start the download? > > > >> Especially because of the factors you mention impacting > >> Content-Length, I don't see how that verifier could know how many > >> bytes to fetch ? What am I missing here ? > > > > > > The transfer protocol. > > Last I checked, HTTP doesn't ask you how many bytes to download before > > giving up, though I will admit I'm not an expert on every protocol that > > might be used to resolve a file... Which is part of why I don't want to > > comment on this in the draft. > > > > Do all popular file transfer protocols require a separate byte size > > parameter to use? > > > >> > >> > Given that location and content type are already optional, how can you > >> > argue that lack of size makes the draft incomplete? > >> > >> Ok, the location being optional, and the size being mandatory does not > >> make sense to me either. size being mandatory when location is > >> supplied would make sense to me > > > > > > How come x5u doesn't have a length? > > > > Can you point me to some existing RFCs to read that include the length and > > that function the way you imagine? > > > > How come every URL in a web page doesn't include the length of the file > > that will be dereferenced? > > > > Which cases outside of git lfs use the size and the hash as a point for the > > content? > > > > Also note that git lfs does not include the location, or content type. > > > > Please also consider my comment about version, should version be added to > > the spec? > > > > I'm not trying to be difficult. > > > > I'm trying to understand the real problem you are trying to solve, and I'm > > rejecting the blind assertion that size is needed. > > > > Please show me evidence that size is needed. > > > > I shared a concrete use case for it where location is not present, show me > > one where size and location are required. > > > >> > >> > >> On Fri, 7 Mar 2025 at 04:09, Orie Steele <[email protected]> wrote: > >> > > >> > Hi Steve, > >> > > >> > So just to be clear, the denial of service (DOS) attack is caused by > >> > attempting to resolve a file or byte stream of arbitrary length? > >> > > >> > And the solution is to believe the issuer has accurately reflected the > >> > length of the pre-image and the location and content type hints? > >> > > >> > Does this consideration apply to any other URIs present in the header? > >> > Such as x5u? > >> > > >> > Depending on the URI or string and resolver software, the transfer could > >> > start, and then hang... Forever. > >> > > >> > The client could retry, the client might not even need to resolve the > >> > resource because of caching or having previously dereferenced the > >> > resource. > >> > > >> > The resource could be compressed, have different transfer encoding, > >> > etc... > >> > > >> > There are a lot of reasons that the original file might not be > >> > resolvable to the signed hash, and a lot of ways to handle resolution > >> > which could result in unreasonable amount of work for a client or a > >> > server. > >> > > >> > I don't see how any of that is this drafts business. > >> > > >> > I can see how certain application use cases might rely on additional > >> > metadata in the protected header or unprotected header (such as a > >> > counter signature, or time stamp token), before attempting to use the > >> > verified payload and header parameters. > >> > > >> > You assert that a complete scenario requires downloading the original > >> > content. > >> > > >> > I believe that's where we disagree. > >> > > >> > I could send you a hash envelope, you could already have the content, > >> > you could compute the hash of it, and attach it as the payload, and > >> > verify the signature. > >> > > >> > Every use of a hash envelope with a detached payload, starts with you > >> > computing the hash, and then verifying the signature. > >> > > >> > This is also how you would use hash envelope to verify a signature for > >> > some expected hardware/firmware measurement. > >> > > >> > You might have JWK/COSE Key... The hash is the cose key thumbprint... > >> > The signature proves its still trusted. > >> > > >> > When I share with you a signature for a package URL, I'm not requiring > >> > you to download it. > >> > > >> > Package URLs don't include a size parameter. > >> > > >> > When I share with you a signed container layer, I'm not requiring you to > >> > download it, or even to necessarily rebuild and confirm the hash matches. > >> > > >> > If I'm hashing a large file or local configuration or tripwire system, > >> > and uploading hash envelopes, the server won't naturally have access to > >> > the original files, which could be sensitive, or unsuitable for > >> > transport over the network. > >> > > >> > The server doesn't care what the size of the file is, it won't ever see > >> > the original content. > >> > > >> > This is the SCITT use case for creating transparency around local LLM > >> > models, or other content that can't be transferred to the transparency > >> > service. > >> > > >> > Perhaps your intended use case is similar to git lfs? > >> > > >> > ``` > >> > $ git lfs pointer --file=path/to/file > >> > Git LFS pointer for path/to/file: > >> > > >> > version https://git-lfs.github.com/spec/v1 > >> > oid > >> > sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393 > >> > size 12345 > >> > ``` > >> > > >> > If you know, that the size will always be present, you can sum over > >> > every pointer to compute the total size of stored objects. > >> > > >> > Perhaps we should consider a version parameter as well? > >> > > >> > Where is the download use case you are describing actually coming from? > >> > > >> > Given that location and content type are already optional, how can you > >> > argue that lack of size makes the draft incomplete? > >> > > >> > I'm not sure if I'll be available to attend the session at IETF 122, I'm > >> > hoping we can discuss the essential remaining blockers for this draft on > >> > list. > >> > > >> > Regards, > >> > > >> > OS > >> > > >> > > >> > On Thu, Mar 6, 2025, 1:40 PM Steve Lasker <[email protected]> > >> > wrote: > >> >> > >> >> > What happens if the resolved file has the correct hash, but incorrect > >> >> > file size? > >> >> > Fail transfer as soon as incorrect size is received (as soon as > >> >> HTTP Content-Length is received, when file cuts off prematurely, or > >> >> file continues past where it should end). > >> >> I'd agree fail is the expected outcome, if the actual length is either > >> >> too small or too big. > >> >> > >> >> > Is the file name important? > >> >> I’d suggest it's not "important", and if it was, be incorporated into > >> >> the location value. Beyond location, I’d suggest it’s beyond the scope > >> >> of this draft. > >> >> > >> >> Scope > >> >> I believe we’re all focused on keeping the scope minimal, engaging > >> >> other specs where needed, which can add other header values to an > >> >> envelope. > >> >> We’re not trying to make a package manager, rather provide the minimal > >> >> required properties to make hash-envelope useful. > >> >> Since a complete scenario would involve downloading the content, > >> >> confirming the hash matches, and knowing the size/length would mitigate > >> >> DOS attacks, I do believe it meets the bar for scope, or the draft > >> >> could be argued as incomplete. > >> >> > >> >> 122 Meeting > >> >> I requested time from the chairs to discuss in the COSE meeting on > >> >> Wednesday > >> >> > >> >> Steve > >> >> > >> >> > >> >> -----Original Message----- > >> >> From: Henk Birkholz <[email protected]> > >> >> Sent: Wednesday, March 5, 2025 2:28 AM > >> >> To: Orie Steele <[email protected]>; Carsten Bormann > >> >> <[email protected]> > >> >> Cc: Steve Lasker <[email protected]>; Ilari Liusvaara > >> >> <[email protected]>; cose <[email protected]> > >> >> Subject: Re: [COSE] Re: I-D Action: draft-ietf-cose-hash-envelope-02.txt > >> >> > >> >> Hi Orie, > >> >> > >> >> in summary what I read is: > >> >> > >> >> * this is exiting but does not belong here > >> >> * adding hash-env-sigs to existing systems > >> >> > >> >> The question is, how much "convenience information" belongs in an > >> >> "un-profiled" cose hash envelope for existing systems, right? > >> >> > >> >> Pointing to an instance of the pre-image is already an option. > >> >> Indicating the intended size of the pre-image seems to be very close, > >> >> semantically. > >> >> > >> >> So the question is about scope creep vs. simplicity of resulting RFC. > >> >> Yes? > >> >> > >> >> We added the "length" (are we settled on the name? length? ...) due to > >> >> Ilari's feedback. If I want to convey metadata about a pre-image, there > >> >> is already RFC9393. In consequence, I am leaning very so slightly to > >> >> Orie's point and "keeping it simple". I acknowledge the intended use > >> >> for that size... sorry length value, but I am not sure that this is the > >> >> right place (aka the right I-D) to address it. > >> >> > >> >> Are there any other strong proponents for an optional pre-image "length" > >> >> header parameter? If not, maybe we can come to an in-room decision at > >> >> IETF 122 meeting and not include it. > >> >> > >> >> > >> >> Viele Grüße, > >> >> > >> >> Henk > >> >> > >> >> On 05.03.25 03:49, Orie Steele wrote: > >> >> > Hi, > >> >> > > >> >> > I'm hesitant to start considering file transfer in scope for this > >> >> > draft. > >> >> > > >> >> > The original motivation was to create a simple standard syntax for > >> >> > signing hashes that are already used as identifiers, such as sha256 of > >> >> > spdx sbom, or container hashes... > >> >> > Delivery and integration for these is already a solved problem. > >> >> > > >> >> > We now seem to be imagining using hash envelope as part of some > >> >> > verifiable build system, that uses the optional location, content > >> >> > type, > >> >> > and a new file size parameter, to resolve large binaries from small > >> >> > signatures and verifiable metadata. > >> >> > > >> >> > That's exciting. > >> >> > > >> >> > I'd been imagining adding hash envelope signatures to existing > >> >> > systems, > >> >> > not using it to build new artifact repositories or package management > >> >> > systems. > >> >> > > >> >> > At a certain point, it's probably better to sign a corim manifest > >> >> > (which > >> >> > as you can see also includes hashes)... And let the manifest carry the > >> >> > information necessary to download data. > >> >> > > >> >> > That's all exciting stuff, but I prefer to not include it in this > >> >> > draft. > >> >> > > >> >> > Simplicity is what makes successful standards. > >> >> > > >> >> > I'm not opposed to profiling hash envelope to build a package manager, > >> >> > especially one that works well in constrained environments, I would > >> >> > just > >> >> > prefer address those requirements in a dedicated document. > >> >> > > >> >> > Regards, > >> >> > > >> >> > OS > >> >> > > >> >> > On Tue, Mar 4, 2025, 10:55 AM Carsten Bormann <[email protected] > >> >> > <mailto:[email protected]>> wrote: > >> >> > > >> >> > Hi Orie, > >> >> > > >> >> > > What happens if the resolved file has the correct hash, but > >> >> > incorrect file size? > >> >> > > >> >> > You invoke crypto agility and choose a better hash function :-) > >> >> > (I understand Ilari’s argument that being able to limit the file > >> >> > size before computing the hash can help mitigate DoS.) > >> >> > > >> >> > > I wonder if there is some CBOR related filesystem RFC that > >> >> > could > >> >> > provide the file size and other relevant metadata. > >> >> > > >> >> > file-entry = { > >> >> > filesystem-item, > >> >> > ? size => uint, > >> >> > ? file-version => text, > >> >> > ? hash => hash-entry, > >> >> > * $$file-extension, > >> >> > global-attributes, > >> >> > } > >> >> > > >> >> > Not an RFC yet, but pretty advanced already: > >> >> > > >> >> > https://www.ietf.org/archive/id/draft-ietf-rats-corim-07.html#appendix-A-1 > >> >> > > >> >> > <https://www.ietf.org/archive/id/draft-ietf-rats-corim-07.html#appendix-A-1> > >> >> > > >> >> > Grüße, Carsten > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > COSE mailing list -- [email protected] > >> >> > To unsubscribe send an email to [email protected] > >> > > >> > _______________________________________________ > >> > COSE mailing list -- [email protected] > >> > To unsubscribe send an email to [email protected] _______________________________________________ COSE mailing list -- [email protected] To unsubscribe send an email to [email protected]
