And actually, replying to myself

> If you start with the data you always have a size. If you start with
> metadata, paraphrasing some larger piece of data you do not.

For the use case exploration I discuss above, I can get behind a
collective view which says "In that case, put the size in the
metadata". Possibly that is the answer here.

On Fri, 7 Mar 2025 at 15:55, Robin Bryce <[email protected]> wrote:
>
> > I'm trying to understand the real problem you are trying to solve, and I'm 
> > rejecting the blind assertion that size is needed.
>
> Oh I do appreciate that!
>
> >I shared a concrete use case for it where location is not present, show me 
> >one where size and location are required.
>
> First, I have to acknowledge that the uses I have in mind can't be
> satisfied by this id in isolation, and perhaps that makes my
> interjection "not helpful".
>
> At the company I work for, pretty much every significant use case we
> have encountered involves making statements about data.
> We actively don't want to see the data - some customers could not use
> us if we required that - we only care about the metadata.
> The metadata may be summarizing, commenting on, flagging status for,
> some action that has been taken in consideration of the "data".
>
> Consumers of the metadata are then typically starting from the
> metadata and they are often not going to be the original recipient of
> that data.
> In many situations, the original data need not be fetched by consumers
> at all, in order for automated process to act responsibly. "Has the
> right party attested to the metadata" is sufficient.
>
> But when things "go wrong" homework needs to be checked and checking
> that the metadata is indeed supported by the data is required. And the
> checker may only have the metadata to go on.
>
> I do appreciate the id we are discussing has no avenue for attaching
> metadata and I'm not suggesting it should.
> But as a primitive for systems that are "metadata" oriented, location
> is the *reason* this id is interesting.
>
> If you start with the data you always have a size. If you start with
> metadata, paraphrasing some larger piece of data you do not.
>
> I totally see you can make the connection, open the file, whatever,
> and read bytes until it is exhausted, relying on the protocol to tell
> you when to stop adding bytes to the hash.
> As an implementer, I see that as an invitation to dos my service at
> worst, and at best to introduce un-necessarily unbounded operations.
>
> You asked about use cases, in addition to LLM models, and model cards,
> two off the cuff examples would be VCons (was assent granted in the
> call) and medically significant decisions based on patient data.
>
> Granted this id is *not* about metadata, but we would certainly see it
> as useful way to deal with the data that is the subject of metadata
>
> > How come x5u doesn't have a length?
>
> No idea :-)
>
> > Can you point me to some existing RFCs to read that include the length and 
> > that function the way you imagine?
>
> I can't but I'm going to spend some time looking on the expectation
> that I will learn something!
>
> > How come every URL in a web page doesn't include the length of the file 
> > that will be dereferenced?
>
> I would guess that in the context of an interactive application there
> are sensible timeouts and the user can hit cancel at any time. For
> lights out operations this just isn't reasonable (imo).
>
> > Also note that git lfs does not include the location, or content type.
>
> Granted, that is interesting.
>
> > Please also consider my comment about version, should version be added to 
> > the spec?
>
> I'm not sure I have a useful opinion on this. I think that the id
> would be useful without it and that introducing it, along with
> location (what does it mean when the version associated with the same
> location changes ?) would make things quite a bit harder to reason
> about.
>
> > Please show me evidence that size is needed.
>
> I don't know that I can meet that bar, but
>
> > I'm trying to understand the real problem you are trying to solve
>
> I've done my best for this, at least from the PoV that makes this ID
> interesting to me.
>
> I apologize for the company specific link, but it does speak
> specifically to the use case question and the problems we are trying
> to solve: 
> https://docs.datatrails.ai/platform/overview/advanced-concepts/#attachments--content-integrity-protection
>
> Cheers,
>
> Robin
>
> On Fri, 7 Mar 2025 at 14:44, Orie Steele <[email protected]> wrote:
> >
> > Hi Robin,
> >
> > Let's get grounded in real use cases.
> >
> > On Fri, Mar 7, 2025, 6:03 AM Robin Bryce <[email protected]> wrote:
> >>
> >> Hi Orie,
> >>
> >> > The client could retry, the client might not even need to resolve the 
> >> > resource because of caching or having previously dereferenced the 
> >> > resource.
> >> > The resource could be compressed, have different transfer encoding, etc..
> >>
> >> In the case where location is supplied the *reason* for that is (in
> >> the uses I'd anticipate) that the verifier is expected to fetch and
> >> hash the content from that location as part of verification.
> >
> >
> > So you start with an envelope that has no payload, download a large file, 
> > hash the file in chunks, place the hash in the payload, and very the 
> > signature.
> >
> > And the use cases for this are what exactly?
> >
> > Without specifics I'm left imagining you are installing a large signed 
> > binary from a small signature, is that the use case?
> >
> > Is this for signing LLM models distributed through bit torrent, or some 
> > protocol that doesn't work when you don't know the size?
> >
> > Which protocols require size to start the download?
> >
> >> Especially because of the factors you mention impacting
> >> Content-Length, I don't see how that verifier could know how many
> >> bytes to fetch ? What am I missing here ?
> >
> >
> > The transfer protocol.
> > Last I checked, HTTP doesn't ask you how many bytes to download before 
> > giving up, though I will admit I'm not an expert on every protocol that 
> > might be used to resolve a file... Which is part of why I don't want to 
> > comment on this in the draft.
> >
> > Do all popular file transfer protocols require a separate byte size 
> > parameter to use?
> >
> >>
> >> > Given that location and content type are already optional, how can you 
> >> > argue that lack of size makes the draft incomplete?
> >>
> >> Ok, the location being optional, and the size being mandatory does not
> >> make sense to me either. size being mandatory when location is
> >> supplied would make sense to me
> >
> >
> > How come x5u doesn't have a length?
> >
> > Can you point me to some existing RFCs to read that include the length and 
> > that function the way you imagine?
> >
> > How come every URL in a web page doesn't include the length of the file 
> > that will be dereferenced?
> >
> > Which cases outside of git lfs use the size and the hash as a point for the 
> > content?
> >
> > Also note that git lfs does not include the location, or content type.
> >
> > Please also consider my comment about version, should version be added to 
> > the spec?
> >
> > I'm not trying to be difficult.
> >
> > I'm trying to understand the real problem you are trying to solve, and I'm 
> > rejecting the blind assertion that size is needed.
> >
> > Please show me evidence that size is needed.
> >
> > I shared a concrete use case for it where location is not present, show me 
> > one where size and location are required.
> >
> >>
> >>
> >> On Fri, 7 Mar 2025 at 04:09, Orie Steele <[email protected]> wrote:
> >> >
> >> > Hi Steve,
> >> >
> >> > So just to be clear, the denial of service (DOS) attack is caused by 
> >> > attempting to resolve a file or byte stream of arbitrary length?
> >> >
> >> > And the solution is to believe the issuer has accurately reflected the 
> >> > length of the pre-image and the location and content type hints?
> >> >
> >> > Does this consideration apply to any other URIs present in the header? 
> >> > Such as x5u?
> >> >
> >> > Depending on the URI or string and resolver software, the transfer could 
> >> > start, and then hang... Forever.
> >> >
> >> > The client could retry, the client might not even need to resolve the 
> >> > resource because of caching or having previously dereferenced the 
> >> > resource.
> >> >
> >> > The resource could be compressed, have different transfer encoding, 
> >> > etc...
> >> >
> >> > There are a lot of reasons that the original file might not be 
> >> > resolvable to the signed hash, and a lot of ways to handle resolution 
> >> > which could result in unreasonable amount of work for a client or a 
> >> > server.
> >> >
> >> > I don't see how any of that is this drafts business.
> >> >
> >> > I can see how certain application use cases might rely on additional 
> >> > metadata in the protected header or unprotected header (such as a 
> >> > counter signature, or time stamp token), before attempting to use the 
> >> > verified payload and header parameters.
> >> >
> >> > You assert that a complete scenario requires downloading the original 
> >> > content.
> >> >
> >> > I believe that's where we disagree.
> >> >
> >> > I could send you a hash envelope, you could already have the content, 
> >> > you could compute the hash of it, and attach it as the payload, and 
> >> > verify the signature.
> >> >
> >> > Every use of a hash envelope with a detached payload, starts with you 
> >> > computing the hash, and then verifying the signature.
> >> >
> >> > This is also how you would use hash envelope to verify a signature for 
> >> > some expected hardware/firmware measurement.
> >> >
> >> > You might have JWK/COSE Key... The hash is the cose key thumbprint... 
> >> > The signature proves its still trusted.
> >> >
> >> > When I share with you a signature for a package URL, I'm not requiring 
> >> > you to download it.
> >> >
> >> > Package URLs don't include a size parameter.
> >> >
> >> > When I share with you a signed container layer, I'm not requiring you to 
> >> > download it, or even to necessarily rebuild and confirm the hash matches.
> >> >
> >> > If I'm hashing a large file or local configuration or tripwire system, 
> >> > and uploading hash envelopes, the server won't naturally have access to 
> >> > the original files, which could be sensitive, or unsuitable for 
> >> > transport over the network.
> >> >
> >> > The server doesn't care what the size of the file is, it won't ever see 
> >> > the original content.
> >> >
> >> > This is the SCITT use case for creating transparency around local LLM 
> >> > models, or other content that can't be transferred to the transparency 
> >> > service.
> >> >
> >> > Perhaps your intended use case is similar to git lfs?
> >> >
> >> > ```
> >> > $ git lfs pointer --file=path/to/file
> >> > Git LFS pointer for path/to/file:
> >> >
> >> > version https://git-lfs.github.com/spec/v1
> >> > oid 
> >> > sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
> >> > size 12345
> >> > ```
> >> >
> >> > If you know, that the size will always be present, you can sum over 
> >> > every pointer to compute the total size of stored objects.
> >> >
> >> > Perhaps we should consider a version parameter as well?
> >> >
> >> > Where is the download use case you are describing actually coming from?
> >> >
> >> > Given that location and content type are already optional, how can you 
> >> > argue that lack of size makes the draft incomplete?
> >> >
> >> > I'm not sure if I'll be available to attend the session at IETF 122, I'm 
> >> > hoping we can discuss the essential remaining blockers for this draft on 
> >> > list.
> >> >
> >> > Regards,
> >> >
> >> > OS
> >> >
> >> >
> >> > On Thu, Mar 6, 2025, 1:40 PM Steve Lasker <[email protected]> 
> >> > wrote:
> >> >>
> >> >> > What happens if the resolved file has the correct hash, but incorrect 
> >> >> > file size?
> >> >>   > Fail transfer as soon as incorrect size is received (as soon as 
> >> >> HTTP Content-Length is received, when file cuts off prematurely, or 
> >> >> file continues past where it should end).
> >> >> I'd agree fail is the expected outcome, if the actual length is either 
> >> >> too small or too big.
> >> >>
> >> >> > Is the file name important?
> >> >> I’d suggest it's not "important", and if it was, be incorporated into 
> >> >> the location value. Beyond location, I’d suggest it’s beyond the scope 
> >> >> of this draft.
> >> >>
> >> >> Scope
> >> >> I believe we’re all focused on keeping the scope minimal, engaging 
> >> >> other specs where needed, which can add other header values to an 
> >> >> envelope.
> >> >> We’re not trying to make a package manager, rather provide the minimal 
> >> >> required properties to make hash-envelope useful.
> >> >> Since a complete scenario would involve downloading the content, 
> >> >> confirming the hash matches, and knowing the size/length would mitigate 
> >> >> DOS attacks, I do believe it meets the bar for scope, or the draft 
> >> >> could be argued as incomplete.
> >> >>
> >> >> 122 Meeting
> >> >> I requested time from the chairs to discuss in the COSE meeting on 
> >> >> Wednesday
> >> >>
> >> >> Steve
> >> >>
> >> >>
> >> >> -----Original Message-----
> >> >> From: Henk Birkholz <[email protected]>
> >> >> Sent: Wednesday, March 5, 2025 2:28 AM
> >> >> To: Orie Steele <[email protected]>; Carsten Bormann 
> >> >> <[email protected]>
> >> >> Cc: Steve Lasker <[email protected]>; Ilari Liusvaara 
> >> >> <[email protected]>; cose <[email protected]>
> >> >> Subject: Re: [COSE] Re: I-D Action: draft-ietf-cose-hash-envelope-02.txt
> >> >>
> >> >> Hi Orie,
> >> >>
> >> >> in summary what I read is:
> >> >>
> >> >> * this is exiting but does not belong here
> >> >> * adding hash-env-sigs to existing systems
> >> >>
> >> >> The question is, how much "convenience information" belongs in an 
> >> >> "un-profiled" cose hash envelope for existing systems, right?
> >> >>
> >> >> Pointing to an instance of the pre-image is already an option.
> >> >> Indicating the intended size of the pre-image seems to be very close, 
> >> >> semantically.
> >> >>
> >> >> So the question is about scope creep vs. simplicity of resulting RFC. 
> >> >> Yes?
> >> >>
> >> >> We added the "length" (are we settled on the name? length? ...) due to 
> >> >> Ilari's feedback. If I want to convey metadata about a pre-image, there 
> >> >> is already RFC9393. In consequence, I am leaning very so slightly to 
> >> >> Orie's point and "keeping it simple". I acknowledge the intended use 
> >> >> for that size... sorry length value, but I am not sure that this is the 
> >> >> right place (aka the right I-D) to address it.
> >> >>
> >> >> Are there any other strong proponents for an optional pre-image "length"
> >> >> header parameter? If not, maybe we can come to an in-room decision at 
> >> >> IETF 122 meeting and not include it.
> >> >>
> >> >>
> >> >> Viele Grüße,
> >> >>
> >> >> Henk
> >> >>
> >> >> On 05.03.25 03:49, Orie Steele wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I'm hesitant to start considering file transfer in scope for this 
> >> >> > draft.
> >> >> >
> >> >> > The original motivation was to create a simple standard syntax for
> >> >> > signing hashes that are already used as identifiers, such as sha256 of
> >> >> > spdx sbom, or container hashes...
> >> >> > Delivery and integration for these is already a solved problem.
> >> >> >
> >> >> > We now seem to be imagining using hash envelope as part of some
> >> >> > verifiable build system, that uses the optional location, content 
> >> >> > type,
> >> >> > and a new file size parameter, to resolve large binaries from small
> >> >> > signatures and verifiable metadata.
> >> >> >
> >> >> > That's exciting.
> >> >> >
> >> >> > I'd been imagining adding hash envelope signatures to existing 
> >> >> > systems,
> >> >> > not using it to build new artifact repositories or package management
> >> >> > systems.
> >> >> >
> >> >> > At a certain point, it's probably better to sign a corim manifest 
> >> >> > (which
> >> >> > as you can see also includes hashes)... And let the manifest carry the
> >> >> > information necessary to download data.
> >> >> >
> >> >> > That's all exciting stuff, but I prefer to not include it in this 
> >> >> > draft.
> >> >> >
> >> >> > Simplicity is what makes successful standards.
> >> >> >
> >> >> > I'm not opposed to profiling hash envelope to build a package manager,
> >> >> > especially one that works well in constrained environments, I would 
> >> >> > just
> >> >> > prefer address those requirements in a dedicated document.
> >> >> >
> >> >> > Regards,
> >> >> >
> >> >> > OS
> >> >> >
> >> >> > On Tue, Mar 4, 2025, 10:55 AM Carsten Bormann <[email protected]
> >> >> > <mailto:[email protected]>> wrote:
> >> >> >
> >> >> >     Hi Orie,
> >> >> >
> >> >> >      > What happens if the resolved file has the correct hash, but
> >> >> >     incorrect file size?
> >> >> >
> >> >> >     You invoke crypto agility and choose a better hash function :-)
> >> >> >     (I understand Ilari’s argument that being able to limit the file
> >> >> >     size before computing the hash can help mitigate DoS.)
> >> >> >
> >> >> >      > I wonder if there is some CBOR related filesystem RFC that 
> >> >> > could
> >> >> >     provide the file size and other relevant metadata.
> >> >> >
> >> >> >         file-entry = {
> >> >> >           filesystem-item,
> >> >> >           ? size => uint,
> >> >> >           ? file-version => text,
> >> >> >           ? hash => hash-entry,
> >> >> >           * $$file-extension,
> >> >> >           global-attributes,
> >> >> >         }
> >> >> >
> >> >> >     Not an RFC yet, but pretty advanced already:
> >> >> >     
> >> >> > https://www.ietf.org/archive/id/draft-ietf-rats-corim-07.html#appendix-A-1
> >> >> >  
> >> >> > <https://www.ietf.org/archive/id/draft-ietf-rats-corim-07.html#appendix-A-1>
> >> >> >
> >> >> >     Grüße, Carsten
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > COSE mailing list -- [email protected]
> >> >> > To unsubscribe send an email to [email protected]
> >> >
> >> > _______________________________________________
> >> > COSE mailing list -- [email protected]
> >> > To unsubscribe send an email to [email protected]

_______________________________________________
COSE mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to