Depending on the protocol you are using for work, it's possible this is
better solved in different ways.

We would need a concrete protocol to say more, not just a location hint.

I could imagine placing a lot of protocol specific metadata in the header
and perhaps not even using the location and content type hints... And for
some use cases that might be a much better solution.

Especially if you wanted to index on protected protocol specific metadata
consistently.

On Fri, Mar 7, 2025, 9:07 AM Robin Bryce <[email protected]> wrote:

> And actually, replying to myself
>
> > If you start with the data you always have a size. If you start with
> > metadata, paraphrasing some larger piece of data you do not.
>
> For the use case exploration I discuss above, I can get behind a
> collective view which says "In that case, put the size in the
> metadata". Possibly that is the answer here.
>
> On Fri, 7 Mar 2025 at 15:55, Robin Bryce <[email protected]> wrote:
> >
> > > I'm trying to understand the real problem you are trying to solve, and
> I'm rejecting the blind assertion that size is needed.
> >
> > Oh I do appreciate that!
> >
> > >I shared a concrete use case for it where location is not present, show
> me one where size and location are required.
> >
> > First, I have to acknowledge that the uses I have in mind can't be
> > satisfied by this id in isolation, and perhaps that makes my
> > interjection "not helpful".
> >
> > At the company I work for, pretty much every significant use case we
> > have encountered involves making statements about data.
> > We actively don't want to see the data - some customers could not use
> > us if we required that - we only care about the metadata.
> > The metadata may be summarizing, commenting on, flagging status for,
> > some action that has been taken in consideration of the "data".
> >
> > Consumers of the metadata are then typically starting from the
> > metadata and they are often not going to be the original recipient of
> > that data.
> > In many situations, the original data need not be fetched by consumers
> > at all, in order for automated process to act responsibly. "Has the
> > right party attested to the metadata" is sufficient.
> >
> > But when things "go wrong" homework needs to be checked and checking
> > that the metadata is indeed supported by the data is required. And the
> > checker may only have the metadata to go on.
> >
> > I do appreciate the id we are discussing has no avenue for attaching
> > metadata and I'm not suggesting it should.
> > But as a primitive for systems that are "metadata" oriented, location
> > is the *reason* this id is interesting.
> >
> > If you start with the data you always have a size. If you start with
> > metadata, paraphrasing some larger piece of data you do not.
> >
> > I totally see you can make the connection, open the file, whatever,
> > and read bytes until it is exhausted, relying on the protocol to tell
> > you when to stop adding bytes to the hash.
> > As an implementer, I see that as an invitation to dos my service at
> > worst, and at best to introduce un-necessarily unbounded operations.
> >
> > You asked about use cases, in addition to LLM models, and model cards,
> > two off the cuff examples would be VCons (was assent granted in the
> > call) and medically significant decisions based on patient data.
> >
> > Granted this id is *not* about metadata, but we would certainly see it
> > as useful way to deal with the data that is the subject of metadata
> >
> > > How come x5u doesn't have a length?
> >
> > No idea :-)
> >
> > > Can you point me to some existing RFCs to read that include the length
> and that function the way you imagine?
> >
> > I can't but I'm going to spend some time looking on the expectation
> > that I will learn something!
> >
> > > How come every URL in a web page doesn't include the length of the
> file that will be dereferenced?
> >
> > I would guess that in the context of an interactive application there
> > are sensible timeouts and the user can hit cancel at any time. For
> > lights out operations this just isn't reasonable (imo).
> >
> > > Also note that git lfs does not include the location, or content type.
> >
> > Granted, that is interesting.
> >
> > > Please also consider my comment about version, should version be added
> to the spec?
> >
> > I'm not sure I have a useful opinion on this. I think that the id
> > would be useful without it and that introducing it, along with
> > location (what does it mean when the version associated with the same
> > location changes ?) would make things quite a bit harder to reason
> > about.
> >
> > > Please show me evidence that size is needed.
> >
> > I don't know that I can meet that bar, but
> >
> > > I'm trying to understand the real problem you are trying to solve
> >
> > I've done my best for this, at least from the PoV that makes this ID
> > interesting to me.
> >
> > I apologize for the company specific link, but it does speak
> > specifically to the use case question and the problems we are trying
> > to solve:
> https://docs.datatrails.ai/platform/overview/advanced-concepts/#attachments--content-integrity-protection
> >
> > Cheers,
> >
> > Robin
> >
> > On Fri, 7 Mar 2025 at 14:44, Orie Steele <[email protected]>
> wrote:
> > >
> > > Hi Robin,
> > >
> > > Let's get grounded in real use cases.
> > >
> > > On Fri, Mar 7, 2025, 6:03 AM Robin Bryce <[email protected]> wrote:
> > >>
> > >> Hi Orie,
> > >>
> > >> > The client could retry, the client might not even need to resolve
> the resource because of caching or having previously dereferenced the
> resource.
> > >> > The resource could be compressed, have different transfer encoding,
> etc..
> > >>
> > >> In the case where location is supplied the *reason* for that is (in
> > >> the uses I'd anticipate) that the verifier is expected to fetch and
> > >> hash the content from that location as part of verification.
> > >
> > >
> > > So you start with an envelope that has no payload, download a large
> file, hash the file in chunks, place the hash in the payload, and very the
> signature.
> > >
> > > And the use cases for this are what exactly?
> > >
> > > Without specifics I'm left imagining you are installing a large signed
> binary from a small signature, is that the use case?
> > >
> > > Is this for signing LLM models distributed through bit torrent, or
> some protocol that doesn't work when you don't know the size?
> > >
> > > Which protocols require size to start the download?
> > >
> > >> Especially because of the factors you mention impacting
> > >> Content-Length, I don't see how that verifier could know how many
> > >> bytes to fetch ? What am I missing here ?
> > >
> > >
> > > The transfer protocol.
> > > Last I checked, HTTP doesn't ask you how many bytes to download before
> giving up, though I will admit I'm not an expert on every protocol that
> might be used to resolve a file... Which is part of why I don't want to
> comment on this in the draft.
> > >
> > > Do all popular file transfer protocols require a separate byte size
> parameter to use?
> > >
> > >>
> > >> > Given that location and content type are already optional, how can
> you argue that lack of size makes the draft incomplete?
> > >>
> > >> Ok, the location being optional, and the size being mandatory does not
> > >> make sense to me either. size being mandatory when location is
> > >> supplied would make sense to me
> > >
> > >
> > > How come x5u doesn't have a length?
> > >
> > > Can you point me to some existing RFCs to read that include the length
> and that function the way you imagine?
> > >
> > > How come every URL in a web page doesn't include the length of the
> file that will be dereferenced?
> > >
> > > Which cases outside of git lfs use the size and the hash as a point
> for the content?
> > >
> > > Also note that git lfs does not include the location, or content type.
> > >
> > > Please also consider my comment about version, should version be added
> to the spec?
> > >
> > > I'm not trying to be difficult.
> > >
> > > I'm trying to understand the real problem you are trying to solve, and
> I'm rejecting the blind assertion that size is needed.
> > >
> > > Please show me evidence that size is needed.
> > >
> > > I shared a concrete use case for it where location is not present,
> show me one where size and location are required.
> > >
> > >>
> > >>
> > >> On Fri, 7 Mar 2025 at 04:09, Orie Steele <[email protected]>
> wrote:
> > >> >
> > >> > Hi Steve,
> > >> >
> > >> > So just to be clear, the denial of service (DOS) attack is caused
> by attempting to resolve a file or byte stream of arbitrary length?
> > >> >
> > >> > And the solution is to believe the issuer has accurately reflected
> the length of the pre-image and the location and content type hints?
> > >> >
> > >> > Does this consideration apply to any other URIs present in the
> header? Such as x5u?
> > >> >
> > >> > Depending on the URI or string and resolver software, the transfer
> could start, and then hang... Forever.
> > >> >
> > >> > The client could retry, the client might not even need to resolve
> the resource because of caching or having previously dereferenced the
> resource.
> > >> >
> > >> > The resource could be compressed, have different transfer encoding,
> etc...
> > >> >
> > >> > There are a lot of reasons that the original file might not be
> resolvable to the signed hash, and a lot of ways to handle resolution which
> could result in unreasonable amount of work for a client or a server.
> > >> >
> > >> > I don't see how any of that is this drafts business.
> > >> >
> > >> > I can see how certain application use cases might rely on
> additional metadata in the protected header or unprotected header (such as
> a counter signature, or time stamp token), before attempting to use the
> verified payload and header parameters.
> > >> >
> > >> > You assert that a complete scenario requires downloading the
> original content.
> > >> >
> > >> > I believe that's where we disagree.
> > >> >
> > >> > I could send you a hash envelope, you could already have the
> content, you could compute the hash of it, and attach it as the payload,
> and verify the signature.
> > >> >
> > >> > Every use of a hash envelope with a detached payload, starts with
> you computing the hash, and then verifying the signature.
> > >> >
> > >> > This is also how you would use hash envelope to verify a signature
> for some expected hardware/firmware measurement.
> > >> >
> > >> > You might have JWK/COSE Key... The hash is the cose key
> thumbprint... The signature proves its still trusted.
> > >> >
> > >> > When I share with you a signature for a package URL, I'm not
> requiring you to download it.
> > >> >
> > >> > Package URLs don't include a size parameter.
> > >> >
> > >> > When I share with you a signed container layer, I'm not requiring
> you to download it, or even to necessarily rebuild and confirm the hash
> matches.
> > >> >
> > >> > If I'm hashing a large file or local configuration or tripwire
> system, and uploading hash envelopes, the server won't naturally have
> access to the original files, which could be sensitive, or unsuitable for
> transport over the network.
> > >> >
> > >> > The server doesn't care what the size of the file is, it won't ever
> see the original content.
> > >> >
> > >> > This is the SCITT use case for creating transparency around local
> LLM models, or other content that can't be transferred to the transparency
> service.
> > >> >
> > >> > Perhaps your intended use case is similar to git lfs?
> > >> >
> > >> > ```
> > >> > $ git lfs pointer --file=path/to/file
> > >> > Git LFS pointer for path/to/file:
> > >> >
> > >> > version https://git-lfs.github.com/spec/v1
> > >> > oid
> sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
> > >> > size 12345
> > >> > ```
> > >> >
> > >> > If you know, that the size will always be present, you can sum over
> every pointer to compute the total size of stored objects.
> > >> >
> > >> > Perhaps we should consider a version parameter as well?
> > >> >
> > >> > Where is the download use case you are describing actually coming
> from?
> > >> >
> > >> > Given that location and content type are already optional, how can
> you argue that lack of size makes the draft incomplete?
> > >> >
> > >> > I'm not sure if I'll be available to attend the session at IETF
> 122, I'm hoping we can discuss the essential remaining blockers for this
> draft on list.
> > >> >
> > >> > Regards,
> > >> >
> > >> > OS
> > >> >
> > >> >
> > >> > On Thu, Mar 6, 2025, 1:40 PM Steve Lasker <[email protected]>
> wrote:
> > >> >>
> > >> >> > What happens if the resolved file has the correct hash, but
> incorrect file size?
> > >> >>   > Fail transfer as soon as incorrect size is received (as soon
> as HTTP Content-Length is received, when file cuts off prematurely, or file
> continues past where it should end).
> > >> >> I'd agree fail is the expected outcome, if the actual length is
> either too small or too big.
> > >> >>
> > >> >> > Is the file name important?
> > >> >> I’d suggest it's not "important", and if it was, be incorporated
> into the location value. Beyond location, I’d suggest it’s beyond the scope
> of this draft.
> > >> >>
> > >> >> Scope
> > >> >> I believe we’re all focused on keeping the scope minimal, engaging
> other specs where needed, which can add other header values to an envelope.
> > >> >> We’re not trying to make a package manager, rather provide the
> minimal required properties to make hash-envelope useful.
> > >> >> Since a complete scenario would involve downloading the content,
> confirming the hash matches, and knowing the size/length would mitigate DOS
> attacks, I do believe it meets the bar for scope, or the draft could be
> argued as incomplete.
> > >> >>
> > >> >> 122 Meeting
> > >> >> I requested time from the chairs to discuss in the COSE meeting on
> Wednesday
> > >> >>
> > >> >> Steve
> > >> >>
> > >> >>
> > >> >> -----Original Message-----
> > >> >> From: Henk Birkholz <[email protected]>
> > >> >> Sent: Wednesday, March 5, 2025 2:28 AM
> > >> >> To: Orie Steele <[email protected]>; Carsten Bormann <
> [email protected]>
> > >> >> Cc: Steve Lasker <[email protected]>; Ilari Liusvaara <
> [email protected]>; cose <[email protected]>
> > >> >> Subject: Re: [COSE] Re: I-D Action:
> draft-ietf-cose-hash-envelope-02.txt
> > >> >>
> > >> >> Hi Orie,
> > >> >>
> > >> >> in summary what I read is:
> > >> >>
> > >> >> * this is exiting but does not belong here
> > >> >> * adding hash-env-sigs to existing systems
> > >> >>
> > >> >> The question is, how much "convenience information" belongs in an
> "un-profiled" cose hash envelope for existing systems, right?
> > >> >>
> > >> >> Pointing to an instance of the pre-image is already an option.
> > >> >> Indicating the intended size of the pre-image seems to be very
> close, semantically.
> > >> >>
> > >> >> So the question is about scope creep vs. simplicity of resulting
> RFC. Yes?
> > >> >>
> > >> >> We added the "length" (are we settled on the name? length? ...)
> due to Ilari's feedback. If I want to convey metadata about a pre-image,
> there is already RFC9393. In consequence, I am leaning very so slightly to
> Orie's point and "keeping it simple". I acknowledge the intended use for
> that size... sorry length value, but I am not sure that this is the right
> place (aka the right I-D) to address it.
> > >> >>
> > >> >> Are there any other strong proponents for an optional pre-image
> "length"
> > >> >> header parameter? If not, maybe we can come to an in-room decision
> at IETF 122 meeting and not include it.
> > >> >>
> > >> >>
> > >> >> Viele Grüße,
> > >> >>
> > >> >> Henk
> > >> >>
> > >> >> On 05.03.25 03:49, Orie Steele wrote:
> > >> >> > Hi,
> > >> >> >
> > >> >> > I'm hesitant to start considering file transfer in scope for
> this draft.
> > >> >> >
> > >> >> > The original motivation was to create a simple standard syntax
> for
> > >> >> > signing hashes that are already used as identifiers, such as
> sha256 of
> > >> >> > spdx sbom, or container hashes...
> > >> >> > Delivery and integration for these is already a solved problem.
> > >> >> >
> > >> >> > We now seem to be imagining using hash envelope as part of some
> > >> >> > verifiable build system, that uses the optional location,
> content type,
> > >> >> > and a new file size parameter, to resolve large binaries from
> small
> > >> >> > signatures and verifiable metadata.
> > >> >> >
> > >> >> > That's exciting.
> > >> >> >
> > >> >> > I'd been imagining adding hash envelope signatures to existing
> systems,
> > >> >> > not using it to build new artifact repositories or package
> management
> > >> >> > systems.
> > >> >> >
> > >> >> > At a certain point, it's probably better to sign a corim
> manifest (which
> > >> >> > as you can see also includes hashes)... And let the manifest
> carry the
> > >> >> > information necessary to download data.
> > >> >> >
> > >> >> > That's all exciting stuff, but I prefer to not include it in
> this draft.
> > >> >> >
> > >> >> > Simplicity is what makes successful standards.
> > >> >> >
> > >> >> > I'm not opposed to profiling hash envelope to build a package
> manager,
> > >> >> > especially one that works well in constrained environments, I
> would just
> > >> >> > prefer address those requirements in a dedicated document.
> > >> >> >
> > >> >> > Regards,
> > >> >> >
> > >> >> > OS
> > >> >> >
> > >> >> > On Tue, Mar 4, 2025, 10:55 AM Carsten Bormann <[email protected]
> > >> >> > <mailto:[email protected]>> wrote:
> > >> >> >
> > >> >> >     Hi Orie,
> > >> >> >
> > >> >> >      > What happens if the resolved file has the correct hash,
> but
> > >> >> >     incorrect file size?
> > >> >> >
> > >> >> >     You invoke crypto agility and choose a better hash function
> :-)
> > >> >> >     (I understand Ilari’s argument that being able to limit the
> file
> > >> >> >     size before computing the hash can help mitigate DoS.)
> > >> >> >
> > >> >> >      > I wonder if there is some CBOR related filesystem RFC
> that could
> > >> >> >     provide the file size and other relevant metadata.
> > >> >> >
> > >> >> >         file-entry = {
> > >> >> >           filesystem-item,
> > >> >> >           ? size => uint,
> > >> >> >           ? file-version => text,
> > >> >> >           ? hash => hash-entry,
> > >> >> >           * $$file-extension,
> > >> >> >           global-attributes,
> > >> >> >         }
> > >> >> >
> > >> >> >     Not an RFC yet, but pretty advanced already:
> > >> >> >
> https://www.ietf.org/archive/id/draft-ietf-rats-corim-07.html#appendix-A-1
> <
> https://www.ietf.org/archive/id/draft-ietf-rats-corim-07.html#appendix-A-1
> >
> > >> >> >
> > >> >> >     Grüße, Carsten
> > >> >> >
> > >> >> >
> > >> >> > _______________________________________________
> > >> >> > COSE mailing list -- [email protected]
> > >> >> > To unsubscribe send an email to [email protected]
> > >> >
> > >> > _______________________________________________
> > >> > COSE mailing list -- [email protected]
> > >> > To unsubscribe send an email to [email protected]
>
_______________________________________________
COSE mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to