Re: [blink-dev] Intent to ship: Cache sharing for extremely-pervasive resources

Mike Taylor Mon, 27 Oct 2025 06:40:14 -0700

On 10/22/25 5:48 p.m., Patrick Meenan wrote:

The candidate list goes down to 20k occurrences in order to catchresources that were updated mid-crawl and may have multiple entrieswith different hashes that add up to 100k+ occurrences. In thecandidate list, without any filtering, the 100k cutoff is around 600,I'd estimate that well less than 25% of the candidates make it throughthe filtering for stable pattern, correct resource type and reliablepattern. First release will likely be 100-200 and I don't expect itwill ever grow above 500.

Thanks - I see the living document has been updated to mention 500 as aceiling.

As far as cadence goes, I expect there will be a lot of activity forthe next few releases as individual patterns are coordinated with theorigin owners but then it will settle down to a much more burstypattern of updates every few Chrome releases (likely linked with anorigin changing their application and adding more/differentresources). And yes, it is manual.As far as the process goes, resource owners need to actively assertthat their resource is appropriate for the single-keyed cache and thatthey would like it included (usually in response to active outreachfrom us but we have the external-facing list for owner-initiatedcontact as well). The design doc has the documentation for what itmeans to be appropriate (and the doc will be moved to a readme page inthe repository next to the actual list so it's not a hard-to-findGoogle doc):

Will there be any kind of public record of this assertion? What happensif a site starts using query params or sending cookies? Does the personin charge of manual list curation discover that in the next release?Does that require a new release (I don't know if this lives in componentupdater, or in the binary itself)?


5. Require resource owner opt-in

For each URL to be included, reach out to the team/company responsiblefor the resource to validate the URL pattern and get assurances thatthe pattern will always serve the same content to all sites and not beabused for tracking (by using unique URLs within the pattern mask as abit-mask for fingerprinting). They will also need to validate that theURLs covered by the pattern will not rely on being able to set cookiesover HTTP using a Set-CookieHTTP response header because they will notbe re-applied across cache boundaries (the set-cookie is not cachedwith the resource).

On Wed, Oct 22, 2025 at 5:31 PM Mike Taylor <[email protected]>wrote:


    On 10/18/25 8:34 a.m., Patrick Meenan wrote:

    Sorry, I missed a step in making the candidate resource list
    public. I have moved it to my chromium account and made it public
    here
    
<https://docs.google.com/spreadsheets/d/1TgWhdeqKbGm6hLM9WqnnXLn-iiO4Y9HTjDXjVO2aBqI/edit?usp=sharing>.


    Not everything in that list meets all of the criteria - it's just
    the first step in the manual curation (same URL served the same
    content across > 20k sites in the HTTP Archive dataset).

    The manual steps frome there for meeting the criteria are basically:

    - Cull the list for scripts, stylesheets and compression
    dictionaries.
    - Remove any URLs that use query parameters.
    - Exclude any responses that set cookies.
    - Identify URLs that are not manually versioned by site embedders
    (i.e. the embedded resource can not get stale). This is either
    in-place updating resources or automatically versioned resources.
    - Only include URLs that can reliably target a single resource by
    pattern (i.e. ..../<hash>-common.js but not ..../<hash>.js)
    - Get confirmation from the resource owner that the given URL
    Pattern is and will continue to be appropriate for the
    single-keyed cache


    A few questions on list curation:

    Can you clarify how big the list will be? The privacy review at
    https://chromestatus.com/feature/5202380930678784?gate=5174931459145728 
mentions
    ~500, while the design doc mentions 1000. I see the candidate
    resource list starts at ~5000, then presumably manual curation
    begins to get to one of those numbers.

    What is the expected list curation/update cadence? Is it actually
    manual?

    Is there any recourse process for owners of resources that don't
    want to be included? Do we have documentation on what it mean to
    be appropriate for the single-keyed cache?

    thanks,
    Mike


--
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/78423999-6502-4e0d-bc7c-889060b8c9bc%40chromium.org.

Re: [blink-dev] Intent to ship: Cache sharing for extremely-pervasive resources

Reply via email to