The list construction should already be completely objective. I changed the manual origin-owner validation to trust and require "cache-control: public" instead. The rest of the criteria <https://docs.google.com/document/d/1xaoF9iSOojrlPrHZaKIJMK4iRZKA3AD6pQvbSy4ueUQ/edit?tab=t.0> should be well-defined and objective. I'm not sure if they can be fully automated yet (though that might just be my pre-AI thinking).
The main need for humans in the loop right now is to create the patterns so that they each represent a "single" resource that is stable over time with URL changes (version/hash) and distinguishing those stable files from random hash bundles that aren't stable from release to release. That's fairly easy for a human to do (and get right). On Fri, Nov 7, 2025 at 4:47 PM Rick Byers <[email protected]> wrote: > Thanks Pat. I am personally a big fan of things which increase publisher > ad revenue across the web broadly without hurting (or ideally improving) > the user experience, and this seems likely to do exactly that. In > particular I recall all the debate around stale-while-revalidate > <https://web.dev/articles/stale-while-revalidate> and am proud that we > pushed > <https://groups.google.com/a/chromium.org/g/blink-dev/c/rspPrQHfFkI/m/c5j3xJQRDAAJ?e=48417069> > through it with urgency and confirmed it indeed increased publisher ad > revenue across the web > <https://web.dev/case-studies/ads-case-study-stale-while-revalidate>. > > Reading the Mozilla feedback carefully the point that resonates most with > me is the risk of "gatekeeping" and the potential to mitigate that by > establishing objective rules for inclusion. Is it plausible to imagine a > version of this where the list construction would be entirely objective? > What would the tradeoffs be? > > Thanks, > Rick > > > > > On Thu, Oct 30, 2025 at 3:50 PM Patrick Meenan <[email protected]> > wrote: > >> Reaching out to site owners was mostly for a sanity check that the >> resource is not expecting to be partitioned for some reason (even though >> the payloads are known to be identical). If it helps, we can replace the >> reach-out step with a requirement that the responses be "Cache-Control: >> public" (and hard-enforce it in the browser by not writing the resource to >> cache if it isn't). That is an explicit indicator that the resources are >> cacheable in shared upstream caches. >> >> I removed the 2 items from the design doc that were specifically targeted >> at direct fingerprinting since that's moot with the 3PC link (as well as >> the fingerprinting bits from the validation with resource owners). >> >> On the site-preferencing concern, it doesn't actually preference large >> sites but it does preference currently-popular third-party resources (most >> of which are provided by large corporations). The benefit is spread across >> all of the sites that they are embedded in (funnily enough, most large >> sites won't benefit because they don't tend to use third-parties). >> >> Determining the common resources at a local level exposes the same XS >> Leak issues as allowing all resources (i.e. your local map tiles will show >> up in multiple cache partitions because they all reference your current >> location but they can be used to identify your location since they are not >> globally common). Instead of using the HTTP Archive to collect the >> candidates, we could presumably build a centralized list based on >> aggregated common resources that are seen across cache partitions by each >> user but that feels like an awful lot of complexity for a very small number >> of resulting resources. >> >> On the test results, sorry, I thought I had included the experiment >> results in the I2S but it looks like I may not have. >> >> The test was specifically just with the patterns for the Google ads >> scripts because we aren't expecting this feature to impact the vitals for >> the main page/content since most of the pervasive resources are third-party >> content that is usually async already and not critical-path. It's possible >> some video or map embeds might trigger LCP in some cases but that's the >> exception more than the norm. This is more geared to making those >> supporting things work better while maintaining the user experience. Ads >> has the kind of instrumentation that we'd need to be able to get visibility >> into the success (or failure) of that assumption and to be able to measure >> small changes. >> >> The results were stat-sig positive but relatively small. The ad iframes >> displayed their content slightly faster and transmitted fewer bytes for >> each frame (very low single digit percentages). >> >> The guardrail metrics, including vitals) were all neutral which is what >> we were hoping for (improvement without a cost of increased contention). >> >> If you'd feel more comfortable with gathering more data, I wouldn't be >> opposed to running the full list at 1% to check the guardrail metrics again >> before fully launching. We won't necessarily expect to see positive >> movement to justify a launch since the resources are still async but we can >> validate that assumption with the full list at least (if that is the only >> remaining concern). >> >> >> On Thu, Oct 30, 2025 at 5:28 PM Rick Byers <[email protected]> wrote: >> >>> Thanks Erik and Patrick, of course that makes sense. Sorry for the naive >>> question. My naive reading of the design doc suggested to me that a lot of >>> the privacy mitigations were about preventing the cross-site tracking risk. >>> Could the design be simplified by removing some of those mitigations? For >>> example, the section about reaching out to the resource owners, to what >>> extent is that really necessary when all we're trying to mitigate is XS >>> leaks? Don't the popularity properties alone mitigate that sufficiently? >>> >>> What can you share about the magnitude of the performance benefit in >>> practice in your experiments? In particular for LCP, since we know >>> <https://wpostats.com/> that correlates well with user engagement (and >>> against abandonment) and so presumably user value. >>> >>> The concern about not wanting to further advantage more popular sites >>> over less popular ones resonates with me. Part of that argument seems to >>> apply broadly to the idea of any LRU cache (especially one with a reuse >>> bias which I believe ours has >>> <https://www.chromium.org/developers/design-documents/network-stack/disk-cache/#eviction>?). >>> But perhaps an important distinction here is that the benefits are >>> determined globally vs. on a user-by-user basis? But I think any solution >>> that worked on a user-by-user basis would have the XS leak problem, right? >>> Perhaps it's worth reflecting on our stance on using crowd-sourced data to >>> try to improve the experience for all users while still being fair to sites >>> broadly. In general I think this is something Chromium is much more open to >>> (where it brings significant user benefit) than other engines. For example, >>> our Media Engagement Index <https://developer.chrome.com/blog/autoplay> >>> system has some similar properties in terms of using aggregate user >>> behaviour to help decide which sites have the power to play audio on page >>> load and which don't. I was personally uncertain at the time if the >>> complexity would prove to be worth the benefit, but now I'm quite convinced >>> it is. Playing audio on load is just something users and developers want in >>> a few cases, but not most cases. I wonder if perhaps cross-site caching is >>> similar? >>> >>> Rick >>> >>> On Thu, Oct 30, 2025 at 9:09 AM Matt Menke <[email protected]> wrote: >>> >>>> Note that even with Vary: Origin, we still have to load the HTTP >>>> request headers from the disk cache to apply the vary header, which leaks >>>> timing information, so "Vary: Origin" is not a sufficient security >>>> mechanism to prevent that sort of cross-site attack. >>>> >>>> On Wednesday, October 29, 2025 at 5:08:42 PM UTC-4 Erik Anderson wrote: >>>> >>>>> My understanding was that there was believed to be a meaningful >>>>> security benefit with partitioning the cache. That’s because it would >>>>> limit >>>>> a party from being able to inferr that you’ve visited some other site by >>>>> measuring a side effect tied to how quickly a resource loads. That >>>>> observation could potentially be made even if that specific adversary >>>>> doesn’t have any of their own content loaded on the other site. >>>>> >>>>> >>>>> >>>>> Of course, if there is an entity with a resource loaded across both >>>>> sites with a 3p cookie *and* they’re willing to share that >>>>> info/collude, there’s not much benefit. And even when partitioned, if 3p >>>>> cookies are enabled, there are potentially measurable side effects that >>>>> differ based on if the resource request had some specific state in a 3p >>>>> cookie. >>>>> >>>>> >>>>> >>>>> Does that incremental security benefit of partitioning the cache >>>>> justify the performance costs when 3p cookies are still enabled? I’m not >>>>> sure. >>>>> >>>>> >>>>> >>>>> Even if partitioning was eliminated, a site could protect themselves a >>>>> bit by specifying Vary: Origin, but that probably doesn’t >>>>> sufficiently cover iframe scenarios (nor would I expect most sites to hold >>>>> it right). >>>>> >>>>> >>>>> >>>>> *From:* Rick Byers <[email protected]> >>>>> *Sent:* Wednesday, October 29, 2025 11:56 AM >>>>> *To:* Patrick Meenan <[email protected]> >>>>> *Cc:* Mike Taylor <[email protected]>; blink-dev < >>>>> [email protected]> >>>>> *Subject:* [EXTERNAL] Re: [blink-dev] Intent to ship: Cache sharing >>>>> for extremely-pervasive resources >>>>> >>>>> >>>>> >>>>> If this is enabled only when 3PCs are enabled, then what are the >>>>> tradeoffs of going through all this complexity and governance vs. just >>>>> broadly coupling HTTP cache keying behavior to 3PC status in some way? >>>>> What >>>>> can a tracker credibly do with a single-keyed HTTP cache that they cannot >>>>> do with 3PCs? Are there also concerns about accidental cross-site resource >>>>> sharing which could be mitigated more simply by other means, eg. by >>>>> scoping >>>>> to just to ETag-based caching? >>>>> >>>>> >>>>> >>>>> I remember the controversy and some real evidence of harm to users and >>>>> businesses in 2020 when we partitioned the HTTP cache, but I was convinced >>>>> that we had to accept that harm in order to credibly achieve 3PCD. At the >>>>> time I was personally a fan of a proposal like this (even for users >>>>> without >>>>> 3PCs) in order to mitigate the harm. But now it seems to me that if we're >>>>> going to start talking about poking holes in that decision, perhaps we >>>>> should be doing a larger review of the options in that space with the >>>>> knowledge that most Chrome users are likely to continue to have 3PCs >>>>> enabled. WDYT? >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Rick >>>>> >>>>> >>>>> >>>>> On Mon, Oct 27, 2025 at 10:27 AM Patrick Meenan <[email protected]> >>>>> wrote: >>>>> >>>>> I don't believe the security/privacy protections actually rely on the >>>>> assertions (and it's unlikely those would be public). It's more for >>>>> awareness and to make sure they don't accidentally break something with >>>>> their app if they were relying on the responses being partitioned by site. >>>>> >>>>> >>>>> >>>>> As far as query params go, the browser code already only filters for >>>>> requests with no query params so any that do rely on query params won't >>>>> get >>>>> included anyway. >>>>> >>>>> >>>>> >>>>> The same goes for cookies. Since the feature is only enabled when >>>>> third-party cookies are enabled, adding cookies to these responses or >>>>> putting unique content in them won't actually pierce any new boundaries >>>>> but >>>>> it goes against the intent of only using it for public/static resources >>>>> and >>>>> they'd lose the benefit of the shared cache when it gets updated. Same >>>>> goes >>>>> for the fingerprinting risks if the pattern was abused. >>>>> >>>>> >>>>> >>>>> On Mon, Oct 27, 2025 at 9:39 AM Mike Taylor <[email protected]> >>>>> wrote: >>>>> >>>>> On 10/22/25 5:48 p.m., Patrick Meenan wrote: >>>>> >>>>> The candidate list goes down to 20k occurrences in order to catch >>>>> resources that were updated mid-crawl and may have multiple entries with >>>>> different hashes that add up to 100k+ occurrences. In the candidate list, >>>>> without any filtering, the 100k cutoff is around 600, I'd estimate that >>>>> well less than 25% of the candidates make it through the filtering for >>>>> stable pattern, correct resource type and reliable pattern. First release >>>>> will likely be 100-200 and I don't expect it will ever grow above 500. >>>>> >>>>> Thanks - I see the living document has been updated to mention 500 as >>>>> a ceiling. >>>>> >>>>> >>>>> >>>>> As far as cadence goes, I expect there will be a lot of activity for >>>>> the next few releases as individual patterns are coordinated with the >>>>> origin owners but then it will settle down to a much more bursty pattern >>>>> of >>>>> updates every few Chrome releases (likely linked with an origin changing >>>>> their application and adding more/different resources). And yes, it is >>>>> manual. >>>>> >>>>> As far as the process goes, resource owners need to actively assert >>>>> that their resource is appropriate for the single-keyed cache and that >>>>> they >>>>> would like it included (usually in response to active outreach from us but >>>>> we have the external-facing list for owner-initiated contact as well). >>>>> The >>>>> design doc has the documentation for what it means to be appropriate (and >>>>> the doc will be moved to a readme page in the repository next to the >>>>> actual >>>>> list so it's not a hard-to-find Google doc): >>>>> >>>>> Will there be any kind of public record of this assertion? What >>>>> happens if a site starts using query params or sending cookies? Does the >>>>> person in charge of manual list curation discover that in the next >>>>> release? >>>>> Does that require a new release (I don't know if this lives in component >>>>> updater, or in the binary itself)? >>>>> >>>>> >>>>> >>>>> *5. Require resource owner opt-in* >>>>> For each URL to be included, reach out to the team/company responsible >>>>> for the resource to validate the URL pattern and get assurances that the >>>>> pattern will always serve the same content to all sites and not be abused >>>>> for tracking (by using unique URLs within the pattern mask as a bit-mask >>>>> for fingerprinting). They will also need to validate that the URLs covered >>>>> by the pattern will not rely on being able to set cookies over HTTP using >>>>> a >>>>> Set-Cookie HTTP response header because they will not be re-applied >>>>> across cache boundaries (the set-cookie is not cached with the resource). >>>>> >>>>> >>>>> >>>>> On Wed, Oct 22, 2025 at 5:31 PM Mike Taylor <[email protected]> >>>>> wrote: >>>>> >>>>> On 10/18/25 8:34 a.m., Patrick Meenan wrote: >>>>> >>>>> Sorry, I missed a step in making the candidate resource list public. I >>>>> have moved it to my chromium account and made it public here >>>>> <https://docs.google.com/spreadsheets/d/1TgWhdeqKbGm6hLM9WqnnXLn-iiO4Y9HTjDXjVO2aBqI/edit?usp=sharing>. >>>>> >>>>> >>>>> >>>>> >>>>> Not everything in that list meets all of the criteria - it's just the >>>>> first step in the manual curation (same URL served the same content across >>>>> > 20k sites in the HTTP Archive dataset). >>>>> >>>>> >>>>> >>>>> The manual steps frome there for meeting the criteria are basically: >>>>> >>>>> >>>>> >>>>> - Cull the list for scripts, stylesheets and compression dictionaries. >>>>> >>>>> - Remove any URLs that use query parameters. >>>>> >>>>> - Exclude any responses that set cookies. >>>>> >>>>> - Identify URLs that are not manually versioned by site embedders >>>>> (i.e. the embedded resource can not get stale). This is either in-place >>>>> updating resources or automatically versioned resources. >>>>> >>>>> - Only include URLs that can reliably target a single resource by >>>>> pattern (i.e. ..../<hash>-common.js but not ..../<hash>.js) >>>>> >>>>> - Get confirmation from the resource owner that the given URL Pattern >>>>> is and will continue to be appropriate for the single-keyed cache >>>>> >>>>> A few questions on list curation: >>>>> >>>>> Can you clarify how big the list will be? The privacy review at >>>>> https://chromestatus.com/feature/5202380930678784?gate=5174931459145728 >>>>> mentions >>>>> ~500, while the design doc mentions 1000. I see the candidate resource >>>>> list >>>>> starts at ~5000, then presumably manual curation begins to get to one of >>>>> those numbers. >>>>> >>>>> What is the expected list curation/update cadence? Is it actually >>>>> manual? >>>>> >>>>> Is there any recourse process for owners of resources that don't want >>>>> to be included? Do we have documentation on what it mean to be appropriate >>>>> for the single-keyed cache? >>>>> >>>>> thanks, >>>>> Mike >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "blink-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion visit >>>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w6UFSnxxzhGKBnY1BJKiZZeH7BUm7PmcjQm_%2BLjGyrtYg%40mail.gmail.com >>>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w6UFSnxxzhGKBnY1BJKiZZeH7BUm7PmcjQm_%2BLjGyrtYg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "blink-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>>> To view this discussion visit >>>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAFUtAY9Nffq00r-xbiu2BO00y%2B_2knAi-zheMs9hrE-dB%2BTZ3w%40mail.gmail.com >>>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAFUtAY9Nffq00r-xbiu2BO00y%2B_2knAi-zheMs9hrE-dB%2BTZ3w%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPq58w4ceQ4Df%2BzFCYwFM5MSAh4APVXtCHj9Q7o5CP_B%3DKs1kA%40mail.gmail.com.
