Just as a side question, do we have statistics on the extent of duplication we have on ATS cache? Say, how many URL's point to the same object on average? It seems like a trade-off between duplication and computation (space and time).
On Wed, Aug 27, 2014 at 1:22 PM, Leif Hedstrom <zw...@apache.org> wrote: > On Aug 27, 2014, at 1:51 PM, Nick Kew <n...@apache.org> wrote: > > > On Wed, 27 Aug 2014 16:17:17 +0000 > > Rasim Saltuk Alakuş <rala...@turksat.com.tr> wrote: > > > >> > >> Hi All, > >> > >> ATS uses URL hash for cache storage. And CacheUrl plugin adds some more > flexibility in URL hashing strategy. > >> > >> We think of creating hash based on packet content and use it as the > hash while storing and retrieving from cache This looks a better solution, > so that URI changes won't hurt caching system. One immediate benefit for > example if you cache YouTube , each request for same video can have > different URL and CacheUrl plugin does not always provide a good solution. > Also maintaining site based hash filters looks not an elegant solution. > >> > >> Is there any previous or active work for implementing content based > hashing? What kind of problems and constrains you may guess. Is there any > volunteer to implement this feature together with us? > > > > > > Indeed, the whole scheme is BAD (Broken As Designed). > > Using different URLs for common content breaks cacheing on > > the Web at large, and hacking one agent (such as Trafficserver) > > to work around it will gain you only a tiny fraction of what > > you've thrown away. Indeed, if every agent on the Web - > > from origin servers to desktop browsers - implemented this > > cacheing scheme, you'd still lose MOST of the benefits of > > cacheing, as the same content passes through different paths. > > > > I thought some more on this over a boring meeting, two more thoughts comes > to mind: > > 1) Cache poisoning. This could be a serious problem, at a minimum some > defenses such as using the Host: portion of the request for the cache key > would be required. But, I’m guessing that still would be possible to abuse, > to poison the HTTP caches (since the client request + origin response > headers no longer dictates the cache lookup). > > 2) HTTP/2. Albeit it supports non-TLS, several browser vendors have > indicated they will not support H2 over plain text. So, assuming we’re > moving towards TLS across the board, this sort of interaction will get more > tricky. I personally think it’ll have to evolve in a way that the content > owners will need to participate better with caches. It’s too early to say, > but maybe such a proposal would encourage the YouTube’s and Netflix’es to > behave better (in some way that they can still control content, ad > impressions, click tracking etc. etc. yet allow ISPs to cache the actual > content). > > Just my $0.01, > > > — Leif > >