Re: generating hash from packet content

Bill Zeng Wed, 27 Aug 2014 13:38:57 -0700

Just as a side question, do we have statistics on the extent of duplication
we have on ATS cache? Say, how many URL's point to the same object on
average? It seems like a trade-off between duplication and computation
(space and time).




On Wed, Aug 27, 2014 at 1:22 PM, Leif Hedstrom <zw...@apache.org> wrote:

> On Aug 27, 2014, at 1:51 PM, Nick Kew <n...@apache.org> wrote:
>
> > On Wed, 27 Aug 2014 16:17:17 +0000
> > Rasim Saltuk Alakuş <rala...@turksat.com.tr> wrote:
> >
> >>
> >> Hi All,
> >>
> >> ATS uses URL hash for cache storage. And CacheUrl plugin adds some more
> flexibility in URL hashing strategy.
> >>
> >> We think of creating hash based on packet content and use it as the
> hash while storing and retrieving from cache This looks a better solution,
> so that URI changes won't hurt caching system. One immediate benefit for
> example if you cache YouTube , each request for same video can have
> different URL and CacheUrl plugin does not always provide a good solution.
> Also maintaining site based hash filters looks not an elegant solution.
> >>
> >> Is there any previous or active work for implementing content based
> hashing? What kind of problems and constrains you may guess. Is there any
> volunteer to implement this feature together with us?
> >
> >
> > Indeed, the whole scheme is BAD (Broken As Designed).
> > Using different URLs for common content breaks cacheing on
> > the Web at large, and hacking one agent (such as Trafficserver)
> > to work around it will gain you only a tiny fraction of what
> > you've thrown away.  Indeed, if every agent on the Web -
> > from origin servers to desktop browsers - implemented this
> > cacheing scheme, you'd still lose MOST of the benefits of
> > cacheing, as the same content passes through different paths.
>
>
>
> I thought some more on this over a boring meeting, two more thoughts comes
> to mind:
>
> 1) Cache poisoning. This could be a serious problem, at a minimum some
> defenses such as using the Host: portion of the request for the cache key
> would be required. But, I’m guessing that still would be possible to abuse,
> to poison the HTTP caches (since the client request + origin response
> headers no longer dictates the cache lookup).
>
> 2) HTTP/2. Albeit it supports non-TLS, several browser vendors have
> indicated they will not support H2 over plain text. So, assuming we’re
> moving towards TLS across the board, this sort of interaction will get more
> tricky. I personally think it’ll have to evolve in a way that the content
> owners will need to participate better with caches. It’s too early to say,
> but maybe such a proposal would encourage the YouTube’s and Netflix’es to
> behave better (in some way that they can still control content, ad
> impressions, click tracking etc. etc. yet allow ISPs to cache the actual
> content).
>
> Just my $0.01,
>
>
> — Leif
>
>

Re: generating hash from packet content

Reply via email to