On 8/27/2014 3:22 PM, Leif Hedstrom wrote:
On Aug 27, 2014, at 1:51 PM, Nick Kew <n...@apache.org> wrote:

On Wed, 27 Aug 2014 16:17:17 +0000
Rasim Saltuk Alakuş <rala...@turksat.com.tr> wrote:

Hi All,

ATS uses URL hash for cache storage. And CacheUrl plugin adds some more 
flexibility in URL hashing strategy.

We think of creating hash based on packet content and use it as the hash while 
storing and retrieving from cache This looks a better solution, so that URI 
changes won't hurt caching system. One immediate benefit for example if you 
cache YouTube , each request for same video can have different URL and CacheUrl 
plugin does not always provide a good solution. Also maintaining site based 
hash filters looks not an elegant solution.

Is there any previous or active work for implementing content based hashing? 
What kind of problems and constrains you may guess. Is there any volunteer to 
implement this feature together with us?

Indeed, the whole scheme is BAD (Broken As Designed).
Using different URLs for common content breaks cacheing on
the Web at large, and hacking one agent (such as Trafficserver)
to work around it will gain you only a tiny fraction of what
you've thrown away.  Indeed, if every agent on the Web -
from origin servers to desktop browsers - implemented this
cacheing scheme, you'd still lose MOST of the benefits of
cacheing, as the same content passes through different paths.


I thought some more on this over a boring meeting, two more thoughts comes to 
mind:

1) Cache poisoning. This could be a serious problem, at a minimum some defenses 
such as using the Host: portion of the request for the cache key would be 
required. But, I’m guessing that still would be possible to abuse, to poison 
the HTTP caches (since the client request + origin response headers no longer 
dictates the cache lookup).

Good point on the cache poisoning. If the attacker knew your hash generation strategy (e.g. hash the first 1000 bytes of the file) and had access to a legitimate copy of that data, he could indeed inject bogus data for the non hashed data.

Given the large number of potential hosts for a CDN, I think you want to generalize the host name before you add it to the look up key. If the host name matches your expectations for a CDN, you can use a fixed name as part of the key. Otherwise, you use the host name straight.

Reply via email to