Hi,

On Tue, Mar 27, 2012 at 10:56:55AM +0100, Jonathan Matthews wrote:
> On 27 March 2012 03:24, fred hu <[email protected]> wrote:
> > Hi
> >
> > Actually this "problem" was not introduced in this version, it exists for a
> > long time.
> >
> > The scene is using haproxy for LB http requests to different cache server
> > based on uri hash.
> >
> > This works well for normal request. But when end-users access via a http
> > proxy, the uri changed.
> >
> > The first line of a normal request:
> > GET /1.jpg HTTP/1.1
> >
> > The first line of a request via some http proxy:
> > GET http://www.google.com/1.jpg HTTP/1.1
> >
> > The result is the same two object hashed to two different cache server [A
> > and B].
> > But this is not the worst case, since only a little bit storage wasted.
> >
> > The worst case is if web administrator update the origian 1.jpg and remove
> > 1.jpg in cache server A.
> > The out-of-date object stored in cache server B will never be
> > removed/updated.
> >
> > I am not sure this is a bug or not, but I guess this might be one problem.
> 
> http://martinfowler.com/bliki/TwoHardThings.html
> 
> This is not a problem that HAProxy is in any place to help you with, IMHO :-)
> You will need to solve the cache invalidation problem another way.

Well, there would be a solution I think. A dirty one but a solution
nonetheless. Assuming that very few requests are emitted as an absolute
URI with a scheme, haproxy could be used to rewrite them before performing
the hash.

It would then look like this :

    reqrep ^([^:\ ]*\ )(http://[^/]*)(/.*) \1\3  if { url_beg http }

The effect will be that "http://hostname"; would be stripped when found
(and only in this case, reducing the regex cost). Since URI hashing is
performed at the end, it would hash the rewritten URI. Your server will
get the rewritten URI BTW.

But it's dirty.

Willy


Reply via email to