------------------------------------------------------------------------
*From: *Patrick Hemmer <hapr...@stormcloud9.net>
*Sent: * 2013-10-22 23:32:31 E
*CC: *haproxy@formilux.org
*Subject: *Re: handling hundreds of reqrep statements

>
>
> ------------------------------------------------------------------------
> *From: *Patrick Hemmer <hapr...@stormcloud9.net>
> *Sent: * 2013-10-22 19:13:08 E
> *To: *haproxy@formilux.org
> *Subject: *handling hundreds of reqrep statements
>
>> I'm currently using haproxy (1.5-dev19) as a content based router. It
>> takes an incoming request, looks at the url, rewrites it, and sends
>> it on to the appropriate back end.
>> The difficult part is that we need to all parsing and rewriting after
>> the first match. This is because we might have a url such as
>> '/foo/bar' which rewrites to '/foo/baz', and another rewrite from
>> '/foo/b' to '/foo/c'. As you can see both rules would try to trigger
>> a rewrite on '/foo/bar/shot', and we'd end up with '/foo/caz/shot'.
>> Additionally there are hundreds of these rewrites (the config file is
>> generated from a mapping).
>>
>> There are 2 questions here:
>>
>> 1) I currently have this working using stick tables (it's unpleasant
>> but it works).
>> It basically looks like this:
>> frontend frontend1
>>     acl foo_bar path_reg ^/foo/bar
>>     use_backend backend1 if foo_bar
>>
>>     acl foo_b path_reg ^/foo/b
>>     use_backend backend1 if foo_b
>>
>> backend backend1
>>     stick-table type integer size 1 store gpc0 # create a stick table
>> to store one entry
>>     tcp-request content track-sc1 always_false # enable tracking on
>> sc1. The `always_false` doesn't matter, it just requires a key, so we
>> give it one
>>     acl rewrite-init sc1_clr_gpc0 ge 0 # ACL to clear gpc0
>>     tcp-request content accept if rewrite-init # clear gpc0 on the
>> start of every request
>>     acl rewrite-empty sc1_get_gpc0 eq 0 # ACL to check if gpc0 has
>> been set
>>     acl rewrite-set sc1_inc_gpc0 ge 0 # ACL to set gpc0 when a
>> rewrite has matched
>>
>>     acl foo_bar path_reg ^/foo/bar
>>     reqrep ^(GET|POST)\ /foo/bar(.*) \1\ /foo/baz\2 if rewrite-empty
>> foo_bar rewrite-set # the conditional first checks if another rewrite
>> has matched, then checks the foo_bar acl, and then performs the
>> rewrite-set only if foo_bar matched
>>
>>     acl foo_b path_reg ^/foo/b
>>     reqrep ^(GET|POST)\ /foo/b(.*) \1\ /foo/c\2 if rewrite-empty
>> foo_b rewrite-set # same procedure as above
>>
>> (my actual rules are a bit more complicated, but those examples
>> exhibit all the problem points I have).
>>
>> The cleaner way I thought of handling this was to instead do
>> something like this:
>> backend backend1
>>     acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found
>>
>>     acl foo_bar path_reg ^/foo/bar
>>     reqrep ^(GET|POST)\ /foo/bar(.*) \1\ /foo/baz\2\r\nX-Rewrite-ID:\
>> foo_bar if !rewrite-found foo_bar
>>
>>     acl foo_b path_reg ^/foo/b
>>     reqrep ^(GET|POST)\ /foo/b(.*) \1\ /foo/c\2\r\nX-Rewrite-ID:\
>> foo_b if !rewrite-found foo_b
>>
>> But this doesn't work. The rewrite-found acl never finds the header
>> and so both reqrep commands run. Is there any better way of doing
>> this than the nasty stick table?
>>
>>
>> 2) I would also like to add a field to the log indicating which rule
>> matched. I can't figure out a way to accomplish this bit.
>> Since the config file is automatically generated, I was hoping to
>> just assign a short numeric ID and stick that in the log somehow. The
>> only way I can think that this could work is by adding a header
>> conditionally using an acl (or use the header created by the
>> alternate idea above), and then using `capture request header` to add
>> that to the log. But it does not appear haproxy can capture headers
>> added by itself.
>>
>> -Patrick
>
> Ok, so I went home and resumed trying to figure this out, starting
> from scratch on a whole new machine. Well guess what, the "cleaner"
> way worked. After many proclamations of "WTF?" out loud (my dog was
> getting concerned), I think I found a bug. And I cannot begin to
> describe just how awesome this bug is.
>
> Here's how you can duplicate this awesomeness:
>
> Start a haproxy with the following config:
> defaults
>     mode http
>     timeout connect 1000
>     timeout client 1000
>     timeout server 1000
>
> frontend frontend
>     bind *:2082
>
>     maxconn 20000
>
>   acl rewrite-found req.hdr(X-Header-ID) -m found
>
>     reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ bar if
> !rewrite-found
>     reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ pop if
> !rewrite-found
>     reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ tart if
> !rewrite-found
>
>     default_backend backend
>
> backend backend
>     server server 127.0.0.1:2090
>
>
>
> Start up a netcat:
> while true; do nc -l -p 2090; done
>
>
> Create a file with the following contents (I'll presume we call it
> "data"):
> GET /foo/ HTTP/1.1
> Accept: */*
> User-Agent: Agent
> Host: localhost:2082
>
>
> (with the empty line on the bottom)
>
> And now run:
> nc localhost2082 < data
>
> In your listening netcat, notice you got 3 "X-Header-ID" headers.
>
> Now in your "data" file, move the "Accept: */*" down one line, so it's
> after the User-Agent and retry. Notice you only get 1 "X-Header-ID"
> back. It works!
>
> But wait, it gets even better. Put the "Accept: */*" line back where
> it was, and in the haproxy config, replace all "X-Header-ID" with
> "X-HeaderID" (just remove the second dash, really just remove any
> character, it seems to be length that matters). Then redo the request.
> Notice it works!
>
> I've said it numerous times tonight, WTF? :-)
>
>
> -Patrick
So I did a little more playing around, and this is what appears to be
going on:
It seems that when the request first comes in, haproxy allocates a
buffer for every header. If the header is "X-Foo: bar" it allocates a 10
character buffer. When you do `reqrep` on the request line, and add a
line at the end with the "\r\n" it moves every header down by one. So
"X-Foo: bar" ends up in the buffer for whatever header was after it. If
that buffer isn't big enough to old the whole thing, then when haproxy
goes to look for a matching header it won't find it. So in my case,
"X-Header-ID: foo" got put in the buffer for "Accept: */*". Since
X-Header-ID is one character longer than that buffer, when haproxy went
looking for it, it was only finding "X-Header-I".

Now I thought I got the "\r\n" rewrite trick from the haproxy man page,
but it appears I'm nuts as I see nothing of the sort. I can't remember
where I got it from now.

So, going back to the drawing board with my original 2 questions, I
think I have a solution

backend backend1
    acl rewrite-found req.hdr(X-Rewrite-ID) -m found
    http-request set-header X-Original-Path %[path]

    acl foo_bar req.hdr(X-Original-Path) -m reg ^/foo/bar
    http-request set-header X-Rewrite-ID foo_bar if !rewrite-found foo_bar
    acl foo_bar_header req.hdr(X-Rewrite-ID) -m str foo_bar
    reqrep ^(GET)\ /foo/bar(.*) \1\ /foo/baz\2 if foo_bar_header

    acl foo_b req.hdr(X-Original-Path) -m reg ^/foo/b
    http-request set-header X-Rewrite-ID foo_b if !rewrite-found foo_b
    acl foo_b_header req.hdr(X-Rewrite-ID) -m str foo_b
    reqrep ^(GET)\ /foo/b(.*) \1\ /foo/c\2 if foo_b_header


This way '/foo/barbaz' -> '/foo/bazbaz' and '/foo/ba' -> '/foo/ca'

It's cumbersome, but less hackish than the stick table. It relies upon
the fact that `http-request set-header` happens before `reqrep`.
I would still prefer a simpler solution if possible.



-Patrick

Reply via email to