------------------------------------------------------------------------
*From: *Patrick Hemmer <hapr...@stormcloud9.net>
*Sent: * 2013-10-22 19:13:08 E
*To: *haproxy@formilux.org
*Subject: *handling hundreds of reqrep statements

> I'm currently using haproxy (1.5-dev19) as a content based router. It
> takes an incoming request, looks at the url, rewrites it, and sends it
> on to the appropriate back end.
> The difficult part is that we need to all parsing and rewriting after
> the first match. This is because we might have a url such as
> '/foo/bar' which rewrites to '/foo/baz', and another rewrite from
> '/foo/b' to '/foo/c'. As you can see both rules would try to trigger a
> rewrite on '/foo/bar/shot', and we'd end up with '/foo/caz/shot'.
> Additionally there are hundreds of these rewrites (the config file is
> generated from a mapping).
>
> There are 2 questions here:
>
> 1) I currently have this working using stick tables (it's unpleasant
> but it works).
> It basically looks like this:
> frontend frontend1
>     acl foo_bar path_reg ^/foo/bar
>     use_backend backend1 if foo_bar
>
>     acl foo_b path_reg ^/foo/b
>     use_backend backend1 if foo_b
>
> backend backend1
>     stick-table type integer size 1 store gpc0 # create a stick table
> to store one entry
>     tcp-request content track-sc1 always_false # enable tracking on
> sc1. The `always_false` doesn't matter, it just requires a key, so we
> give it one
>     acl rewrite-init sc1_clr_gpc0 ge 0 # ACL to clear gpc0
>     tcp-request content accept if rewrite-init # clear gpc0 on the
> start of every request
>     acl rewrite-empty sc1_get_gpc0 eq 0 # ACL to check if gpc0 has
> been set
>     acl rewrite-set sc1_inc_gpc0 ge 0 # ACL to set gpc0 when a rewrite
> has matched
>
>     acl foo_bar path_reg ^/foo/bar
>     reqrep ^(GET|POST)\ /foo/bar(.*) \1\ /foo/baz\2 if rewrite-empty
> foo_bar rewrite-set # the conditional first checks if another rewrite
> has matched, then checks the foo_bar acl, and then performs the
> rewrite-set only if foo_bar matched
>
>     acl foo_b path_reg ^/foo/b
>     reqrep ^(GET|POST)\ /foo/b(.*) \1\ /foo/c\2 if rewrite-empty foo_b
> rewrite-set # same procedure as above
>
> (my actual rules are a bit more complicated, but those examples
> exhibit all the problem points I have).
>
> The cleaner way I thought of handling this was to instead do something
> like this:
> backend backend1
>     acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found
>
>     acl foo_bar path_reg ^/foo/bar
>     reqrep ^(GET|POST)\ /foo/bar(.*) \1\ /foo/baz\2\r\nX-Rewrite-ID:\
> foo_bar if !rewrite-found foo_bar
>
>     acl foo_b path_reg ^/foo/b
>     reqrep ^(GET|POST)\ /foo/b(.*) \1\ /foo/c\2\r\nX-Rewrite-ID:\
> foo_b if !rewrite-found foo_b
>
> But this doesn't work. The rewrite-found acl never finds the header
> and so both reqrep commands run. Is there any better way of doing this
> than the nasty stick table?
>
>
> 2) I would also like to add a field to the log indicating which rule
> matched. I can't figure out a way to accomplish this bit.
> Since the config file is automatically generated, I was hoping to just
> assign a short numeric ID and stick that in the log somehow. The only
> way I can think that this could work is by adding a header
> conditionally using an acl (or use the header created by the alternate
> idea above), and then using `capture request header` to add that to
> the log. But it does not appear haproxy can capture headers added by
> itself.
>
> -Patrick

Ok, so I went home and resumed trying to figure this out, starting from
scratch on a whole new machine. Well guess what, the "cleaner" way
worked. After many proclamations of "WTF?" out loud (my dog was getting
concerned), I think I found a bug. And I cannot begin to describe just
how awesome this bug is.

Here's how you can duplicate this awesomeness:

Start a haproxy with the following config:
defaults
    mode http
    timeout connect 1000
    timeout client 1000
    timeout server 1000

frontend frontend
    bind *:2082

    maxconn 20000

  acl rewrite-found req.hdr(X-Header-ID) -m found

    reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ bar if
!rewrite-found
    reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ pop if
!rewrite-found
    reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ tart if
!rewrite-found

    default_backend backend

backend backend
    server server 127.0.0.1:2090



Start up a netcat:
while true; do nc -l -p 2090; done


Create a file with the following contents (I'll presume we call it "data"):
GET /foo/ HTTP/1.1
Accept: */*
User-Agent: Agent
Host: localhost:2082


(with the empty line on the bottom)

And now run:
nc localhost2082 < data

In your listening netcat, notice you got 3 "X-Header-ID" headers.

Now in your "data" file, move the "Accept: */*" down one line, so it's
after the User-Agent and retry. Notice you only get 1 "X-Header-ID"
back. It works!

But wait, it gets even better. Put the "Accept: */*" line back where it
was, and in the haproxy config, replace all "X-Header-ID" with
"X-HeaderID" (just remove the second dash, really just remove any
character, it seems to be length that matters). Then redo the request.
Notice it works!

I've said it numerous times tonight, WTF? :-)


-Patrick

Reply via email to