On Fri, Sep 18, 2009 at 05:17:39PM -0400, Miguel Pilar Vilagran wrote:
> Regardless of how big a cluster...@! This seems to be, you're much, much 
> better off using a regex.

Well, I don't agree. It depends on the ability to factor the rules,
and of course on the regex library. Having a lot of patterns in a
single ACL is pretty fast, they're all just checked in a loop, and
haproxy knows where the end is. Also, one thing that can be done
is to enumerate them from the most common to the least common.

Using a regex, you'd need to match a large set of values prefixed
with '.*' (bad) and separated by '|'. I think the regex might help
when all values have a large common prefix, because the they will
be evaluated as a tree. But here it's not the case at all.

If someone is interested in doing the test, I'd really suggest building
with libpcre which is blazingly fast.

Oh I've just run this ACL in a test config as-is (without regex).
Without the ACL, the test runs at 25900 hits/s. With the ACL, the
performance drops to 25660, which means approximately 1% performance
hit. I think is pretty much acceptable.

I have tried to add the ACL 10 times (450 patterns) to get a better
figure. Performance drops to 24500 hits/s, or about 5.4%. We're at
2.2 us for 450 patterns which translate into 4.9 ns or 16 CPU cycles
per pattern (my machine is a C2D at 3.2 GHz). That means it could
test about 200 million patterns per second. I don't think we can
reach that level with a complex regex.

> acl url_block   path_end .ad  .adprototype .asa .asax .ascx .axd .browser .cd 
> .cdx .cer .compiled .config .cs .csproj .dd .exclude .idc .java .jsl .ldb 
> .ldd .lddprototype .ldf .licx .master .mdb .mdf .msgx .phps .refresh .rem 
> .resources .resx .sd .sdm .sdmDocument .sitemap .skin .soap .svc .vb .vbproj 
> .vjsproj .vsdisco .webinfo

Regards,
Willy


Reply via email to