Hi, the attached patch fixes the issue with missing ACL entries (verified) and streamlines the code a bit.
Performance-wise it improves a bit, parsing the same 1M-entry ACL in 19.4 seconds (17.8 seconds in userland). On Mon, Nov 14, 2016 at 11:04 AM, Kinkie <gkin...@gmail.com> wrote: > On Mon, Nov 14, 2016 at 5:59 AM, Amos Jeffries <squ...@treenet.co.nz> wrote: >> On 14/11/2016 6:30 p.m., Alex Rousskov wrote: >>> On 11/13/2016 05:11 PM, Kinkie wrote: >>> >>>> the attached patch moves away from hand-rolling a c-string onto >>>> joining a SBufList for optimizing regexes in RegexData.cc. >>> >>>> You can find attached as a test case the output of squidclient >>>> mgr:config taken on trunk and on the submitted code. It is slightly >>>> different in that the new code allows for a longer optimized regex. >>> >>> I see the following before/after difference: >>> >>> -acl bigacl dstdom_regex ... (pattern84) (pattern86) ... >>> +acl bigacl dstdom_regex ... (pattern84)|(pattern85) ... >>> >>> Does changing space into "|" result in one internal regex instead of >>> two? >> >> It does / should. >> >>> And that is the optimization you have implemented? >> >> No, that was implemented long ago by Marcus. We got a ~20% speed >> increase from that. More complexity in the individual regex test seemed >> to counteract the fewer tests done. >> >> >>> How is that >>> related to SBufs (i.e., why could not old c-strings accommodate longer >>> regexes)? >> >> FYI (to kinkie): The 100 patterns per aggregation was an arbitrary >> limit, but the total string length is limited by the ConfigParser buffers. >> >> The plain-text length of the aggregated regex string needs to fit within >> one squid.conf line length. With space for the directive name, label, >> type and flags being included in the limit. > > that's BUFSIZ, or 4k. We currently have 1K + separator, name and flags. > If anything, we could be more aggressive in having longer regexes, > modulo OOM in regcomp. > >>> And why is there no pattern85 in the "before" test?! >>> >>> >>>> The code is expected to cause a small performance regression on >>>> parse and reconfigure due to extra data copies. The regression is >>>> negligible: the attached test cases perform "squid -k parse" in 60 >>>> msec in trunk and 62 on the branch on a warm cache on my i5 macbook >>>> air. >>> >>> A 3% performance regression is not necessarily negligible. Have you >>> tested with more ACL lines (not larger individual ACL lines) that take a >>> few seconds to load (as opposed to 60ms) ? If not, please do unless that >>> test would be irrelevant somehow. >>> >>> And what is the expected performance improvement from having fewer >>> longer regexes? >> >> I assume you mean performance of normal request processing through the >> regex ACL. Agreed, that need to be checked/compared to current speeds. > > Time needed for squid -k parse on my Mac 1.3GHz i5 macbook air, for a > 1M-entry random ACL, 13Mb-big: > new: 20.8s > trunk: 18.3s > > Hopefully in time we'll move the config parser to SBuf.. > > -- > Francesco -- Francesco
wordlist-sbuflist-aclregexdata-v2.patch
Description: Binary data
_______________________________________________ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev