On 03/20/2011 08:44 PM, Karsten Bräckelmann forwarded From: Matt Elson > I have no idea why, but it seems: > \s proceeded by three or more characters and tflags multiple > regularly hits the problem for me.
I don't have much experience with non-production re2c; how do I properly
reproduce (and therefore test) this bug on svn trunk?
I would want to try this, which should be a faster regex anyway:
/free\s[ptc](?:ill|ablet|ap(?:sule|let)s/i
I also wanted to try a leading word-break ("\b") in front of the regex,
though I don't know how many spams that will skip.
While looking at the PILL_PRICE rules,
body __PILL_PRICE_1
m;\$?[\d\s.]{3,8}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i
What is the point of leading with an optional piece? That regex is
identical to this simpler one:
m;[\d\s.]{3}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i
Another point; what if we merge _1 and _3 from
_1 m;\$?[\d\s.]{3,8}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i
_2 /(?:pill|tablet|cap(?:sule|let))s\s\$?[\d\s.]{3,8}/i
_3 /free\s(?:pill|tablet|cap(?:sule|let))s/i
into (note removal of _1's optional lead)
m;(?:[\d\s.]{3}(?:/|per|each)|free)\s?(?:pill|tablet|cap(?:sule|let));i
Matt already showed that disabling _1 and _2 didn't prevent the problem
with _3, so this isn't as much of a potential remedy as it initially
seems, but it should be slightly more efficient and might avoid the re2c
bug.
signature.asc
Description: OpenPGP digital signature
