On 03/20/2011 08:44 PM, Karsten Bräckelmann forwarded From: Matt Elson
> I have no idea why, but it seems:
> \s proceeded by three or more characters and tflags multiple
> regularly hits the problem for me.

I don't have much experience with non-production re2c; how do I properly
reproduce (and therefore test) this bug on svn trunk?

I would want to try this, which should be a faster regex anyway:

/free\s[ptc](?:ill|ablet|ap(?:sule|let)s/i

I also wanted to try a leading word-break ("\b") in front of the regex,
though I don't know how many spams that will skip.

While looking at the PILL_PRICE rules,

body  __PILL_PRICE_1
m;\$?[\d\s.]{3,8}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i

What is the point of leading with an optional piece?  That regex is
identical to this simpler one:

m;[\d\s.]{3}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i

Another point; what if we merge _1 and _3 from

_1 m;\$?[\d\s.]{3,8}(?:/|per|each)\s?(?:pill|tablet|cap(?:sule|let));i
_2 /(?:pill|tablet|cap(?:sule|let))s\s\$?[\d\s.]{3,8}/i
_3 /free\s(?:pill|tablet|cap(?:sule|let))s/i

into (note removal of _1's optional lead)

m;(?:[\d\s.]{3}(?:/|per|each)|free)\s?(?:pill|tablet|cap(?:sule|let));i

Matt already showed that disabling _1 and _2 didn't prevent the problem
with _3, so this isn't as much of a potential remedy as it initially
seems, but it should be slightly more efficient and might avoid the re2c
bug.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to