(Apologies if this is duplicated)

> Sorry, despite *all* the initial research done by Matt (thanks again!)
> ended up in my Inbox, I somehow lost track of some details while also
> being busy on the users list...

Just as a heads up, it's less case insensitive vs. case sensitive but
more whether or not the CompiledRegExp module that sa-compile makes
flags a given regex as "lossy".  Case sensitive (i.e. /pill/) will
*always* be lossy it looks like, but case insensitive may or may not be.
/pill/i was NOT lossy, but the original __PILL_PRICE_[1-3] rules were
all lossy when I ran my debug info and they all  were case insensitive,
I think.

I'm not sure what determines if a given compiled regex is lossy or not
(or if its lossiness is consistent across versions of re2c and
architectures even).

> See my previous post regarding test-cases. If it doesn't... That might
> be a workaround. As might be [Uu]ppercase, in some circumstances.

In my testing, apparently a construction like l{2} will make the rule
not compile correctly, which will work around the issue in what feels
like a hacky way.

As in:

body        __PILL_PRICE_1        
m;\$?[\d\s.]{3,8}(?:/|per|each)\s?(?:pil{2}|tablet|cap(?:sule|let));i

will actually cause the rule to not get compiled (gcc throws errors for
me ) and when you scan it it uses the non-compiled rule, avoiding the
issue altogether:

Mar 23 09:04:23.924 [24906] dbg: rules: ran body rule __PILL_PRICE_1
======> got hit: "$2.15 per pill"
Mar 23 09:04:23.924 [24906] dbg: rules: ran body rule __PILL_PRICE_1
======> got hit: "$2.15 per pill"

Though not sure how consistent that failure state is.

In the end, I think to work around this without code updates on client
side, the regexes either need to be non-lossy or be unable to be
compiled.  

> But for that, we need test-cases and testers(!) to check, if that's also
> covered by the Perl bug.

I can continue testing some things if necessary, but it sounds like it'd
be more in finding ways to work around the issue (outside of simply not
pushing any body regexes with tflags multiple to < 3.2.2).


-- 
  Matt Elson
  [email protected]

Reply via email to