On Mon, Dec 05, 2011 at 09:33:05AM -0500, [email protected] wrote:
>Forum: CFEngine Help
>Subject: problem with negative lookahead regex
>Author: svenXY
>Link to topic: https://cfengine.com/forum/read.php?3,24188,24188#msg-24188
>
>What I want is find out if a line starts with "log4j.rootLogger=", then
>something else, then does not contain the string ", SYSLOG". This works fine
>when specifying the whole string, but as soon as I use '(.*)' or even '(.*?)'
>inbetween, the regex fails.
>
>Here's some code to demonstrate:
>
>
>body common control
>{
> bundlesequence => { "test" };
>}
>
>
>bundle agent test
>{
>
> vars:
> "start" string => "log4j.rootLogger=";
> "startlong" string => "log4j.rootLogger=INFO, FILE";
> "end" string => ", SYSLOG";
Shouldn't this be "SYSLOG" without the comma and space characters? It
could be the only option listed. For example: "log4j.rootLogger=SYSLOG"
Of course, that dones't address the regex problem.
> classes:
> "matched_origin" expression => regcmp("^($(start)(.*?))(?!$(end))$",
> "log4j.rootLogger=INFO, FILE");
> "matched_whole" expression => regcmp("^($(start)(.*?))(?!$(end))$",
> "log4j.rootLogger=INFO, FILE, SYSLOG");
> "matched_l_origin" expression => regcmp("^($(startlong))(?!$(end))$",
> "log4j.rootLogger=INFO, FILE");
> "matched_l_whole" expression => regcmp("^($(startlong))(?!$(end))$",
> "log4j.rootLogger=INFO, FILE, SYSLOG");
I think the problem here is that the '.*!' construct, is still too
greedy. You match the value of ${start} just fine, then match
*everything else to the end of the string*, then perform a negative
lookahead. The negative lookahead succeeds because there's nothing left
to match.
Consider this example where we want to find all rabbits not
chased by a dog:
$ cat test.txt
rabbit
rabbit dog
$ pcregrep 'rabbit.*(?!dog)' test.txt
rabbit
rabbit dog
It finds both. Now consider:
$ pcregrep 'rabbit(?!.*dog)' test.txt
rabbit
So maybe try this regex (note that I've moved the '.*'):
^($(start))(?!.*$(end)).*$
So I don't think this is a bug, but one of the dark and subtle corners
of regexes.
>
> reports:
> matched_origin::
> "this should match - start is not followed by end";
> matched_whole::
> "this should not match (best version)";
> matched_l_origin::
> "should match, but start contains the whole string";
> matched_l_whole::
> "this should not match (whole string version)";
>}
>
>
>
>output is:
>
>
>R: this should match - start is not followed by end
>R: this should not match (best version)
>R: should match, but start contains the whole string
>
>
>- the second one is my problem. It should not work, because I do the following
>there:
>
>"log4j.rootLogger=INFO, FILE, SYSLOG" matched against
>"^(log4j.rootLogger=(.*?)(?!, SYSLOG)$"
>
>--> and that should not match!!!
>
>Is that a bug or can someone enlighten my poor understanding of regexes here?
>
>Thaanks a bunch,
>Sven
>
>_______________________________________________
>Help-cfengine mailing list
>[email protected]
>https://cfengine.org/mailman/listinfo/help-cfengine
--
Jesse Becker
NHGRI Linux support (Digicon Contractor)
_______________________________________________
Help-cfengine mailing list
[email protected]
https://cfengine.org/mailman/listinfo/help-cfengine