I get the same behavior in TRE 8.0

$ ./tre  "a(.*?)b(.*?)ab" abbbbbab
./tre a(.*?)b(.*?)ab abbbbbab (0,8)(1,5)(6,6)

I noticed that leaving the second quantifier greedy results in what you
would expect to be correct behavior:

$ ./tre  "a(.*?)b(.*)ab" abbbbbab
./tre a(.*?)b(.*)ab abbbbbab (0,8)(1,1)(2,6)


I've been trying to figure out how tagged regex should deal with reluctant
quantifiers - like Chris Kuklewicz, I'm writing a tagged regex package for
another language (Java, in my case). And like Chris, I read Ville's paper
and thesis, but not the tre code.

I've come up with an interesting case, where the behavior of the tag for the
capturing group cannot be static: it depends on the input data.

Consider (a+?b*)((?:a|b)+)... the first right paren will be either reluctant
or greedy depending on the input. Feed it a stream of 'a', and it should be
reluctant. Feed it an 'a' followed by many 'b', and it should be greedy.

TRE seems to have trouble in either case:

$ ./tre "(a+?b*)((?:a|b)+)" aaab
./tre (a+?b*)((?:a|b)+) aaab (0,4)(0,2)(2,4)

$ ./tre "(a+?b*)((?:a|b)+)" abbb
./tre (a+?b*)((?:a|b)+) abbb (0,4)(0,2)(2,4)

Odd that the first group should capture (0,2)...
_______________________________________________
TRE-general mailing list tre-general@laurikari.net
http://laurikari.net/mailman/listinfo/tre-general
http://laurikari.net/tre/

Reply via email to