I get the same behavior in TRE 8.0 $ ./tre "a(.*?)b(.*?)ab" abbbbbab ./tre a(.*?)b(.*?)ab abbbbbab (0,8)(1,5)(6,6)
I noticed that leaving the second quantifier greedy results in what you would expect to be correct behavior: $ ./tre "a(.*?)b(.*)ab" abbbbbab ./tre a(.*?)b(.*)ab abbbbbab (0,8)(1,1)(2,6) I've been trying to figure out how tagged regex should deal with reluctant quantifiers - like Chris Kuklewicz, I'm writing a tagged regex package for another language (Java, in my case). And like Chris, I read Ville's paper and thesis, but not the tre code. I've come up with an interesting case, where the behavior of the tag for the capturing group cannot be static: it depends on the input data. Consider (a+?b*)((?:a|b)+)... the first right paren will be either reluctant or greedy depending on the input. Feed it a stream of 'a', and it should be reluctant. Feed it an 'a' followed by many 'b', and it should be greedy. TRE seems to have trouble in either case: $ ./tre "(a+?b*)((?:a|b)+)" aaab ./tre (a+?b*)((?:a|b)+) aaab (0,4)(0,2)(2,4) $ ./tre "(a+?b*)((?:a|b)+)" abbb ./tre (a+?b*)((?:a|b)+) abbb (0,4)(0,2)(2,4) Odd that the first group should capture (0,2)...
_______________________________________________ TRE-general mailing list tre-general@laurikari.net http://laurikari.net/mailman/listinfo/tre-general http://laurikari.net/tre/