From: Ito Kazumitsu <[EMAIL PROTECTED]> Date: Tue, 24 Jan 2006 07:25:29 +0900 (JST)
> This is an ad hoc fix and not carefully designed. I can think of problems > such as > > How should we compare an stingy match and a non-stingy match? I am beginning to think that gnu.regexp is doing more than is expected: Find all possible matches and select the best match. Jakarta Regexp seems much simpler (I have not seriously studied the source code, this is only an impression): Go forward. If something fails, backtrack and try another option. If there is nothing to be done, then we have found a match. If we have more to do but we cannot go ahead, the the matching has failed. Our java.util.regex.Matcher#find() and java.util.regex.Matcher#matches() both uses the same gnu.regexp.RE#getMatchImpl(), which tries to find all possible matches. So with the regexp /^[ab]{1,3}(ab*|b)/, our find() can find "aabbbbb" from "aabbbbb". But other implementations (Sun's JDK, Jakarta Regexp, Perl, Ruby) can find only "aabb". 1 [ab] : a 2 [ab] : a 3 [ab] : b 4 ab* : FAIL 5 b : b Other implementations stop here, but gnu.regexp goes back to 2 and tries another option. 3 ab* : FAIL 4 b : b Go back to 1 and try another. 2 ab* : abbbbb "aabbbbb" has been found. If we can simplufy the work of getMatchImpl(), we will be getting the same results as other implementations as well as improved performance. In that case, when matches() call getMatchImpl(), an implicit RETokenEnd should be added. Otherwise, "aaa" cannot match /a+?/. _______________________________________________ Classpath-patches mailing list Classpath-patches@gnu.org http://lists.gnu.org/mailman/listinfo/classpath-patches