Consider the following program:
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TokenizeTest {
public static void main(String[] args) {
String input = "The cat sat on the mat";
int prevEnd = 0;
Matcher matcher = Pattern.compile("[\\x20\\n\\r\\t]+").matcher(input);
while (matcher.find()) {
System.err.println("Match at " + matcher.start() + ": " +
input.substring(prevEnd, matcher.start()));
prevEnd = matcher.end();
}
System.err.println("Remainder: " + input.substring(prevEnd));
}
}
With Sun JRE the output is:
Match at 3: The
Match at 7: cat
Match at 11: sat
Match at 14: on
Match at 18: the
Remainder: mat
With GNU Classpath it is:
Remainder: The cat sat on the mat
Michael Kay
--
Summary: Regex tokenizing
Product: classpath
Version: 0.20
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: classpath
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: mike at saxonica dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25976
_______________________________________________
Bug-classpath mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-classpath