Hi, I have rewritten large part of the source code of gnu.regexp package.
The imortant points of this change are: (1) A new method REToken#matchThis. This method tries to match the input string against the REToken itself and does not try to match the next RETokens chained to it. The currently used REToken#match method should be defined using REToken#matchThis. This is useful for (3). (2) A new method REToken#findMatch. This is almost the same as the current REToken#match but returns a resulting REMatch instead of a boolean value. This is useful for the depth-first search with backtracking. (3) New methods REToken#returnsFixedLengthMatches and REToken#findFixedLengthMatches. These will fasten the search for repeated matches if the matched string is supposed to have a fixed length. (4) RETokenOneOf and RETokenRepeated perform a depth-first search with backtracking. After this change, the test attached below shows 400% improved performance compared with the current CVS version. The improved performance comes mainly from the change (3). To my regret, The change (4) had a negative effect on performance. ChangeLog 2006-03-01 Ito Kazumitsu <[EMAIL PROTECTED]> * gnu/regexp/BacktrackStack.java: New file. * gnu/regexp/RE.java(findMatch): New method. * gnu/regexp/REMatch.java(next,matchFlags,MF_FIND_ALL, REMatchList): Removed. (backtrackStack): New field. * gnu/regexp/REToken.java(match): Changed from an abstract method to an ordinary method defined with the new method matchThis. (matchThis, getNext, findMatch, returnsFixedLengthMatches, findFixedLengthMatches, backtrack, toString): New methods. * gnu/regexp/RETokenAny.java: Inplemented new methods of REToken. * gnu/regexp/RETokenBackRef.java: Likewise. * gnu/regexp/RETokenChar.java: Likewise. * gnu/regexp/RETokenEnd.java: Likewise. * gnu/regexp/RETokenEndSub.java: Likewise. * gnu/regexp/RETokenIndependent.java: Likewise. * gnu/regexp/RETokenLookAhead.java: Likewise. * gnu/regexp/RETokenLookBehind.java: Likewise. * gnu/regexp/RETokenNamedProperty.java: Likewise. * gnu/regexp/RETokenPOSIX.java: Likewise. * gnu/regexp/RETokenRange.java: Likewise. * gnu/regexp/RETokenStart.java: Likewise. * gnu/regexp/RETokenWordBoundary.java: Likewise * gnu/regexp/RETokenOneOf.java: Rewriten. * gnu/regexp/RETokenRepeated.java: Rewriten. The performance test follows: import java.io.*; import java.util.*; import java.util.regex.*; import java.io.*; import java.util.*; import java.util.regex.*; public class RegExTestCase { static final Pattern dynPatString = Pattern.compile("(.*?)(<dynstr\\s+)(property)\\s*=\\s*\"(.*?)\"\\s*(/>)" ); // | 1 ||<- 2 ->||<- 3 ->| |<4>| |5 | long time = 0; public RegExTestCase () { for (int i = 0; i < 10000; i++) { String s = replaceDynamicStringAll(testString, "Foo"); } System.out.println("Elapsed time = " + time); } public String replaceDynamicStringAll(String inStr, String replaceStr) { StringBuffer sb = new StringBuffer(); Matcher m = dynPatString.matcher(inStr); time -= System.currentTimeMillis(); boolean b = m.find(); time += System.currentTimeMillis(); while (b) { sb.append(m.group(1)); m.appendReplacement(sb, replaceStr); b = m.find(); } m.appendTail(sb); return sb.toString(); } private final static String testString = "ABC<dynstr property = \"X\"/>"; public final static void main(String[] args) { new RegExTestCase(); } }
--- gnu/java/nio/charset/iconv/IconvProvider.java.orig Sat Jul 16 00:12:48 2005 +++ gnu/java/nio/charset/iconv/IconvProvider.java Thu Oct 20 23:41:57 2005 @@ -62,7 +62,11 @@ } } - private IconvProvider() + // Declaring the construtor public may violate the use of singleton. + // But it must be public so that an instance of this class can be + // created by Class.newInstance(), which is the case when this provider is + // defined in META-INF/services/java.nio.charset.spi.CharsetProvider. + public IconvProvider() { IconvMetaData.setup(); }