Hi,
A micro benchmark[1] on regex shows our implementation is much
slower than RI, it cost Harmony ~23 seconds while RI ~7 seconds to
complete the test. I use windows XP, drlvm and Harmony M2 build, and
then replace regex classes with RI's while testing RI.
Regular expression is very improtant and core classlib for
Harmony, make it performs well will do benefit to Harmony performance.
We may find some way to make it faster.
According to Spec, regex use NFA algorithm[2] which is a little
different with perl regular expression. As a result, we can not change
the basic algorithm to DFA to improve performance. However still
there's alternative algorithm of NFA. I googled open source regular
experssion implementation, it is said that PCRE[3](Perl Compatible
Regular Expressions) algorithm is of very high performance and luckily
it is BSD licensed!
However the source is so large and C-written, and a quick glance
at complie function I find it rather complex, And java regular
expression is different from perl's so that a directly wrapping will
be surely un-workable. But we still may read its implementation and
find out some key point to improve, or some one already has some idea
on tuning?
I'm not a regular expression expert, please correct me if I miss
something. Any comments/suggestions?
[1]
public void test_Benchmark() throws Exception {
// warn up phase
for (int i = 0; i < 10000; i++) {
Pattern p = Pattern.compile("\\A\\p{Alpha}[\\p{Alnum}+-.]*\\z");
Matcher m = p.matcher("aaaaab");
m.matches();
}
// benchmark start
long time = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
Pattern p = Pattern.compile("\\A\\p{Alpha}[\\p{Alnum}+-.]*\\z");
Matcher m = p.matcher("aaaaab");
m.matches();
}
System.out.println(System.currentTimeMillis() - time);
}
[2] file:///C:/Spec/docs/api/java/util/regex/Pattern.html
[3] http://www.pcre.org/
--
Best Regards!
Jimmy, Jing Lv
China Software Development Lab, IBM