On Wed, Jul 7, 2010 at 09:55, Jesse Wilson <jessewil...@google.com> wrote: > On Mon, Jul 5, 2010 at 8:02 PM, Deven You <devyo...@gmail.com> wrote: > > I have written some simple benchmarks for harmony regex and find the > > performance of harmony is poor compared to RI. For example, Mathcer.find() > > only reach 60% of that of RI.
i'd be interested to see your benchmark code. our regular expression benchmarks actually focus on replacing use of Pattern/Matcher with simpler fast paths where the full complexity isn't needed. for example, most String.split calls don't actually use regular expressions. i'm not sure how much of this work made it into froyo, and it's certainly still a work in progress. one thing we want to do is use http://code.google.com/p/caliper/ benchmarks to track the relative performance of Java implementations of the stuff that's currently native code (but doesn't inherently need to be). we don't have anything like that for regular expressions yet. (our tool for conveniently running benchmarks is http://code.google.com/p/vogar/ and you can find some of our actual benchmarks at http://code.google.com/p/dalvik/.) > > I heard Android use icu4jni re-implement this > > module. Since icu4jni use native code I think it may has higher performance > > than harmony. I am trying to use icu4jni as the back-end of harmony regex > > but find icu4jni has no functions related to regex operations. although, as jesse said, regular expressions are a special case where there was no original icu4jni implementation, icu4jni is a dead project (http://site.icu-project.org/#TOC-ICU4JNI). Android used the icu4jni code, but it's been gradually rewritten since then. the process is incomplete, and the extent of the rewrite varies from class to class, but you're probably better off taking code from Android rather than icu4jni. > > I know there are some android guys in our community. So can anyone tell me > > some detail info for android's regex, like if it re-implement the regex > > logic using native code by android itself rather than icu4jni and really get > > higher performance compared to harmony regex? Thanks a lot! > > We actually use icu4c's pattern, which we access via JNI. Documentation is > here: > http://icu-project.org/apiref/icu4c/classRegexPattern.html > > Our regex-java integration code is Apache-licensed and available in our Git > repository here: > > http://android.git.kernel.org/?p=platform/libcore.git;a=tree;f=regex/src/main/java/java/util/regex;h=f9207d043a24d3dc034bf2cc3a5fdd57115692c3;hb=master actually, in shipping versions of Android we use the C API rather than the C++ API. see http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=icu/src/main/native/NativeRegEx.cpp;h=7b3cafc0779ef7f3ed69a32128af7f96e3498922;hb=master for the gory details. i've recently rewritten this to use the C++ APIs, and it's much cleaner for it. if you look at the current code you'll see that we copy the input to the native heap, because icu4c < 4.6 can't cope with its input moving about between calls (and a VM can't necessarily guarantee the Java char[] doesn't move around). in future, we'll use a new icu4c API to get the best of both worlds: http://sourceforge.net/mailarchive/forum.php?thread_name=aanlktim7cztafnebqm8to3lhddingvs3pshpw79xb...@mail.gmail.com&forum_name=icu-design if you're shipping your own copy of icu (rather than using some system one you have no control over), you might want to backport that change (which has been submitted). one final thing to be aware of is that icu4c's regular expression implementation isn't 100% compatible with the RI. most obviously, there's no support for CANON_EQ, but there are a variety of more subtle incompatibilities too. -- Elliott Hughes - http://who/enh - http://jessies.org/~enh/