Re: Perl5Util performance

2006-03-30 Thread Duke Tantiprasut
Thanks Daniel.

Sounds like I should be moving to java.util.regex. I do like the convenience
of the pattern caching but I guess it's easy enough to set that up myself
for java.util.regex.

Duke

On 3/29/06, Daniel F. Savarese [EMAIL PROTECTED] wrote:


 In message [EMAIL PROTECTED],
 Duke
 Tantiprasut writes:
 I'm curious why there is such a significant jump from the Perl5Matcher
 compared to the java.util.regex?

 A hefty chunk of that time comes from converting strings to char[] before
 matching.  I've tuned that benchmark before and trimmed 25% of the time
 just by using PatternMatcherInput instead of String.  It's not exactly
 a rigorous benchmark anyway.  Measurements I've made in the past show
 that the performance of the packages depends heavily on the input and
 how the regular expressions are written.  Two equivalent regular
 expressions can have very different performance characteristics.
 That said, ORO is behind the times on performance, having been designed
 originally to get the most out of JDK 1.0.2.

 A question that bears revisiting is if Perl5Matcher needs to bother
 converting to char[] anymore.  In JDK 1.0.2 and 1.1 days it was a big
 performance win, but unless you're working with your input as
 char[] from the start, I bet these days it would be faster to not make
 the conversion and work directly with String (or CharSequence) if we're
 willing to abandon JDK 1.2/1.3 compatibility.  But now that there's
 a java.util.regex, the primary reason to use ORO appears to be if you're
 still on 1.2/1.3...

 In response to the email Subject, Perl5Util is a convenience class and
 will always be slower than using Perl5Matcher directly because Perl5Util
 has to parse the native Perl-style representation of expressions :(

 daniel


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Perl5Util performance

2006-03-29 Thread Duke Tantiprasut
Hi All,

Below is an interesting benchmark result comparing a number of regex
engines.

http://tusker.org/regex/regex_benchmark.html

I'm curious why there is such a significant jump from the Perl5Matcher
compared to the java.util.regex?

The DFA based engines such as JREXX look really fast but I'm not sure if it
allows you get the matched group results.

Duke


Re: Perl5Util performance

2006-03-29 Thread Daniel F. Savarese

In message [EMAIL PROTECTED], Duke 
Tantiprasut writes:
I'm curious why there is such a significant jump from the Perl5Matcher
compared to the java.util.regex?

A hefty chunk of that time comes from converting strings to char[] before
matching.  I've tuned that benchmark before and trimmed 25% of the time
just by using PatternMatcherInput instead of String.  It's not exactly
a rigorous benchmark anyway.  Measurements I've made in the past show
that the performance of the packages depends heavily on the input and
how the regular expressions are written.  Two equivalent regular
expressions can have very different performance characteristics.
That said, ORO is behind the times on performance, having been designed
originally to get the most out of JDK 1.0.2.

A question that bears revisiting is if Perl5Matcher needs to bother
converting to char[] anymore.  In JDK 1.0.2 and 1.1 days it was a big
performance win, but unless you're working with your input as
char[] from the start, I bet these days it would be faster to not make
the conversion and work directly with String (or CharSequence) if we're
willing to abandon JDK 1.2/1.3 compatibility.  But now that there's
a java.util.regex, the primary reason to use ORO appears to be if you're
still on 1.2/1.3...

In response to the email Subject, Perl5Util is a convenience class and
will always be slower than using Perl5Matcher directly because Perl5Util
has to parse the native Perl-style representation of expressions :(

daniel


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]