Hi Hilmar,

I used regex a lot in perl and java... I was also skeptical about the regex in 
java when I start using them. 

from my own experience, I can tell you the following:
it's MUCH more easy to write regex in perl than in java.
java regex require more optimisation: working regex and optimal regex are two 
different things
in java, Patterns must be compiled first. So, if you iterate through a large 
number of strings you want to match, compile your pattern outside the loop
if you use regex in large iteration, avoid using methods from java.lang.String 
that use regex: String.replaceFirst, String.replaceAll, String.matches.... your 
pattern will be compiled each time
Avoid applying regex to large string. If possible, try to limit the matches to 
the places where the pattern is .. methods like indexOf, lastIndexOf, split ... 
from java.lang.String are very useful in this regards.
It's more easy to get the matching group in java than in perl
test first with editors like : RegExhibit or your IDE regex plugin.
finally, I recommend the Java Regular Expressions book from Mehran Habibi 
(http://www.amazon.com/Java-Regular-Expressions-Taming-java-util-regex/dp/1590591070)

If your regex are well optimised, you will not notice any difference between 
perl/java. 

If you need to use regex in complex algorithm or software in combination with 
java/biojava, don't hesitate, java regex are excellent. If you just need regex 
in small script go for perl

Best

khalil 



On 22 Oct 2012, at 18:00, [email protected] wrote:

> Send Biojava-l mailing list submissions to
>       [email protected]
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       http://lists.open-bio.org/mailman/listinfo/biojava-l
> or, via email, send a message with subject or body 'help' to
>       [email protected]
> 
> You can reach the person managing the list at
>       [email protected]
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biojava-l digest..."
> 
> 
> Today's Topics:
> 
>   1. regex performance in Java (Hilmar Lapp)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 22 Oct 2012 10:52:24 -0400
> From: Hilmar Lapp <[email protected]>
> Subject: [Biojava-l] regex performance in Java
> To: BioJava <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=us-ascii
> 
> I know that this is really Java language topic, but since parsing biological 
> data formats is to rife with regular expression applications, I'm curious 
> what the experience is among the Biojava people with the use of regular 
> expressions in Java. 
> 
> They (at least as in java.util.regex) have been reported to me as performing 
> much slower (by several orders of magnitude) than the regex implementation in 
> Perl, and some simple benchmarking tests seem to bear that out. Even after 
> scrutinizing the benchmark and finding nothing obvious, I'm still skeptical 
> as to why this would be the case - naively I would have assumed that the 
> underlying runtime library is implemented in C in both cases. But perhaps 
> this is not true?
> 
> Any experience people have made here speed-wise (or tricks or things not to 
> do for Java regex's) would be appreciated.
> 
>       -hilmar
> -- 
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> 
> 
> 
> 
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Biojava-l mailing list  -  [email protected]
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 
> End of Biojava-l Digest, Vol 117, Issue 10
> ******************************************





-----

Confidentiality Notice: This e-mail and any files transmitted with it are 
private and confidential and are solely for the use of the addressee. It may 
contain material which is legally privileged. If you are not the addressee or 
the person responsible for delivering to the addressee, please notify that you 
have received this e-mail in error and that any use of it is strictly 
prohibited. It would be helpful if you could notify the author by replying to 
it.




_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to