Hi Hilmar, Looked at the test in a bit more details, I can see what you are trying to test but is there a real life problem behind this? What this test is doing is a lot of searches on very short strings. Is this what your real life application does? I am asking because if your real life application uses regexp to look into long string, the performance might be totally different. What is your aim - 3 seconds for 500K searches do not seem particularly slow to me.
Thanks Peter On 24 October 2012 19:10, P. Troshin <[email protected]> wrote: > Hi Hilmar, > > Hmm, it looks like I spoke too soon; the previous run was doing > nothing as all of the cases were commented out. > I can now see that the results of my runs are not massively different > from that of yours. > It would help if you could encourage your student to write a few unit > tests so that we know what you are trying to achieve and to simplify > the testing. > > Just a thought > > Thanks, > Peter > > > > On 24 October 2012 17:47, Hilmar Lapp <[email protected]> wrote: >> Hi everyone, >> >> Thanks for all your responses. Indeed I know that the Java regex API isn't >> an enjoyable one to program with, and if the underlying task were about >> writing something from scratch, I'd be all for avoiding regex's too if the >> same thing could be achieved by string comparison. >> >> However, and of course I failed to say that initially, the task from which >> this query is originating is about converting a Perl script to Java (not >> because Perl is somehow bad, but because those Perl scripts have shown to be >> an obstacle to easy cross-platform installation of the - mostly Java - >> software they are a part of). That doesn't mean one couldn't in the course >> also rewrite the code that uses regular expressions to one that doesn't, but >> I also think it wise not to introduce multiple variables as a source of >> error at once. >> >> Some of the responses would be best answered by looking at the expressions >> and the code that uses them, so here are the two "benchmark" scripts. >> >> Java: https://gist.github.com/3940931 >> Perl: https://gist.github.com/3940780 >> >> I'm also copying Dongye Meng here, who is a CS student at UNC working with >> us on the project - if anyone has further wisdom to share about how to >> reduce the performance gap between the two versions, he'd surely appreciate. >> >> -hilmar >> >> On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote: >> >>> Hilmar Lapp <[email protected]> writes: >>>> They (at least as in java.util.regex) have been reported to me as >>>> performing much slower (by several orders of magnitude) than the regex >>>> implementation in Perl, and some simple benchmarking tests seem to >>>> bear that out. Even after scrutinizing the benchmark and finding >>>> nothing obvious, I'm still skeptical as to why this would be the case >>>> - naively I would have assumed that the underlying runtime library is >>>> implemented in C in both cases. But perhaps this is not true? >>> >>> >>> Well, the difference is that Perl is perl, while Java is not; it all >>> depends on the JVM, and libraries also. A quick shuftie at >>> the source for the open-jdk libraries suggests that the regexp searching >>> is done in Java -- it's not just a drop through to C. Always the problem >>> with performance optimisation on Java -- you are only optimising for one >>> situation. It might be interesting to see how much variation there is >>> between JVMs. >>> >>> Like others, I would only use regexp as a last resort in Java anyway; >>> compared to Perl, writing the code is painful. Still, I guess that you >>> know this! >>> >>> Phil >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : >> =========================================================== >> >> >> >> >> >> _______________________________________________ >> Biojava-l mailing list - [email protected] >> http://lists.open-bio.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
