Hi Hilmar,

Looked at the test in a bit more details, I can see what you are
trying to test but is there a real life problem behind this?
What this test is doing is a lot of searches on very short strings. Is
this what your real life application does? I am asking because if your
real life application uses regexp to look into long string, the
performance might be totally different.
What is your aim - 3 seconds for 500K searches do not seem
particularly slow to me.

Thanks
Peter


On 24 October 2012 19:10, P. Troshin <[email protected]> wrote:
> Hi Hilmar,
>
> Hmm, it looks like I spoke too soon; the previous run was doing
> nothing as all of the cases were commented out.
> I can now see that the results of my runs are not massively different
> from that of yours.
> It would help if you could encourage your student to write a few unit
> tests so that we know what you are trying to achieve and to simplify
> the testing.
>
> Just a thought
>
> Thanks,
> Peter
>
>
>
> On 24 October 2012 17:47, Hilmar Lapp <[email protected]> wrote:
>> Hi everyone,
>>
>> Thanks for all your responses. Indeed I know that the Java regex API isn't 
>> an enjoyable one to program with, and if the underlying task were about 
>> writing something from scratch, I'd be all for avoiding regex's too if the 
>> same thing could be achieved by string comparison.
>>
>> However, and of course I failed to say that initially, the task from which 
>> this query is originating is about converting a Perl script to Java (not 
>> because Perl is somehow bad, but because those Perl scripts have shown to be 
>> an obstacle to easy cross-platform installation of the - mostly Java - 
>> software they are a part of). That doesn't mean one couldn't in the course 
>> also rewrite the code that uses regular expressions to one that doesn't, but 
>> I also think it wise not to introduce multiple variables as a source of 
>> error at once.
>>
>> Some of the responses would be best answered by looking at the expressions 
>> and the code that uses them, so here are the two "benchmark" scripts.
>>
>> Java: https://gist.github.com/3940931
>> Perl: https://gist.github.com/3940780
>>
>> I'm also copying Dongye Meng here, who is a CS student at UNC working with 
>> us on the project - if anyone has further wisdom to share about how to 
>> reduce the performance gap between the two versions, he'd surely appreciate.
>>
>>         -hilmar
>>
>> On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote:
>>
>>> Hilmar Lapp <[email protected]> writes:
>>>> They (at least as in java.util.regex) have been reported to me as
>>>> performing much slower (by several orders of magnitude) than the regex
>>>> implementation in Perl, and some simple benchmarking tests seem to
>>>> bear that out. Even after scrutinizing the benchmark and finding
>>>> nothing obvious, I'm still skeptical as to why this would be the case
>>>> - naively I would have assumed that the underlying runtime library is
>>>> implemented in C in both cases. But perhaps this is not true?
>>>
>>>
>>> Well, the difference is that Perl is perl, while Java is not; it all
>>> depends on the JVM, and libraries also. A quick shuftie at
>>> the source for the open-jdk libraries suggests that the regexp searching
>>> is done in Java -- it's not just a drop through to C. Always the problem
>>> with performance optimisation on Java -- you are only optimising for one
>>> situation. It might be interesting to see how much variation there is
>>> between JVMs.
>>>
>>> Like others, I would only use regexp as a last resort in Java anyway;
>>> compared to Perl, writing the code is painful. Still, I guess that you
>>> know this!
>>>
>>> Phil
>>
>> --
>> ===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Biojava-l mailing list  -  [email protected]
>> http://lists.open-bio.org/mailman/listinfo/biojava-l

_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to