Re: Rtalk Performance for micro benchmarks (how to improve)

Charles Oliver Nutter Mon, 17 Sep 2012 20:11:07 -0700

An aside...can you post your RTALK and Java Hanoi impls? I'm always
looking for another benchmark to add to my suite.


- Charlie

On Mon, Sep 17, 2012 at 8:00 PM, Charles Oliver Nutter
<head...@headius.com> wrote:
> Oh, and to reiterate the point about jdk8...
>
> On JRuby fib, it's around 0.44s for fib(35) versus jdk7's 0.33 and
> fastruby's 0.19. So that's easily 2x slower, which isn't far off from
> what you're seeing..right?
>
> - Charlie
>
> On Mon, Sep 17, 2012 at 7:57 PM, Charles Oliver Nutter
> <head...@headius.com> wrote:
>> Your output is a little hard to parse, but I think I followed it.
>>
>> First off, it is currently expected that jdk8 is not optimizing as
>> well as the jdk7 indy logic. I have seen very few benchmarks that are
>> faster, and my reading of the inlining logs tells me it's just not
>> inlining everything the jdk7 logic did. I have not yet gotten an hsdis
>> build to work on OpenJDK8 (help!).
>>
>> If you still don't see performance comparing favorably to the fastruby
>> version (which, to be fair, is doing 100% virtual dispatch with no
>> indy, no user-created guards, etc), then could it simply be that your
>> indy guard logic and arbitrary precision logic adding all that
>> overhead? It seems like a lot indeed.
>>
>> Your times definitely seem a slower than they should be based on the
>> bytecode. My machine is a 2.2GHz i7, and you're on a 2.8GHz core
>> 2...that's probably about a toss-up on straight-line perf, in my
>> experience. So yeah, I guess I'd look at guard logic, your actual math
>> operations, and probably also the size of your numeric objects (my
>> RFixnum has exactly one field for the long value.
>>
>> FWIW, my experience with JRuby is that I can come within 1.5x slower
>> than Java doing the exact same operations through all virtual calls,
>> which matches almost exactly with fastruby's 30% improvement for this
>> contrived case.
>>
>> - Charlie
>>
>> On Mon, Sep 17, 2012 at 7:46 PM, Mark Roos <mr...@roos.com> wrote:
>>> After reading Charles' blog on 'fast' ruby I decided to look at how Rtalk
>>> was
>>> comparing.  At the same time I loaded the latest JDK8 just to compare.
>>>
>>> First jdk 8 runs (excellent) with some things faster but most slower.  But
>>> to
>>> my chagrin Rtalk running FIB(35) is much slower than Charles' effort.  700ms
>>> vs 300ms.
>>> So I am wondering what I could be doing that could improve.  At the end are
>>> my
>>> jvm bytecodes for fib ( which look ok ).  I am doing all 64 bit and my
>>> integer
>>> code does do conversions to/from BigIntegers.  I also use a 4K small integer
>>> cache to help with object creation.
>>>
>>> To do some research I have two versions of Hanoi as well.  The Rtalk version
>>> and
>>> an implementation in Java where I do the exact work but without
>>> invokeDynamic.
>>> Here I run 100% slower vs Charles' 30%.
>>>
>>> Suggestions?  Cmd line args, compiler choice? where to look?
>>>
>>> Thanks
>>> mark
>>>
>>>
>>> 64 bit OSX 2.8 GHz core 2 duo
>>> No cmd line options
>>> Times are the min of 10 runs.
>>>
>>> JDK 7u2 Oracle site
>>> Starting FIB 35
>>>   Time (ms) 617
>>> Starting Fib 40
>>>   Time (ms) 6982
>>> Starting Java Hanoi 25 10X
>>>   Time (ms) 294
>>> Starting RTALK Hanoi 25
>>>   Time (ms) 629
>>>
>>> JDK 8 ( 9/14)  Google code b56
>>> Starting FIB 35
>>>   Time (ms) 776
>>> Starting Fib 40
>>>   Time (ms) 8668
>>> Starting Java Hanoi 25 10X
>>>   Time (ms) 292
>>> Starting RTALK Hanoi 25
>>>   Time (ms) 752
>>>
>>> Code generated by Rtalk compiler
>>> Push Constant is a ConstantCallSite
>>> Perform is an InvokeDynamic MutableCallSite with a single GWT
>>>
>>> CLASS                rtPbc/r957 extends Object
>>> METHOD               invoke RtObject,RtObject,RtObject, access=9
>>> FRAME                -1 localVarCnt=2 {ri/core/rtalk/RtObject,
>>> ri/core/rtalk/RtObject, null, null, null, null} stackCnt=0 {null, null}
>>> Push Constant        2
>>>                      aload 0
>>> Perform              <   line  7
>>>                      getstatic ri/core/rtalk/RtObject _false
>>> Lri/core/rtalk/RtObject;
>>> JUMP                 if_acmpeq LABEL 2
>>>                      aload 0
>>> JUMP                 goto LABEL 3
>>> LABEL                LABEL 2
>>> FRAME                -1 localVarCnt=2 {ri/core/rtalk/RtObject,
>>> ri/core/rtalk/RtObject, null, null, null, null} stackCnt=0 {null, null}
>>>                      aload 1
>>>                      astore 4
>>> Push Constant        1
>>>                      aload 0
>>> Perform              -   line  42
>>>                      aload 4
>>> Perform              supportFib:   line  60
>>>                      astore 4
>>>                      aload 1
>>>                      astore 5
>>> Push Constant        2
>>>                      aload 0
>>> Perform              -   line  87
>>>                      aload 5
>>> Perform              supportFib:   line  105
>>>                      aload 4
>>> Perform              +   line  123
>>> LABEL                LABEL 3
>>> FRAME                -1 localVarCnt=2 {ri/core/rtalk/RtObject,
>>> ri/core/rtalk/RtObject, null, null, null, null} stackCnt=1
>>> {ri/core/rtalk/RtObject, null}
>>>                      areturn
>>>                      maxStack 2, maxLocals 6
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev@openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>>
_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Re: Rtalk Performance for micro benchmarks (how to improve)

Reply via email to