Re: [jvm-l] Re: bignum implementation [was: Trace compiler]

Charles Oliver Nutter Wed, 02 Jun 2010 23:18:04 -0700

Another data point for JRuby.

JRuby has only two integer data types: Fixnum (implemented by
RubyFixnum, always containing a long) and Bignum (implemented by
RubyBignum, using BigDecimal). All Fixnum math operations have an
overflow check, and the arbitrary-precision nature of integers will
probably be the biggest hindrance to lifting integer math to
primitives.


This is OS X Java 1.6.0_20, on a 2.66GHs Core 2 Duo.

~/projects/jruby ➔ jruby --server -J-d64 bench/bench_tak.rb 5
      user     system      total        real
  2.861000   0.000000   2.861000 (  2.793000)
  1.987000   0.000000   1.987000 (  1.987000)
  1.978000   0.000000   1.978000 (  1.978000)
  1.981000   0.000000   1.981000 (  1.982000)
  1.980000   0.000000   1.980000 (  1.980000)

Standard execution. A large part of the overhead here is managing a
Ruby frame, which has backtrace info, etc, for every Ruby call. Tak is
much more tak-call-heavy than math-heavy from what I've seen.

~/projects/jruby ➔ jruby --server -J-d64 --fast  bench/bench_tak.rb 5
      user     system      total        real
  1.984000   0.000000   1.984000 (  1.939000)
  0.786000   0.000000   0.786000 (  0.786000)
  0.782000   0.000000   0.782000 (  0.782000)
  0.793000   0.000000   0.793000 (  0.793000)
  0.781000   0.000000   0.781000 (  0.782000)

"fast" mode, which eliminates Ruby frames when they aren't needed (and
generates backtraces in another way), dispatches literal Fixnums as
long, and pulls the method objects all the way to the call site so
they can inline. Still several layers between each call, plus all
dynamic calling logic.

~/projects/jruby ➔ jruby --server -J-d64 -J-Djruby.compile.dynopt=true
bench/bench_tak.rb 5
      user     system      total        real
  0.891000   0.000000   0.891000 (  0.821000)
  0.325000   0.000000   0.325000 (  0.325000)
  0.323000   0.000000   0.323000 (  0.323000)
  0.322000   0.000000   0.322000 (  0.322000)
  0.323000   0.000000   0.323000 (  0.323000)

Experimental dynopt mode; based on previously-seen calls, generates
direct static dispatches to both Fixnum operations and for the
recursive calls to tak. The least Ruby compatible, so far: no dispatch
guards, no backtrace logic, and if it overflows into Bignum it would
ClassCastException. But it's nearly as fast as writing the same code
against RubyFixnums in Java.

If I go a tiny bit further and turn off some other Rubyisms (updating
a thread-local line number variable, checking for thread events
periodically) I can get it down to this:

~/projects/jruby ➔ jruby --server -J-d64 -J-Djruby.compile.dynopt=true
-J-Djruby.compile.threadless=true -J-Djruby.compile.positionless=true
bench/bench_tak.rb 5
      user     system      total        real
  0.946000   0.000000   0.946000 (  0.868000)
  0.272000   0.000000   0.272000 (  0.272000)
  0.277000   0.000000   0.277000 (  0.277000)
  0.274000   0.000000   0.274000 (  0.274000)
  0.274000   0.000000   0.274000 (  0.274000)

I'm curious why your base perf is so close to this final number, since
it seems pretty amazing to me, and still doesn't have the guards it
needs to be really valid.

Maybe you can post the tak bytecode for the final result in Ejang? I'd
love to see what you're doing...

Here's the bytecode for the "most optimized currently possible" Ruby
version of this:

(3, 4, and 5 are the incoming arguments...and yeah, there's obviously
room for improvement here)
*** Dumping ***
    ALOAD 3
    ASTORE 13
    ALOAD 4
    ASTORE 14
    ALOAD 5
    ASTORE 15
   L0
   L1
    LINENUMBER 2 L1
    ALOAD 14
    CHECKCAST org/jruby/RubyFixnum
    ALOAD 1
    ALOAD 13
    INVOKEVIRTUAL org/jruby/RubyFixnum.op_ge
(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
    INVOKEINTERFACE org/jruby/runtime/builtin/IRubyObject.isTrue ()Z
    IFEQ L2
   L3
    LINENUMBER 3 L3
    ALOAD 15
    ARETURN
    GOTO L4
   L2
   L5
    LINENUMBER 5 L5
    ALOAD 0
    ALOAD 1
    ALOAD 2
    ALOAD 0
    ALOAD 1
    ALOAD 2
    ALOAD 13
    CHECKCAST org/jruby/RubyFixnum
    ALOAD 1
    LDC 1
    INVOKEVIRTUAL org/jruby/RubyFixnum.op_minus
(Lorg/jruby/runtime/ThreadContext;J)Lorg/jruby/runtime/builtin/IRubyObject;
    ALOAD 14
    ALOAD 15
    ACONST_NULL
    INVOKESTATIC
ruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89.__file__
(Lruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;
    ALOAD 0
    ALOAD 1
    ALOAD 2
    ALOAD 14
    CHECKCAST org/jruby/RubyFixnum
    ALOAD 1
    LDC 1
    INVOKEVIRTUAL org/jruby/RubyFixnum.op_minus
(Lorg/jruby/runtime/ThreadContext;J)Lorg/jruby/runtime/builtin/IRubyObject;
    ALOAD 15
    ALOAD 13
    ACONST_NULL
    INVOKESTATIC
ruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89.__file__
(Lruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;
    ALOAD 0
    ALOAD 1
    ALOAD 2
    ALOAD 15
    CHECKCAST org/jruby/RubyFixnum
    ALOAD 1
    LDC 1
    INVOKEVIRTUAL org/jruby/RubyFixnum.op_minus
(Lorg/jruby/runtime/ThreadContext;J)Lorg/jruby/runtime/builtin/IRubyObject;
    ALOAD 13
    ALOAD 14
    ACONST_NULL
    INVOKESTATIC
ruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89.__file__
(Lruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;
    ACONST_NULL
    INVOKESTATIC
ruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89.__file__
(Lruby/jit/tak_AF4F7383C1F97732589C2C05AE5BBD9AB6C81E89;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;
    ARETURN
   L4
    ARETURN
   L6
    LOCALVARIABLE x Lorg/jruby/runtime/builtin/IRubyObject; L0 L6 13
    LOCALVARIABLE y Lorg/jruby/runtime/builtin/IRubyObject; L0 L6 14
    LOCALVARIABLE z Lorg/jruby/runtime/builtin/IRubyObject; L0 L6 15
  @Lorg/jruby/anno/JRubyMethod;(name="__file__", frame=true,
required=3, optional=0, rest=-1)


On Sun, May 30, 2010 at 9:46 AM, Kresten Krab Thorup <[email protected]> wrote:
> OK, my current implementation for integers [int Erjang] is
>
>  class EInteger extends ENumber { ... }
>  class ESmall extends EInteger { int value; }
>  class EBig extends EInteger { BigInteger value; }
>
> With this, I run the bench_tak at
>
>   MacOSX 16.3-b01-279 -server 385ms/loop
>   soylatte16-i386-1.0.3 -server 500ms/loop
>
> I tried to implement your data structure, ...
>
>  class EInteger extends ENumber { int ival; BigInteger bival; ... }
>
>   MacOSX 16.3-b01-279 -server 485ms/loop
>   soylatte16-i386-1.0.3 -server 497ms/loop (-XX:+DoEscapeAnalysis)
>   soylatte16-i386-1.0.3 -server 606ms/loop (-XX:-DoEscapeAnalysis)
>
> With escape analysis it runs more stable, i.e. it looks like there is
> much less GC going on.
>
> Kresten
>
> Erlang test code looks like this
>
> -----------------------------------------------------------
> tak(X,Y,Z) when Y >= X ->
>    Z;
> tak(X,Y,Z) ->
>    tak( tak(X-1, Y, Z),
>         tak(Y-1, Z, X),
>         tak(Z-1, X, Y) ).
>
> main(N) ->
>    Body = fun() ->
>                   Before = erlang:now(),
>                   times(10, fun() -> tak(24,16,8) end),
>                   After = erlang:now(),
>                   Diff = timer:now_diff(After, Before),
>                   io:format("run: ~pms~n", [Diff div 1000])
>    end,
>
>    timer:tc(?MODULE, times, [N, Body]).
> -----------------------------------------------------------
>
>
>
>
>
> So I tried to re-do it with essentially your
>
>
>
> On May 29, 5:15 am, Per Bothner <[email protected]> wrote:
>> On 05/28/2010 03:24 AM, Kresten Krab Thorup wrote:
>>
>>
>>
>>
>>
>>
>>
>> > On May 27, 1:17 am, Per Bothner<[email protected]>  wrote:
>> >> This is another example of where structs would be helpful.  That would
>> >> allow:
>>
>> >> struct Integer {
>> >>     int ivalue;
>> >>     int[] iwords;
>>
>> >> }
>> > ....
>>
>> >> Kawa basically does this using a regular class gnu.math.Integer,
>> >> and "small" Integers pre-allocated.  That is one reason Kawa's
>> >> arbitrary-precision handling is (mostly) noticeably faster than 
>> >> BigInteger.
>> >> (Some of that could be achieved by further tweaking BigInteger.)
>>
>> > 1: I'm intrigued.  How much does this give you?
>>
>> I don't have numbers convenient, but even with the current JVM (i.e.
>> no "struct" support) the big advantage is that in most cases you
>> don't allocate a "data" array, as long as the integer fits in 32 bits.
>> That same you a lot of memory and gc time.  It means you have to explicitly
>> check for the immediate vs array modes (i.e. words==null or not), but
>> once determined you have a 32-bit number the actual work is quick,
>> and requires fewer memory accesses (and hence cache misses).
>>
>> BigInteger could (and perhaps should) do the same optimization.
>> But BigInteger has a some further overheads, including some
>> seldomly-used fields.
>>
>> > I can see that you avoid a virtual call for all math operators where
>> > you can determine the integer-ness of operands; so it does not have to
>> > choose between SmallInt and BigInt objects.  But it comes at the
>> > overhead of an extra word per integer.
>>
>> Right, but that is modest compared to the space used by BigInteger.
>>
>>
>>
>>
>>
>> > Integer arithmetic (most notably X+1, X-1, and X==<constant>) used in
>> > loops is currently a noticeable performance issue in Erjang (when
>> > comparing to the normal erlang implementation using tags).  In many
>> > such cases, I could avoid a virtual call and that does sound
>> > appealing.
>>
>> > 2: What is the motivation in Kawa to make your own bignum
>> > implementation.  Why not just have
>>
>> > class KawaInteger {
>> >     int ival;
>> >     BigInteger bval;
>>
>> >     ...
>> > }
>>
>> > i.e., fall back on the standard bignum implementation?
>>
>> That was not an option at the time: gnu.math.IntNum was implemented
>> before java.math.BigInteger was available.  If I started from scratch
>> that would probably have made sense to do what you're suggesting. But
>> since the current implementation is faster and more space efficient
>> than using BigInteger I don't see much point in ripping it out.
>>
>> You're free to use gnu.math.IntNum (and gnu.math in general); it has
>> no dependencies on the rest of Kawa.
>> --
>>         --Per Bothner
>> [email protected]  http://per.bothner.com/
>
> --
> You received this message because you are subscribed to the Google Groups 
> "JVM Languages" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/jvm-languages?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en.

Re: [jvm-l] Re: bignum implementation [was: Trace compiler]

Reply via email to