The Haskell guys got their best performance improvement moving to LLVM from
the hailstone benchmark so it is interesting to examine this case as well. I
just added support for 64-bit ints to HLVM to implement that benchmark and
my code is:

  let rec collatzLen ((c, n): int * int64) : int =
    if n = 1L then c else
      collatzLen (c+1, if n % 2L = 0L then n / 2L else 3L * n + 1L);;

  let rec loop ((i, (nlen, n)): int64 * (int * int64)) : int64 =
    if i = 1L then n else
      let ilen = collatzLen(1, i) in
      let nlen, n = if ilen > nlen then ilen, i else nlen, n in
      loop (i-1L, (nlen, n));;

When run without using LLVM's optimization passes, this produces the
following output for 10k, 100k and 1M, respectively:

  - : `Int64 = 6171L
  Live: 0
  0.00704098s total; 0s suspend; 0s mark; 0s sweep
  - : `Int64 = 77031L
  Live: 0
  0.087815s total; 0s suspend; 0s mark; 0s sweep
  - : `Int64 = 837799L
  Live: 0
  1.06907s total; 0s suspend; 0s mark; 0s sweep

With LLVM's default optimization passes enabled, I get:

  - : `Int64 = 6171L
  Live: 0
  0.00508595s total; 0s suspend; 0s mark; 0s sweep
  - : `Int64 = 77031L
  Live: 0
  0.0626569s total; 0s suspend; 0s mark; 0s sweep
  - : `Int64 = 837799L
  Live: 0
  0.759738s total; 0s suspend; 0s mark; 0s sweep

Note the ~30% performance improvement in this case. In other cases, LLVM's
default optimization passes can degrade performance significantly.

Here’s the equivalent OCaml code:

  let rec collatzLen(c, n) : int =
    if n = 1L then c else
      collatzLen (c+1, if Int64.rem n 2L = 0L then Int64.div n 2L else
Int64.add (Int64.mul 3L n) 1L);;

  let rec loop(i, (nlen, n)) =
    if i = 1L then n else
      let ilen = collatzLen(1, i) in
      let nlen, n = if ilen > nlen then ilen, i else nlen, n in
      loop (Int64.sub i 1L, (nlen, n));;

and Haskell:

  import Data.Int

  collatzLen :: Int -> Int64 -> Int
  collatzLen c 1 = c
  collatzLen c n = collatzLen (c+1) $ if n `mod` 2 == 0 then n `div` 2 else
3*n+1

  pmax x n = x `max` (collatzLen 1 n, n)

  main = print $ foldl pmax (1,1) [2..1000000]

and C99:

  #include <stdio.h>

  int collatzLen(int c, long long n) {
    return (n==1 ? c : collatzLen(c+1, (n%2==0 ? n/2 : 3*n+1)));
  }

  long long loop(long long m) {
    int nlen=1;
    long long n = 1;
    for (long long i=2; i<=m; ++i) {
      const int ilen = collatzLen(1, i);
      if (ilen > nlen) {
        nlen = ilen;
        n = i;
      }
    }
    return n;
  }

  int main() {
    printf("%lld", loop(1000000));
  }

And here are my timings:

OCaml:      24.3s   ocamlopt
Haskell:    24.0s   ghc6.10 --make -O2
F#.NET:      3.45s  fsc --optimize+ --platform:x86
C:           1.20s  gcc -O3 -std=c99
HLVM:        1.07s  ./repl --compile
HLVM (opt):  0.76s  opt -tailcallelim -std-compile-opts

These results really demonstrate two things:

1. Unboxing can give huge performance improvements on serial code, let alone
parallel code. The optimized HLVM is running 32× faster than the OCaml here.

2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it to beat
GCC-compiled C code here.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Reply via email to