The Haskell guys got their best performance improvement moving to LLVM from the hailstone benchmark so it is interesting to examine this case as well. I just added support for 64-bit ints to HLVM to implement that benchmark and my code is:
let rec collatzLen ((c, n): int * int64) : int = if n = 1L then c else collatzLen (c+1, if n % 2L = 0L then n / 2L else 3L * n + 1L);; let rec loop ((i, (nlen, n)): int64 * (int * int64)) : int64 = if i = 1L then n else let ilen = collatzLen(1, i) in let nlen, n = if ilen > nlen then ilen, i else nlen, n in loop (i-1L, (nlen, n));; When run without using LLVM's optimization passes, this produces the following output for 10k, 100k and 1M, respectively: - : `Int64 = 6171L Live: 0 0.00704098s total; 0s suspend; 0s mark; 0s sweep - : `Int64 = 77031L Live: 0 0.087815s total; 0s suspend; 0s mark; 0s sweep - : `Int64 = 837799L Live: 0 1.06907s total; 0s suspend; 0s mark; 0s sweep With LLVM's default optimization passes enabled, I get: - : `Int64 = 6171L Live: 0 0.00508595s total; 0s suspend; 0s mark; 0s sweep - : `Int64 = 77031L Live: 0 0.0626569s total; 0s suspend; 0s mark; 0s sweep - : `Int64 = 837799L Live: 0 0.759738s total; 0s suspend; 0s mark; 0s sweep Note the ~30% performance improvement in this case. In other cases, LLVM's default optimization passes can degrade performance significantly. Heres the equivalent OCaml code: let rec collatzLen(c, n) : int = if n = 1L then c else collatzLen (c+1, if Int64.rem n 2L = 0L then Int64.div n 2L else Int64.add (Int64.mul 3L n) 1L);; let rec loop(i, (nlen, n)) = if i = 1L then n else let ilen = collatzLen(1, i) in let nlen, n = if ilen > nlen then ilen, i else nlen, n in loop (Int64.sub i 1L, (nlen, n));; and Haskell: import Data.Int collatzLen :: Int -> Int64 -> Int collatzLen c 1 = c collatzLen c n = collatzLen (c+1) $ if n `mod` 2 == 0 then n `div` 2 else 3*n+1 pmax x n = x `max` (collatzLen 1 n, n) main = print $ foldl pmax (1,1) [2..1000000] and C99: #include <stdio.h> int collatzLen(int c, long long n) { return (n==1 ? c : collatzLen(c+1, (n%2==0 ? n/2 : 3*n+1))); } long long loop(long long m) { int nlen=1; long long n = 1; for (long long i=2; i<=m; ++i) { const int ilen = collatzLen(1, i); if (ilen > nlen) { nlen = ilen; n = i; } } return n; } int main() { printf("%lld", loop(1000000)); } And here are my timings: OCaml: 24.3s ocamlopt Haskell: 24.0s ghc6.10 --make -O2 F#.NET: 3.45s fsc --optimize+ --platform:x86 C: 1.20s gcc -O3 -std=c99 HLVM: 1.07s ./repl --compile HLVM (opt): 0.76s opt -tailcallelim -std-compile-opts These results really demonstrate two things: 1. Unboxing can give huge performance improvements on serial code, let alone parallel code. The optimized HLVM is running 32× faster than the OCaml here. 2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it to beat GCC-compiled C code here. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs