On Sun, 12 Dec 2010 14:54:14 -0000 "Jon Harrop" <j...@ffconsultancy.com> wrote:
> The Haskell guys got their best performance improvement moving to > LLVM from the hailstone benchmark so it is interesting to examine > this case as well. I just added support for 64-bit ints to HLVM to > implement that benchmark and my code is: > > Here’s the equivalent OCaml code: > > let rec collatzLen(c, n) : int = > if n = 1L then c else > collatzLen (c+1, if Int64.rem n 2L = 0L then Int64.div n 2L else OK, but boxing itself has nothing to do with the performance degration here. It is the lack of compiler optimizations on the Int64 type. This could be solved by implementing compiler optimizations for it (or manually optimizing some integer arithmetic that is slow). Lets run the code under a profiler, or look at the assembly (I used 'perf record' and 'perf report'). 2 'idiv' instructions turn up as top offenders in the profile. Problem #1: Int64.rem n 2 -> another idiv instruction A C compiler would optimize this to an 'and' instruction. Change that to 'Int64.logand n 1L = 0L'/ Problem #2: Int64.div n 2 -> idiv instruction. A C compiler would optimize this to a right shift. Changing that to 'Int64.shift_right n 1' speeds up the code. With these changes I get almost the same speed as the C code: $ ocamlopt x.ml -o x && time ./x 837799 real 0m0.664s user 0m0.667s sys 0m0.000s $ gcc -O3 x.c && time ./a.out 837799 real 0m0.635s user 0m0.633s sys 0m0.000s Here's the OCaml code: let rec collatzLen(c, n) : int = if n = 1L then c else collatzLen (c+1, if Int64.logand n 1L = 0L then Int64.shift_right n 1 else Int64.add (Int64.mul 3L n) 1L);; let rec loop(i, (nlen, n)) = if i = 1L then n else let ilen = collatzLen(1, i) in let nlen, n = if ilen > nlen then ilen, i else nlen, n in loop (Int64.sub i 1L, (nlen, n));; let _ = let s = loop (1000000L, (1,1000000L)) in print_int (Int64.to_int s);; > 1. Unboxing can give huge performance improvements on serial code, s/Unboxing/arithmetic optimizations/ Please find an example where the performance benefit is due to unboxing, and not due to arithmetic optimizations performed on the unboxed code. > let alone parallel code. The optimized HLVM is running 32× faster > than the OCaml here. > > 2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it > to beat GCC-compiled C code here. > One advantage of using LLVM is that it would notice arithmetic optimizations like this and perform it itself (even if you use the boxed representation). Best regards, --Edwin _______________________________________________ Caml-list mailing list. Subscription management: http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list Archives: http://caml.inria.fr Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs