Re: [Caml-list] MetaOcaml and high-performance [was: AST versus Ocaml]

2009-11-10 Thread Jon Harrop
On Monday 09 November 2009 04:23:28 o...@okmij.org wrote:
 Because offshoring produces a portable C or Fortran code file, you can
 use the code on 32 or 64-bit platform. The reason the native MetaOCaml
 without offshoring does not work on amd64 is because at that time
 OCaml didn't emit PIC code for amd64. So, dynamic linking was
 impossible. That problem has long been fixed in later versions of
 OCaml...

Has the problem been fixed in MetaOCaml?

 Fortunately, some people have considered MetaOCaml to be a viable
 option for performance users and have reported good results. For
 example,

   Tuning MetaOCaml Programs for High Performance
   Diploma Thesis of Tobias Langhammer.
   http://www.infosun.fmi.uni-passau.de/cl/arbeiten/Langhammer.pdf

 Here is a good quotation from the Introduction:

 ``This thesis proposes MetaOCaml for enriching the domain of
 high-performance computing by multi-staged programming. MetaOCaml extends
 the OCaml language.
 ...
 Benchmarks for all presented implementations confirm that the
 execution time can be reduced significantly by high-level
 optimizations. Some MetaOCaml programs even run as fast as respective
 C implementations. Furthermore, in situations where optimizations in
 pure MetaOCaml are limited, computation hotspots can be explicitly or
 implicitly exported to C. This combination of high-level and low-level
 techniques allows optimizations which cannot be obtained in pure C
 without enormous effort.''

That thesis contains three benchmarks:

1. Dense float matrix-matrix multiply.

2. Blur of an int image matrix as convolution with a 3x3 stencil matrix.

3. Polynomial multiplication with distributed parallelism.

I don't know about polynomial multiplication (suffice to say that it is not 
leveraging shared-memory parallelism which is what performance users value in 
today's multicore era) but the code for the first two benchmarks is probably 
10-100x slower than any decent implementation. For example, his fastest 
2048x2048 matrix multiply takes 167s whereas Matlab takes only 3.6s here.

In essence, the performance gain (if any) from offshoring to C or Fortran is 
dwarfed by the lack of shared-memory parallelism.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


[Caml-list] MetaOcaml and high-performance [was: AST versus Ocaml]

2009-11-08 Thread oleg

Jon Harrop wrote:
 I did some experiments with MetaOCaml. Firstly, it is x86 only and not x64
 which means poor floating-point performance, many times slower than HLVM's.
 The language itself is also very restrictive, e.g. you cannot generate
 pattern matches dynamically so you cannot leverage the pattern match
 compiler, which is a huge loss. In essence, effective use of MetaOCaml
 requires writing in continuation passing style which OCaml and, therefore,
 MetaOCaml do not handle well (closure invocations are slow in OCaml and CPS
 is not optimized back into anything sane). So I do not consider MetaOCaml to
 be a viable option for performance users.

A few clarifications seems to be in order. First of all, the original
poster asked about _offshoring_ with MetaOCaml. When the generated
code is run with offshoring, a C of Fortran file is created, which can
be compiled with your favorite compiler and dynamically linked back
into the running OCaml program. Alternatively, you can use the
generated C/Fortran code as it is, as a part of a C/Fortran project. We
did exactly the latter in our FFT project: we used MetaOCaml to create C
files for FFT kernels, and plugged the files into the FFTW benchmarking
framework, which is pure C. It worked as expected.

Because offshoring produces a portable C or Fortran code file, you can
use the code on 32 or 64-bit platform. The reason the native MetaOCaml
without offshoring does not work on amd64 is because at that time
OCaml didn't emit PIC code for amd64. So, dynamic linking was
impossible. That problem has long been fixed in later versions of
OCaml. Offshoring is a good way to get around it, thanks Jan Kybic.

Offshoring could just as well produce Verilog or LLVM code. Alas,
we didn't get around to exploring that idea.

Regarding continuation-passing style (CPS): if your code generatOR
needs let-insertion, then the generatOR _may_ need to be encoded in
CPS. The generatED code is _not_ in CPS; it is in the `conventional',
so-called `direct style'. Even if we assume that CPS code is
difficult to compile efficiently (which is not certain: in CPS, all
`important' function calls are tail-calls, and OCaml is very good with
tail-calls), that difficulty affects only the code generator rather
than the generated code. Most of the time it is the generated code
that is performance-critical.

One may say that writing the generator in CPS is a bother. I have
heard such objections. Please see our PEPM2009 paper about a way to
address it
http://okmij.org/ftp/Computation/Generative.html#circle-shift
and write generators in the conventional style. Please see the example
of writing a flexible Gaussian Elimination code, paying no penalty for
abstractions. 

Fortunately, some people have considered MetaOCaml to be a viable
option for performance users and have reported good results. For
example,

Tuning MetaOCaml Programs for High Performance
Diploma Thesis of Tobias Langhammer.
http://www.infosun.fmi.uni-passau.de/cl/arbeiten/Langhammer.pdf

Here is a good quotation from the Introduction:

``This thesis proposes MetaOCaml for enriching the domain of high-performance
computing by multi-staged programming. MetaOCaml extends the OCaml
language.
...
Benchmarks for all presented implementations confirm that the
execution time can be reduced significantly by high-level
optimizations. Some MetaOCaml programs even run as fast as respective
C implementations. Furthermore, in situations where optimizations in
pure MetaOCaml are limited, computation hotspots can be explicitly or
implicitly exported to C. This combination of high-level and low-level
techniques allows optimizations which cannot be obtained in pure C
without enormous effort.''

___
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs