-------- Original Message --------
Subject:        Re: [Jchat] J subset compilation
Date:   Thu, 27 Feb 2014 23:34:39 -0500
From:   Robert Bernecky <[email protected]>
To:     [email protected]



Having been down this road several times (and still looking for
the exit sign...), I think it makes sense to start out by asking what
problem you are trying to solve. I ask this because the ILs you
picked are extremely low-level, and you will have roughly
zero opportunity to perform algebraic optimizations on any
but the most trivial array-level computations. E.g, consider if you
could handle:

M2← ⍉⍉ M NB. Transpose transpose, if you can't read the APL...

and turn it into:

M2← M

Let's assume you stick with a primitive language, such as those
below. Oh, and remember that C does not have arrays. Neither
does D, which seems like a good candidate for an IL, at first
glance. They have ONLY vectors. If you think they have arrays,
try these two things:

1. Write a function that accepts two array arguments of
arbitrary shape and returns an array of the sum of the first
and the transpose of the second. I ask this one because, as
far as I know, you can't pass an array to a C function unless
it is a vector or has statically declared shape.

2. Write a transpose function:

M2← ⍉M

3. Ensure that the function in (2) works on empty arrays, such as
an array of shape 3 0. This is where C falls apart.
Anytime someone tells you that
they can mimic arrays with vectors of vectors, ask them how
they handle empty arrays.

4. Ensure that all your array-level optimizations work for the
function in (2).

Now, if you plan to ignore array-level optimizations, then you
end up with what is, effectively, an interpreter that has the
syntax analysis performed statically. This is A Good Thing for
iterative functions working on small arrays, but people like Dyalog
are doing some nice JITly things to achieve that same end,
transparently. And, you still have to write a LOT of your own run-time
code, or else (ta da!) invoke the J interpreter for the missing bits.

If you plan to ignore array-level optimizations, then you will
have great difficulty in coming near the performance of hand-written
scalar-oriented code, or array-optimized functional array language
code ( E.g., SAC&  APEX), because it is difficult to fuse loops
in such a way as to eliminate array-valued temps. I discussed
this in my MSc thesis, which you can find here:

http://www.snakeisland.com/apexup.htm

If you ignore array-level optimizations, then you will be nailed
to the wall on performance, due to the high latencies and
limited bandwidth of contemporary memory subsystems.

I suggest you look carefully at SAC. I am using it as my primary
back end for APEX, and am wrapping up (I hope) some research
into array-level optimizations for SAC. www.sac-home.org
It has some excellent ideas for both language design (it also
has some very baroque ones, IMO) and optimizations.

An alternate approach that I have been considering is taking
the concepts of SAC and the SAC compiler, and designing a new
compiler (SAC is written in C.) in J or APL. This would still
be a fair amount of work, but if done right (I.e., the compiler
has to be able to compile itself, which means you really
need support for structs and boxed arrays.), it could be
a highly productive tool AND also serve as the basis for
serious research into optimizers. [I hate writing pages of C
in order to end up doing something like a slight variant on
set membership...]

BTW, if it's a toy you want, I recommend a yo-yo. They'll be back in
fashion soon. [Actually, we have one for mayor here in Toronto.]
Getting something trivial working is not extremely hard - we did
ACORN for the CRAY Y-MP in about two weeks, for
really simple things. Getting it to work on real applications, or even on
language subsets, is a lot more work. E.g., how do you plan to handle
file I/O? Array formatting?

Bob

P.S.: Don't forget 1-bit storage for Boolean arrays.

P.P.S: Be sure to scalarize ALL small arrays, in your copious spare
time. Otherwise, code like Mike Jenkin's APL model for domino,
which contained this nifty expression for a pivot
operation to swap two matrix rows:

M[i,j;] ← M[j,i;]

will run about twice as long than it would if you didn't allocate,
fill, use, then deallocate, two 2-element vectors: i,j and j,i.
When I replaced the above line (by hand, at the time)
by this:

tmp←M[i;]
M[i;]←M[j;]
M[j;]←tmp

the whole domino function roughly doubled in speed.
This is one tip of the iceberg.

b

On 14-02-27 10:19 PM, Joe Bogner wrote:
 I started to toy with the idea of working on a subset of J that can be
 compiled. The idea has been brought up before in the past. Yes, I
 realize it's a path fraught with peril and a likely premature demise.
 Still, as a toy, I figure it would be fun.

 Options for intermediate language:

 1. GCC RTL - It looks very challenging and as far as I can tell would
 require recompiling GCC, which is something I don't want to do

 2. LLVM IR - I cobbled together a small example that emits IR which
 can then be compiled. Hypothetically, we could use the LLVM C bindings
 to generate IR from J words directly from J. For my proof of concept,
 I used fsharp bindings since that seemed to be the easiest path on
 wndows. Julia, Rust, and Haskell have LLVM targets.

 3. C - Either clang, gcc or tinycc. Generating C doesn't seem as
 stable and professional grade as the earlier options. However, I have
 come across decent languages that do it (Chicken Scheme and others).
 ELI does it, http://fastarray.appspot.com/compile.html. APEX also does
 it, http://www.snakeisland.com/apexup.htm. Could be used to also JIT
 code from J for small functions.

 I'm currently edging towards the C path. Maybe that makes more sense
 as a little proof of concept since it seems like it'd be quicker to
 get up and running. My vision here is to implement a subset of verbs
 in C ({ i. # $) .. really basic .. and then have a parser that takes a
 series of J expressions and translates them into the corresponding
 running list of C function calls - all in a single main().

 Some things I'm excited about trying out:

 1. clang/llvm vectorization -
 http://blog.llvm.org/2013/05/llvm-33-vectorization-improvements.html

 2. parallel computation and concurrency -
 http://opensource.mlba-team.de/xdispatch/docs/current/index.html
 ----------------------------------------------------------------------
 For information about J forums see http://www.jsoftware.com/forums.htm



--
Robert Bernecky
Snake Island Research Inc
18 Fifth Street
Ward's Island
Toronto, Ontario M5J 2B9

[email protected]
tel: +1 416 203 0854



----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to