I assume that the intent is to compile to machine language, or possibly C. The thing is that JAVA and JAVASCRIPT are considered compiled, but compile to virtual machines, not to hardware. Isn't the j.dll like a virtual machine? It's relatively small in today's standards. I don't know how big the JAVA and JAVASCRIPT VMs are, but I suspect that they are not much smaller than j.dll .
On Wed, Feb 5, 2014 at 10:44 AM, Robert Bernecky <[email protected]>wrote: > It's not all that hard, at least at first... 8^} > > Types: APEX infers, via data flow analysis (DFA) of the > APL code, the type of each primitive's arguments and results. > The slight hiccup in this is that the "main" function's arguments > need to be declared. No big deal. Here, for example, is the > declaration of n in such a function: > > r←main n;⎕IO;⎕RL;⎕PP;⎕PW > ⍝ dcl integer scalar n > r←benlogd n > ... > > All other types and ranks are determined by DFA (or > else the compiler complains.). > > Ranks: Detection and run-time support for scalars is crucial for good > performance. Merely treating scalars as rank-0 arrays may be a good > idea in theory, but not in practice. Case in point: > I compiled Mike Jenkins' APL model of matrix inverse, aka > domino to SISAL, many moons ago. It contained a pivot operation > that was written this way, more or less: > > mat[i,j;] ← mat[j,i;] > > to swap rows i and j of mat. Runtime performance was, umm, > less than spectacular. Once I realized that the i,j was > allocating two two-element arrays, doing very little with them, then > deallocating them, on every pivot operation, I replaced the code > with this ugliness: > > tmp←mat[i;] > mat[i;]←mat[j;] > mat[j;]←tmp > > The entire matrix divide benchmark now ran twice as fast! > > The next step is to "flatten" each APL statement into single-function > calls, so we can store the resulting abstract syntax tree as > a boxed rank-2 matrix. E.g, we start with this signal processing > benchmark function (logdAKD): > > r←benlogd siz > ⍝ Bench Bates' LOGDERIV function > r←+/LOGDERIV 0.5+siz⍴⍳100 > > The tokenizer/parser/syntax analyzer turns that into this > delightful flattened array: > > 11 seeast D xx[1] > target x lop fn rop y type rank shape value class > benlogd 2 ¯1 ¯1 ¯1 3 ¯1 ¯1 ¯1 ¯1 M > r ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 v > - ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 v > siz ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 v > LOGDERIV ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 M > ⎕io ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 v > ⎕pp ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 v > ⎕pw ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 v > ⎕ct ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 v > ⎕rl ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 v > ⎕wa ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 v > 0.5 ¯1 ¯1 ¯1 ¯1 ¯1 2 0 0.5 n > 100 ¯1 ¯1 ¯1 ¯1 ¯1 1 0 100 n > END ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 > 14 ¯1 ¯1 ⍳ ¯1 12 ¯1 ¯1 ¯1 ¯1 v > 15 3 ¯1 ⍴ ¯1 14 ¯1 ¯1 ¯1 ¯1 v > 16 11 ¯1 + ¯1 15 ¯1 ¯1 ¯1 ¯1 v > 17 ¯1 ¯1 4 ¯1 16 ¯1 ¯1 ¯1 ¯1 v > 18 ¯1 21 / ¯1 17 ¯1 ¯1 ¯1 ¯1 v > 19 ¯1 ¯1 :PA ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 V > 20 ¯1 ¯1 :PA ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 V > 21 19 ¯1 + ¯1 20 ¯1 ¯1 ¯1 ¯1 V > 1 ¯1 ¯1 ← ¯1 18 ¯1 ¯1 ¯1 ¯1 v > > E.g., line 15 is doing a siz reshape ⍳100, where x=3 is > the ast row for siz, and y=14 is the where ⍳100 is computed. > > Next, a Static Single Assignment (SSA) phase makes > the whole task of analysis MUCH easier. > > Then DFA, fills in type, rank, shape(where known), > value(where known), and other properties, such as Array Predicates. > > So far, we know nothing about the target language > (SISAL, SAC, D, so far). As it should be. > > The code generator then churns out really crude code, in > this case for SAC: > > inline double benlogdXID(int siz,int QUADio) > { > A_22=iotaXII( 100,QUADio); > A_23=rhoIII(siz,A_22); > /* dsf scalar(s) */ > A_24=plusDID(0.5,A_23); > A_25=LOGDERIVXDD( A_24); > A_26=plusslXDDFOLD( A_25); > r_0=( A_26); > return(r_0); > } > > As you can see, the generated code is, for all practical purposes, > just function calls. Everything of importance is inlined, so > this does not affect performance. > The code generator also emits customized > code for each primitive definition. Here's what it does for iota. > We start with this code fragment from a library: > > %Fragment iota x01 x bidc i . > inline int[.] $FNAME($YTYPE y, int QUADio) > { /* Index generator on scalar */ > z = QUADio+iota(toi(y)); > return( z); > } > > The $xxx fields are essentially macros (think m4, more than cpp...) > that get replaced with types, etc. This one is trivial, but things > like inner products and semi-globals make things messier: > > inline int[.] iotaXII(int y, int QUADio) > { /* Index generator on scalar */ > z = QUADio+iota(toi(y)); > return( z); > } > > So, there you have it: one APL compiler. > Stuff that code through the SAC compiler > and you'll get out C code for a single-thread, multi-thread, GPU, > or other target system, as you like. > > You're on your own with RTL: I suggest generating high-level > function array language code, because then things like the above > multi-thread stuff comes for free. Those little things like > memory management also come for free with high-level back > end languages. > > I think you can see that the task of writing a compiler for J > is essentially the same as what I did with APEX. It's straightforward, > at least at first... > > Bob > > > On 14-02-05 11:55 AM, Raul Miller wrote: > >> I would prefer to leave the funding and resource issues for another >> time. But I would keep in mind that there's a lot of student out >> there, looking for opportunities to learn. I would expect code >> generated by students to be well below professional standards, but >> mixed in with the clutter would be some really incredible work. So my >> ideal of a development process would engage the academic community, to >> help research and understand the issues and would back that up with >> some professional support. DARPA might have some interest here, also, >> if you are looking for seed money - but even more important would be >> finding interested academics. >> >> The type promotion/demotion issues are a much more interesting issue, >> from my point of view. >> >> My take here is that each of the primitive types (integer, float, ..) >> would be a potential type of the type system, and also that some >> "tagged union types" (number, character, ...) would also be potential >> types of the type system. (But a first version might support only one >> type or only a limited subset of types.) >> >> This leaves us with another problem: how do we indicate which type(s) >> to be used in the compiled code. >> >> One possibility uses a cookbook set of assert. statements. If an >> assert statement forces an error for all but one type, we can decide >> that all code after that point involving that variable are constrained >> to that type. >> >> Similarly, a complete lack of asserts might be taken as meaning that >> that data element is the generic array type we know and love as a J >> noun. We would have to come up with terminology to distinguish this >> case from the more constrained case. >> >> Another issue, of course, is declaring the rules for result types from >> supported verbs. You'd need a whole set of rules that would need to be >> supported by the code. Or, you would need to write the code and then >> extract the rules. Or maybe build a big chart of all the cases and >> then attempt to refine that down to a concise set of memorable quips. >> >> I'd also be targeting RTL >> (http://gcc.gnu.org/onlinedocs/gccint/RTL.html) rather than C. >> >> As for bigints, I'm not sure if the best approach would be to >> incorporate GMP (https://gmplib.org/) or if it's better to implement >> support directly. Any decision is going to have a cost, and - >> especially in the early stages of development - we'd have to >> understand that we may eventually want to jettison both the costs and >> the benefits of some of this exploration. >> >> It seems to me that the hard part involves engaging interest of other >> people. My above sketch assumes that academics and students would have >> some interest in the language and its possibilities. And to get there, >> I think we need to do a better job of demonstrating the capabilities >> of the language and, at the same time, I think we also need to do a >> better job of demonstrating how well J can work with other languages. >> >> And I think both of those are well within reach. >> >> Of course, this kind of thing will also tend to kick up a fair bit of >> noise and odd problems, so we will need to tolerate some of that also. >> >> Thanks, >> >> > > -- > Robert Bernecky > Snake Island Research Inc > 18 Fifth Street > Ward's Island > Toronto, Ontario M5J 2B9 > > [email protected] > tel: +1 416 203 0854 > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
