Felix currently optimised parallel assignments in just one
case: tail recursion. Simple cases like:

/////////////////////////////////////
#import <flx.flxh>

def var a,var b, var c = 1,2,3;
println (a,b,c);

a,b,c=c,a,b; // parallel assignment
println (a,b,c);

var x = a,b,c; // parallel assignment
x = c,b,a;

println x;
//////////////////////////////////////

are not optimised. Felix uses the horrible routine:

//////////////////////////////////////////////////////
  | `AST_assign (sr,fid,l,r) ->
    let rec aux (l,t) r =
      match l with
      | `Expr (sr,e) -> 
        begin match e with
        | `AST_tuple (_,ls) ->
          let n = seq() in
          let vn = "_" ^ si n in
          let sts = ref [] in
          let count = ref 0 in
          iter
          (fun l ->
            let r' = `AST_get_n (sr,(!count,`AST_name (sr,vn,[]))) in
            let l' = `Expr (sr,l),None in
            let asg = aux l' r' in
            sts := !sts @ asg;
            incr count
          )
          ls
          ;
          `AST_val_decl (sr,vn,dfltvs,t,Some r) :: !sts
        | _ -> 
          if fid = "_init"
          then
            match e with
            | `AST_coercion (_,(`AST_name (_,n,[]),t')) ->
              let t = match t with 
                | None -> Some t'
                | t -> t
              in
              [`AST_val_decl (sr,n,dfltvs,t,Some r)]

            | `AST_name (_,n,[]) ->
              [`AST_val_decl (sr,n,dfltvs,t,Some r)]
            | x -> clierr sr 
               ("identifier required in val init, got " ^ 
                 string_of_expr x)
          else
            [assign sr fid e r]
        end
      | `Val (sr,n) ->
          [`AST_val_decl (sr,n,dfltvs,t,Some r)]
      | `Var (sr,n) ->
          [`AST_var_decl (sr,n,dfltvs,t,Some r)]
      | `Skip (sr) ->  []
      | `Name (sr,n) -> 
        let n = `AST_name(sr,n,[]) in
          [assign sr fid n r]
      | `List ls ->
          let n = seq() in
          let vn = "_" ^ si n in
          let sts = ref [] in
          let count = ref 0 in
          iter
          (fun l ->
            let r' = `AST_get_n (sr,(!count,`AST_name (sr,vn,[]))) in
            let asg = aux l r' in
            sts := !sts @ asg;
            incr count
          )
          ls
          ;
          `AST_val_decl (sr,vn,dfltvs,t,Some r) :: !sts
    in
      let sts = aux l r in
      rsts name parent_vs access sts
//////////////////////////////////////////////////////////////

to implement assignments. The weird encoding of the LHS here is to
account for l-expressions (left expressions) aka ties, like this:

        def var a,var b, var c = 1,2,3;
        a,b,c=c,a,b;

Here, the LHS is NOT an expression of tuple type, but an l-expression.
It is implemented by (in the second case):

        tmp := c,a,b; 
        a = tmp.(0);
        b = tmp.(1);
        c = tmp.(2);

in the desugaring phase by the above Ocaml mess. This also takes
care of, for example:

        a,b,c = x;

the same way: first make a temporary of tuple type, then set the
LHS variables using tuple projections.

OTOH, an assignment like:

        x = a,b,c;

is done by:

        tmp := a,b,c; // initialisation
        x = tmp;      // assignment

The point of these temporaries is twofold: the decoupled assignments
are easy to code as an temporary initialisation and assignment,
where the LHS and RHS visibility as a list of entries or whole tuple
can be treated separately, AND, it is necessary, in cases like:

        a,b,c=c,a,b;

NOT to write this as

        a = c; b = a; c = b;

because the first a=c clobbers the old value of a required for
the second b = a.

The POINT of the parallel assignment routine is to minimise the
number of temporaries needed, which also reduces the number
of assignments + initialisations. Instead of the 6 required at
present, we COULD do:

        a' = c; c = b; b = a; a = a';

which only requires 1 temporary and 4 assignments.

One reason the Felix version of nbody is slow is that the 3-tuple used
as a vector is unravelled into component wise operations for the
functional calculations .. but the results are gathered into a tuple
for the assignments, instead of setting the individual LHS tuple 
components individually: that's safe, but it isn't necessary for
these vector ops.

The parallel assignment algorithm is designed to calculate that
this is actually safe. However at present, it only works when
the LHS consists of separate variables: that's the simplest
case, and enough for the tail recursion optimisation.

So there are two problems.

1. How to preserve the parallel assignment structure until the
   optimiser can look at it, without destroying it the way
   the desugaring algorithm does at present

2. How to generalise the parallel assignment algorithm so it
   can handle tuple components (and perhaps record and struct
   components, but they're all equivalent).


Step 2 is non-trivial. At present, the algorithm is present
with a list of the form:

        lhs1 = rhs1
        ...

where lhs terms are just variable indicies -- they can't be
expressions, including tuple components. The rhs is analysed
to get all the symbols used, which is not entirely trivial,
since it can call a child function which uses the variable.

[Note: the RHS is a function .. it cannot MODIFY anything.
If you cheat Felix, and put side effects in functions,
Felix can do unexpected things.. this is an example where
the algorithm assumes you do not do this and the optimiser
can change the calculated result unexpectedly]

Now, we have to generalise all this so we accept a list

        var,proj, proj, ...

with a variable index, and sequence of tuple indexes (they're
always constants) which identify a sub-structure of a tuple.
So changing

        x,mem0,mem2 // var x.(0).(2)

affects x, and x.mem0, and x.mem0.mem2. So the dependency calculation
is no longer a simple setwise elimination process, because you can
refer to a product as a whole, or just a component.

Problem 1 has a nice and horrible solution. The nice solution
is simply to add a new combinator 'parallel_assign' and eliminate
it in the optimiser as above, then fix it to do a better job
in some cases.

The nasty solution is to leave the above code as is, and do enough
trivial data flow analysis to detect code like:

        tmp := c,a,b; 
        a = tmp.(0);
        b = tmp.(1);
        c = tmp.(2);

The latter is somewhat appealing because it involves no changes
to the rest of the compiler, however the detector is not so easy to
write. In particular, Felix has 3 distinct ways to set variables:

        BEXE_assign, BEXE_init, BEXE_iinit

and i have trouble remembering which is used when .. :)

The actual bound code generated for the above (with the
prints removed) is:

  _1044<4603> := (1, 2, 3);
  a<4604> := (_1044<4603>).mem_0;
  b<4605> := (_1044<4603>).mem_1;
  c<4606> := (_1044<4603>).mem_2;


  _1045<4607> := (c<4606>, a<4604>, b<4605>);
  a<4604> = (_1045<4607>).mem_0;
  b<4605> = (_1045<4607>).mem_1;
  c<4606> = (_1045<4607>).mem_2;


  x<4608> := (a<4604>, b<4605>, c<4606>);
  x<4608> = (c<4606>, b<4605>, a<4604>);


where = is assignment, and := is initialisation.
It isn't clear why to bother with initialisation: initialisation
was intended for

        val x = 1;

where the LHS is necessarily local to the current function,
and also as a 'val' it cannot be assigned (but, the compiler
can break that rule of course .. just the user can't).

It was also supposedly a one-off .. but tail rec optimisations
and other loops plus switches mean that actually, initialisation 
is an assignment anyhow.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to