Re: [Felix-language] [felix] [OT] So You Want To Write Your Own Language?

john skaller Wed, 05 Feb 2014 13:20:20 -0800

On 06/02/2014, at 5:34 AM, Raoul Duke wrote:

>> Should the user be forced to do this?
>> Felix is a scripting language so it automates a lot of these tasks,
>> but not without semantic consequences such as changing evaluation
>> strategies.
>> The design choice may well be wrong. I surely regret it a lot maintaining
>> the compiler. But the choice is deliberate. It's not an accident.
> 
> i am not a compiler writer so i do not have a leg to stand on there,
> but i am an end user who is opinionated and experienced wrt usability,
> so the thing that comes to mind is a middle ground of a block
> annotation that lets the user say "super power auto-conversion: ON!"
> for that feature, but otherwise it is not on and requires more ascii
> on the part of the user
> 
> because
> 
> this kind of semantic change is, i believe, big trouble in the long
> run. e.g. consider all the complaints about how haskell is impossible
> to use because any time you change a compile flag, the entire runtime
> of your program has possibly horribly different time & space
> consequences.




Generally this is a tough choice. In this case however there's no
particular semantic change due to switches or optimisation level:
the semantic change is due to the automatic generation of wrapper
functions. For example:

        fun add: int * int -> int = "$1+$2";

        fun foldem (f: int * int -> int) (lst:list[int]) = {
                var x = 0;
                proc incr (a:int)=> x = f (x, a);
                iter incr lst;
                return x;
        }

        var sum = foldem add (list (1,2,3,4));

Here add is normally a macro so

        add (x,y)

expands to

        x + y 

in C++. You cannot pass a macro as a value to foldem.
In C++, it needs a pointer to a closure object.
 
So Felix makes a wrapper in the compiler:

        fun add_wrapper (x:int,y:int) => add (x,y);

and passes that instead. A closure has a method:

        int add_wrapper::apply (int x, int y) { return x + y; }

As you can see this method eagerly evaluates its arguments.
Because its a C++ function. Whereas the macro add lazily
evaluates its arguments .. because that's what substitution does,
and substitution is how macros work.

So the issue here is that the compiler is doing something that
the user would have to do in the above case automatically,
which hides the fact that calling "add" directly has different 
semantics to calling it indirectly (via a closure).

Macros in Felix *usually* use lazy evaluation (substitution),
whereas closures *usually* use eager evaluation.

You can see in this case it is probably not going to make
any difference. But here's a case that does make a difference:

        fun cond : bool * int * int -> int = "$1?$2:$3";

        cond (x == 0,  0, 100 /  x );

If this one is lazily evaluated all is well. If it is eagerly evaluated
you get a division by zero. There's a term for this: if the semantics
differ we say the function is not strict. If a function is strict, the
evaluation strategy doesn't matter. Haskellers use the definition:

        f (error) = error

for strictness. In other words, the function will bug out if the parameter
evaluation bugs out. This is NOT the case for cond .. that's the whole
point of a "shortcut" operator: they use lazy evaluation.

By default Felix assumes all functions are strict, so the compiler is free
to choose whether to use eager or lazy evaluation, generally the idea being
to use lazy evaluation on direct calls for performance, and eager evaluation
for closures, also for performance.

If a function isn't strict, you have to write it to disable this assumption.
You can force eager evaluation with a var, and lazy evaluation
by explicitly passing a closure.

        fun conditional (x:bool, var y:int, var z:int) => cond (x,y,z);

will crash if calculating y or x leads to a division by zero. Whereas

        fun conditional (x: bool, y: 1 -> int, z: 1 -> int) =>
                if x then y() else z () endif
        ;

will only evaluate y or z depending on x, not both: the evaluation
is lazy.

In Haskell evaluation is always lazy, unless the compiler can prove
the function is strict. Then it can optimise by using eager evaluation
and get rid of the closures. But note, for direct calls, lazy evaluation
is almost always faster. Because you only calculate something when
you need it, and generally you avoid subroutine calling overhead
by inlining.

In Felix, the Haskell definition of strict is not enough because Felix has
variables. Consider the accessor function:

        var x  = 1;
        fun f () => x;

When you call f() it reports the current value of x. If is used in an argument:

        proc g ( y:int ) { ++x; println$ y; }

it clearly matters in the call

        g ( f () );

whether f () is evaluated eagerly or lazily. In the former case
it prints 1. In the latter 2.

Note that f is a valid function: it doesn't have any side effects.
But its value does depend on side effects, namely the modification
of the variable x. In this case you can rewrite g to force either strategy:
use var y for eager evaluation or pass y: 1 -> int for lazy and call 
println$ y();

It's not clear this is enough though because it may depend on the
arguments your using not the function! Who would have dreamed you
need to be explicit in proc g??  It is, rather, the *caller* of g using the
argument f () that knows it might matter, not the function writer,
but then the caller has to read the implementation of the function.
You're not supposed to have to do that, the type of the interface
is supposed to tell you everything you need to know, but it doesn't.

It gets even trickier if you're using a generator, which is a function
that IS allowed to have side effects. Felix always lifts direct calls
to generators out to ensure they're evaluated once, eagerly.

But if the generator is a closure, it cannot do that because the type
system doesn't distinguish generators from functions. And the macro
to closure wrapper will be applied automatically too so consider:

        fun rand : 1  = "rand()"; // random number

If you call:

        fun twice : int -> int = "$1 + $2";

        twice (rand () );

then rand is evaluated twice. But if you write

        noinline twice (x:int) => x + x;
        twice (rand () );

it is only evaluated once. Replace "noinline" with "inline"
and it will probably be evaluated twice but you can't be sure
unless you also write

        var x : int

as the parameter to force eager evaluation. If you write:

        gen rand : 1 = "rand()";

all this is fixed. Felix ALWAYS lifts calls to generators out and evaluates them
eagerly (before evaluating any expression containing the call).

If you wrap a generator in a closure, all bets are off because it no longer
knows if the value is a function or generator:

        var k = rand;

then k has type

        1 -> int

and Felix doesn't know if its a function or generator. So now

        twice ( k () )

no longer has determinate semantics. generators are not entirely
safe to use. Originally Felix didn't allow them at all. But I added them
because C programmers can't live without them. C is full of mutating
expressions like 

        ++x, a = b, ....

all these expressions are generators with side effects .. and in C,
the results are ALSO indeterminate in general. You have to understand
sequence points to predict the outcome and take care if you want
a determinate output, and you have to KNOW if a function is a function
or a macro. And in general for a function "f" you have to know if it has
side effects.

HUGE numbers of bugs occur in C because people have no idea
how to do functional programming and they pass around mutators
and mutable data structures in expressions where the order of evaluation
is hard to predict. The same occurs in Python, with stupid data structures
like Python lists (which are mutable, not functional).

So now, in summary .. there's a serious problem here.
Felix provides no notation to distinguish between functions and
accessors.  You cannot write:

        acc  f () => x;

to tell Felix to handle this thing differently to a function in direct calls.
You can write

        gen f () => ....

for generators though. Actually its a lie, you CAN write:

        pure fun f() => x; // ERROR it isn't pure!
        impure fun f() => x; // OK

although the compiler doesn't enforce this at the moment. But it isn't
much use because IMHO no one is going to annotate functions with
pure or impure. And if you use pointers it isn't clear. Just because a 
variable CAN vary doesn't mean it does. Certainly in functional
code you're not supposed to modify a variable.

The real problem is there is no difference in the type of a function,
accessor and generator. That's easy to fix, but then every higher
order function you write would need multiple variants, unless we
also introduce subtyping and automatic conversions  .. 

So actually the REAL problem is I did what people keep telling me
I should do and break from good theory and implement crap because
programmers expect it. 

You*** wanted mutating expressions. Suffer the consequences.
[I mean "programmers" not one particular person]

Felix didn't originally have them. You were FORCED to use a procedure.
Not very convenient.

Of course Felix always allowed functions to refer to variables :-)

In general THERE IS NO KNOWN SOLUTION.
Haskell's solution is sound theoretically but the result of determinate
semantics is indeterminate performance.


--
john skaller
skal...@users.sourceforge.net
http://felix-lang.org




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Re: [Felix-language] [felix] [OT] So You Want To Write Your Own Language?

Reply via email to