----- Forwarded message from Kragen Javier Sitaker <[email protected]> -----

Date: Mon, 2 Jan 2012 02:58:56 -0500
From: Kragen Javier Sitaker <[email protected]>
To: Tomasz Rola <[email protected]>
Subject: Re: Arduino frustrations: C is not a higher-order language, and
        timing is not an optimization
User-Agent: Mutt/1.5.20 (2009-06-14)

On Mon, Jan 02, 2012 at 05:15:03AM +0100, Tomasz Rola wrote:
> On Sun, 1 Jan 2012, Kragen Javier Sitaker wrote:
> > So the upshot is that I end up doing a lot of stuff either by hand
> > or at runtime that I'd like to do automatically at compile time, and
> > then I have to debug problems by trial and error instead of getting
> > useful error messages.
> 
> Hello,

Hi!  May I post your response, and this response to it, to kragen-discuss?

> Perhaps you are enough frustrated to consider Forth?

Well, I've never done anything serious in Forth, but that's probably in part
because I haven't done much embedded programming.

> I'm no expert on Arduinos, and not much knowledge of Forth either, but 
> reading Arduinos' specifications so far lefts me with impression that 
> maybe C is a bit too big for them.

You may be right!

> Google says there is some work on making Forth, but I didn't tried it (it 
> would be hard without buying my piece of hardware, but buying would mean 
> dedication and I really don't have so much time).

You can probably get something running on an Arduino (using their IDE, not
Forth) with US$20 and half an hour.  Try it, it'll be fun!

> Unfortunately I don't have much more to say on this. I must sound lame :-)

Don't worry!

> but if there is a chance you never heard of Forth, here is my chance to be 
> helpful. Yes it looks strange. And I tried it a bit and it is quite good 
> thing for a strange language.

In some ways it could help.  For example, building a data structure to
represent a video frame type at compile time, then compiling that into a
machine-code routine to generate the video frame, could be done easily in
Forth.

But in others, it's kind of the opposite direction.  A big part of my complaint
is that I want the computer to detect bugs (e.g. your stack is going to
overwrite your heap) for me.  With C, the compiler detects if I pass the wrong
number of arguments to a function, or if I have a type error, but not if I have
unbounded stack depth or if my execution time is nondeterministic.  With Forth,
the compiler won't even detect if I pass the wrong number of arguments or try
to dereference an integer.  Forth's terseness makes it 

> Actually, I get many "aha moments" reading about Forth and many surprises...
> 
> If nothing else, your problem with memory management would go away in an 
> instant :-) because there is only stack AFAIK. There may be ways for using 
> other locations (never read that far) but the simplest subset of language 
> does everything using stack only.

This is one of the worst traps of Forth: there's the temptation to keep all
your data on the stack, because you know it's possible.  All you have to do is
write things like 4 PICK and 8 ROLL, and a lot of stack comments to keep track
of whether it's 4 PICK or 5 PICK at the moment.

This is the wrong way to do things, because within ten or twenty lines of code,
you have a terrible, terrible mess, because you actually *need* all those stack
comments to figure out what 5 PICK is supposed to mean, and whether it's
correct.  You get into a worse mess than you could ever create with assembly
language, where (except on the x87 and other stack machines!) things stay where
you put them.  You can read a compelling story about how an experienced
engineer rejected Forth entirely after he fell into this trap at
<http://www.yosefk.com/blog/my-history-with-forth-stack-machines.html>, after
ending up with 700 lines of code like `dup um* nip FRAC 2 * 32 - rshift -rot`.
In the comments, Samuel Falvo calls this "*the* classic beginners Forth problem
— way way WAY too many items on the data stack."

It turns out that Forth has not only the stack, but also access to memory, plus
another stack; and the right way to do things, in my view, is to use VARIABLEs
in Forth in almost all of the same cases where you would use variables in C.
This might sound like a terrible idea, because Forth VARIABLEs are global,
while C has local variables.  And we all know global variables are evil.

Here are some of the problems global variables cause in C and related
languages:

1. They live in a single global namespace, so if you try to define two
   variables with the same name, you get link errors.  Or you get a single
   variable, being used by two different pieces of code that shouldn't be
   interacting at all, causing bugs in them.  And it's hard to find all the
   references to the variable, because they could be anywhere in the program.
2. It's possible to have an execution path that fails to initialize them,
   leaving values from a previous execution, leading to subtle bugs.  In
   effect, they give persistent state to a part of the system that doesn't need
   it, and stateless code is always more reliable and easier to test.
3. In a recursive call, every activation record of a subroutine has its own
   instance of its local variables.  Recursion is still a somewhat tricky
   construct --- you have termination concerns that don't arise with
   nonrecursive calls, plus potentially unbounded stack usage potentially
   causing data corruption (on the Arduino and similar environments) or a
   program crash --- but it's less tricky when each call has its own local
   variables.
4. Multithreaded access to global variables requires synchronization between
   the threads, or you get subtle race-condition bugs.

(Note that I'm talking here about using global variables instead of local
variables, not, say, global variables instead of object instance variables.
There's a whole other set of problems associated with global variables that are
used for communication between different modules, or whose value isn't
overwritten upon entry to the module that owns them, but those problems aren't
relevant here.)

Here's what to do about those problems in Forth:

1. If you declare two VARIABLEs with the same name in Forth, they are two
   separate variables; the scope of the first one ends at the declaration of
   the second one.  Furthermore, Forth has had wordlists or vocabularies since
   the 1970s, and if you declare a VARIABLE in one wordlist, it's not
   accessible to code that isn't compiled with that wordlist.  It's sort of
   like `static` in C, which lets you make a "global" variable file-local.
2. This is still the same problem.
3. Because Forth words can only directly call words defined earlier than
   themselves, recursion in Forth is relatively awkward.  And you do have a
   stack where you can put values to save around a recursive call.  (You'll
   note that this is an example of my earlier complaint about how Forth makes
   you check a lot of correctness conditions by hand.)
4. The traditional Forth approach to multithreading gives each thread its own
   copy of all the VARIABLEs (or the ones definedi in the code it uses,
   anyway), and also cuts down on race-condition bugs by using cooperative
   multithreading.

Additionaly, Forth tends to be a lot terser than C, so all the problems that
scale with the size of the code (#1 and #2) are less severe.

So the problems associated with global variables are just a lot less severe in
Forth.

So, don't try to write code using the stack instead of local variables.  Define
the same variables you would in C, only as globals instead of locals, write the
code that way.  It'll be slightly suboptimal, but it'll be a lot better than if
you try to use the stack for everything.  Use the stack for expression
evaluation, for the values you wouldn't name in C either.  You'll have an empty
data stack at the end of every line of code in words with more than one line of
code, and you'll never use a DUP or SWAP.

Then, maybe, gently, remove a variable or two.  I can usually eliminate one
variable at any given time using the return stack and another one using the
data stack.  More than that usually ends up making the code hard to read and
hard to debug.

Since it's hard to understand stuff like this from people just saying stuff,
here's an example I posted in the comments of yosefk's article I mentioned
above.  Yossi's original code, an example of the trap I alluded to above:

    : mean_std ( sum2 sum inv_len — mean std )
      \ precise_mean = sum * inv_len;
      tuck u* \ sum2 inv_len precise_mean
      \ mean = precise_mean >> FRAC;
      dup FRAC rshift -rot3 \ mean sum2 inv_len precise_mean
      \ var = (((unsigned long long)sum2 * inv_len) >> FRAC) - (precise_mean * 
precise_mean >> (FRAC*2));
      dup um* nip FRAC 2 * 32 - rshift -rot \ mean precise_mean^2 sum2 inv_len
      um* 32 FRAC - lshift swap FRAC rshift or \ mean precise_mean^2 sum*inv_len
      swap - isqrt \ mean std
    ;

So, we have some fixed-point math here.  Let’s admit that, and define the
fixed-point math primitives. It looks like one of sum and inv_len is a
fixed-point fraction, while the other is an ordinary integer; I’m guessing that
sum2 is the sum of the squares, and inv_len is the reciprocal of the length,
which therefore must be a fraction, while sum and sum2 are ordinary integers.
Adopting "." as an indicator character for fixed-point arithmetic, even though
that conflicts with its usual Forth meaning of “output”:

    : s.* u* ;   : .>s FRAC rshift ;
    : .* um* 32 FRAC - lshift swap FRAC rshift or ; 

Now this should be quite straightforward.

    variable sumsq  variable sum  variable len_recip  variable .mean
    : variance  sumsq @  len_recip @ s.*  .mean @ dup .*  - .>s ;
    : mean_std  len_recip ! sum ! sumsq !
        sum @  len_recip @ s.*  .mean !
        .mean @  .>s  variance isqrt ;

You'll note that I'm avoiding stack manipulation operators entirely in this
code, with one exception, and it could probably be improved by using a stack
slot or two to eliminate one or two of the four (!) variables.

I still find the resulting code hard to read, compared to infix; consider

    mean = sum * len_recip;

in place of

    sum @  len_recip @ s.*  .mean !

or 

    sumsq * len_recip - mean * mean

in place of

    sumsq @  len_recip @ s.*  .mean @ dup .*  -

but not anything like the nightmarish `-rot3 dup` stuff above.  This remaining
notational gap is yet another example of Forth traditionally leaving an
error-prone mechanizable task --- in this case, translating well-known
mathematical formulas into RPN --- to humans.

Kragen

----- End forwarded message -----
-- 
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss

Reply via email to