Re: [fonc] Raspberry Pi

Loup Vaillant Wed, 08 Feb 2012 09:30:04 -0800

Alan Kay wrote:

Hi Loup


Actually, your last guess was how we thought most of the optimizations
would be done (as separate code "guarded" by the meanings).  […]

> In practice, the  optimizations we did do are done in the translation
> chain and in the run-time, […]


Okay, thanks.

I can't recall the exact reference, but I have read once about a middle
ground: mechanical optimization passes that are brittle in the face of
meaning change.  I mean, if you change the concise version of your
program, you may have to re-think your optimizations passes, but you
don't necessarily have to re-write your optimized version directly.

Example
{
  The guy where at the university, and was assigned to write optimized
  multiplication for big numbers.  Each student would  be graded
  according to the speed of their program.  No restriction on the
  programming language.

  Everyone started coding in C, but _he_ preferred to start with
  scheme.  He coded a straightforward version of the algorithm, then
  set out to manually (but mostly mechanically) apply a set of
  correctness-preserving transformations, most notably a CPS
  transformation, and a direct translation to C with gotos.  His final
  program, written in pure C, was the second fastest of his class (and
  very close to the first, which used assembly heavily).  Looking back
  at what he could have done better, he saw that his program spent most
  of his time in malloc().  He didn't know how to do it at the time,
  but he had managed his memory directly, his program would have been
  first.

  Oh, and of course, he had much less trouble dealing with mistakes
  than his classmates.

  So, his conclusion was that speed comes from beautiful programs, not
  "prematurely optimized" ones.
}


About Frank, we may imagine using this method in a more automated way,
for instance by annotating the source and intermediate code with
specialized optimizations that would only work in certain cases.  It
could be something roughly like this:

Nile Source            <- Annotations to optimize the Nile program
   |
   |                      Compiler pass that check the validity of the
   |                      optimizations then applies them.
   v
Maru Intermediate code <- Annotations to optimize that maru program
   |
   |                      Compiler pass like the above
   v
C Backend code         <- Annotations (though at this point…)
   |
   |                   <- GCC
   v
Assembly               <- (no, I won't touch that :-)

(Note that instead of annotating the programs, you could manually
 control the compilers.)

Of course, the second you change the Nile source is the second your
annotations at every level won't work any more.  But (i) you would only
have to redo your annotations, and (ii), maybe not all of them anyway,
for there is a slight chance that your intermediate representation
didn't change too much when you changed your source code.

I can think of one big caveat however: if the generated code is too big
or too cryptic, this approach may not be feasible any more.  And I
forgot about profiling your program first.



> But Dan Amelang did such a great job at the meanings that they ran
> fast enough tempt us to use them directly [so] Cairo never entered
> the picture.

If I had to speculate from an outsider's point of view, I'd say these
good surprises probably apply to almost any domain specific language.
The more specialized a language is, the more domain knowledge you can
embed in the compiler, the more efficient the optimizations may be. I
know it sounds like magic, but I recall a similar example with Haskell,
applied to bioinformatics (again, can't find the paper).

Example
{
  The goal was to implement a super-fast algorithm.  The catch was, the
  resulting program has to accept a rather big number of parameters.
  Written directly in C, the fact that those parameters changed meant
  the main loop had to make several checks, slowing the whole thing
  down.

  So they did an EDSL based on monads that basically generated a C
  program after the parameters were read and knows, then ran it.  Not
  quite Just-In-Time compilation, but close.  The result was of course
  noticeably faster than the original C program.
}

Therefore, I'm quite optimistic.  My best guess right now is that smart
compilers will be more than enough to make Frank "fast enough", "as fast
as C"[1] or possibly even faster, for 2 reasons:

1. As far as I know, most languages in Frank are quite specialized.
   That prior knowledge can be exploited by compilers.
2. The code volume is sufficiently small that aggressive whole program
   optimizations are actually feasible.

Such compilers may cost 10 to 100 thousands lines or more, but at least
those lines would be strictly contained. Then, potential end users
wouldn't give up too much hackability in the name of performance.

[1]: http://c2.com/cgi/wiki?AsFastAsCee

Loup.
_______________________________________________
fonc mailing list
[email protected]
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Raspberry Pi

Reply via email to