Bjorn Lisper wrote:

A guess is that the first generation will support a shared memory model much
like SMP:s of today (shared main memory with on-chip cache(s), or some other
kind of local memory (-ies)). Here, I think implicit parallelism in
functional languages can be a win in some situations. This is provided
spawning of parallel threads is cheap and local memory can be efficiently
utilized (so control of granularity is no problem, and dynamic parallelism
is cheap). Cheap threads is likely to be true, efficient use of local memory
is a question mark.


You only need one thread per CPU-core, so the spawning threads cost is
a red-herring. Just like GHC schedules haskell-threads from a single
OS-thread.

However, some obstacles:

Laziness, if used strictly (sorry) gets in the way for parallelism. Waiting
to spawn a parallel thread until we surely know it's result is needed, if it
is anyway needed 99% of the time, is not an optimal strategy. I think
optimistic evaluation strategies a la Robert Ennals can be helpful here. But
designing them to work for new parallel architectures may be non-trivial.
I think there can be material for PhD theses here.


I suggest speculative execution, as used by current CPUs to take advantage
of multiple microcode execution units. If the a CPU has spare capacity, functions
whose results might be needed are run. As soon as you know a result is not
needed the thread can be killed, and the time used for something else. A lot
of work has already been done on hardware superscalar-architectures, and
a lot of this should be applicable to the software case.


If the program implements an inherently sequential algorithm then
parallelism won't help anyway. The use of lists often leads to sequential
algorithms: a prime example is list folds, which all translate the linear
spine structure of a list into a sequential dependence graph of the
resulting computation. Folds over tree-like structures are better in this
regard. The FP community will have a problem with legacy code using
inherently sequential computations over lists. There has been quite some
research on how to parallellize list computations (skeletons, program
transformations using BMF,...). However, they typically rely on knowing the
length of the list. In a lazy language it is common that lists are evaluated
dynamically, and then it's hard to use these techniques efficiently. Again,
speculation may help.


But in the worst case its just a sequential computation, so any gain from
parallelism is still a gain... also the dependant part of a loop is often
separate from an independant part... For example consider a list of strings, and
taking the length of each string. The next iteration does not have to wait
for the length to be computed, it can start the next length computation as
soon as the pointer for the next list cell has been dereferenced.


In the long run, as hardware dimensions shrink even more and communication
becomes relatively even more expensive, it is likely that we will have some
kind of on-chip massively parallel, mesh-connected distributed memory
machine. This will most likely favor static parallelization techniques,
which means only for certain classes of functional programs an automatic
approach with implicit parallelism will work well.


The next generation of CPUs seem to be using a shared cache architecture,
so overheads are extreemely small...

Erm, by static I assume you mean programmer specified parallelism. I would
tend to think of static parallelization as something that takes place at compile
time (and would still count as implicit).


   Keean.

_______________________________________________
Haskell mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell

Reply via email to