Devon said:
Not to disagree with the general idea that array-processing languages
have the potential to take advantage of parallelism, but it's not a new
notion: I wrote a paper about this and took part in a panel discussion
about
20 years ago and others had preceded me.
Skip says:
I agree totally. The idea of parallel processing of array languages is
far from a new idea.
I remember when the Analogic Corp. built a vector processing machine
using APL as the programming language in the early 80's. It was called
"The APL Machine". Analogic understood the potential of controlling
hardware parallelism with an array language like APL, even back then.
Skip said:
Most of J's primitives could take advantage of multiple parallel
processor threads. A simple example is the addition primitive.
Devon said:
Not a good example. The set-up time ignored in the following part
of the paragraph will utterly dominate the time required for _any_
simple,
scalar math function.
Skip says:
For small arrays, you are right. For very large arrays the answer isn't
so clear.
Devon said:
Also ignores memory allocation time which is often a bottleneck. This
is particularly relevant when you talk about SIMD (single-instruction,
multiple data) parallelism. Sure, in theory you could add a bunch of
numbers in parallel, with potentially greater gain for larger arrays
but the time for memory allocation swamps that of simple arithmetic
and the memory allocation becomes more of a problem with larger arrays.
Skip says:
While the addition of two arrays can be defined as a SIMD operation, it
can also
be thought of as MIMD - each core adding its own pieces of the two arrays.
I believe that one could take the J addition primitive and make it "smart"
about using parallel processes. If the data size is small, the
interpreter would keep all the computations in a single thread in one
processor. If the arrays are large, the interpreter could distribute the
work across several processors. It would be interesting to see just at
what array size the parallel process becomes more efficient than the
single-threaded process. Do you agree that parallel processes will
become more efficient at some array size? Or do you contend that the
parallel overhead is such that single-threading will be more efficient,
regardless of array size?
Devon said:
Simply put, multi-core processors are too coarse-grained for an array
language to take advantage simply at an array level. The substantial
set-up required points to taking advantage at higher level than most
of the language primitives. Remember, dual or quad-core implies
multiple, multi-megatransistor processors - that's firing up a lot of
silicon to add a couple of numbers!
Skip says:
A broad statement. It could be true, but I haven't seen concrete
evidence to support your claim. I would certainly want to try to test
out some primitive-level parallelism before we give up on the idea. From
all accounts, the Analogic machine was quite efficient in its parallel
operations. It was the strangeness of the APL language to programmers
that eventually killed the project.
However, on the bright side, this coarse-grain parallelism means we
can take
advantage of it, at an application level, right now as some of as are
currently doing.
Having made the case against attempting to parallelize most J
primitives on
a multi-core architecture,
Skip says:
You make a case, but I think we would need more concrete data before
discarding the whole idea. Modern multi-core processors have lots of
functionality designed to minimize the overhead in SIMD-type processes.
We should see how this architecture could fit into J's primitives before
dismissing it out of hand.
However, it could be true that primitive parallelization is not much
more efficient than the single-processor/single-threaded approach. In
that case, perhaps the way to go would be to create some primitives in
the language that would support parallel operations.
For example the function "parallel" could be used to place the function
in the right argument into a specific processor, and then continue
execution with the next statement in the script. So the script:
parallel A
parallel B
parallel C
Would result in the functions A, B, and C being run on separate threads,
with each thread placed in a separate processor. In this way your
"coarse-grain parallelism" could become an inherent part of the
language, instead of requiring to start a new session for each thread.
For that matter, this native "coarse-grain parallelism" would be useful
whether or not the primitives were parallelized.
J should take on the challenge to lead the way in dealing with
multi-core architectures. It would be an excellent way to raise the
visibility of the language in programmer circles.
Skip
elliscave.com
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm