Devon said:
Not to disagree with the general idea that array-processing languages
have the potential to take advantage of parallelism, but it's not a new
notion: I wrote a paper about this and took part in a panel discussion about
20 years ago and others had preceded me.
Skip says:

I agree totally. The idea of parallel processing of array languages is far from a new idea.
I remember when the Analogic Corp. built a vector processing machine
using APL as the programming language in the early 80's. It was called "The APL Machine". Analogic understood the potential of controlling hardware parallelism with an array language like APL, even back then.

Skip said:
Most of J's primitives could take advantage of multiple parallel
processor threads. A simple example is the addition primitive.
Devon said:
Not a good example.  The set-up time ignored in the following part
of the paragraph will utterly dominate the time required for _any_ simple,
scalar math function.
Skip says:
For small arrays, you are right. For very large arrays the answer isn't so clear.

Devon said:
Also ignores memory allocation time which is often a bottleneck. This is particularly relevant when you talk about SIMD (single-instruction, multiple data) parallelism. Sure, in theory you could add a bunch of numbers in parallel, with potentially greater gain for larger arrays but the time for memory allocation swamps that of simple arithmetic and the memory allocation becomes more of a problem with larger arrays.
Skip says:

While the addition of two arrays can be defined as a SIMD operation, it can also
be thought of as MIMD - each core adding its own pieces of the two arrays.

I believe that one could take the J addition primitive and make it "smart"
about using parallel processes. If the data size is small, the interpreter would keep all the computations in a single thread in one processor. If the arrays are large, the interpreter could distribute the work across several processors. It would be interesting to see just at what array size the parallel process becomes more efficient than the single-threaded process. Do you agree that parallel processes will become more efficient at some array size? Or do you contend that the parallel overhead is such that single-threading will be more efficient, regardless of array size?

Devon said:
Simply put, multi-core processors are too coarse-grained for an array language to take advantage simply at an array level. The substantial set-up required points to taking advantage at higher level than most of the language primitives. Remember, dual or quad-core implies multiple, multi-megatransistor processors - that's firing up a lot of silicon to add a couple of numbers!
Skip says:
A broad statement. It could be true, but I haven't seen concrete evidence to support your claim. I would certainly want to try to test out some primitive-level parallelism before we give up on the idea. From all accounts, the Analogic machine was quite efficient in its parallel operations. It was the strangeness of the APL language to programmers that eventually killed the project.
However, on the bright side, this coarse-grain parallelism means we can take
advantage of it, at an application level, right now as some of as are
currently doing.

Having made the case against attempting to parallelize most J primitives on
a multi-core architecture,
Skip says:
You make a case, but I think we would need more concrete data before discarding the whole idea. Modern multi-core processors have lots of functionality designed to minimize the overhead in SIMD-type processes. We should see how this architecture could fit into J's primitives before dismissing it out of hand.

However, it could be true that primitive parallelization is not much more efficient than the single-processor/single-threaded approach. In that case, perhaps the way to go would be to create some primitives in the language that would support parallel operations.

For example the function "parallel" could be used to place the function in the right argument into a specific processor, and then continue execution with the next statement in the script. So the script:

parallel A
parallel B
parallel C

Would result in the functions A, B, and C being run on separate threads, with each thread placed in a separate processor. In this way your "coarse-grain parallelism" could become an inherent part of the language, instead of requiring to start a new session for each thread. For that matter, this native "coarse-grain parallelism" would be useful whether or not the primitives were parallelized.

J should take on the challenge to lead the way in dealing with multi-core architectures. It would be an excellent way to raise the visibility of the language in programmer circles.

Skip
elliscave.com


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to