Re: [Jgeneral] speeding up J

Skip Cave Wed, 07 Feb 2007 17:13:46 -0800

Devon said:

Not to disagree with the general idea that array-processing languages
have the potential to take advantage of parallelism, but it's not a new

notion: I wrote a paper about this and took part in a panel discussionabout

20 years ago and others had preceded me.

Skip says:

I agree totally. The idea of parallel processing of array languages isfar from a new idea.

I remember when the Analogic Corp. built a vector processing machine

using APL as the programming language in the early 80's. It was called"The APL Machine". Analogic understood the potential of controllinghardware parallelism with an array language like APL, even back then.


Skip said:

Most of J's primitives could take advantage of multiple parallel
processor threads. A simple example is the addition primitive.

Devon said:

Not a good example.  The set-up time ignored in the following part
of the paragraph will utterly dominate the time required for _any_simple,
scalar math function.

Skip says:

For small arrays, you are right. For very large arrays the answer isn'tso clear.

Devon said:
Also ignores memory allocation time which is often a bottleneck. Thisis particularly relevant when you talk about SIMD (single-instruction,multiple data) parallelism. Sure, in theory you could add a bunch ofnumbers in parallel, with potentially greater gain for larger arraysbut the time for memory allocation swamps that of simple arithmeticand the memory allocation becomes more of a problem with larger arrays.

Skip says:

While the addition of two arrays can be defined as a SIMD operation, itcan also

be thought of as MIMD - each core adding its own pieces of the two arrays.

I believe that one could take the J addition primitive and make it "smart"

about using parallel processes. If the data size is small, theinterpreter would keep all the computations in a single thread in oneprocessor. If the arrays are large, the interpreter could distribute thework across several processors. It would be interesting to see just atwhat array size the parallel process becomes more efficient than thesingle-threaded process. Do you agree that parallel processes willbecome more efficient at some array size? Or do you contend that theparallel overhead is such that single-threading will be more efficient,regardless of array size?


Devon said:

Simply put, multi-core processors are too coarse-grained for an arraylanguage to take advantage simply at an array level. The substantialset-up required points to taking advantage at higher level than mostof the language primitives. Remember, dual or quad-core impliesmultiple, multi-megatransistor processors - that's firing up a lot ofsilicon to add a couple of numbers!

Skip says:

A broad statement. It could be true, but I haven't seen concreteevidence to support your claim. I would certainly want to try to testout some primitive-level parallelism before we give up on the idea. Fromall accounts, the Analogic machine was quite efficient in its paralleloperations. It was the strangeness of the APL language to programmersthat eventually killed the project.

However, on the bright side, this coarse-grain parallelism means wecan take
advantage of it, at an application level, right now as some of as are
currently doing.
Having made the case against attempting to parallelize most Jprimitives on
a multi-core architecture,

Skip says:

You make a case, but I think we would need more concrete data beforediscarding the whole idea. Modern multi-core processors have lots offunctionality designed to minimize the overhead in SIMD-type processes.We should see how this architecture could fit into J's primitives beforedismissing it out of hand.

However, it could be true that primitive parallelization is not muchmore efficient than the single-processor/single-threaded approach. Inthat case, perhaps the way to go would be to create some primitives inthe language that would support parallel operations.

For example the function "parallel" could be used to place the functionin the right argument into a specific processor, and then continueexecution with the next statement in the script. So the script:


parallel A
parallel B
parallel C

Would result in the functions A, B, and C being run on separate threads,with each thread placed in a separate processor. In this way your"coarse-grain parallelism" could become an inherent part of thelanguage, instead of requiring to start a new session for each thread.For that matter, this native "coarse-grain parallelism" would be usefulwhether or not the primitives were parallelized.

J should take on the challenge to lead the way in dealing withmulti-core architectures. It would be an excellent way to raise thevisibility of the language in programmer circles.


Skip
elliscave.com


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jgeneral] speeding up J

Reply via email to