Whatever the arguments about primitives in general, I know that matrix product could be written to take full advantage of a multi-core architecture. If the same could be done for matrix divide, you would have in just those two primitives accounted for most of the machine cycles spent by quite a few real-world applications.
Pushing for multi-core support may be a step too far. If the interpreter were upgraded to support SSE3 instructions (with all the compatibility problems that this entailed), matrix multiply could be severalfold improved in a single thread. Henry Rich > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Devon McCormick > Sent: Wednesday, February 07, 2007 12:14 PM > To: General forum > Subject: Re: [Jgeneral] speeding up J > > Not to disagree with the general idea that array-processing languages > have the potential to take advantage of parallelism, but it's > not a new > notion: I wrote a paper about this and took part in a panel > discussion about > > 20 years ago and others had preceded me. > > On 2/7/07, Skip Cave <[EMAIL PROTECTED]> wrote: > > However, it is clear the future of computation is rushing > headlong into > > multi-processing. and has been for many years. However, > the problems > continue to crop up > in the "ancillary" issues. > > > Most of J's primitives could take advantage of multiple parallel > > processor threads. A simple example is the addition primitive. > > Not a good example. The set-up time ignored in the following part > of the paragraph will utterly dominate the time required for > _any_ simple, > scalar math function. > > > Of course, this ignores set-up time to break up > > the arrays into operable chunks for each processor, and > then the time > > needed to put the array pieces all back together again, ... > > Also ignores memory allocation time which is often a > bottleneck. This is > particularly relevant when you talk about SIMD > (single-instruction, multiple > data) parallelism. Sure, in theory you could add a bunch of > numbers in > parallel, with potentially greater gain for larger arrays but > the time for > memory > allocation > swamps that of simple arithmetic and the memory allocation becomes > more of a problem with larger arrays. > > Simply put, multi-core processors are too coarse-grained for an array > language to take advantage simply at an array level. The > substantial set-up > required points to taking advantage at higher level than most of the > language > primitives. Remember, dual or quad-core implies multiple, > multi-megatransistor > processors - that's firing up a lot of silicon to add a > couple of numbers! > > However, on the bright side, this coarse-grain parallelism > means we can take > advantage of it, at an application level, right now as some of as are > currently > doing. > > Having made the case against attempting to parallelize most J > primitives on > a multi-core architecture, I am currently running something > in J which could > potentially benefit from this though I don't know the details of the > Miller-Rabin > factoring algorithm (which I believe underlies q:) > sufficiently well to say > this for > certain: I've been running q: on an 88-digit number for about > the past two > days. > Until it finishes, I'm reluctant to shut down my machine. A > potentially > long-running > algorithm like this is one of the few that might benefit from > the current > multi-core > trends. > > -- > Devon McCormick > ^me^ at acm. > org is my > preferred e-mail > ---------------------------------------------------------------------- > For information about J forums see > http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
