Whatever the arguments about primitives in general, I know that
matrix product could be written to take full advantage of
a multi-core architecture.  If the same could be done for matrix
divide, you would have in just those two primitives accounted
for most of the machine cycles spent by quite a few real-world
applications.

Pushing for multi-core support may be a step too far.  If the
interpreter were upgraded to support SSE3 instructions (with all
the compatibility problems that this entailed), matrix multiply
could be severalfold improved in a single thread.

Henry Rich 

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Devon McCormick
> Sent: Wednesday, February 07, 2007 12:14 PM
> To: General forum
> Subject: Re: [Jgeneral] speeding up J
> 
> Not to disagree with the general idea that array-processing languages
> have the potential to take advantage of parallelism, but it's 
> not a new
> notion: I wrote a paper about this and took part in a panel 
> discussion about
> 
> 20 years ago and others had preceded me.
> 
> On 2/7/07, Skip Cave <[EMAIL PROTECTED]> wrote:
> > However, it is clear the future of computation is rushing 
> headlong into
> > multi-processing.  and has been for many years.  However, 
> the problems
> continue to crop up
> in the "ancillary" issues.
> 
> > Most of J's primitives could take advantage of multiple parallel
> > processor threads. A simple example is the addition primitive.
> 
> Not a good example.  The set-up time ignored in the following part
> of the paragraph will utterly dominate the time required for 
> _any_ simple,
> scalar math function.
> 
> > Of course, this ignores set-up time to break up
> > the arrays into operable chunks for each processor, and 
> then the time
> > needed to put the array pieces all back together again, ...
> 
> Also ignores memory allocation time which is often a 
> bottleneck.  This is
> particularly relevant when you talk about SIMD 
> (single-instruction, multiple
> data) parallelism.  Sure, in theory you could add a bunch of 
> numbers in
> parallel, with potentially greater gain for larger arrays but 
> the time for
> memory
> allocation
> swamps that of simple arithmetic and the memory allocation becomes
> more of a problem with larger arrays.
> 
> Simply put, multi-core processors are too coarse-grained for an array
> language to take advantage simply at an array level.  The 
> substantial set-up
> required points to taking advantage at higher level than most of the
> language
> primitives.  Remember, dual or quad-core implies multiple,
> multi-megatransistor
> processors - that's firing up a lot of silicon to add a 
> couple of numbers!
> 
> However, on the bright side, this coarse-grain parallelism 
> means we can take
> advantage of it, at an application level, right now as some of as are
> currently
> doing.
> 
> Having made the case against attempting to parallelize most J 
> primitives on
> a multi-core architecture, I am currently running something 
> in J which could
> potentially benefit from this though I don't know the details of the
> Miller-Rabin
> factoring algorithm (which I believe underlies q:) 
> sufficiently well to say
> this for
> certain: I've been running q: on an 88-digit number for about 
> the past two
> days.
> Until it finishes, I'm reluctant to shut down my machine.  A 
> potentially
> long-running
> algorithm like this is one of the few that might benefit from 
> the current
> multi-core
> trends.
> 
> -- 
> Devon McCormick
> ^me^ at acm.
> org is my
> preferred e-mail
> ----------------------------------------------------------------------
> For information about J forums see 
> http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to