[julia-users] Re: ANN: ParallelAccelerator v0.2 for Julia 0.5 released.

Ralph Smith Sun, 30 Oct 2016 22:09:59 -0700

I looked a bit deeper (i.e. found a machine where I have access to an Intel 
compiler, albeit not up to date - my shop is cursed by budget cuts).  ICC 
breaks up a loop like
for (i=0; i<n; i++) {
  a[i] = exp(cos(b[i]));
  s += a[i];
}
into calls to vector math library functions and a separate loop for the 
sum. The library is bundled with ICC; it's not MKL, but its domain overlaps 
with MKL -
hence my misapprehension - so your point stands. Something like 
blackscholes benefits from these vector library calls, and GCC doesn't do 
that.


It would be nice if Julia's LLVM system included an optimization pass which 
invoked a vector math library when appropriate. I guess that's a challenge 
outside
the scope of ParallelAccelerator, but maybe good ground for some other 
project.

On Thursday, October 27, 2016 at 1:04:33 PM UTC-4, Todd Anderson wrote:
>
> That's interesting.  I generally don't test with gcc  and my experiments 
> with ICC/C have shown something like 20% slower for LLVM/native threads for 
> some class of benchmarks (like blackscholes) but 2-4x slower for some other 
> benchmarks (like laplace-3d).  The 20% may be attributable to ICC being 
> better (including at vectorization like you mention) but certainly not the 
> 2-4x.  These larger differences are still under investigation.
>
> I guess something we have said in the docs or our postings have created 
> this impression that our performance gains are somehow related to MKL or 
> blas in general.  If you have MKL then you can compile Julia to use it 
> through its LLVM path.  ParallelAccelerator does not insert calls to MKL 
> where they didn't exist in the incoming IR and I don't think ICC does 
> either.  If MKL calls exist in the incoming IR then we don't modify them 
> either.  
>
> On Wednesday, October 26, 2016 at 7:51:33 PM UTC-7, Ralph Smith wrote:
>>
>> This is great stuff.  Initial observations (under Linux/GCC) are that 
>> native threads are about 20% faster than OpenMP, so I surmise you are 
>> feeding LLVM some very tasty
>> code. (I tested long loops with straightforward memory access.)
>>
>> On the other hand, some of the earlier posts make me think that you were 
>> leveraging the strong vector optimization of the Intel C compiler and its 
>> tight coupling to
>> MKL libraries.   If so, is there any prospect of getting LLVM to take 
>> advantage of MKL?
>>
>>
>> On Wednesday, October 26, 2016 at 8:13:38 PM UTC-4, Todd Anderson wrote:
>>>
>>> Okay, METADATA with ParallelAccelerator verison 0.2 has been merged so 
>>> if you do a standard Pkg.add() or update() you should get the latest 
>>> version.
>>>
>>> For native threads, please note that we've identified some issues with 
>>> reductions and stencils that have been fixed and we will shortly be 
>>> released in version 0.2.1.  I will post here again when that release takes 
>>> place.
>>>
>>> Again, please give it a try and report back with experiences or file 
>>> bugs.
>>>
>>> thanks!
>>>
>>> Todd
>>>
>>

[julia-users] Re: ANN: ParallelAccelerator v0.2 for Julia 0.5 released.

Reply via email to