as an example, the data looks like this:
v = rand(3)
r = rand(6000,3)
x = linspace(1.0, 2.0, 300) * (v./sqrt(sumabs2(v)))'
my function in 0.4 looks like this:
function s04(xl, rl)
result = zeros(size(xl,1))
for i = 1:size(xl,1)
dotprods = rl * xl[i,:]'
#10000 loops, best of 3: 17.66 µs per loop
imexp = exp(im .* dotprods)
#1000 loops, best of 3: 172.33 µs per loop
sumprod = sum(imexp) * sum(conj(imexp)) #10000
loops, best of 3: 21.04 µs per loop
result[i] = sumprod
end
return result
end
and using @timeit s04(x,r) gives
10 loops, best of 3: 67.52 ms per loop
where most time is spend in the exp() calls. Now in 0.5dev, the individual
parts have similar or actually better timings like the dot product:
function s05(xl, rl)
result = zeros(size(xl,1))
for i = 1:size(xl,1)
dotprods = rl * xl[i,:]
#10000 loops, best of 3: 10.99 µs per loop
imexp = exp(im .* dotprods) #1000
loops, best of 3: 158.50 µs per loop
sumprod = sum(imexp) * sum(conj(imexp)) #10000 loops,
best of 3: 21.81 µs per loop
result[i] = sumprod
end
return result
end
but @timeit s05(x,r) always gives something ~70% worse runtime:
10 loops, best of 3: 113.80 ms per loop
the summing I replaced then by the blas counterpart, for a modest speedup:
sumprod = Base.LinAlg.BLAS.asum(imexp) *
Base.LinAlg.BLAS.asum(conj(imexp)) #10000 loops, best of 3: 17.02 µs
per loop
and the exp() call also runs a bit fast devectorized. But always the same
on my Fedora23 workstation, individual calls inside the function have
slightly better performance in 0.5dev, but the whole function is slower.
And oddly enough only on my Fedora workstation! On a OS X laptop, those
0.5dev speedups from the parts inside the loop translate in the expected
speedup for the whole function!
So that puzzles me, perhaps someone can reproduce this with above function
and input?
cheers, Johannes
On Friday, February 26, 2016 at 4:28:05 PM UTC+1, Kristoffer Carlsson wrote:
>
> What code and where is it spending time? You talk about openblas, does it
> mean that blas got slower for you? How about peakflops() on the different
> versions?
>
> On Friday, February 26, 2016 at 4:08:06 PM UTC+1, Johannes Wagner wrote:
>>
>> hey guys,
>> I just experienced something weird. I have some code that runs fine on
>> 0.43, then I updated to 0.5dev to test the new Arrays, run same code and
>> noticed it got about ~50% slower. Then I downgraded back to 0.43, ran the
>> old code, but speed remained slow. I noticed while reinstalling 0.43,
>> openblas-threads didn't get isntalled along with it. So I manually
>> installed it, but no change.
>> Does anyone has an idea what could be going on? LLVM on fedora23 is 3.7
>>
>> Cheers, Johannes
>>
>