Re: [julia-users] Re: Can this magic square function be further optimized?

Ivar Nesje Tue, 26 Aug 2014 10:46:18 -0700

PyPy can't run Numpy

There is a pure python numpypy for PyPy.


kl. 19:27:44 UTC+2 tirsdag 26. august 2014 skrev Jacob Quinn følgende:
>
> I know numpy has multi-threading in a bunch of their vectorized functions.
>
>
> On Tue, Aug 26, 2014 at 11:46 AM, Iain Dunning <[email protected] 
> <javascript:>> wrote:
>
>> Is it possible that PyPy is multithreading something? I struggle to 
>> believe its so much faster - although I'd believe it if it was about the 
>> same.
>>  On Aug 26, 2014 11:31 AM, "Simon Kornblith" <[email protected] 
>> <javascript:>> wrote:
>>
>>> Here's an interesting comparison:
>>>
>>> https://gist.github.com/simonster/6195af68c6df33ca965d
>>>
>>> idiv for 64-bit integers is one of the most expensive extant x86-64 
>>> instructions. For 32-bit integers, it is much cheaper, and this function 
>>> runs nearly twice as fast. When LLVM knows the divisor in advance, it can 
>>> avoid idiv and perform the division as a multiplication and bit shift, 
>>> which is faster still. But the biggest gain comes from avoiding rem in the 
>>> inner loop entirely.
>>>
>>> Of course, this is only tackling the odd case, and additional 
>>> performance can be gained for all of these benchmarks by avoiding 
>>> allocation.
>>>
>>> Simon
>>>
>>> On Tuesday, August 26, 2014 9:11:47 AM UTC-4, Phillip Berndt wrote:
>>>>
>>>>
>>>> (1) allocate the output M outside of the core algorithm, and pass it as 
>>>>> an 
>>>>> input, i.e., 
>>>>>
>>>>
>>>> I did that, though it can be argued that this is cheating given that 
>>>> the competitors also have to allocate an array for each loop. With that 
>>>> version (and some more slight optimization: Storing intermediate values in 
>>>> the for loops, using column-major indexing and @simd) [
>>>> https://gist.github.com/phillipberndt/7dc0aed7eb855f900f0d/
>>>> 21cce76664bdc59f6203ff6f3496e80e256f54cb], the overall time for the 
>>>> N=3..1000 test case is down to 3.67s.
>>>>
>>>> (2) @time (for i = 1:100; magic!(M); end). Did it allocate any memory? 
>>>>> Then 
>>>>> you have a problem. Use the profiler, or run julia with --track- 
>>>>> allocation=user, to find out where it occurs.  
>>>>
>>>>
>>>> It does, about 3 Mb on line 2 (if n % 2 == 1). Doesn't make much sense 
>>>> so I guess the profiler interfered with the optimizer here?! I doubt that 
>>>> trying to get rid of the 3Mb will gain another second though.
>>>>  
>>>>
>>>>> (3) Even if it's not allocating, you may have a bottleneck. Use the 
>>>>> profiler to 
>>>>> find it. 
>>>>>
>>>>
>>>> The line where the most time is spent is line 11, filling the array in 
>>>> the odd case. I don't see how it could be optimized any further, so that's 
>>>> probably as far as one gets?!
>>>>
>>>> - Phillip
>>>>
>>>
>

Re: [julia-users] Re: Can this magic square function be further optimized?

Reply via email to