I love reading this list. Aside from over and over again finding new ways 
in which Julia is awesome, I learn all kinds of stuff from all the side 
tracks you guys get into. If I could buy a beer everyone who's taught me 
something by expanding "a little too much" on something vaguely related to 
what we're actually talking about, I'd be a really poor guy afterwards.

Keep it up, I don't want to stop learning! =)

// T

On Thursday, May 22, 2014 6:19:05 PM UTC+2, Dahua Lin wrote:
>
> As a side note, I just cleaned up the Devectorize.jl package (tagged vs 
> 0.4). Now it works well under Julia v0.3.
>
> I am now working on a major upgrade to this package. This may lead to a 
> more transparent & extensible code generator and the support of arrays of 
> arbitrary dimensions (with the help of the ArrayViews package). 
>
> Dahua
>
>
> On Thursday, May 22, 2014 8:38:15 AM UTC-5, gael....@gmail.com wrote:
>>
>>
>>
>> Le jeudi 22 mai 2014 09:51:44 UTC+2, Tobias Knopp a écrit :
>>>
>>> To give this discussions some facts I have done some benchmarking on my 
>>> own
>>>
>>> Matlab R2013a:
>>>
>>> function [ y ] = perf( )
>>>   N = 10000000;
>>>   x = rand(N,1);
>>>   y = x + x .* x + x .* x;
>>> end
>>>
>>> >> tic;y=perf();toc;
>>> Elapsed time is 0.177664 seconds.
>>>
>>> Julia 0.3 prerelease
>>>
>>>  function perf()
>>>    N = 10000000
>>>    x = rand(N)
>>>    y = x + x .* x + x .* x
>>>  end
>>>
>>> julia> @time perf()
>>> elapsed time: 0.232852894 seconds (400002808 bytes allocated)
>>>
>>> using Devectorize.jl
>>>
>>>  function perf_devec()
>>>    N = 10000000
>>>    x = rand(N)
>>>    @devec y = x + x .* x + x .* x
>>>  end
>>>
>>> julia> @time perf_devec()
>>> elapsed time: 0.084605794 seconds (160000664 bytes allocated)
>>>
>>> So seems all pretty consistent to me. Matlab is a little better in 
>>> vectorized code as they presumely have a better memory caching. But still 
>>> explicit devectorization using the @devec macro performs best. So using 
>>> vectorized code in Julia is fine and "reasonable fast". If someone wants to 
>>> do performance tweaking I don't see the issue telling him about 
>>> devectorization.
>>>
>>  
>> Ahah !!! I was sure of it: we don't talk about the same thing. To me,
>> @devec y = x + x .* x + x .* x
>> is actually *vectorized* code :). When I'm talking about devectorizing 
>> code, I'm only talking about explicit loops. It's a shame that I only paid 
>> attention to Devectorize.jl yesterday night. This thing is awesome and it 
>> should be a great place to contribute to.
>>
>> This should be the very first answer to "this part of my code is too 
>> slow".
>>
>>
>> Regarding the benchmarks you've done, thanks. Without evidence, no 
>> discussion. I agree.
>>
>> But there are two problems with your benchmarks. Firstly, you've not 
>> repeated them and therefore can't associate an uncertainty with them. Maybe 
>> matlab code is not actually faster. Secondly, what if matlab or julia 
>> actually spend most of its time getting the random vector?
>>
>> I'd recommend  you to repeat your result and compare directly the 
>> estimated distribution function. I've done just that for your simple code. 
>> I just put N = rand(...) outside of the function each time. I also created 
>> devectorized versions of your code (I mean, with explicit loops written by 
>> myself). Once with ".*" as a multiplier and once with "*". The resulting 
>> kernel densities can be found attached.
>>
>> As you can see, the resulting functions are not even close from a 
>> Gaussian. Normality tests failed for each of those distributions. How to 
>> explain that? Easy: a typical desktop computer does nothing most of the 
>> time. Once you launch something, it spends it's time on this but from time 
>> to time very rarely, it needs to spend some CPU time on something else. 
>> Therefore, there is the mode : the most probable execution time and a tail 
>> that is bigger on the side of increasing times.
>>
>> I just go the time using tic(), toc(), so for each of the thousand 
>> repetitions of the exact same calculation, I could follow the execution 
>> time in real time. The fact that "Explicit loop (*)" has a bigger and 
>> stranger tail is directly related to the fact that I used my mouse quite 
>> intensively during that run. What does that mean? 
>>
>> 1) One must repeat calculations for benchmarking.
>> 2) Calculating the mean of the repetitions is useless because it is not a 
>> good estimator of the mode of the distribution.
>>
>> The point 1 is obvious, but not point 2 : please consider the second plot 
>> attached which is a zoomed in part of the first one. As you can see, the 
>> mode of the "Explicit loop (*)" curve is positioned slightly on the left 
>> compared to the "Explicit loop (.*)" curve. This certainly means that for 
>> scalars, "*" has less overhead than ".*" (maybe just a couple of extra 
>> "if"s or an extra function call). However, because I shook my mouse (on 
>> purpose) during that run (it was so funny to see the execution time bump on 
>> the terminal), it has a bigger asymmetric tail. 
>>
>> The result? The ".*" is on "average" 3% faster than the "*" version.
>>
>> The problem? This is not a function benchmark, this is a system load 
>> benchmark.
>>
>> Oh my, I've done it again, sorry for the long post.
>>
>

Reply via email to