I love reading this list. Aside from over and over again finding new ways in which Julia is awesome, I learn all kinds of stuff from all the side tracks you guys get into. If I could buy a beer everyone who's taught me something by expanding "a little too much" on something vaguely related to what we're actually talking about, I'd be a really poor guy afterwards.
Keep it up, I don't want to stop learning! =) // T On Thursday, May 22, 2014 6:19:05 PM UTC+2, Dahua Lin wrote: > > As a side note, I just cleaned up the Devectorize.jl package (tagged vs > 0.4). Now it works well under Julia v0.3. > > I am now working on a major upgrade to this package. This may lead to a > more transparent & extensible code generator and the support of arrays of > arbitrary dimensions (with the help of the ArrayViews package). > > Dahua > > > On Thursday, May 22, 2014 8:38:15 AM UTC-5, gael....@gmail.com wrote: >> >> >> >> Le jeudi 22 mai 2014 09:51:44 UTC+2, Tobias Knopp a écrit : >>> >>> To give this discussions some facts I have done some benchmarking on my >>> own >>> >>> Matlab R2013a: >>> >>> function [ y ] = perf( ) >>> N = 10000000; >>> x = rand(N,1); >>> y = x + x .* x + x .* x; >>> end >>> >>> >> tic;y=perf();toc; >>> Elapsed time is 0.177664 seconds. >>> >>> Julia 0.3 prerelease >>> >>> function perf() >>> N = 10000000 >>> x = rand(N) >>> y = x + x .* x + x .* x >>> end >>> >>> julia> @time perf() >>> elapsed time: 0.232852894 seconds (400002808 bytes allocated) >>> >>> using Devectorize.jl >>> >>> function perf_devec() >>> N = 10000000 >>> x = rand(N) >>> @devec y = x + x .* x + x .* x >>> end >>> >>> julia> @time perf_devec() >>> elapsed time: 0.084605794 seconds (160000664 bytes allocated) >>> >>> So seems all pretty consistent to me. Matlab is a little better in >>> vectorized code as they presumely have a better memory caching. But still >>> explicit devectorization using the @devec macro performs best. So using >>> vectorized code in Julia is fine and "reasonable fast". If someone wants to >>> do performance tweaking I don't see the issue telling him about >>> devectorization. >>> >> >> Ahah !!! I was sure of it: we don't talk about the same thing. To me, >> @devec y = x + x .* x + x .* x >> is actually *vectorized* code :). When I'm talking about devectorizing >> code, I'm only talking about explicit loops. It's a shame that I only paid >> attention to Devectorize.jl yesterday night. This thing is awesome and it >> should be a great place to contribute to. >> >> This should be the very first answer to "this part of my code is too >> slow". >> >> >> Regarding the benchmarks you've done, thanks. Without evidence, no >> discussion. I agree. >> >> But there are two problems with your benchmarks. Firstly, you've not >> repeated them and therefore can't associate an uncertainty with them. Maybe >> matlab code is not actually faster. Secondly, what if matlab or julia >> actually spend most of its time getting the random vector? >> >> I'd recommend you to repeat your result and compare directly the >> estimated distribution function. I've done just that for your simple code. >> I just put N = rand(...) outside of the function each time. I also created >> devectorized versions of your code (I mean, with explicit loops written by >> myself). Once with ".*" as a multiplier and once with "*". The resulting >> kernel densities can be found attached. >> >> As you can see, the resulting functions are not even close from a >> Gaussian. Normality tests failed for each of those distributions. How to >> explain that? Easy: a typical desktop computer does nothing most of the >> time. Once you launch something, it spends it's time on this but from time >> to time very rarely, it needs to spend some CPU time on something else. >> Therefore, there is the mode : the most probable execution time and a tail >> that is bigger on the side of increasing times. >> >> I just go the time using tic(), toc(), so for each of the thousand >> repetitions of the exact same calculation, I could follow the execution >> time in real time. The fact that "Explicit loop (*)" has a bigger and >> stranger tail is directly related to the fact that I used my mouse quite >> intensively during that run. What does that mean? >> >> 1) One must repeat calculations for benchmarking. >> 2) Calculating the mean of the repetitions is useless because it is not a >> good estimator of the mode of the distribution. >> >> The point 1 is obvious, but not point 2 : please consider the second plot >> attached which is a zoomed in part of the first one. As you can see, the >> mode of the "Explicit loop (*)" curve is positioned slightly on the left >> compared to the "Explicit loop (.*)" curve. This certainly means that for >> scalars, "*" has less overhead than ".*" (maybe just a couple of extra >> "if"s or an extra function call). However, because I shook my mouse (on >> purpose) during that run (it was so funny to see the execution time bump on >> the terminal), it has a bigger asymmetric tail. >> >> The result? The ".*" is on "average" 3% faster than the "*" version. >> >> The problem? This is not a function benchmark, this is a system load >> benchmark. >> >> Oh my, I've done it again, sorry for the long post. >> >