On Fri, Sep 2, 2016 at 7:41 AM, Mauro <[email protected]> wrote:

> On Fri, 2016-09-02 at 13:34, Jong Wook Kim <[email protected]> wrote:
> > Hi Yichao, what a nice idea :)
> >
> > But even if I write in the C++ way,  @time sqrt(1) yields 5 allocations
> of 176
> > bytes, and in inner loops this could be a bottleneck.
>

It's the allocation due to doing things in the global scope. try `f() =
@time sqrt(1); f()`


>
> Those are just allocations for the return value of sqrt.  Consider:
>
> julia> function f(n)
>        out = 0.0
>        for i=1:n
>        out += sin(n)
>        end
>        out
>        end
> f (generic function with 1 method)
>
> julia> @time f(10) # warmup
>   0.000008 seconds (149 allocations: 10.167 KB)
> -5.440211108893696
>
> julia> @time f(10)
>   0.000005 seconds (5 allocations: 176 bytes)
> -5.440211108893696
>
> julia> @time f(10000)
>   0.000849 seconds (5 allocations: 176 bytes)
> -3056.143888882987
>
>
> > Is this an inevitable overhead of using ccall, or is it just a bogus
> that I can
> > ignore?
> >
> > Jong Wook
> >
> >
> >     On Sep 2, 2016, at 7:14 AM, Yichao Yu <[email protected]> wrote:
> >
> >
> >
> >     On Fri, Sep 2, 2016 at 7:03 AM, Jong Wook Kim <[email protected]>
> wrote:
> >
> >         Hi,
> >
> >         I'm using Julia 0.4.6 on OSX El Capitan, and was trying to
> normalize
> >         each column of matrix, so that the norm of each column becomes
> 1. Below
> >         is a isolated and simplified version of what I'm doing:
> >
> >         function foo1()
> >             local a = rand(1000, 10000)
> >             @time for i in 1:size(a, 2)
> >                 a[:, i] /= norm(a[:, i])
> >             end
> >         end
> >
> >         foo1()
> >         0.165662 seconds (117.44 k allocations: 232.505 MB, 37.08% gc
> time)
> >
> >         I thought maybe the array copying is the problem, but this
> didn't help
> >         much:
> >
> >         function foo2()
> >             local a = rand(1000, 10000)
> >             @time for i in 1:size(a, 2)
> >                 a[:, i] /= norm(slice(a, :, i))
> >             end
> >         end
> >
> >         foo2()
> >         0.131377 seconds (98.47 k allocations: 155.921 MB, 36.66% gc
> time)
> >
> >         and then I figured that this ugly one runs the fastest:
> >
> >         function foo3()
> >             local a = rand(1000, 10000)
> >             @time for i in 1:size(a, 2)
> >                 setindex!(a, norm(slice(a, :, i)), :, i)
> >             end
> >         end
> >
> >         foo3()
> >         0.013814 seconds (49.49 k allocations: 1.365 MB, 4.86% gc time)
> >
> >         So I overheard a few times that plain for-loops are faster than
> >         vectorized code in Julia, and it seems it's allocating slightly
> less
> >         memory, but it's slower than the above.
> >
> >         function foo4()
> >             local a = rand(1000, 10000)
> >             @time @inbounds for i in 1:size(a, 2)
> >                 n = norm(slice(a, :, i))
> >                 @inbounds for j in 1:size(a, 1)
> >                     a[j, i] /= n
> >                 end
> >             end
> >         end
> >
> >         foo4()
> >         0.055522 seconds (30.00 k allocations: 1.068 MB, 15.14% gc time)
> >
> >         Is there a solution that is faster and less uglier than foo3()
> and foo4
> >         ()?
> >
> >         Thinking of an equivalent implementation in C/C++, I should be
> able to
> >         write this logic without any heap allocation. Is it possible in
> Julia?
> >
> >
> >     You can write it in the way you'd write it in c++ and just don't use
> `norm
> >     `.
> >
> >
> >
> >         Thanks,
> >         Jong Wook
>

Reply via email to