The code is really short:

N=2000000

function doit1(N)
    Fe=zeros(3,1);    Ns=zeros(3,1)+1.0;    f=-6.0;    Jac=1.0;
    for i=1:N
        Fe += Ns *  (f * Jac); 
    end
    Fe
end

function doit2(N)
    Fe=zeros(3,1);    Ns=zeros(3,1)+1.0;    f=-6.0;    Jac=1.0;
    for i=1:N
        Fe += Ns .*  ([f] * Jac); 
    end
    Fe
end

function doit3(N)
    Fe=zeros(3,1);    Ns=zeros(3,1)+1.0;    f=-6.0;    Jac=1.0;
    fs=[1.0]
    for i=1:N
        fs= ([f] * Jac );   Fe += Ns .*  fs; 
    end
    Fe
end

function doit4(N)
    Fe=zeros(3,1);    Ns=zeros(3,1)+1.0;    f=-6.0;    Jac=1.0;
    fs=[1.0]
    for i=1:N                   # 
        fs= [f]; fs *= Jac;   Fe += Ns .*  fs; 
    end
    Fe
end
# 
@time doit1(N)
@time doit2(N)
@time doit3(N)
@time doit4(N)

Here are the measurements on my machine:

elapsed time: 0.461173619 seconds (384129944 bytes allocated, 44.45% gc 
time)
elapsed time: 6.905249901 seconds (1904151036 bytes allocated, 12.84% gc 
time)
elapsed time: 6.862259146 seconds (1904166072 bytes allocated, 13.23% gc 
time)
elapsed time: 6.842678542 seconds (1904166256 bytes allocated, 12.81% gc 
time)

I'll be grateful for any pointers as to how to structure the code so that 
the array operations not incur this horrendous penalty.

Petr

On Wednesday, December 10, 2014 7:06:43 PM UTC-8, Tim Holy wrote:
>
> Can you post short but complete code examples in a gist? 
> https://gist.github.com/ 
> That would make it easier to follow than chasing code examples across long 
> email chains. 
>
> --Tim 
>
>
> On Wednesday, December 10, 2014 02:35:38 PM Petr Krysl wrote: 
> > I don't know if this is correct, but here is a guess: 
> > 
> > Option 3 still requires a temp array ( to calculate the result of the 
> > paren fs= ([f] * Jac * w[j]); ), and option 4 eliminates that temp. The 
> > cost of the temp over the 2 million loops is ~200MB and 0.6 sec CPU 
> time. 
> > So WHY is the difference between 1 and 2 so HUUUGE? 
> > 
> > I think this calls for someone who wrote the compiler. Guys? 
> > 
> > Thanks a bunch, 
> > 
> > P 
> > 
> > On Wednesday, December 10, 2014 12:51:26 PM UTC-8, Petr Krysl wrote: 
> > > Actually: option (4) was also tested: 
> > > # 16.333766821 seconds (3008899660 bytes 
> > > fs[1]= f; fs *= (Jac * w[j]); 
> > > 
> > >             Fe += Ns[j] .*  fs; 
> > > 
> > > So, allocation of memory was reduced somewhat, runtime not so much. 
> > > 
> > > On Wednesday, December 10, 2014 12:45:20 PM UTC-8, Petr Krysl wrote: 
> > >> Well,  temporary array was also on my mind.  However,  things are I 
> > >> believe a little bit more complicated. 
> > >> 
> > >> Here is the code with three timed options.  As you can see, the first 
> two 
> > >> options are the fast one (multiplication with a scalar) and the slow 
> one 
> > >> (multiplication with a one by one matrix).   In the third option I 
> tried 
> > >> to 
> > >> avoid the creation of an  ad hoc temporary by allocating a variable 
> > >> outside 
> > >> of the loop.  The effect unfortunately is nil. 
> > >> 
> > >>     fs=[0.0]# Used only for option (3) 
> > >>     # Now loop over all fes in the block 
> > >>     for i=1:size(conns,1) 
> > >>     
> > >>         ... 
> > >>         for j=1:npts 
> > >>         
> > >>            ... 
> > >>           
> > >>           # Option (1): 7.193767019 seconds (1648850568 bytes 
> > >>           # Fe += Ns[j] *  (f * Jac * w[j]); # 
> > >>           
> > >>            # Option (2): 17.301214583 seconds (3244458368 bytes 
> > >>           
> > >>           #    Fe += Ns[j] .*  ([f] * Jac * w[j]); # 
> > >>           
> > >>            # Option (3): 16.943314075 seconds (3232879120 
> > >>           
> > >>           fs= ([f] * Jac * w[j]);   Fe += Ns[j] .*  fs; 
> > >>         
> > >>         end 
> > >>         ... 
> > >>     
> > >>     end 
> > >> 
> > >> What do you think?   Why is the code still getting hit with a big 
> > >> performance/memory penalty? 
> > >> 
> > >> Thanks, 
> > >> 
> > >> Petr 
> > >> 
> > >> On Monday, December 8, 2014 2:03:02 PM UTC-8, Valentin Churavy wrote: 
> > >>> I would think that when f is a 1x1 matrix Julia is allocating a new 
> 1x1 
> > >>> matrix to store the result. If it is a scalar that allocation can be 
> > >>> skipped. When this part of the code is now in a hot loop it might 
> happen 
> > >>> that you allocate millions of very small short-lived objects and 
> that 
> > >>> taxes 
> > >>> the GC quite a lot. 
>
>

Reply via email to