On Tuesday, February 11, 2014 9:18:16 AM UTC-5, Jutho wrote:
>
> So to make a fair comparison to that c implementation, I have to compare 
> the Julia speed (10-15 times BLAS speed) with the C speed (1.3 times BLAS 
> speed) in the first regime, and the Julia speed (100 times BLAS speed) with 
> the C speed (4 to 5 times BLAS speed) in the second regime. Any idea on 
> where the big difference between Julia and C is coming from?
>

I would do your own C benchmark rather than trusting the one on that web 
page.  For example, it's not clear what BLAS implementation they are using 
there, and this makes a huge difference.  Also, that benchmark was on a 
fairly old machine, and the difference between optimized BLAS performance 
and naive 3-loop performance has only increased over time.   It may not be 
particularly meaningful to compare (your Julia)/(your BLAS) to (their 
C)/(their BLAS).

Also, for small sizes, you may want to replace e.g.

t1=@elapsed mygemm!(1.,A,B,0.,C)


with something like

t1=(@elapsed for i=1:100; mygemm!(1.,A,B,0.,C); end)/100


and similarly for the BLAS benchmark, to make sure you get accurate timings.

Reply via email to