I can't reproduce this at the REPL or if I put your loop in a function. (I can 
replicate your result---in my case a factor of 3---if I run your scripts.) But 
since you're also measuring JIT-compiling time, I'm not sure how seriously to 
take that (and it's kinda irrelevant anyway, since you never, ever do anything 
performance sensitive outside a function).

Check out the performance tips page:
http://docs.julialang.org/en/stable/manual/performance-tips/

--Tim

On Monday, January 18, 2016 10:17:26 AM Franco Venturi wrote:
> I wrote the Julia scripts below that just test the performance of the
> matrix multiplication in four cases (A' stands for A transpose, and B'
> stands for B transpose):
> 
>    1. A x B
>    2. A' x B
>    3. A x B'
>    4. A' x B'
> 
> I ran these scripts multiplying two 1,000 x 1,000 matrices 20 times, i.e.:
> 
> 
>    1. axb.jl -n 1000 -k 1000 -m 1000 -t 20
>    2. atxb.jl -n 1000 -k 1000 -m 1000 -t 20
>    3. axbt.jl -n 1000 -k 1000 -m 1000 -t 20
>    4. atxbt.jl -n 1000 -k 1000 -m 1000 -t 20
> 
> Cases 1, 2, 3 run in about the same time with a performance of about 70
> GFLOPS on my desktop, however case 4 is about 10 times slower.
> I run Linux Fedora 23, julia version 0.4.3, and openblas version 0.2.15.
> I also compiled Julia directly from the latest version on GitHub, used
> Intel MKL, and the results are similar (the absolute times are different,
> but the A' x B' case is still about 10 times slower than the other three
> cases).
> 
> Since Julia uses openblas/MKL function 'dgemm' (or 'sgemm' in the single
> precision case), I also wrote the same examples in C directly invoking
> openblas/MKL 'cblas_dgemm' function and there's no difference in these four
> cases using the native calls (the arguments for the function 'cblas_dgemm'
> have transpose flags for both A and B).
> 
> I am not sure why Julia is so much slower multiplying A' x B', so I thought
> I would bring it to your attention.
> 
> Regards,
> Franco Venturi
> 
> #!/bin/julia -q
> 
> using ArgParse
> 
> s = ArgParseSettings()
> 
> @add_arg_table s begin
>     "-m"
>     "-k"
>     "-n"
>     "-t"
> end
> 
> parsed_args = parse_args(s)
> m = parse(Int, parsed_args["m"])
> k = parse(Int, parsed_args["k"])
> n = parse(Int, parsed_args["n"])
> t = parse(Int, parsed_args["t"])
> 
> a = 2 * rand(m, k) - 1
> b = 2 * rand(k, n) - 1
> 
> tstart = time()
> for i = 1:t
>     c = a * b
> end
> tend = time()
> duration = (tend - tstart) / t
> mflops = (2 * m * n * k) / duration * 1.0e-6
> @printf("%dx%dx%d\t%f s\t%f MFLOPS\n", m, n, k, duration, mflops)
> 
> 
> #!/bin/julia -q
> 
> using ArgParse
> 
> s = ArgParseSettings()
> 
> @add_arg_table s begin
>     "-m"
>     "-k"
>     "-n"
>     "-t"
> end
> 
> parsed_args = parse_args(s)
> m = parse(Int, parsed_args["m"])
> k = parse(Int, parsed_args["k"])
> n = parse(Int, parsed_args["n"])
> t = parse(Int, parsed_args["t"])
> 
> a = (2 * rand(m, k) - 1)'
> b = 2 * rand(k, n) - 1
> 
> tstart = time()
> for i = 1:t
>     c = a' * b
> end
> tend = time()
> duration = (tend - tstart) / t
> mflops = (2 * m * n * k) / duration * 1.0e-6
> @printf("%dx%dx%d\t%f s\t%f MFLOPS\n", m, n, k, duration, mflops)
> 
> 
> 
> #!/bin/julia -q
> 
> using ArgParse
> 
> s = ArgParseSettings()
> 
> @add_arg_table s begin
>     "-m"
>     "-k"
>     "-n"
>     "-t"
> end
> 
> parsed_args = parse_args(s)
> m = parse(Int, parsed_args["m"])
> k = parse(Int, parsed_args["k"])
> n = parse(Int, parsed_args["n"])
> t = parse(Int, parsed_args["t"])
> 
> a = 2 * rand(m, k) - 1
> b = (2 * rand(k, n) - 1)'
> 
> tstart = time()
> for i = 1:t
>     c = a * b'
> end
> tend = time()
> duration = (tend - tstart) / t
> mflops = (2 * m * n * k) / duration * 1.0e-6
> @printf("%dx%dx%d\t%f s\t%f MFLOPS\n", m, n, k, duration, mflops)
> 
> 
> 
> #!/bin/julia -q
> 
> using ArgParse
> 
> s = ArgParseSettings()
> 
> @add_arg_table s begin
>     "-m"
>     "-k"
>     "-n"
>     "-t"
> end
> 
> parsed_args = parse_args(s)
> m = parse(Int, parsed_args["m"])
> k = parse(Int, parsed_args["k"])
> n = parse(Int, parsed_args["n"])
> t = parse(Int, parsed_args["t"])
> 
> a = (2 * rand(m, k) - 1)'
> b = (2 * rand(k, n) - 1)'
> 
> tstart = time()
> for i = 1:t
>     c = a' * b'
> end
> tend = time()
> duration = (tend - tstart) / t
> mflops = (2 * m * n * k) / duration * 1.0e-6
> @printf("%dx%dx%d\t%f s\t%f MFLOPS\n", m, n, k, duration, mflops)

Reply via email to