There do look to be a lot of non-const globals in that code. Not sure if they are used in performance critical sections of the benchmarks, but there are some.
On Sat, Nov 1, 2014 at 9:38 AM, Tim Holy <[email protected]> wrote: > Your code is long enough that I, for one, don't have time to dig into it > myself. But as a guideline, Julia should not be massively slower than C, > particularly on what seem (upon casual inspection) like very > straightforward > benchmarks. > > Have you read the "Performance tips" section of the manual and used the > tools > there to investigate it yourself? > > http://docs.julialang.org/en/latest/manual/performance-tips/ > > --Tim > > On Friday, October 31, 2014 11:16:44 AM Kapil Agarwal wrote: > > Hi > > > > This is my first experiment with Julia and I wanted to share some > results. > > I have ported the STREAM benchmark (http://www.cs.virginia.edu/stream/) > to > > Julia. The code is available on github > > (https://github.com/kapiliitr/JuliaBenchmarks/blob/master/streamp.jl). > > > > I am getting the following performance results in Julia - > > > > Array size = 5000000 (elements), Offset = 0 (elements) > > Memory per array = 38.14697265625 MiB (= 0.03725290298461914 GiB) > > Total memory required = 114.44091796875 MiB (= 0.11175870895385742 GiB) > > Function Best Rate MB/s Avg time Min time Max time > > Copy: 43.0 1.885108 1.861376 1.908840 > > Scale: 37.1 2.166505 2.155083 2.177926 > > Add: 48.2 2.532873 2.487158 2.578587 > > Triad: 43.1 2.787225 2.784426 2.790023 > > > > I am getting the following performance results in C - > > > > Array size = 5000000 (elements), Offset = 0 (elements) > > Memory per array = 38.1 MiB (= 0.0 GiB). > > Total memory required = 114.4 MiB (= 0.1 GiB). > > Each kernel will be executed 3 times. > > Function Best Rate MB/s Avg time Min time Max time > > Copy: 8553.3 0.009360 0.009353 0.009366 > > Scale: 8248.4 0.009712 0.009699 0.009726 > > Add: 9490.6 0.012987 0.012644 0.013329 > > Triad: 9032.0 0.013540 0.013286 0.013793 > > > > > > Following are the results with 4 processors in Julia- > > > > Function Best Rate MB/s Avg time Min time Max time > > Copy: 11122.2 0.007308 0.007193 0.007423 > > Scale: 465.5 0.217924 0.171840 0.264008 > > Add: 12481.8 0.009678 0.009614 0.009742 > > Triad: 471.3 0.267199 0.254624 0.279775 > > > > > > Following are the results with 4 omp threads in C- > > > > Function Best Rate MB/s Avg time Min time Max time > > Copy: 11077.0 0.007228 0.007222 0.007233 > > Scale: 10552.7 0.007587 0.007581 0.007594 > > Add: 11986.9 0.010023 0.010011 0.010036 > > Triad: 12173.0 0.009865 0.009858 0.009872 > > > > As it can be seen that with one thread/process, performance of Julia is > > much less than C for all the functions. However, for multi-process runs, > > Julia performs similar to C for Copy and Add functions but it's > performance > > hits for Scale and Triad functions. > > > > What could be the reason behind this ? Could this be a problem in my > > implementation or is this just the way Julia is implemented ? > > > > Thanks > > > > -- > > Kapil > >
