I need those variables as globals as otherwise I will have to pass them from one function to another all the time.
I tried passing the arguments as function arguments and which did lead to an increase in performance. Is that the only way that I don't declare them as globals and pass them as arguments each time ? That is very cumbersome. For multiple processes, I am in fact using DArrays, and I am unaware how IPC takes place in Julia, but there is there an efficient way in Julia for the same ? Regards, Kapil Agarwal On Sat, Nov 1, 2014 at 11:42 AM, Stefan Karpinski <[email protected]> wrote: > There do look to be a lot of non-const globals in that code. Not sure if > they are used in performance critical sections of the benchmarks, but there > are some. > > On Sat, Nov 1, 2014 at 9:38 AM, Tim Holy <[email protected]> wrote: > >> Your code is long enough that I, for one, don't have time to dig into it >> myself. But as a guideline, Julia should not be massively slower than C, >> particularly on what seem (upon casual inspection) like very >> straightforward >> benchmarks. >> >> Have you read the "Performance tips" section of the manual and used the >> tools >> there to investigate it yourself? >> >> http://docs.julialang.org/en/latest/manual/performance-tips/ >> >> --Tim >> >> On Friday, October 31, 2014 11:16:44 AM Kapil Agarwal wrote: >> > Hi >> > >> > This is my first experiment with Julia and I wanted to share some >> results. >> > I have ported the STREAM benchmark (http://www.cs.virginia.edu/stream/) >> to >> > Julia. The code is available on github >> > (https://github.com/kapiliitr/JuliaBenchmarks/blob/master/streamp.jl). >> > >> > I am getting the following performance results in Julia - >> > >> > Array size = 5000000 (elements), Offset = 0 (elements) >> > Memory per array = 38.14697265625 MiB (= 0.03725290298461914 GiB) >> > Total memory required = 114.44091796875 MiB (= 0.11175870895385742 GiB) >> > Function Best Rate MB/s Avg time Min time Max time >> > Copy: 43.0 1.885108 1.861376 1.908840 >> > Scale: 37.1 2.166505 2.155083 2.177926 >> > Add: 48.2 2.532873 2.487158 2.578587 >> > Triad: 43.1 2.787225 2.784426 2.790023 >> > >> > I am getting the following performance results in C - >> > >> > Array size = 5000000 (elements), Offset = 0 (elements) >> > Memory per array = 38.1 MiB (= 0.0 GiB). >> > Total memory required = 114.4 MiB (= 0.1 GiB). >> > Each kernel will be executed 3 times. >> > Function Best Rate MB/s Avg time Min time Max time >> > Copy: 8553.3 0.009360 0.009353 0.009366 >> > Scale: 8248.4 0.009712 0.009699 0.009726 >> > Add: 9490.6 0.012987 0.012644 0.013329 >> > Triad: 9032.0 0.013540 0.013286 0.013793 >> > >> > >> > Following are the results with 4 processors in Julia- >> > >> > Function Best Rate MB/s Avg time Min time Max time >> > Copy: 11122.2 0.007308 0.007193 0.007423 >> > Scale: 465.5 0.217924 0.171840 0.264008 >> > Add: 12481.8 0.009678 0.009614 0.009742 >> > Triad: 471.3 0.267199 0.254624 0.279775 >> > >> > >> > Following are the results with 4 omp threads in C- >> > >> > Function Best Rate MB/s Avg time Min time Max time >> > Copy: 11077.0 0.007228 0.007222 0.007233 >> > Scale: 10552.7 0.007587 0.007581 0.007594 >> > Add: 11986.9 0.010023 0.010011 0.010036 >> > Triad: 12173.0 0.009865 0.009858 0.009872 >> > >> > As it can be seen that with one thread/process, performance of Julia is >> > much less than C for all the functions. However, for multi-process runs, >> > Julia performs similar to C for Copy and Add functions but it's >> performance >> > hits for Scale and Triad functions. >> > >> > What could be the reason behind this ? Could this be a problem in my >> > implementation or is this just the way Julia is implemented ? >> > >> > Thanks >> > >> > -- >> > Kapil >> >> >
