There do look to be a lot of non-const globals in that code. Not sure if
they are used in performance critical sections of the benchmarks, but there
are some.

On Sat, Nov 1, 2014 at 9:38 AM, Tim Holy <[email protected]> wrote:

> Your code is long enough that I, for one, don't have time to dig into it
> myself. But as a guideline, Julia should not be massively slower than C,
> particularly on what seem (upon casual inspection) like very
> straightforward
> benchmarks.
>
> Have you read the "Performance tips" section of the manual and used the
> tools
> there to investigate it yourself?
>
> http://docs.julialang.org/en/latest/manual/performance-tips/
>
> --Tim
>
> On Friday, October 31, 2014 11:16:44 AM Kapil Agarwal wrote:
> > Hi
> >
> > This is my first experiment with Julia and I wanted to share some
> results.
> > I have ported the STREAM benchmark (http://www.cs.virginia.edu/stream/)
> to
> > Julia. The code is available on github
> > (https://github.com/kapiliitr/JuliaBenchmarks/blob/master/streamp.jl).
> >
> > I am getting the following performance results in Julia -
> >
> > Array size = 5000000 (elements), Offset = 0 (elements)
> > Memory per array = 38.14697265625 MiB (= 0.03725290298461914 GiB)
> > Total memory required = 114.44091796875 MiB (= 0.11175870895385742 GiB)
> > Function    Best Rate MB/s  Avg time     Min time     Max time
> > Copy:              43.0     1.885108     1.861376     1.908840
> > Scale:             37.1     2.166505     2.155083     2.177926
> > Add:               48.2     2.532873     2.487158     2.578587
> > Triad:             43.1     2.787225     2.784426     2.790023
> >
> > I am getting the following performance results in C -
> >
> > Array size = 5000000 (elements), Offset = 0 (elements)
> > Memory per array = 38.1 MiB (= 0.0 GiB).
> > Total memory required = 114.4 MiB (= 0.1 GiB).
> > Each kernel will be executed 3 times.
> > Function    Best Rate MB/s  Avg time     Min time     Max time
> > Copy:            8553.3     0.009360     0.009353     0.009366
> > Scale:           8248.4     0.009712     0.009699     0.009726
> > Add:             9490.6     0.012987     0.012644     0.013329
> > Triad:           9032.0     0.013540     0.013286     0.013793
> >
> >
> > Following are the results with 4 processors in Julia-
> >
> > Function    Best Rate MB/s  Avg time     Min time     Max time
> > Copy:           11122.2     0.007308     0.007193     0.007423
> > Scale:            465.5     0.217924     0.171840     0.264008
> > Add:            12481.8     0.009678     0.009614     0.009742
> > Triad:            471.3     0.267199     0.254624     0.279775
> >
> >
> > Following are the results with  4 omp threads in C-
> >
> > Function    Best Rate MB/s  Avg time     Min time     Max time
> > Copy:           11077.0     0.007228     0.007222     0.007233
> > Scale:          10552.7     0.007587     0.007581     0.007594
> > Add:            11986.9     0.010023     0.010011     0.010036
> > Triad:          12173.0     0.009865     0.009858     0.009872
> >
> > As it can be seen that with one thread/process, performance of Julia is
> > much less than C for all the functions. However, for multi-process runs,
> > Julia performs similar to C for Copy and Add functions but it's
> performance
> > hits for Scale and Triad functions.
> >
> > What could be the reason behind this ? Could this be a problem in my
> > implementation or is this just the way Julia is implemented ?
> >
> > Thanks
> >
> > --
> > Kapil
>
>

Reply via email to