Any thoughts on why I got so much slow down using qthreads on HPL & LULESH?
HPL (seconds):
#nodes 2 4 8 16 32
qthreads 14.65 20.4 27.93 42.00 108.79
fifo 13.67 18.26 17.7        19.3      30.48
thanks

On Tue, Aug 15, 2017 at 10:39 PM, Hui Zhang <[email protected]>
wrote:

> Well, good to know :)
>
> No difference for two builds (I made clean builds in two separate
> copy of Chapel 1.15 with the only difference being the CHPL_TASKS)
>
> On Tue, Aug 15, 2017 at 6:35 PM, Greg Titus <[email protected]> wrote:
>
>> Hello Hui --
>>
>> I think you already are risking correctness for performance.  :-)
>>
>> One of the side effects of throwing '--fast' at compile time is to
>> disable guard pages by default at execution time.  Plus, while fifo and
>> qthreads tasking don’t behave identically with respect to guard pages, they
>> do both support them and just a difference in guard page setting wouldn’t
>> cause a huge difference in performance between the two unless the app or
>> benchmark did a lot of task creation.  So, I don’t think that’s it.
>>
>> A further question, then: were debug and optimize settings the same for
>> both your runtime builds (tasks=fifo and =qthreads), and for the
>> third-party package builds for qthreads and hwloc?
>>
>> greg
>>
>>
>> > On Aug 15, 2017, at 3:27 PM, Hui Zhang <[email protected]>
>> wrote:
>> >
>> > Thanks, Aji,
>> >
>> > I've verified that it was configured with "--enable-guard-pages" by
>> default when I built Chapel. But is that necessary for Chapel? Any thoughts
>> from Chapel? I don't want to take the risk of correctness for performance.
>> > Thanks
>> >
>> > On Tue, Aug 15, 2017 at 4:50 PM, Aji, Ashwin <[email protected]>
>> wrote:
>> > These are my 2 cents on measuring qthreads performance before in
>> Chapel. If you configured qthreads with “--enable-guard-pages”, then the
>> performance will be much slower than without enabling guard pages. It may
>> be worthwhile to see how you have configured qthreads.
>> >
>> >
>> >
>> > Regards,
>> >
>> > Ashwin
>> >
>> >
>> >
>> > From: Hui Zhang [mailto:[email protected]]
>> > Sent: Tuesday, August 15, 2017 11:50 AM
>> > To: Greg Titus <[email protected]>
>> > Cc: Chapel Sourceforge Developers List <[email protected]
>> eforge.net>
>> > Subject: Re: [Chapel-developers] qthreads performance
>> >
>> >
>> >
>> > Hi, Greg
>> >
>> >
>> >
>> > On Tue, Aug 15, 2017 at 1:35 PM, Greg Titus <[email protected]> wrote:
>> >
>> > Hello Hui --
>> >
>> > Generally CHPL_TASKS=qthreads outperforms CHPL_TASKS=fifo at all but
>> the smallest scales.  We would need to know a lot more to come to any
>> worthwhile conclusions.  What is the output of `printchplenv --anonymize`
>> for your configurations (I assume they differ  only in terms of the
>> CHPL_TASKS setting)?
>> >
>> > ​
>> >
>> > CHPL_TARGET_PLATFORM: linux64
>> >
>> > CHPL_TARGET_COMPILER: gnu
>> >
>> > CHPL_TARGET_ARCH: native *
>> >
>> > CHPL_LOCALE_MODEL: flat
>> >
>> > CHPL_COMM: gasnet *
>> >
>> >   CHPL_COMM_SUBSTRATE: ibv *
>> >
>> >   CHPL_GASNET_SEGMENT: large
>> >
>> > CHPL_TASKS: qthreads
>> >
>> > CHPL_LAUNCHER: gasnetrun_ibv *
>> >
>> > CHPL_TIMERS: generic
>> >
>> > CHPL_UNWIND: none
>> >
>> > CHPL_MEM: jemalloc
>> >
>> > CHPL_MAKE: gmake
>> >
>> > CHPL_ATOMICS: intrinsics
>> >
>> >   CHPL_NETWORK_ATOMICS: none
>> >
>> > CHPL_GMP: gmp
>> >
>> > CHPL_HWLOC: hwloc
>> >
>> > CHPL_REGEXP: re2
>> >
>> > CHPL_WIDE_POINTERS: struct
>> >
>> > CHPL_AUX_FILESYS: none
>> >
>> > Yes, the only difference is CHPL_TASKS.​
>> >
>> >
>> >
>> > Are you using any compilation options other than ‘--fast’?  What
>> execution options are you using?
>> >
>> > ​For hpl:  --n=500 --printArray=false --printStacts=true
>> --useRandomSeed=false -nl *
>> >
>> > For lulesh: ​
>> >
>> >
>> >
>> > ​--filename=lmeshes/sedov15oct.lmesh -nl *
>> >
>> > For isx:  --nide-weakISO --n=5592400 --numTrials=10​ -nl *
>> >
>> > Are you setting any execution-time environment variables (CHPL_RT_*)
>> and if so, to what values?
>> >
>> > ​NO​
>> >
>> >
>> >
>> > And finally, what is the target architecture (number of nodes, number
>> of CPU cores per node, etc.)?
>> >
>> > ​I use 2/4/8/16/32 nodes, each has 20 physical cores​
>> >
>> >
>> >
>> >
>> > thanks,
>> > greg
>> >
>> >
>> >
>> > > On Aug 15, 2017, at 9:59 AM, Hui Zhang <[email protected]>
>> wrote:
>> > >
>> > > Hello,
>> > >
>> > > I did some performance comparison between qthreads and fifo with 3
>> benchmakrs: lulesh, hpl, and isx. I expected qthreads to outperform fifo in
>> all cases, but the result turns out to be superising.
>> > > For lulesh and hpl, in all tests (#nodes from 2 to 32), qthreads is
>> much slower (took 1.5~10x longer than that of fifo). For isx, qthreads
>> beats fifo with speedup of 1.5~2x.
>> > >
>> > > All benchmarks compiled with --fast and I'm using 1.15. So is what
>> I'm getting here reasonable? Any previous performance comparison between
>> fifo and qthreads on those benchmarks?
>> > >
>> > > Thanks
>> > >
>> > > --
>> > > Best regards
>> > >
>> > >
>> > > Hui Zhang
>> >
>> > > ------------------------------------------------------------
>> ------------------
>> > > Check out the vibrant tech community on one of the world's most
>> > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
>> _________________________________________
>> > > Chapel-developers mailing list
>> > > [email protected]
>> > > https://lists.sourceforge.net/lists/listinfo/chapel-developers
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Best regards
>> >
>> >
>> > Hui Zhang
>> >
>> >
>> >
>> >
>> > --
>> > Best regards
>> >
>> >
>> > Hui Zhang
>>
>>
>
>
> --
> Best regards
>
>
> Hui Zhang
>



-- 
Best regards


Hui Zhang
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to