Hello Hui -- Sorry, I’d been busy with some other things and then was away last week. I’ll try to reproduce this and see what I can figure out this week.
greg > On Aug 21, 2017, at 10:05 AM, Hui Zhang <[email protected]> wrote: > > Any thoughts on why I got so much slow down using qthreads on HPL & LULESH? > HPL (seconds): > #nodes 2 4 8 16 32 > qthreads 14.65 20.4 27.93 42.00 108.79 > fifo 13.67 18.26 17.7 19.3 30.48 > > thanks > > On Tue, Aug 15, 2017 at 10:39 PM, Hui Zhang <[email protected]> wrote: > Well, good to know :) > > No difference for two builds (I made clean builds in two separate > copy of Chapel 1.15 with the only difference being the CHPL_TASKS) > > On Tue, Aug 15, 2017 at 6:35 PM, Greg Titus <[email protected]> wrote: > Hello Hui -- > > I think you already are risking correctness for performance. :-) > > One of the side effects of throwing '--fast' at compile time is to disable > guard pages by default at execution time. Plus, while fifo and qthreads > tasking don’t behave identically with respect to guard pages, they do both > support them and just a difference in guard page setting wouldn’t cause a > huge difference in performance between the two unless the app or benchmark > did a lot of task creation. So, I don’t think that’s it. > > A further question, then: were debug and optimize settings the same for both > your runtime builds (tasks=fifo and =qthreads), and for the third-party > package builds for qthreads and hwloc? > > greg > > > > On Aug 15, 2017, at 3:27 PM, Hui Zhang <[email protected]> wrote: > > > > Thanks, Aji, > > > > I've verified that it was configured with "--enable-guard-pages" by default > > when I built Chapel. But is that necessary for Chapel? Any thoughts from > > Chapel? I don't want to take the risk of correctness for performance. > > Thanks > > > > On Tue, Aug 15, 2017 at 4:50 PM, Aji, Ashwin <[email protected]> wrote: > > These are my 2 cents on measuring qthreads performance before in Chapel. If > > you configured qthreads with “--enable-guard-pages”, then the performance > > will be much slower than without enabling guard pages. It may be worthwhile > > to see how you have configured qthreads. > > > > > > > > Regards, > > > > Ashwin > > > > > > > > From: Hui Zhang [mailto:[email protected]] > > Sent: Tuesday, August 15, 2017 11:50 AM > > To: Greg Titus <[email protected]> > > Cc: Chapel Sourceforge Developers List > > <[email protected]> > > Subject: Re: [Chapel-developers] qthreads performance > > > > > > > > Hi, Greg > > > > > > > > On Tue, Aug 15, 2017 at 1:35 PM, Greg Titus <[email protected]> wrote: > > > > Hello Hui -- > > > > Generally CHPL_TASKS=qthreads outperforms CHPL_TASKS=fifo at all but the > > smallest scales. We would need to know a lot more to come to any > > worthwhile conclusions. What is the output of `printchplenv --anonymize` > > for your configurations (I assume they differ only in terms of the > > CHPL_TASKS setting)? > > > > > > > > CHPL_TARGET_PLATFORM: linux64 > > > > CHPL_TARGET_COMPILER: gnu > > > > CHPL_TARGET_ARCH: native * > > > > CHPL_LOCALE_MODEL: flat > > > > CHPL_COMM: gasnet * > > > > CHPL_COMM_SUBSTRATE: ibv * > > > > CHPL_GASNET_SEGMENT: large > > > > CHPL_TASKS: qthreads > > > > CHPL_LAUNCHER: gasnetrun_ibv * > > > > CHPL_TIMERS: generic > > > > CHPL_UNWIND: none > > > > CHPL_MEM: jemalloc > > > > CHPL_MAKE: gmake > > > > CHPL_ATOMICS: intrinsics > > > > CHPL_NETWORK_ATOMICS: none > > > > CHPL_GMP: gmp > > > > CHPL_HWLOC: hwloc > > > > CHPL_REGEXP: re2 > > > > CHPL_WIDE_POINTERS: struct > > > > CHPL_AUX_FILESYS: none > > > > Yes, the only difference is CHPL_TASKS. > > > > > > > > Are you using any compilation options other than ‘--fast’? What execution > > options are you using? > > > > For hpl: --n=500 --printArray=false --printStacts=true > > --useRandomSeed=false -nl * > > > > For lulesh: > > > > > > > > --filename=lmeshes/sedov15oct.lmesh -nl * > > > > For isx: --nide-weakISO --n=5592400 --numTrials=10 -nl * > > > > Are you setting any execution-time environment variables (CHPL_RT_*) and if > > so, to what values? > > > > NO > > > > > > > > And finally, what is the target architecture (number of nodes, number of > > CPU cores per node, etc.)? > > > > I use 2/4/8/16/32 nodes, each has 20 physical cores > > > > > > > > > > thanks, > > greg > > > > > > > > > On Aug 15, 2017, at 9:59 AM, Hui Zhang <[email protected]> wrote: > > > > > > Hello, > > > > > > I did some performance comparison between qthreads and fifo with 3 > > > benchmakrs: lulesh, hpl, and isx. I expected qthreads to outperform fifo > > > in all cases, but the result turns out to be superising. > > > For lulesh and hpl, in all tests (#nodes from 2 to 32), qthreads is much > > > slower (took 1.5~10x longer than that of fifo). For isx, qthreads beats > > > fifo with speedup of 1.5~2x. > > > > > > All benchmarks compiled with --fast and I'm using 1.15. So is what I'm > > > getting here reasonable? Any previous performance comparison between fifo > > > and qthreads on those benchmarks? > > > > > > Thanks > > > > > > -- > > > Best regards > > > > > > > > > Hui Zhang > > > > > ------------------------------------------------------------------------------ > > > Check out the vibrant tech community on one of the world's most > > > engaging tech sites, Slashdot.org! > > > http://sdm.link/slashdot_______________________________________________ > > > Chapel-developers mailing list > > > [email protected] > > > https://lists.sourceforge.net/lists/listinfo/chapel-developers > > > > > > > > > > > > > > -- > > > > Best regards > > > > > > Hui Zhang > > > > > > > > > > -- > > Best regards > > > > > > Hui Zhang > > > > > -- > Best regards > > > Hui Zhang > > > > -- > Best regards > > > Hui Zhang ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Chapel-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-developers
