Hello, Elliot and Greg,

I'm having trouble running the benchmark isx on more than 12 nodes. When I
use
 16 nodes, it'll time out. It's not the same #threads problem I had with
lulesh. I tried setting CHPL_RT_NUM_THREADS_PER_LOCALE from 2~255, but it
still timed out. Is there any way to run isx with more nodes?

Thanks

On Tue, Aug 1, 2017 at 2:37 PM, Elliot Ronaghan <erona...@cray.com> wrote:

> [re-adding Greg]
>
> I'm not sure, that seems odd. That said fifo is our portable tasking
> layer, not our performance oriented one, so I don't think we'll have time
> to investigate.
>
> Elliot
>
> -----Original Message-----
> From: Hui Zhang <wayne.huizh...@gmail.com>
> Date: Tuesday, August 1, 2017 at 12:40 PM
> To: Elliot Ronaghan <erona...@cray.com>
> Subject: Re: [Chapel-developers] Too many simultaneous local client threads
>
> However, the reality is when I setting it to 200 for 32 nodes, it'll end
> up with 70 seconds instead of 28 seconds I got with 4. So why?
>
> On Aug 1, 2017 12:29 PM, "Elliot Ronaghan" <erona...@cray.com> wrote:
>
> Under fifo, ideally you don't want to set CHPL_RT_NUM_THREADS_PER_LOCALE.
> You want fifo to be able create as any pthreads as it wants. If you have to
> set it, set it as high as you can, otherwise you're limiting the amount of
> parallelism on a node. It effectively
>  limits the number of pthreads that are created.
>
>
> -----Original Message-----
> From: Hui Zhang <wayne.huizh...@gmail.com>
> Date: Tuesday, August 1, 2017 at 12:24 PM
> To: Elliot Ronaghan <erona...@cray.com>
> Cc: Greg Titus <g...@cray.com>
> Subject: Re: [Chapel-developers] Too many simultaneous local client threads
>
> Hello, Elliot
>
>
> After setting CHPL_RT_NUM_THREADS_PER_LOCALE, it
>  runs now. I also manually tuned this parameter and found a huge
> difference in tuning. Here's what I found of the best performance of lulesh
> (compiled with --fast) on the different number of nodes with a different
> value of this parameter (set 4 means setting
>  CHPL_RT_NUM_THREADS_PER_LOCALE=4
> in this test case)
>
>
> #nodes:         1(set 4)2 (set 12)4(set 10)8(set 4)16(set 4)32(set 4)best
> perf(s)    2.1917.745417.766519.210423.167628.0629
>
>
>
> Therefore, I'm wondering:
> 1. Is there a kinda guideline of what number we should set that parameter
> to with the different number of nodes to get the best performance in that
> case? Also different compilation flags would
>  end up in choosing different "best" value of
> CHPL_RT_NUM_THREADS_PER_LOCALE
> 2. So if CHPL_RT_NUM_THREADS_PER_LOCALE=4 and #nodes=10, does it mean at
> most 4 cores are used
>  on each node and the maximum parallelism is only 40?  even though we have
> access to 20 cores on each node...
>
>
> [to answer the 'has to be fifo' question]  I use PAPI to sample the
> execution, which needs explicit initialization on each thread. And it seems
> like qthread will implicitly create threads and currently
>  cannot be plugged-in with PAPI directly, but I have not looked into that
> much yet...
>
>
> Thanks
>
>
> On Fri, Jul 28, 2017 at 12:10 PM, Elliot Ronaghan
> <erona...@cray.com> wrote:
>
> [re-adding Greg]
>
> Ok, I don't see anything that sticks out. Maye try setting
> CHPL_RT_NUM_THREADS_PER_LOCALE to 200 or so?
>
> Also, can you remind me why you're using fifo? Qthreads is by far more
> tested and robust at this point.
>
> Elliot
>
> -----Original Message-----
> From: Hui Zhang <wayne.huizh...@gmail.com>
> Date: Friday, July 28, 2017 at 11:57 AM
> To: Elliot Ronaghan <erona...@cray.com>
> Subject: Re: [Chapel-developers] Too many simultaneous local client threads
>
> Hello, Elliot
>
> CHPL_TASKS=fifo
> CHPL_COMM=gasnet
> CHPL_PAPI_SUPPORT=1
> CHPL_COMM_SUBSTRATE=ibv
> CHPL_HOST_PLATFORM=linux64
> CHPL_TARGET_ARCH=native
> CHPL_LLVM=llvm
> CHPL_LLVM_SRC=/lustre/hzhang86/bin/chapel-15/third-
> party/llvm/build/linux64-gnu
> CHPL_HOME=/lustre/hzhang86/bin/chapel-15
> CHPL_LAUNCHER=gasnetrun_ibv
> CHPL_LLVM_PASS_LIB=/lustre/hzhang86/bin/chapel-15/third-
> party/llvm/build/linux64-gnu/Release+Asserts/lib
>
> GASNET_ROUTE_OUTPUT=0
> GASNET_PHYSMEM_MAX=1G
> GASNET_SSH_OPTIONS=-o LogLevel=Error
> GASNET_IBV_SPAWNER=ssh
>
>
>
>
> On Fri, Jul 28, 2017 at 11:22 AM, Elliot Ronaghan
> <erona...@cray.com> wrote:
>
> [Adding Greg]
>
> I'm not really seeing anything that would explain this. It looks like the
> pthread limiting code in gasnet/fifo is working correctly to me. I was also
> able to run lulesh and hpl with -nl1024 without any issues, so I'm not
> having any luck reproducing locally.
>
> Can you send the results of `printenv | grep CHPL_` and `printenv | grep
> GASNET_`?
>
> Elliot
>
> -----Original Message-----
> From: Hui Zhang <wayne.huizh...@gmail.com>
> Date: Wednesday, July 26, 2017 at 10:51 AM
> To: Elliot Ronaghan <erona...@cray.com>
> Subject: Fwd: [Chapel-developers] Too many simultaneous local client
> threads
>
> Hello, Elliot
>
>
> Thanks for your reply. Here's the email thread of my question. I have also
> tested with lulesh, which crashed at 16 nodes due to the same reason.
>
> Enjoy the vocation.
>
> ---------- Forwarded message ----------
> From: Hui Zhang <wayne.huizh...@gmail.com>
> Date: Tue, Jul 25, 2017 at 12:09 PM
> Subject: Re: [Chapel-developers] Too many simultaneous local client threads
> To: Greg Titus <g...@cray.com>
> Cc: Michael Ferguson <mfergu...@cray.com>, Chapel Sourceforge Developers
> List <chapel-developers@lists.sourceforge.net>
>
>
> Thanks, Greg.
>
>
> It looks like I just got into this little pitfall based on my setting. I
> actually tried to set "CHPL_RT_NUM_THREADS_PER_LOCALE"
>  to be the number of physical cores on each node. But that caused the
> program to hang and get terminated due to time expiration.
>
>
> On Tue, Jul 25, 2017 at 11:59 AM, Greg Titus
> <g...@cray.com> wrote:
>
> Hi Michael, Hui --
>
> There's a little bit of interface (Elliot is familiar with it) through
> which comm layers can tell tasking layers to create no more than a certain
> number of pthreads.  The gasnet comm layer uses this to limit the tasking
> layer to no more than 256 pthreads, and
>  tasks=fifo is supposed to pay attention to this.  It sounds like this
> little bit of interface has gotten broken at some point.  (Because here we
> have GASNet barking that the fifo tasking layer is trying to create more.)
> Perhaps with help from Elliot you can
>  look into what's gone wrong there -- user code shouldn't be able to drive
> the runtime into producing this error message.
>
> I can't help much beyond this, sorry, I'm traveling and the Nashville
> airport wifi is horrid.
>
> greg
>
> ________________________________________
> From: Michael Ferguson <mfergu...@cray.com>
> Sent: Tuesday, July 25, 2017 9:36 AM
> To: Hui Zhang; Chapel Sourceforge Developers List
> Subject: Re: [Chapel-developers] Too many simultaneous local client threads
>
> Hi -
>
> >Hello,
> >
> >
> >
> >I'm running the hpl benchmark with the latest Chapel release 1.15 on
> >multi-locale. It's running OK on 2,4,8,16 nodes, until 32 nodes, where it
> >reports the error:
> >*** FATAL ERROR: GASNet Extended API: Too many simultaneous local client
> >threads (limit=256). To raise this limit, configure GASNet using
> >--with-max-pthreads-per-node=N.
> >
> >
> >Is it reasonable to have this error when running on 32 nodes?
>
> I'm not familiar with that error but it sounds like something to do
> with the number of threads you are running per locale. How many
> cores do the compute nodes have? Is it possible that all of the
> tasks are accidentally running on 1 node?
>
> Have you tried running the Chapel program with the -v option
> to observe the job launcher? Does hello6-taskpar-dist.chpl print
> out different machine names for each locale when you run it?
>
> > I read this
> >
> >http://chapel.cray.com/docs/1.15/usingchapel/tasks.html#id2
> ><http://chapel.cray.com/docs/1.15/usingchapel/tasks.html#id2>  In order
> >to run it on 32 nodes, shall I set
> >CHPL_RT_NUM_THREADS_PER_LOCALE
> >and rebuild Chapel? Or in what other way?  Thanks
> >
> >
> >Here's my env:
> >
> >CHPL_TARGET_PLATFORM: linux64
> >CHPL_TARGET_COMPILER: gnu
> >CHPL_TARGET_ARCH: native *
> >CHPL_LOCALE_MODEL: flat
> >CHPL_COMM: gasnet *
> >  CHPL_COMM_SUBSTRATE: ibv *
> >  CHPL_GASNET_SEGMENT: large
> >CHPL_TASKS: fifo *
> >CHPL_LAUNCHER: gasnetrun_ibv *
> >CHPL_TIMERS: generic
> >CHPL_UNWIND: none
> >CHPL_MEM: jemalloc
> >CHPL_MAKE: gmake
> >CHPL_ATOMICS: intrinsics
> >  CHPL_NETWORK_ATOMICS: none
> >CHPL_GMP: gmp
> >CHPL_HWLOC: none
> >CHPL_REGEXP: re2
> >CHPL_WIDE_POINTERS: struct
> >CHPL_AUX_FILESYS: none
> >
> >
> >
>
>
> >?Besides, I'm a little confused of "local" statement. According to
> >http://chapel.cray.com/docs/1.15/technotes/local.html?highlight=local, is
> >it just used to assert the communication free statements in the block? Or
> >it's used as an performance
> > optimization?
>
> local blocks are a performance optimization. They do assert that
> the operations are local but might not do so with --fast.
>
> -michael
>
> >
> >
> >
> >Thanks
> >
> >
> >--
> >Best regards
> >
> >
> >Hui Zhang
> >
> >
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org!
> http://sdm.link/slashdot <http://sdm.link/slashdot>
> _______________________________________________
> Chapel-developers mailing list
> Chapel-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/chapel-developers
>
>
>
>
>
>
> --
> Best regards
>
>
> Hui Zhang
>
>
>
>
>
>
>
>
> --
> Best regards
>
>
> Hui Zhang
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Best regards
>
>
> Hui Zhang
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Best regards
>
>
> Hui Zhang
>
>
>
>
>
>
>
>
>
>
>


-- 
Best regards


Hui Zhang
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
Chapel-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to