Hi -

On 9/16/16, 3:55 PM, "Brad Chamberlain" <br...@cray.com> wrote:

>
>Is there anything that could/should be done for the release to ease
>future 
>user pain?  (in terms of either code or documentation changes?)

Yes, absolutely. I'm looking at updating multilocale.rst and launcher.rst.
I think these documents need to be clearer about not just what variables
are set but where they fit in. (E.g., you don't use sbatch with
gasnetrun_ibv).

-michael

>On Fri, 16 Sep 2016, Michael Ferguson wrote:
>
>> Hi -
>>
>> (For the archives). I was able to help Hui and got 3 different ways of
>> launching Chapel programs working on that Infiniband cluster:
>>
>> 1) export CHPL_LAUNCHER=slurm-gasnetrun_ibv
>>   export CHPL_LAUNCHER_WALLTIME=00:15:00
>>
>>   export SLURM_PARTITION=debug
>>   make
>>   chpl program.chpl
>>   ./a.out -nl 3
>>
>> 2) export CHPL_LAUNCHER=gasnetrun_ibv
>>   export GASNET_IBV_SPAWNER=S
>>   make
>>   chpl program.chpl
>>   salloc -N number-of-locales
>>     # in the salloc shell:
>>     export GASNET_SSH_SERVERS=`scontrol show hostnames`
>>     ./a.out -nl 3
>>
>> 3) export CHPL_LAUNCHER=gasnetrun_ibv
>>   export GASNET_IBV_SPAWNER=S
>>   make
>>   chpl program.chpl
>>   sbatch job.sh
>>
>>   where job.sh is an sbatch script that contains
>>   export GASNET_SSH_SERVERS=`scontrol show hostnames`
>>   among other things:
>>
>>   job.sh file contains:
>>
>> #!/bin/bash
>> #SBATCH -t 0:10:0
>> #SBATCH --nodes=3
>> #SBATCH --exclusive
>> #SBATCH --partition=debug
>> #SBATCH --output=/path-to-job-output
>>
>> export GASNET_SSH_SERVERS=`scontrol show hostnames`
>> export GASNET_IBV_SPAWNER=ssh
>> export GASNET_PHYSMEM_MAX=1G # Limit GASNet's IBV conduit probing
>>
>> export GASNET_SSH_OPTIONS="-o LogLevel=Error" #disable login banner into
>> the output
>>
>> cd some-directory
>>
>> ./a.out -nl 3
>>
>>
>>
>> Note:
>>
>> * GASNET_CSPAWN_CMD does not work with GASNet's ibv launcher.
>>
>> * It appears to be necessary to run GASNet's ibv launcher
>>   (simply running the _real executables in sbatch or srun
>>    isn't sufficient).
>> * Setting GASNET_PHYSMEM_MAX and possibly GASNET_PHYSMEM_NOPROBE
>>   is important for job launches to take a reasonable amount of time
>>
>> Cheers,
>>
>> -michael
>>
>>
>>
>>
>>
>> On 9/10/16, 12:17 AM, "Hui Zhang" <wayne.huizh...@gmail.com> wrote:
>>
>>> Hello, Greg
>>>
>>>
>>> I did two ways:
>>> 1. use batch script
>>> CHPL_COMM=gasnet
>>> CHPL_LAUNCHER=slurm-gasnetrun_ibv
>>> CHPL_COMM_SUBSTRATE=ibv
>>> GASNET_ROUTE_OUTPUT=0
>>> GASNET_VERBOSEENV=1
>>> GASNET_SSH_OPTIONS="-o LogLevel=Error" #disable login banner
>>>
>>>
>>>
>>> GASNET_SPAWNFN=C
>>> GASNET_CSPAWN_CMD='srun -N%N %C'
>>>
>>>
>>> cmd:
>>> $CHPL_HOME/test/release/examples/hello6-taskpar-dist_real -nl 4
>>> --tasksPerLocale=6 -v
>>>
>>>
>>> 2. use interactive:
>>> same Envs, except I didn't set GASNET_SPAWNFN, and use srun explicitly:
>>>
>>>
>>> salloc -N 4 -t 00:15:00 -p debug
>>> srun $CHPL_HOME/test/release/examples/hello6-taskpar-dist_real -nl 4
>>> --tasksPerLocale=6 -v
>>>
>>>
>>> Both gives me the same error:
>>>
>>>
>>> *** FATAL ERROR: Requested spawner "(not set)" is unknown or not
>>> supported in this build
>>> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before
>>> gasneti_backtrace_init
>>> *** FATAL ERROR: Requested spawner "(not set)" is unknown or not
>>> supported in this build
>>> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before
>>> gasneti_backtrace_init
>>> *** FATAL ERROR: Requested spawner "(not set)" is unknown or not
>>> supported in this build
>>> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before
>>> gasneti_backtrace_init
>>> *** FATAL ERROR: Requested spawner "(not set)" is unknown or not
>>> supported in this build
>>> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before
>>> gasneti_backtrace_init
>>> srun: error: compute-b28-47: task 0: Aborted (core dumped)
>>> srun: error: compute-b28-49: task 2: Aborted (core dumped)
>>> srun: error: compute-b28-48: task 1: Aborted (core dumped)
>>> srun: error: compute-b28-50: task 3: Aborted (core dumped)
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> On Fri, Sep 9, 2016 at 7:20 PM, Greg Titus
>>> <g...@cray.com> wrote:
>>>
>>> Hello Hui --
>>>
>>> I've somewhat lost track of your environment settings.  What do you
>>>have
>>> CHPL_LAUNCHER and CHPL_COMM_SUBSTRATE set to now, and also what are the
>>> settings of all of your GASNet-specific env vars, such as
>>>GASNET_SPAWNFN
>>> and the like?
>>>
>>> thanks,
>>> greg
>>>
>>>
>>>
>>> On Fri, 9 Sep 2016, Hui Zhang wrote:
>>>
>>>
>>> Hello, team
>>> Following up the previous issue, I've found out that was because I was
>>> missing libibverbs.so.1 in the machine. After adding that, I came to an
>>> error exactly the same as I found in an old thread in the mailing list:
>>> https://sourceforge.net/p/chapel/mailman/message/34769706/
>>>
>>> ** FATAL ERROR: Requested spawner "(not set)" is unknown or not
>>>supported
>>> in this build
>>> WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before
>>> gasneti_backtrace_init
>>>
>>> srun: error: node01: task 0: Aborted
>>> srun: error: node03: task 2: Aborted
>>> srun: error: node02: task 1: Aborted
>>>
>>> But I don't see a solution provided, so is there any method tried to
>>>fix
>>> this problem ?
>>>
>>> Thanks
>>>
>>>
>>> On Wed, Sep 7, 2016 at 11:22 PM, Hui Zhang <wayne.huizh...@gmail.com>
>>> wrote:
>>>      Update:
>>> I tried chapel 1.11 and the master, both gives me the same result
>>> (not outputting anything). Executing with -v gives me one line
>>> message:
>>> expect .chpl-expect-# (some number, not fixed from run to run)
>>>
>>>
>>> On Wed, Sep 7, 2016 at 2:30 PM, Hui Zhang <wayne.huizh...@gmail.com>
>>> wrote:
>>>      Hello, team
>>>
>>> I had success on running Chapel multi-locale on an infiniband
>>> cluster with the default GASNET settting. Here's my script to
>>> use gasnet with slurm:
>>>
>>> export GASNET_SSH_OPTIONS="-o LogLevel=Error" #disable login
>>> banner into the output
>>> export GASNET_SPAWNFN=C
>>> export GASNET_CSPAWN_CMD='srun -N%N %C'
>>>
>>> .​
>>> /hello6-taskpar-dist -nl 4​     (using _real won't work, any
>>> idea why?)​
>>>
>>>
>>> It works but the output suggests to use ibv-conduit instead of
>>> udp-conduit for better performance, so I ​did:
>>> 1) export CHPL_COMM=gasnet
>>>      export CHPL_LAUNCHER=slurm-gasnetrun_ibv
>>>      export CHPL_COMM_SUBSTRATE=ibv
>>> 2) cd $CHPL_HOME & make
>>> It reports the same error
>>> 
>>>ashttps://sourceforge.net/p/chapel/mailman/chapel-developers/thread/VI1P
>>>R0
>>> 6MB118160
>>> 
>>><http://sourceforge.net/p/chapel/mailman/chapel-developers/thread/VI1PR0
>>>6M
>>> B118160>
>>> 8c2323f3f4c6d95cf0d2...@vi1pr06mb1181.eurprd06.prod.outlook.com/
>>> 
>>><http://8c2323f3f4c6d95cf0d2...@vi1pr06mb1181.eurprd06.prod.outlook.com/
>>>>
>>> and it builds with patch provided by Michael.
>>>
>>> However, when I recompiled hello6, then used the same script
>>> to execute it, the job completed normally but it did not
>>> output anything. If I use -v in the command, it only printed
>>> out:
>>> expect .chpl-expect-12045
>>>
>>> Am I missing something ?
>>> Thanks
>>>
>>> --
>>> Best regards
>>>
>>>
>>> Hui Zhang
>>>
>>>
>>>
>>>
>>> --
>>> Best regards
>>>
>>>
>>> Hui Zhang
>>>
>>>
>>>
>>>
>>> --
>>> Best regards
>>>
>>>
>>> Hui Zhang
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Best regards
>>>
>>>
>>> Hui Zhang
>>>
>>>
>>
>> 
>>-------------------------------------------------------------------------
>>-----
>> _______________________________________________
>> Chapel-developers mailing list
>> Chapel-developers@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/chapel-developers

------------------------------------------------------------------------------
_______________________________________________
Chapel-developers mailing list
Chapel-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to