Hi, 

________________________________________
>>From: Michael Ferguson <[email protected]>
>>Sent: 15 January 2016 18:01
>>To: Panagiotopoulou, Konstantina; [email protected]
>>Subject: Re: [Chapel-developers] Building Chapel on AMD Infiniband cluster 
>>with slurm
>>
>>Hi Konstantina -
>>
>>Have you tried running your program with -v ? sometimes it prints
>>out some diagnostics that point to the problem.

-v prints nothing useful (almost nothing at all)

>>Are you able to run programs with CHPL_COMM_SUBSTRATE=udp and
>>using SSH spawning?
>>
>>Are you able to run programs locally?

Yes I can run them locally and with udp

>>The error message you included makes me think that perhaps something
>>is going wrong with environment variable forwarding. Perhaps there is
>>a way to configure SLURM to do that, or to explicitly forward the
>>GASNET_IBV_SPAWNER variable.
>>
>>
>>It's also possible that something went wrong with the build. I usually
>>need to build the Chapel runtime on a machine with access to InfiniBand,
>>which sometimes means a compute node (and not a head node).
>>
>>I wouldn't expect GASNET_IBV_SPAWNER=ssh to work unless you can
>>SSH to the compute nodes without a password.

I did a clean build and got this:
salloc: Relinquishing job allocation 771
Spawner is set to MPI, but MPI support was not compiled in
usage: gasnetrun -n <n> [options] [--] prog [program args]


So then I set  GASNET_IBV_SPAWNER=ssh and tried with srun. The output is:
--
$ srun -N 2 ./hello -nl 2
salloc: Granted job allocation 775
salloc: Granted job allocation 776
salloc: Pending job allocation 777
salloc: job 777 queued and waiting for resources
salloc: Pending job allocation 778
salloc: job 778 queued and waiting for resources
salloc: Pending job allocation 779
salloc: job 779 queued and waiting for resources
Cleaning up orphaned processes...
*** FATAL ERROR: One or more processes died before setup was completed
WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before 
gasneti_backtrace_init
salloc: Relinquishing job allocation 776
Cleaning up orphaned processes...
--

Probably this has nothing to do with resource unavailability  (Friday night - I 
am the only user on the cluster)
but something to do with ssh-ing (or not ssh-ing anyway)

>>On some systems in the past, I've had better luck configuring the spawner
>>to launch with MPI. (although I think I prefer the SSH one on principle...)
>>
>>Have you tried running the GASNet tests, as the thread you found suggests?

>>Lastly, if InfiniBand is going to work, you should be able to run ibstat
>>and it should say State: Active. It might be worth making sure you can
>>run an InfniBand benchamrk, like ibping.

Cannot run any of the tests. I am not sure why , but ibstat looks ok (active)


I think I need to ask the admin on Monday (might talk him into switching to 
torque)
I 'll let you know when/if it gets solved.
Thanks Michael  :)

--Konstantina
>>-michael

>>On 1/15/16, 12:13 PM, "Panagiotopoulou, Konstantina" <[email protected]>
>>wrote:
>>
>>>Hi MIchael,
>>>
>>>I 've got a version just after October's release but didn't have that
>>>(weird!)
>>>Anyway, I applied the patch and it does build but still not working
>>>properly. I tried the taskParallel.chpl in primers. and I get this:
>>>
>** FATAL ERROR: Requested spawner "(not set)" is unknown or not supported
>in this build
>WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before
>gasneti_backtrace_init
>
>srun: error: gpu01: task 0: Aborted
>srun: error: gpu03: task 2: Aborted
>srun: error: gpu02: task 1: Aborted
>
>I found someone with the same issue (...ages ago)
>https://www.mail-archive.com/[email protected]/msg00095.ht
>ml
>
>I also have GASNET_IBV_SPAWNER=ssh set, but I am not sure it makes any
>difference since I can only ssh to my login node...
>
>--Konstantina
>________________________________________
>From: Michael Ferguson <[email protected]>
>Sent: 15 January 2016 16:21
>To: Panagiotopoulou, Konstantina; [email protected]
>Subject: Re: [Chapel-developers] Building Chapel on AMD Infiniband
>cluster with slurm
>
>Hi Konstantina -
>
>Good to hear from you. I merged a PR fixing some compilation errors in
>August
> https://github.com/chapel-lang/chapel/pull/2299
>
>Are you using a version of Chapel from before that change? Or have you
>found
>other compilation errors? Perhaps you just need to apply that patch...
>
>Cheers,
>
>-michael
>
>On 1/15/16, 10:45 AM, "Panagiotopoulou, Konstantina" <[email protected]>
>wrote:
>
>>Hi team,
>>
>>
>>I am trying to build Chapel on an Infiniband cluster with slurm  but I
>>keep getting these errors:
>>
>>
>>launch-slurm-gasnetrun_ibv.c: In function ŒgenNumLocalesOptions¹:
>>launch-slurm-gasnetrun_ibv.c:121:3: error: enumeration value Œslurmpro¹
>>not handled in switch [-Werror=switch]
>>   switch (sbatch) {
>>   ^
>>launch-slurm-gasnetrun_ibv.c:121:3: error: enumeration value Œnccs¹ not
>>handled in switch [-Werror=switch]
>>launch-slurm-gasnetrun_ibv.c:121:3: error: enumeration value Œuma¹ not
>>handled in switch [-Werror=switch]
>>launch-slurm-gasnetrun_ibv.c:121:3: error: enumeration value Œunknown¹
>>not handled in switch [-Werror=switch]
>>launch-slurm-gasnetrun_ibv.c:111:9: error: unused variable Œqueue¹
>>[-Werror=unused-variable]
>>   char* queue = getenv("CHPL_LAUNCHER_QUEUE");
>>         ^
>>launch-slurm-gasnetrun_ibv.c: In function Œchpl_launch_create_command¹:
>>launch-slurm-gasnetrun_ibv.c:215:23: error: too many arguments for format
>>[-Werror=format-extra-args]
>>   fprintf(expectFile, "--ntasks-per-node=1 ",numLocales);
>>
>>
>>For my experiments I need TASKS=fifo and LOCALE_MODEL=flat.
>>Am I missing something??
>>
>>
>>Thanks,
>>Konstantina
>>
>>
>>
>>
>>
>>Here is my chplenv:
>>
>>
>>$ util/printchplenv
>>CHPL_HOST_PLATFORM: linux64 *
>>CHPL_HOST_COMPILER: gnu
>>CHPL_TARGET_PLATFORM: linux64
>>CHPL_TARGET_COMPILER: gnu *
>>CHPL_TARGET_ARCH: k8 *   (same errors when set to "native")
>>CHPL_LOCALE_MODEL: flat
>>CHPL_COMM: gasnet *
>>  CHPL_COMM_SUBSTRATE: ibv *
>>  CHPL_GASNET_SEGMENT: large
>>CHPL_TASKS: fifo *
>>CHPL_LAUNCHER: slurm-gasnetrun_ibv *
>>CHPL_TIMERS: generic
>>CHPL_MEM: dlmalloc
>>CHPL_MAKE: gmake
>>CHPL_ATOMICS: intrinsics
>>  CHPL_NETWORK_ATOMICS: none
>>CHPL_GMP: gmp
>>CHPL_HWLOC: none
>>CHPL_REGEXP: re2
>>CHPL_WIDE_POINTERS: struct
>>CHPL_LLVM: none
>>CHPL_AUX_FILESYS: none
>>
>>
>>
>>I set TARGET_ARCH to k8 (though it is an opteron) because of this:
>>https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/i386-and-x86-64-Options.html
>>Œk8¹ Œopteron¹ Œathlon64¹ Œathlon-fx¹Processors based on the AMD K8 core
>>with x86-64 instruction set support, including the AMD Opteron, Athlon
>>64, and Athlon 64 FX processors. (This supersets MMX, SSE, SSE2, 3DNow!,
>>enhanced 3DNow!
>> and 64-bit instruction set extensions.)
>>
>>
>>and the cpu spec:
>>$ lscpu
>>Architecture:          x86_64
>>CPU op-mode(s):        32-bit, 64-bit
>>Byte Order:            Little Endian
>>CPU(s):                8
>>On-line CPU(s) list:   0-7
>>Thread(s) per core:    2
>>Core(s) per socket:    4
>>Socket(s):             1
>>NUMA node(s):          2
>>Vendor ID:             AuthenticAMD
>>CPU family:            21
>>Model:                 2
>>Stepping:              0
>>CPU MHz:               1400.000
>>BogoMIPS:              5600.37
>>Virtualization:        AMD-V
>>L1d cache:             16K
>>L1i cache:             64K
>>L2 cache:              2048K
>>L3 cache:              6144K
>>NUMA node0 CPU(s):     0-3
>>NUMA node1 CPU(s):     4-7
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers

Reply via email to