Hi, ________________________________________ >>From: Michael Ferguson <[email protected]> >>Sent: 15 January 2016 18:01 >>To: Panagiotopoulou, Konstantina; [email protected] >>Subject: Re: [Chapel-developers] Building Chapel on AMD Infiniband cluster >>with slurm >> >>Hi Konstantina - >> >>Have you tried running your program with -v ? sometimes it prints >>out some diagnostics that point to the problem.
-v prints nothing useful (almost nothing at all) >>Are you able to run programs with CHPL_COMM_SUBSTRATE=udp and >>using SSH spawning? >> >>Are you able to run programs locally? Yes I can run them locally and with udp >>The error message you included makes me think that perhaps something >>is going wrong with environment variable forwarding. Perhaps there is >>a way to configure SLURM to do that, or to explicitly forward the >>GASNET_IBV_SPAWNER variable. >> >> >>It's also possible that something went wrong with the build. I usually >>need to build the Chapel runtime on a machine with access to InfiniBand, >>which sometimes means a compute node (and not a head node). >> >>I wouldn't expect GASNET_IBV_SPAWNER=ssh to work unless you can >>SSH to the compute nodes without a password. I did a clean build and got this: salloc: Relinquishing job allocation 771 Spawner is set to MPI, but MPI support was not compiled in usage: gasnetrun -n <n> [options] [--] prog [program args] So then I set GASNET_IBV_SPAWNER=ssh and tried with srun. The output is: -- $ srun -N 2 ./hello -nl 2 salloc: Granted job allocation 775 salloc: Granted job allocation 776 salloc: Pending job allocation 777 salloc: job 777 queued and waiting for resources salloc: Pending job allocation 778 salloc: job 778 queued and waiting for resources salloc: Pending job allocation 779 salloc: job 779 queued and waiting for resources Cleaning up orphaned processes... *** FATAL ERROR: One or more processes died before setup was completed WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before gasneti_backtrace_init salloc: Relinquishing job allocation 776 Cleaning up orphaned processes... -- Probably this has nothing to do with resource unavailability (Friday night - I am the only user on the cluster) but something to do with ssh-ing (or not ssh-ing anyway) >>On some systems in the past, I've had better luck configuring the spawner >>to launch with MPI. (although I think I prefer the SSH one on principle...) >> >>Have you tried running the GASNet tests, as the thread you found suggests? >>Lastly, if InfiniBand is going to work, you should be able to run ibstat >>and it should say State: Active. It might be worth making sure you can >>run an InfniBand benchamrk, like ibping. Cannot run any of the tests. I am not sure why , but ibstat looks ok (active) I think I need to ask the admin on Monday (might talk him into switching to torque) I 'll let you know when/if it gets solved. Thanks Michael :) --Konstantina >>-michael >>On 1/15/16, 12:13 PM, "Panagiotopoulou, Konstantina" <[email protected]> >>wrote: >> >>>Hi MIchael, >>> >>>I 've got a version just after October's release but didn't have that >>>(weird!) >>>Anyway, I applied the patch and it does build but still not working >>>properly. I tried the taskParallel.chpl in primers. and I get this: >>> >** FATAL ERROR: Requested spawner "(not set)" is unknown or not supported >in this build >WARNING: Ignoring call to gasneti_print_backtrace_ifenabled before >gasneti_backtrace_init > >srun: error: gpu01: task 0: Aborted >srun: error: gpu03: task 2: Aborted >srun: error: gpu02: task 1: Aborted > >I found someone with the same issue (...ages ago) >https://www.mail-archive.com/[email protected]/msg00095.ht >ml > >I also have GASNET_IBV_SPAWNER=ssh set, but I am not sure it makes any >difference since I can only ssh to my login node... > >--Konstantina >________________________________________ >From: Michael Ferguson <[email protected]> >Sent: 15 January 2016 16:21 >To: Panagiotopoulou, Konstantina; [email protected] >Subject: Re: [Chapel-developers] Building Chapel on AMD Infiniband >cluster with slurm > >Hi Konstantina - > >Good to hear from you. I merged a PR fixing some compilation errors in >August > https://github.com/chapel-lang/chapel/pull/2299 > >Are you using a version of Chapel from before that change? Or have you >found >other compilation errors? Perhaps you just need to apply that patch... > >Cheers, > >-michael > >On 1/15/16, 10:45 AM, "Panagiotopoulou, Konstantina" <[email protected]> >wrote: > >>Hi team, >> >> >>I am trying to build Chapel on an Infiniband cluster with slurm but I >>keep getting these errors: >> >> >>launch-slurm-gasnetrun_ibv.c: In function ŒgenNumLocalesOptions¹: >>launch-slurm-gasnetrun_ibv.c:121:3: error: enumeration value Œslurmpro¹ >>not handled in switch [-Werror=switch] >> switch (sbatch) { >> ^ >>launch-slurm-gasnetrun_ibv.c:121:3: error: enumeration value Œnccs¹ not >>handled in switch [-Werror=switch] >>launch-slurm-gasnetrun_ibv.c:121:3: error: enumeration value Œuma¹ not >>handled in switch [-Werror=switch] >>launch-slurm-gasnetrun_ibv.c:121:3: error: enumeration value Œunknown¹ >>not handled in switch [-Werror=switch] >>launch-slurm-gasnetrun_ibv.c:111:9: error: unused variable Œqueue¹ >>[-Werror=unused-variable] >> char* queue = getenv("CHPL_LAUNCHER_QUEUE"); >> ^ >>launch-slurm-gasnetrun_ibv.c: In function Œchpl_launch_create_command¹: >>launch-slurm-gasnetrun_ibv.c:215:23: error: too many arguments for format >>[-Werror=format-extra-args] >> fprintf(expectFile, "--ntasks-per-node=1 ",numLocales); >> >> >>For my experiments I need TASKS=fifo and LOCALE_MODEL=flat. >>Am I missing something?? >> >> >>Thanks, >>Konstantina >> >> >> >> >> >>Here is my chplenv: >> >> >>$ util/printchplenv >>CHPL_HOST_PLATFORM: linux64 * >>CHPL_HOST_COMPILER: gnu >>CHPL_TARGET_PLATFORM: linux64 >>CHPL_TARGET_COMPILER: gnu * >>CHPL_TARGET_ARCH: k8 * (same errors when set to "native") >>CHPL_LOCALE_MODEL: flat >>CHPL_COMM: gasnet * >> CHPL_COMM_SUBSTRATE: ibv * >> CHPL_GASNET_SEGMENT: large >>CHPL_TASKS: fifo * >>CHPL_LAUNCHER: slurm-gasnetrun_ibv * >>CHPL_TIMERS: generic >>CHPL_MEM: dlmalloc >>CHPL_MAKE: gmake >>CHPL_ATOMICS: intrinsics >> CHPL_NETWORK_ATOMICS: none >>CHPL_GMP: gmp >>CHPL_HWLOC: none >>CHPL_REGEXP: re2 >>CHPL_WIDE_POINTERS: struct >>CHPL_LLVM: none >>CHPL_AUX_FILESYS: none >> >> >> >>I set TARGET_ARCH to k8 (though it is an opteron) because of this: >>https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/i386-and-x86-64-Options.html >>Œk8¹ Œopteron¹ Œathlon64¹ Œathlon-fx¹Processors based on the AMD K8 core >>with x86-64 instruction set support, including the AMD Opteron, Athlon >>64, and Athlon 64 FX processors. (This supersets MMX, SSE, SSE2, 3DNow!, >>enhanced 3DNow! >> and 64-bit instruction set extensions.) >> >> >>and the cpu spec: >>$ lscpu >>Architecture: x86_64 >>CPU op-mode(s): 32-bit, 64-bit >>Byte Order: Little Endian >>CPU(s): 8 >>On-line CPU(s) list: 0-7 >>Thread(s) per core: 2 >>Core(s) per socket: 4 >>Socket(s): 1 >>NUMA node(s): 2 >>Vendor ID: AuthenticAMD >>CPU family: 21 >>Model: 2 >>Stepping: 0 >>CPU MHz: 1400.000 >>BogoMIPS: 5600.37 >>Virtualization: AMD-V >>L1d cache: 16K >>L1i cache: 64K >>L2 cache: 2048K >>L3 cache: 6144K >>NUMA node0 CPU(s): 0-3 >>NUMA node1 CPU(s): 4-7 >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 _______________________________________________ Chapel-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-developers
