Hi, This is an issue I noticed recently, but I thought it was only affecting some use-cases (or some runtimes). However, it seems it's a broader problem. It is under investigation, but for now it seems that eliminate it (or strongly diminish its effects) by turning off GPU-side task timing. You can do that by setting the GMX_DISABLE_GPU_TIMING environment variable.
Note that this is workaround that may turn out to not be a complete solution, please report back if you've done longer runs. Regarding the thread count, the MPI and CUDA runtimes can spawn threads, GROMACS certainly used 3x 4 threads in your case. Note that you will likely get better performance by using 6 ranks x 2 threads (both because this avoids ranks spanning across sockets and it allows GPU task/transfer overlap). Cheers, -- Szilárd On Tue, Mar 27, 2018 at 4:09 PM, Albert Mao <albert....@gmail.com> wrote: > Hello! > > I'm trying to run molecular dynamics on a fairly large system > containing approximately 250000 atoms. The simulation runs well for > about 100000 steps and then gets killed by the queueing engine due to > exceeding the swap space usage limit. The compute node I'm using has > 12 cores in two sockets, three GPUs, and 8 GB of memory. I'm using > GROMACS 2018 and allowing mdrun to delegate the workload > automatically, resulting in three thread-MPI ranks each with one GPU > and four OpenMP threads. The queueing engine reports the following > usage: > > TERM_SWAP: job killed after reaching LSF swap usage limit. > Exited with exit code 131. > Resource usage summary: > CPU time : 50123.00 sec. > Max Memory : 4671 MB > Max Swap : 30020 MB > Max Processes : 5 > Max Threads : 35 > > Even though it's a large system, by my rough estimate, the simulation > should not need much more than 0.5 gigabytes of memory; 4.6 GB seems > like too much and 30 GB is completely ridiculous. > Indeed, running the system on a similar node without GPUs is working > well (but slowly), consuming about 0.65 GB and 2 GB of swap. > > I also don't understand why 35 threads got created. > > Could there be a memory leak somewhere in the OpenCL code? Any > suggestions on preventing this memory usage expansion would be greatly > appreciated. > > I've included relevant output from mdrun with system and configuration > information at the end of this message. I'm using OpenCL despite > having Nvidia GPUs because of a sad problem where building with CUDA > support fails due to the C compiler being "too new". > > Thanks! > -Albert Mao > > GROMACS: gmx mdrun, version 2018 > Executable: /data/albertmaolab/software/gromacs/bin/gmx > Data prefix: /data/albertmaolab/software/gromacs > Command line: > > gmx mdrun -v -pforce 10000 -s blah.tpr -deffnm blah -cpi blah.cpt > > GROMACS version: 2018 > Precision: single > Memory model: 64 bit > MPI library: thread_mpi > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) > GPU support: OpenCL > SIMD instructions: SSE4.1 > FFT library: fftw-3.2.1 > RDTSCP usage: disabled > TNG support: enabled > Hwloc support: hwloc-1.5.0 > Tracing support: disabled > Built on: 2018-02-22 07:25:43 > Built by: ah...@eris1pm01.research.partners.org [CMAKE] > Build OS/arch: Linux 2.6.32-431.29.2.el6.x86_64 x86_64 > Build CPU vendor: Intel > Build CPU brand: Common KVM processor > Build CPU family: 15 Model: 6 Stepping: 1 > Build CPU features: aes apic clfsh cmov cx8 cx16 intel lahf mmx msr > nonstop_tsc pcid pclmuldq pdpe1gb popcnt pse sse2 sse3 sse4.1 sse4.2 > ssse3 > C compiler: /data/albertmaolab/software/gcc/bin/gcc GNU 7.3.0 > C compiler flags: -msse4.1 -O3 -DNDEBUG -funroll-all-loops > -fexcess-precision=fast > C++ compiler: /data/albertmaolab/software/gcc/bin/g++ GNU 7.3.0 > C++ compiler flags: -msse4.1 -std=c++11 -O3 -DNDEBUG > -funroll-all-loops -fexcess-precision=fast > OpenCL include dir: /apps/lib-osver/cuda/8.0.61/include > OpenCL library: /apps/lib-osver/cuda/8.0.61/lib64/libOpenCL.so > OpenCL version: 1.2 > > Running on 1 node with total 12 cores, 12 logical cores, 3 compatible GPUs > Hardware detected: > CPU info: > Vendor: Intel > Brand: Intel(R) Xeon(R) CPU X5675 @ 3.07GHz > Family: 6 Model: 44 Stepping: 2 > Features: aes apic clfsh cmov cx8 cx16 htt intel lahf mmx msr > nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 > sse4.1 sse4.2 ssse3 > Hardware topology: Full, with devices > Sockets, cores, and logical processors: > Socket 0: [ 0] [ 2] [ 4] [ 6] [ 8] [ 10] > Socket 1: [ 1] [ 3] [ 5] [ 7] [ 9] [ 11] > Numa nodes: > Node 0 (25759080448 bytes mem): 0 2 4 6 8 10 > Node 1 (25769799680 bytes mem): 1 3 5 7 9 11 > Latency: > 0 1 > 0 1.00 2.00 > 1 2.00 1.00 > Caches: > L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 1 ways > L2: 262144 bytes, linesize 64 bytes, assoc. 8, shared 1 ways > L3: 12582912 bytes, linesize 64 bytes, assoc. 16, shared 6 ways > PCI devices: > 0000:04:00.0 Id: 8086:10c9 Class: 0x0200 Numa: -1 > 0000:04:00.1 Id: 8086:10c9 Class: 0x0200 Numa: -1 > 0000:05:00.0 Id: 15b3:6746 Class: 0x0280 Numa: -1 > 0000:06:00.0 Id: 10de:06d2 Class: 0x0302 Numa: -1 > 0000:01:03.0 Id: 1002:515e Class: 0x0300 Numa: -1 > 0000:00:1f.2 Id: 8086:3a20 Class: 0x0101 Numa: -1 > 0000:00:1f.5 Id: 8086:3a26 Class: 0x0101 Numa: -1 > 0000:14:00.0 Id: 10de:06d2 Class: 0x0302 Numa: -1 > 0000:11:00.0 Id: 10de:06d2 Class: 0x0302 Numa: -1 > GPU info: > Number of GPUs detected: 3 > #0: name: Tesla M2070, vendor: NVIDIA Corporation, device version: > OpenCL 1.1 CUDA, stat: compatible > #1: name: Tesla M2070, vendor: NVIDIA Corporation, device version: > OpenCL 1.1 CUDA, stat: compatible > #2: name: Tesla M2070, vendor: NVIDIA Corporation, device version: > OpenCL 1.1 CUDA, stat: compatible > > (later) > > Using 3 MPI threads > Using 4 OpenMP threads per tMPI thread > On host gpu004.research.partners.org 3 GPUs auto-selected for this run. > Mapping of GPU IDs to the 3 GPU tasks in the 3 ranks on this node: > PP:0,PP:1,PP:2 > Pinning threads with an auto-selected logical core stride of 1 > System total charge: 0.000 > Will do PME sum in reciprocal space for electrostatic interactions. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.