On Fri, Mar 15, 2019 at 5:02 PM Tafelmeier, Stefanie < stefanie.tafelme...@zae-bayern.de> wrote:
> Hi, > > about the tests: > - ntmpi 1 -ntomp 22 -pin on; doesn't work* > OK, so this suggests that your previously successful 22-thread runs did not turn on pinning, I assume? Can you please try: -ntmpi 1 -ntomp 1 -pin on -ntmpi 1 -ntomp 2 -pin on that is to check does pinning work at all? Also, please try one/both of the above (assuming they fail with) same binary, but CPU-only run, i.e. -ntmpi 1 -ntomp 1 -pin on -nb cpu > - ntmpi 1 -ntomp 22 -pin off; runs > - ntmpi 1 -ntomp 23 -pin off; runs > - ntmpi 1 -ntomp 23 -pinstride 1 -pin on; doesn't work* > - ntmpi 1 -ntomp 23 -pinstride 2 -pin on; runs > - ntmpi 23 -ntomp 1 -pinstride 1 -pin on; doesn't work** > - ntmpi 23 -ntomp 1 -pinstride 2 -pin on; doesn't work** > Just to confirm, can you please run the **'s with either -ntmpi 24 (to avoid the DD error). > > *Error as known. > > **The number of ranks you selected (23) contains a large prime factor 23. > In > most cases this will lead to bad performance. Choose a number with smaller > prime factors or set the decomposition (option -dd) manually. > > The log file is at: > https://it-service.zae-bayern.de/Team/index.php/s/fypKB9iZJz8yXq8 > Will have a look and get back with more later. > > Many thanks again, > Steffi > > -----Ursprüngliche Nachricht----- > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von Szilárd > Páll > Gesendet: Freitag, 15. März 2019 16:27 > An: Discussion list for GROMACS users > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > Hi, > > Please share log files with an external service attachments are not > accepted on the list. > > Also, when checking the error with the patch supplied, please run the > following cases -- no long runs are needed just want to know which of these > runs and which of these doesn't: > - ntmpi 1 -ntomp 22 -pin on > - ntmpi 1 -ntomp 22 -pin off > - ntmpi 1 -ntomp 23 -pin off > - ntmpi 1 -ntomp 23 -pinstride 1 -pin on > - ntmpi 1 -ntomp 23 -pinstride 2 -pin on > - ntmpi 23 -ntomp 1 -pinstride 1 -pin on > - ntmpi 23 -ntomp 1 -pinstride 2 -pin on > > Thanks, > -- > Szilárd > > > On Fri, Mar 15, 2019 at 4:04 PM Tafelmeier, Stefanie < > stefanie.tafelme...@zae-bayern.de> wrote: > > > Hi Szilárd, > > > > thanks for the quick reply. > > About the first suggestion, I'll try and give feedback soon. > > > > Regarding the second, I attached the log-file for the case of > > mdrun -v -nt 25 > > Which ends in the known error message. > > > > Again, thanks a lot for your information and help. > > > > Best wishes, > > Steffi > > > > > > > > -----Ursprüngliche Nachricht----- > > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von > Szilárd > > Páll > > Gesendet: Freitag, 15. März 2019 15:30 > > An: Discussion list for GROMACS users > > Betreff: Re: [gmx-users] WG: WG: Issue with CUDA and gromacs > > > > Hi Stefanie, > > > > Unless and until the error and performance-related concerns prove to be > > related, let's keep those separate. > > > > I'd first focus on the former. To be honest, I've never encountered such > an > > issue where if you use more than a certain number of threads, the run > > aborts with that error. To investigate further can you please apply the > > following patch file which hopefully give more context to the error: > > https://termbin.com/uhgp > > (e.g. you can execute the following to accomplish that: > > curl https://termbin.com/uhgp > devicebuffer.cuh.patch && patch -p0 < > > devicebuffer.cuh.patch) > > > > Regarding the performance-related questions, can you please share a full > > log file of the runs so we can see the machine config, simulation > > system/settings, etc. Without that it is hard to judge what's best for > your > > case. However, if you only have a single GPU (which seems to be the case > > based on the log excerpts) along those two rather beefy CPUs, than you > will > > likely not get much benefit from using all cores and it is normal that > you > > see little to no improvement from using cores of a second CPU socket. > > > > Cheers, > > -- > > Szilárd > > > > > > On Thu, Mar 14, 2019 at 12:47 PM Tafelmeier, Stefanie < > > stefanie.tafelme...@zae-bayern.de> wrote: > > > > > Dear all, > > > > > > I was not sure if the email before reached you, but again many thanks > for > > > your reply Szilárd. > > > > > > As written below we are still facing a problem with the performance of > > > your workstation. > > > I wrote before because of the error message when keeping occurring for > > > mdrun simulation: > > > > > > Assertion failed: > > > Condition: stat == cudaSuccess > > > Asynchronous H2D copy failed > > > > > > As I mentioned all Versions to install (Gormacs, Cuda, nvcc, gcc) are > the > > > newest once now. > > > > > > If I run mdrun without further settings it will lead to this error > > > message. If I run it and choose the thread amount directly the mdrun is > > > performing well. But only for –nt numbers between 1 – 22. Higher ones > > again > > > lead to the before mentioned error message. > > > > > > In order to investigate in more detail, I tried different versions for > > > –nt, –ntmpi – ntomp also combined with –npme: > > > - The best performance in the sense of ns/day is with –nt 22 > > > respectively –ntomp 22 alone. But then only 22 threads are involved. > > Which > > > is fine if I run more than one mdrun simultaneously, as I can > distribute > > > the other 66 threads. The GPU usage is then around 65%. > > > - A similar good performance is reached with mdrun -ntmpi 4 > -ntomp > > > 18 -npme 1 -pme gpu -nb gpu. But then 44 threads are involved. The GPU > > > usage is then around 50%. > > > > > > I read the information on > > > > > > http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html > > > which was very helpful, bur some things are still not clear now to me: > > > I was wondering if there is any other enhancement of the performance? > Or > > > what is the reason, that –nt maximum is at 22 threads? Could this be > > > connected to the sockets (see details below) of your workstation? > > > It is not clear to me how a number of thread (-nt) higher 22 can lead > to > > > the error regarding the Asynchronous H2D copy) > > > > > > Please excuse all these questions. I would appreciate a lot if you > might > > > have a hint for this problem as well. > > > > > > Best regards, > > > Steffi > > > > > > ----- > > > > > > The workstation details are: > > > Running on 1 node with total 44 cores, 88 logical cores, 1 compatible > GPU > > > Hardware detected: > > > > > > CPU info: > > > Vendor: Intel > > > Brand: Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz > > > Family: 6 Model: 85 Stepping: 4 > > > Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl > clfsh > > > cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid > > pclmuldq > > > pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 > > tdt > > > x2apic > > > > > > Number of AVX-512 FMA units: 2 > > > Hardware topology: Basic > > > Sockets, cores, and logical processors: > > > Socket 0: [ 0 44] [ 1 45] [ 2 46] [ 3 47] [ 4 > 48] [ > > > 5 49] [ 6 50] [ 7 51] [ 8 52] [ 9 53] [ 10 54] [ 11 > 55] > > > [ 12 56] [ 13 57] [ 14 58] [ 15 59] [ 16 60] [ 17 61] [ 18 > > > 62] [ 19 63] [ 20 64] [ 21 65] > > > Socket 1: [ 22 66] [ 23 67] [ 24 68] [ 25 69] [ 26 > 70] [ > > > 27 71] [ 28 72] [ 29 73] [ 30 74] [ 31 75] [ 32 76] [ 33 > 77] > > > [ 34 78] [ 35 79] [ 36 80] [ 37 81] [ 38 82] [ 39 83] [ 40 > > > 84] [ 41 85] [ 42 86] [ 43 87] > > > GPU info: > > > Number of GPUs detected: 1 > > > #0: NVIDIA Quadro P6000, compute cap.: 6.1, ECC: no, stat: > > compatible > > > > > > ----- > > > > > > > > > > > > -----Ursprüngliche Nachricht----- > > > Von: gromacs.org_gmx-users-boun...@maillist.sys.kth.se [mailto: > > > gromacs.org_gmx-users-boun...@maillist.sys.kth.se] Im Auftrag von > > Szilárd > > > Páll > > > Gesendet: Donnerstag, 31. Januar 2019 17:15 > > > An: Discussion list for GROMACS users > > > Betreff: Re: [gmx-users] WG: Issue with CUDA and gromacs > > > > > > On Thu, Jan 31, 2019 at 2:14 PM Szilárd Páll <pall.szil...@gmail.com> > > > wrote: > > > > > > > > On Wed, Jan 30, 2019 at 5:15 PM Tafelmeier, Stefanie > > > > <stefanie.tafelme...@zae-bayern.de> wrote: > > > > > > > > > > Dear all, > > > > > > > > > > We are facing an issue with the CUDA toolkit. > > > > > We tried several combinations of gromacs versions and CUDA > Toolkits. > > > No Toolkit older than 9.2 was possible to try as there are no driver > for > > > nvidia available for a Quadro P6000. > > > > > Gromacs > > > > > > > > Install the latest 410.xx drivers and it will work; the NVIDIA driver > > > > download website (https://www.nvidia.com/Download/index.aspx) > > > > recommends 410.93. > > > > > > > > Here's a system with CUDA 10-compatible driver running o a system > with > > > > a P6000: https://termbin.com/ofzo > > > > > > Sorry, I misread that as "CUDA >=9.2 was not possible". > > > > > > Note that the driver is backward compatible, so you can use a new > > > driver with older CUDA versions. > > > > > > Also note that the oldest driver NVIDIA claims to have P6000 support > > > is 390.59 which is, as far as I know, one gen older than the 396 that > > > the CUDA 9.2 toolkit came with. This is however, not something I'd > > > recommend pursuing, use a new driver from the official site with any > > > CUDA version that GROMACS supports and it should be fine. > > > > > > > > > > > > CUDA > > > > > > > > > > Error message > > > > > > > > > > 2019 > > > > > > > > > > 10.0 > > > > > > > > > > gmx mdrun: > > > > > Assertion failed: > > > > > Condition: stat == cudaSuccess > > > > > Asynchronous H2D copy failed > > > > > > > > > > 2019 > > > > > > > > > > 9.2 > > > > > > > > > > gmx mdrun: > > > > > Assertion failed: > > > > > Condition: stat == cudaSuccess > > > > > Asynchronous H2D copy failed > > > > > > > > > > 2018.5 > > > > > > > > > > 9.2 > > > > > > > > > > gmx mdrun: Fatal error: > > > > > HtoD cudaMemcpyAsync failed: invalid argument > > > > > > > > Can we get some more details on these, please? complete log files > > > > would be a good start. > > > > > > > > > 5.1.5 > > > > > > > > > > 9.2 > > > > > > > > > > Installation make: nvcc fatal : Unsupported gpu architecture > > > 'compute_20'* > > > > > > > > > > 2016.2 > > > > > > > > > > 9.2 > > > > > > > > > > Installation make: nvcc fatal : Unsupported gpu architecture > > > 'compute_20'* > > > > > > > > > > > > > > > *We also tried to set the target CUDA architectures as described in > > > the installation guide ( > > > manual.gromacs.org/documentation/2019/install-guide/index.html). > > > Unfortunately it didn't work. > > > > > > > > What does it mean that it didn't work? Can you share the command you > > > > used and what exactly did not work? > > > > > > > > For the P6000 which is a "compute capability 6.1" device (for anyone > > > > who needs to look it up, go here: > > > > https://developer.nvidia.com/cuda-gpus), you should set > > > > cmake ../ -DGMX_CUDA_TARGET_SM="61" > > > > > > > > -- > > > > Szilárd > > > > > > > > > Performing simulations on CPU only always works, yet of cause are > > more > > > slowly than they could be with additionally using the GPU. > > > > > The issue #2761 (https://redmine.gromacs.org/issues/2762) seems > > > similar to our problem. > > > > > Even though this issue is still open, we wanted to ask if you can > > give > > > us any information about how to solve this problem? > > > > > > > > > > Many thanks in advance. > > > > > Best regards, > > > > > Stefanie Tafelmeier > > > > > > > > > > > > > > > Further details if necessary: > > > > > The workstation: > > > > > 2 x Xeon Gold 6152 @ 3,7Ghz (22 K, 44Th, AVX512) > > > > > Nvidia Quadro P6000 with 3840 Cuda-Cores > > > > > > > > > > The simulations system: > > > > > Long chain alkanes (previously used with gromacs 5.1.5 and CUDA > 7.5 - > > > worked perfectly) > > > > > > > > > > > > > > > > > > > > > > > > > ZAE Bayern > > > > > Stefanie Tafelmeier > > > > > Bereich Energiespeicherung/Division Energy Storage > > > > > Thermische Energiespeicher/Thermal Energy Storage > > > > > Walther-Meißner-Str. 6 > > > > > 85748 Garching > > > > > > > > > > Tel.: +49 89 329442-75 > > > > > Fax: +49 89 329442-12 > > > > > stefanie.tafelme...@zae-bayern.de<mailto: > > > stefanie.tafelme...@zae-bayern.de> > > > > > http://www.zae-bayern.de<http://www.zae-bayern.de/> > > > > > > > > > > > > > > > ZAE Bayern - Bayerisches Zentrum für Angewandte Energieforschung e. > > V. > > > > > Vorstand/Board: > > > > > Prof. Dr. Hartmut Spliethoff (Vorsitzender/Chairman), > > > > > Prof. Dr. Vladimir Dyakonov > > > > > Sitz/Registered Office: Würzburg > > > > > Registergericht/Register Court: Amtsgericht Würzburg > > > > > Registernummer/Register Number: VR 1386 > > > > > > > > > > Sämtliche Willenserklärungen, z. B. Angebote, Aufträge, Anträge und > > > Verträge, sind für das ZAE Bayern nur in schriftlicher und > ordnungsgemäß > > > unterschriebener Form rechtsverbindlich. Diese E-Mail ist > ausschließlich > > > zur Nutzung durch den/die vorgenannten Empfänger bestimmt. Jegliche > > > unbefugte Offenbarung, Nutzung oder Verbreitung, sei es insgesamt oder > > > teilweise, ist untersagt. Sollten Sie diese E-Mail irrtümlich erhalten > > > haben, benachrichtigen Sie bitte unverzüglich den Absender und löschen > > Sie > > > diese E-Mail. > > > > > > > > > > Any declarations of intent, such as quotations, orders, > applications > > > and contracts, are legally binding for ZAE Bayern only if expressed in > a > > > written and duly signed form. This e-mail is intended solely for use by > > the > > > recipient(s) named above. Any unauthorised disclosure, use or > > > dissemination, whether in whole or in part, is prohibited. If you have > > > received this e-mail in error, please notify the sender immediately and > > > delete this e-mail. > > > > > > > > > > > > > > > -- > > > > > Gromacs Users mailing list > > > > > > > > > > * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > > posting! > > > > > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > > > > > * For (un)subscribe requests visit > > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users > > or > > > send a mail to gmx-users-requ...@gromacs.org. > > > -- > > > Gromacs Users mailing list > > > > > > * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > > posting! > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > * For (un)subscribe requests visit > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > > send a mail to gmx-users-requ...@gromacs.org. > > > -- > > > Gromacs Users mailing list > > > > > > * Please search the archive at > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > > posting! > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > > > * For (un)subscribe requests visit > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > > send a mail to gmx-users-requ...@gromacs.org. > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.