Was this "illegal mem access" with namd12 resolved?

ISSUE* somehow I have a lot of problems with the NAMD-2.12 version. All
CUDA jobs *
>From  * owner-namd-l_at_ks.uiuc.edu
<owner-namd-l_at_ks.uiuc.edu?Subject=Re:%20%20NAMD-2.12%20handful%20of%20issues%20with%20CUDA>
[mailto:owner-namd-l_at_ks.uiuc.edu
<owner-namd-l_at_ks.uiuc.edu?Subject=Re:%20%20NAMD-2.12%20handful%20of%20issues%20with%20CUDA>]
*Im *
*> Auftrag von *Norman Geist *
*> *Gesendet:* Freitag, 10. März 2017 10:16 *
*will:  **1. Immediately fail for SMP single process runs when having more *
  *than 1 thread via ++ppn: *
*FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file *
*src/CudaTileListKernel.cu, function sortTileLists *
*on Pe 4 (gpu5 device 1): an illegal memory access was encountered *
*------------- Processor 4 Exiting: Called CmiAbort ------------ *
*Reason: FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file *
 *src/CudaTileListKernel.cu, function sortTileLists *
*on Pe 4 (gpu5 device 1): an illegal memory access was encountered *
  *This happens for my own compiled versions (CUDA-7.5) as well as for the *
  *precompiled multicore version (CUDA-6.5). *

*From:* Ajasja Ljubetič (*ajasja.ljubetic_at_gmail.com*
<ajasja.ljubetic_at_gmail.com?Subject=Re:%20%20NAMD-2.12%20handful%20of%20issues%20with%20CUDA>
)
*Date:* Fri Mar 10 2017 - 05:14:10 CST
Are you sure your graphics card is OK?  Have you tried any of the available
memory checkers?

*From:* Norman Geist (*norman.geist_at_uni-greifswald.de*
<norman.geist_at_uni-greifswald.de?Subject=Re:%20AW:%20%20NAMD-2.12%20handful%20of%20issues%20with%20CUDA>
)
*Date:* Fri Mar 10 2017 - 05:41:10 CST
Yes, since it works with gromacs, cp2k and namd versions < 2.12. Maybe I
should also mention that I’m using amber FF and files.

..............................
Actually, as far as I understand, "illegal mem access" is a software not
hardware problem.
What could I do? Perhaps running something else than NAMD, may be a game
involving the GPUs?

Thanks for advice
francesco


---------- Forwarded message ---------
From: Francesco Pietra <chiendar...@gmail.com>
Date: Mon, Jan 17, 2022 at 3:50 PM
Subject: Re: namd-l: Fwd: nvidia issue with namd12 Debian 11
To: Vermaas, Josh <verma...@msu.edu>
Cc: nam...@ks.uiuc.edu <nam...@ks.uiuc.edu>, debian-users <
debian-user@lists.debian.org>


Hi Josh, no big system:
Info) Analyzing structure ...
Info)    Atoms: 107292
Info)    Bonds: 77829
Info)    Angles: 61441  Dihedrals: 46455  Impropers: 1604  Cross-terms: 158
Info)    Bondtypes: 0  Angletypes: 0  Dihedraltypes: 0  Impropertypes: 0
Info)    Residues: 31152
Info)    Waters: 30102
Info)    Segments: 128
Info)    Fragments: 30587   Protein: 9   Nucleic: 25

Following your hint, I tried MD with a very small system:

Info) Analyzing structure ...
Info)    Atoms: 1448
Info)    Bonds: 1187
Info)    Angles: 1618  Dihedrals: 699  Impropers: 0  Cross-terms: 0
Info)    Bondtypes: 0  Angletypes: 0  Dihedraltypes: 0  Impropertypes: 0
Info)    Residues: 261
Info)    Waters: 0
Info)    Segments: 33
Info)    Fragments: 261   Protein: 0   Nucleic: 0

Exactly the same error messages that I reported for the bigger system. So,
it is not a problem of insufficient mem on the GTX.
My very feeble guess is that there is a mismatch between the linux kernel
and the nvidia driver, but they were selected by the Debian code and other
people should have met the issue. I am not sure that Debian 11 could work
correctly with a downgraded couple of linux kernel/nvidia driver. Perhaps
it could easier to downgrade to Debian 10, which worked correctly on my
raid1 box.

thanks
francesco

Incidentally, I said namd12, while it is 14.

On Mon, Jan 17, 2022 at 1:24 PM Vermaas, Josh <verma...@msu.edu> wrote:

> How big is your system? The error being tossed back is that you are out of
> memory. The GTX 680 only has 2GB of memory, and so depending on your system
> size you may run yourself out of memory.
>
>
>
> -Josh
>
>
>
> *From: *<owner-nam...@ks.uiuc.edu> on behalf of Francesco Pietra <
> chiendar...@gmail.com>
> *Reply-To: *"nam...@ks.uiuc.edu" <nam...@ks.uiuc.edu>, Francesco Pietra <
> chiendar...@gmail.com>
> *Date: *Monday, January 17, 2022 at 4:40 AM
> *To: *NAMD <nam...@ks.uiuc.edu>, debian-users <
> debian-user@lists.debian.org>
> *Subject: *namd-l: Fwd: nvidia issue with namd12 Debian 11
>
>
>
> I forgot to add that commands 'nvidia-detect' and 'nvidia-smi' detect both
> GTX 680 as activated and tells that they are supported by all driver
> versions, including those for Tesla 450.
>
> Actually, legacy nvidia drivers are only required for very old nvidia
> graphic cards, from 400 downwards.
>
>
>
> I alsoo add that the box is at CUDA 11.2
>
>
>
> ---------- Forwarded message ---------
> From: *Francesco Pietra* <chiendar...@gmail.com>
> Date: Mon, Jan 17, 2022 at 4:15 AM
> Subject: nvidia issue with namd12 Debian 11
> To: NAMD <nam...@ks.uiuc.edu>, debian-users <debian-user@lists.debian.org>
>
>
>
> With a Debian 11 box with two GTX 680 I am unable to get them working. The
> problem occurred with upgrading from debian 10 to 11 and, from namd 11 to
> 12 (/NAMD_Git-2021-11-27_Linux-x86_64-multicore-CUDA)
>
>
>
> nvidia-driver 460.91.03-1
>
> linux-image-amd64 5.10.84-1
>
> linux kernel 5.10.0-10-amd64
>
>
>
> Error when trying a minimization:
>
>
>
> TCL: Minimizing for 3000 steps
> FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
> src/CudaTileListKernel.cu, function sortTileLists, line 1577
>  on Pe 2 (gig64 device 0 pci 0:2:0): an illegal memory access was
> encountered
> FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
> src/CudaTileListKernel.cu, function sortTileLists, line 1577
>  on Pe 2 (gig64 device 0 pci 0:2:0): an illegal memory access was
> encountered
> [Partition 0][Node 0] End of program
> FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
> src/CudaTileListKernel.cu, function sortTileLists, line 1577
>  on Pe 4 (gig64 device 1 pci 0:3:0): an illegal memory access was
> encountered
> FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
> src/CudaTileListKernel.cu, function sortTileLists, line 1577
>  on Pe 4 (gig64 device 1 pci 0:3:0): an illegal memory access was
> encountered
>
>
>
> I have also reconfigured the xserver, at no avail.
>
>
>
> I have noticed issues about namd12/nvidia on the web, apparently
> unresolved.
>
>
>
> Thanks for advice
>
> francesco pietra
>
>
>
>
>

Reply via email to