glad to hear you fixed it. I guess that means you can use the normal "non-development" driver to run NAMD? That is good news.
On Thu, Sep 27, 2012 at 1:39 PM, Francesco Pietra <chiendar...@gmail.com>wrote: > SOLVED. Although not told by the the tests"dpkg -l |grep nvia| amd > "modinfo nvidia", there was a mismatch between the runtime and the > driver. When this became clear, on a new "apt-get upgrade" a mixture > of versions 302 and 304 was installed, creating a mess. I had to > correct manually by installing the specific version 304 for all > packages with "apt-get install=version". > > In face of so much time lost for trivial problems - and posted on NAMD > for NAMD non existing problems (I apologize for that)- I think now > that it would be better (at least for people using the OS for > scientific purposes) to install the driver the "nvidia way" rather > than the "Debian way". In order to have something fixed when upgrading > Debian. As I am presently short of time, I decided for no more > upgrading Debian until I have free time enough to change the "way". > > Thanks > francesco Pietra > > On Thu, Sep 27, 2012 at 3:54 PM, Francesco Pietra <chiendar...@gmail.com> > wrote: > > There are for me two ways of getting cuda at work: (a) install the > > driver according to nvidia (as probably is implied in what you > > suggested); (b) rely on Debian amd64, which furnishes precompiled > > nvidia driver. I adopted (b) because upgrading is automatic and Debian > > is notoriously highly reliable. > > > > I did not take notice of the cuda driver I had just before the "fatal" > > upgrading, but it were months that I did not upgrade. The version noed > > on my amd64 notebook is 295.53; probably I upgraded from that version. > > > > Now, on amd64,version 304.48.1 is available, while in my system > > version 302.17-3 is installed, along with the basic > > nvidia-kernel-dkms, as I posted initially. All under cuda-toolkit > > version 4 (although this is not used in the "Debian way" of my > > installation). > > > > The output of > > > > dpkg -l |grep nvidia > > > > modinfo nvidia > > > > which I posted initially, indicate, in my experience, that everything > > is working correctly. On these basis, I suspected that 302.17-3 is too > > advanced for current namd builds, although everything is under toolkit > > 4 (or equivalent way). > > > > I could try to install 295 driver in place of 302 but probably someone > > knows better than me what could be expected. Moving forward is easy, > > going back, with any OS, is matter for experts. > > > > I am not sure that all I said is correct. I am a biochemist, not a > > software expert. > > > > Thanks for your kind attention. > > > > francesco pietra > > > > On Thu, Sep 27, 2012 at 2:58 PM, Aron Broom <brooms...@gmail.com> wrote: > >> So one potential problem here: is 302.17 a development driver, or just > the > >> one Linux installs itself from the proprietary drivers? It looks to me > like > >> the absolutely newest development driver is ver 295.41. I'm not > confident > >> that you'd be able to run NAMD without the development driver installed. > >> The installation is manual, and it should overwrite whatever driver you > have > >> there. I recommend a trip to the CUDA development zone webpage. > >> > >> ~Aron > >> > >> On Thu, Sep 27, 2012 at 3:52 AM, Francesco Pietra < > chiendar...@gmail.com> > >> wrote: > >>> > >>> Hello: > >>> I have tried the NAMD_CVS-2012-09-26_Linux-x86_64-multicore-CUDA with > >>> nvidia version 302.17: > >>> > >>> Running command: namd2 heat-01.conf +p6 +idlepoll > >>> > >>> Charm++: standalone mode (not using charmrun) > >>> Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21 > >>> CharmLB> Load balancer assumes all CPUs are same. > >>> Charm++> Running on 1 unique compute nodes (12-way SMP). > >>> Charm++> cpu topology info is gathered in 0.001 seconds. > >>> Info: NAMD CVS-2012-09-26 for Linux-x86_64-multicore-CUDA > >>> Info: > >>> Info: Please visit http://www.ks.uiuc.edu/Research/namd/ > >>> Info: for updates, documentation, and support information. > >>> Info: > >>> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005) > >>> Info: in all publications reporting results obtained with NAMD. > >>> Info: > >>> Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic > >>> Info: Built Wed Sep 26 02:25:08 CDT 2012 by jim on lisboa.ks.uiuc.edu > >>> Info: 1 NAMD CVS-2012-09-26 Linux-x86_64-multicore-CUDA 6 gig64 > >>> francesco > >>> Info: Running on 6 processors, 1 nodes, 1 physical nodes. > >>> Info: CPU topology information available. > >>> Info: Charm++/Converse parallel runtime startup completed at 0.085423 s > >>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64): > >>> initialization error > >>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (gig64): > >>> initialization error > >>> ------------- Processor 3 Exiting: Called CmiAbort ------------ > >>> Reason: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64): > >>> initialization error > >>> > >>> Program finished. > >>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 4 (gig64): > >>> initialization error > >>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 2 (gig64): > >>> initialization error > >>> > >>> > >>> As I had (nearly) no comment to such failures, I can only imagine that > >>> either (i) my question - disregarding obvious issues - was too silly > >>> to merit attention; (ii) it is well known that nvidia version 302.17 > >>> is incompatible with current namd builds for Linux-GNU. > >>> > >>> At any event, in the frame of metapackages, it is probably impossible > >>> within Debian GNU-Linux wheezy to go back to a previous version of > >>> nvidia. On the other hand, the stable version of the OS furnishes a > >>> much too old version of nvidia. Therefore, my question is: > >>> > >>> Any chance to compile namd in front of installed nvidia version 302.17? > >>> > >>> Thanks for advice. Without access to namd-cuda I am currently hindered > >>> to answer a question raised by the reviewers of a manuscript (the CPU > >>> cluster has long ago been shut down, as it became too expensive for > >>> our budget) > >>> > >>> francesco pietra > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Wed, Sep 26, 2012 at 4:08 PM, Francesco Pietra < > chiendar...@gmail.com> > >>> wrote: > >>> > I forgot to mention that I am at final version 2.9 of namd. > >>> > f. > >>> > > >>> > On Wed, Sep 26, 2012 at 4:05 PM, Aron Broom <brooms...@gmail.com> > wrote: > >>> >> I'm not certain, but I think the driver version needs to match the > CUDA > >>> >> toolkit version that NAMD uses, and I think the library file NAMD > comes > >>> >> with > >>> >> is toolkit 4.0 or something of that sort. > >>> >> > >>> >> ~Aron > >>> >> > >>> >> > >>> >> On Wed, Sep 26, 2012 at 9:58 AM, Francesco Pietra > >>> >> <chiendar...@gmail.com> > >>> >> wrote: > >>> >>> > >>> >>> Hi: > >>> >>> Following updating/upgrading of Debian GNU-Linux amd64 wheezy, > >>> >>> minimizations do not run anymore on GTX-680: > >>> >>> > >>> >>> CUDA error in CudaGetDeviceCount on Pe3 Pe4, Pe6. Initialization > >>> >>> error. > >>> >>> > >>> >>> The two GTX are regularly activated with > >>> >>> nvidia-smi -L > >>> >>> nvidia-smi -pm 1 > >>> >>> > >>> >>> Server and nvidia are the same version: > >>> >>> > >>> >>> francesco@gig64:~$ dpkg -l |grep nvidia > >>> >>> ii glx-alternative-nvidia 0.2.2 > >>> >>> amd64 allows the selection of NVIDIA as GLX provider > >>> >>> ii libgl1-nvidia-alternatives 302.17-3 > >>> >>> amd64 transition libGL.so* diversions to > >>> >>> glx-alternative-nvidia > >>> >>> ii libgl1-nvidia-glx:amd64 302.17-3 > >>> >>> amd64 NVIDIA binary OpenGL libraries > >>> >>> ii libglx-nvidia-alternatives 302.17-3 > >>> >>> amd64 transition libgl.so diversions to > >>> >>> glx-alternative-nvidia > >>> >>> ii libnvidia-ml1:amd64 302.17-3 > >>> >>> amd64 NVIDIA management library (NVML) runtime library > >>> >>> ii nvidia-alternative 302.17-3 > >>> >>> amd64 allows the selection of NVIDIA as GLX provider > >>> >>> ii nvidia-glx 302.17-3 > >>> >>> amd64 NVIDIA metapackage > >>> >>> ii nvidia-installer-cleanup 20120630+3 > >>> >>> amd64 Cleanup after driver installation with the > >>> >>> nvidia-installer > >>> >>> ii nvidia-kernel-common 20120630+3 > >>> >>> amd64 NVIDIA binary kernel module support files > >>> >>> ii nvidia-kernel-dkms 302.17-3 > >>> >>> amd64 NVIDIA binary kernel module DKMS source > >>> >>> ii nvidia-smi 302.17-3 > >>> >>> amd64 NVIDIA System Management Interface > >>> >>> ii nvidia-support 20120630+3 > >>> >>> amd64 NVIDIA binary graphics driver support files > >>> >>> ii nvidia-vdpau-driver:amd64 302.17-3 > >>> >>> amd64 NVIDIA vdpau driver > >>> >>> ii nvidia-xconfig 302.17-2 > >>> >>> amd64 X configuration tool for non-free NVIDIA drivers > >>> >>> ii xserver-xorg-video-nvidia 302.17-3 > >>> >>> amd64 NVIDIA binary Xorg driver > >>> >>> francesco@gig64:~$ > >>> >>> > >>> >>> > >>> >>> root@gig64:/home/francesco# modinfo nvidia > >>> >>> filename: /lib/modules/3.2.0-2-amd64/updates/dkms/nvidia.ko > >>> >>> alias: char-major-195-* > >>> >>> version: 302.17 > >>> >>> supported: external > >>> >>> license: NVIDIA > >>> >>> alias: pci:v000010DEd00000E00sv*sd*bc04sc80i00* > >>> >>> alias: pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00* > >>> >>> alias: pci:v000010DEd*sv*sd*bc03sc02i00* > >>> >>> alias: pci:v000010DEd*sv*sd*bc03sc00i00* > >>> >>> depends: i2c-core > >>> >>> vermagic: 3.2.0-2-amd64 SMP mod_unload modversions > >>> >>> parm: NVreg_EnableVia4x:int > >>> >>> parm: NVreg_EnableALiAGP:int > >>> >>> parm: NVreg_ReqAGPRate:int > >>> >>> parm: NVreg_EnableAGPSBA:int > >>> >>> parm: NVreg_EnableAGPFW:int > >>> >>> parm: NVreg_Mobile:int > >>> >>> parm: NVreg_ResmanDebugLevel:int > >>> >>> parm: NVreg_RmLogonRC:int > >>> >>> parm: NVreg_ModifyDeviceFiles:int > >>> >>> parm: NVreg_DeviceFileUID:int > >>> >>> parm: NVreg_DeviceFileGID:int > >>> >>> parm: NVreg_DeviceFileMode:int > >>> >>> parm: NVreg_RemapLimit:int > >>> >>> parm: NVreg_UpdateMemoryTypes:int > >>> >>> parm: NVreg_InitializeSystemMemoryAllocations:int > >>> >>> parm: NVreg_UseVBios:int > >>> >>> parm: NVreg_RMEdgeIntrCheck:int > >>> >>> parm: NVreg_UsePageAttributeTable:int > >>> >>> parm: NVreg_EnableMSI:int > >>> >>> parm: NVreg_MapRegistersEarly:int > >>> >>> parm: NVreg_RegisterForACPIEvents:int > >>> >>> parm: NVreg_RegistryDwords:charp > >>> >>> parm: NVreg_RmMsg:charp > >>> >>> parm: NVreg_NvAGP:int > >>> >>> root@gig64:/home/francesco# > >>> >>> > >>> >>> I have also tried with recently used MD files, same problem: > >>> >>> francesco@gig64:~/tmp$ charmrun namd2 heat-01.conf +p6 +idlepoll > 2>&1 > >>> >>> | tee heat-01.log > >>> >>> Running command: namd2 heat-01.conf +p6 +idlepoll > >>> >>> > >>> >>> Charm++: standalone mode (not using charmrun) > >>> >>> Converse/Charm++ Commit ID: v6.4.0-beta1-0-g5776d21 > >>> >>> CharmLB> Load balancer assumes all CPUs are same. > >>> >>> Charm++> Running on 1 unique compute nodes (12-way SMP). > >>> >>> Charm++> cpu topology info is gathered in 0.001 seconds. > >>> >>> Info: NAMD CVS-2012-06-20 for Linux-x86_64-multicore-CUDA > >>> >>> Info: > >>> >>> Info: Please visit http://www.ks.uiuc.edu/Research/namd/ > >>> >>> Info: for updates, documentation, and support information. > >>> >>> Info: > >>> >>> Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 > (2005) > >>> >>> Info: in all publications reporting results obtained with NAMD. > >>> >>> Info: > >>> >>> Info: Based on Charm++/Converse 60400 for > multicore-linux64-iccstatic > >>> >>> Info: Built Wed Jun 20 02:24:32 CDT 2012 by jim on > lisboa.ks.uiuc.edu > >>> >>> Info: 1 NAMD CVS-2012-06-20 Linux-x86_64-multicore-CUDA 6 > gig64 > >>> >>> francesco > >>> >>> Info: Running on 6 processors, 1 nodes, 1 physical nodes. > >>> >>> Info: CPU topology information available. > >>> >>> Info: Charm++/Converse parallel runtime startup completed at > >>> >>> 0.00989199 s > >>> >>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 5 (gig64): > >>> >>> initialization error > >>> >>> ------------- Processor 5 Exiting: Called CmiAbort ------------ > >>> >>> Reason: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 5 > (gig64): > >>> >>> initialization error > >>> >>> > >>> >>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 1 (gig64): > >>> >>> initialization error > >>> >>> Program finished. > >>> >>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 3 (gig64): > >>> >>> initialization error > >>> >>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 2 (gig64): > >>> >>> initialization error > >>> >>> FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 4 (gig64): > >>> >>> initialization error > >>> >>> francesco@gig64:~/tmp$ > >>> >>> > >>> >>> > >>> >>> This is a shared-mem machine. > >>> >>> Does the version 302.17 work for you? > >>> >>> > >>> >>> Thanks > >>> >>> francesco pietra > >>> >>> > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> Aron Broom M.Sc > >>> >> PhD Student > >>> >> Department of Chemistry > >>> >> University of Waterloo > >>> >> > >>> > >> > >> > >> > >> -- > >> Aron Broom M.Sc > >> PhD Student > >> Department of Chemistry > >> University of Waterloo > >> > -- Aron Broom M.Sc PhD Student Department of Chemistry University of Waterloo