Solved, but only by a trick. The significance of "GPU fallen out of the bus" (simply means that the gtx-470 are not seen by the system) should call attention and find a deep seated solution. Unfortunately not from me, a chemist who only knows his job. chiendarret
On Wed, Jul 6, 2011 at 11:23 PM, Francesco Pietra <chiendar...@gmail.com> wrote: > Replying as the author: > > Solved by setting persistent mode for both gtx470 and monitoring from > a ssh-linked desktop. The gtx470 machine appears from its terminal (no > X system was raised) as if it were hanged (keyboard not sensed, > terminal blinking at a stage long before namd md had started), > actually it is working regularly (top -i: 6 processors; nvidia-smi -q > - d TEMPERATURE output 90, 85), correct *.log. To avoid software > problems to the gtx-470 machine, shutdown will have to be issued from > the desktop. Clearly there is no hardware problem, at least per se, > where I find myself comfortable, as I was already wondering about > possible explosions of gtx470 capacitors. I simply shut down the > useless monitor of the gtx470 machine. I wonder whether there is a > problem with the nvidia latest driver (built from amd64 wheezy) using > gtx470 (problems started from upgrading to such latest driver), > however i don't mind as far as i can work. chiendarret > > On Wed, Jul 6, 2011 at 5:49 PM, Francesco Pietra <chiendar...@gmail.com> > wrote: >> Although I am trying to run an application, the messages below may >> indicate general problems with the computer. While I could run >> namd-cuda md simulations without problems after running (as root) >> >> nvidia-smi -L >> >> the situation has gradually worsened. After a two-days successful >> simulation, this afternoon a similar simulation does not start at all >> and the computer does no more sense the keyboard, as if it were >> hanged. Actually, looking at /var/log/messages of the gig64 machine >> from a ssh-linked desktop, kernel problems are indicated, as reported >> in part below. >> >> ****************** >> Jul 6 17:26:42 gig64 kernel: [ 230.179942] NVRM: GPU at 0000:01:00.0 >> has fallen off the bus. >> Jul 6 17:28:04 gig64 kernel: [ 312.424094] Modules linked in: >> powernow_k8 mperf cpufreq_conservative cpufreq_powersave cpufreq_stats >> cpufreq_userspace fuse nfsd exportfs nfs lockd fscache nfs_acl >> auth_rpcgss sunrpc ext2 it87 hwmon_vid loop firewire_sbp2 >> snd_hda_codec_hdmi nvidia(P) snd_hda_codec_realtek snd_hda_intel >> snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device evdev >> pcspkr k10temp snd i2c_piix4 soundcore edac_core edac_mce_amd i2c_core >> parport_pc snd_page_alloc parport wmi button processor thermal_sys >> ext3 jbd mbcache dm_mod raid1 md_mod usbhid hid sg sr_mod sd_mod cdrom >> crc_t10dif ata_generic ohci_hcd xhci_hcd pata_atiixp pata_jmicron ahci >> libahci libata ehci_hcd firewire_ohci usbcore firewire_core scsi_mod >> crc_itu_t floppy nls_base r8169 mii [last unloaded: scsi_wait_scan] >> Jul 6 17:28:04 gig64 kernel: [ 312.424204] CPU 4 >> Jul 6 17:28:04 gig64 kernel: [ 312.424208] Modules linked in: >> powernow_k8 mperf cpufreq_conservative cpufreq_powersave cpufreq_stats >> cpufreq_userspace fuse nfsd exportfs nfs lockd fscache nfs_acl >> auth_rpcgss sunrpc ext2 it87 hwmon_vid loop firewire_sbp2 >> snd_hda_codec_hdmi nvidia(P) snd_hda_codec_realtek snd_hda_intel >> snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device evdev >> pcspkr k10temp snd i2c_piix4 soundcore edac_core edac_mce_amd i2c_core >> parport_pc snd_page_alloc parport wmi button processor thermal_sys >> ext3 jbd mbcache dm_mod raid1 md_mod usbhid hid sg sr_mod sd_mod cdrom >> crc_t10dif ata_generic ohci_hcd xhci_hcd pata_atiixp pata_jmicron ahci >> libahci libata ehci_hcd firewire_ohci usbcore firewire_core scsi_mod >> crc_itu_t floppy nls_base r8169 mii [last unloaded: scsi_wait_scan] >> Jul 6 17:28:04 gig64 kernel: [ 312.424302] >> Jul 6 17:28:04 gig64 kernel: [ 312.424309] Pid: 2916, comm: namd2 >> Tainted: P O 2.6.38-2-amd64 #1 Gigabyte Technology Co., Ltd. >> GA-890FXA-UD5/GA-890FXA-UD5 >> Jul 6 17:28:04 gig64 kernel: [ 312.424322] RIP: >> 0010:[<ffffffffa07d4f74>] [<ffffffffa07d4f74>] >> _nv015265rm+0x252/0x260 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.424951] RSP: >> 0018:ffff88042075fc88 EFLAGS: 00000297 >> Jul 6 17:28:04 gig64 kernel: [ 312.424958] RAX: 00000000ffffffff >> RBX: ffff88042b522000 RCX: 0000000000000019 >> Jul 6 17:28:04 gig64 kernel: [ 312.424964] RDX: 00000000ffffffff >> RSI: 0000000000005499 RDI: ffff88042b522028 >> Jul 6 17:28:04 gig64 kernel: [ 312.424971] RBP: ffff8804217f2c88 >> R08: ffff8804217f2c98 R09: 0000000000000000 >> Jul 6 17:28:04 gig64 kernel: [ 312.424977] R10: 0000000000000246 >> R11: 0000000000000028 R12: ffffffff8100a30e >> Jul 6 17:28:04 gig64 kernel: [ 312.424983] R13: 0000000000000000 >> R14: 0000000000000246 R15: 0000000000000028 >> Jul 6 17:28:04 gig64 kernel: [ 312.424991] FS: >> 00007f2f355a1720(0000) GS:ffff8800bfb00000(0000) >> knlGS:0000000000000000 >> Jul 6 17:28:04 gig64 kernel: [ 312.424998] CS: 0010 DS: 0000 ES: >> 0000 CR0: 000000008005003b >> Jul 6 17:28:04 gig64 kernel: [ 312.425004] CR2: 00007f2f33947000 >> CR3: 000000041ef84000 CR4: 00000000000006e0 >> Jul 6 17:28:04 gig64 kernel: [ 312.425010] DR0: 0000000000000000 >> DR1: 0000000000000000 DR2: 0000000000000000 >> Jul 6 17:28:04 gig64 kernel: [ 312.425016] DR3: 0000000000000000 >> DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Jul 6 17:28:04 gig64 kernel: [ 312.425024] Process namd2 (pid: 2916, >> threadinfo ffff88042075e000, task ffff88041e5cd7c0) >> Jul 6 17:28:04 gig64 kernel: [ 312.425065] ffff88042b522000 >> 0000000000000045 0000000000000000 0000000000000003 >> Jul 6 17:28:04 gig64 kernel: [ 312.425077] 0000000000000000 >> ffffffffa045b0dc ffff88042b522000 ffff88042b057000 >> Jul 6 17:28:04 gig64 kernel: [ 312.425088] ffff88042fb30000 >> ffffffffa045b2e1 0000000000000002 0000000000000003 >> Jul 6 17:28:04 gig64 kernel: [ 312.425494] [<ffffffffa045b0dc>] ? >> _nv002890rm+0x8b/0x9c [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.425853] [<ffffffffa045b2e1>] ? >> _nv005068rm+0x1f4/0x20a [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.426368] [<ffffffffa069a5d7>] ? >> _nv010159rm+0x97b/0x9a9 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.426881] [<ffffffffa069303a>] ? >> _nv010153rm+0x314/0x3eb [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.427194] [<ffffffffa03dc728>] ? >> _nv002567rm+0x8c6/0x9a8 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.427511] [<ffffffffa03ebe35>] ? >> _nv002030rm+0xac/0xf0 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.427827] [<ffffffffa03ebe00>] ? >> _nv002030rm+0x77/0xf0 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa0a20265>] ? >> _nv002424rm+0x5b5/0x751 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa0a1a8b1>] ? >> rm_ioctl+0x30/0x10a [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa0a390e9>] ? >> nv_kern_ioctl+0x31a/0x381 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa0a3918d>] ? >> nv_kern_unlocked_ioctl+0x1c/0x20 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffff81104b0b>] ? >> do_vfs_ioctl+0x467/0x4b4 >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffff810d60fd>] ? >> do_brk+0x2ca/0x326 >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffff81104ba3>] ? >> sys_ioctl+0x4b/0x70 >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffff81009952>] ? >> system_call_fastpath+0x16/0x1b >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] Call Trace: >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa045b0dc>] ? >> _nv002890rm+0x8b/0x9c [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa045b2e1>] ? >> _nv005068rm+0x1f4/0x20a [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa069a5d7>] ? >> _nv010159rm+0x97b/0x9a9 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa069303a>] ? >> _nv010153rm+0x314/0x3eb [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa03dc728>] ? >> _nv002567rm+0x8c6/0x9a8 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa03ebe35>] ? >> _nv002030rm+0xac/0xf0 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa03ebe00>] ? >> _nv002030rm+0x77/0xf0 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa0a20265>] ? >> _nv002424rm+0x5b5/0x751 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa0a1a8b1>] ? >> rm_ioctl+0x30/0x10a [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa0a390e9>] ? >> nv_kern_ioctl+0x31a/0x381 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffffa0a3918d>] ? >> nv_kern_unlocked_ioctl+0x1c/0x20 [nvidia] >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffff81104b0b>] ? >> do_vfs_ioctl+0x467/0x4b4 >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffff810d60fd>] ? >> do_brk+0x2ca/0x326 >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffff81104ba3>] ? >> sys_ioctl+0x4b/0x70 >> Jul 6 17:28:04 gig64 kernel: [ 312.428001] [<ffffffff81009952>] ? >> system_call_fastpath+0x16/0x1b >> root@gig64:/var/log# >> Message from syslogd@gig64 at Jul 6 17:29:28 ... >> kernel:[ 396.424694] Stack: >> >> Message from syslogd@gig64 at Jul 6 17:29:28 ... >> kernel:[ 396.424762] Call Trace: >> >> Message from syslogd@gig64 at Jul 6 17:29:28 ... >> kernel:[ 396.428001] Code: be a0 e8 85 65 60 00 b8 00 00 00 00 e8 9d >> 65 60 00 85 c0 74 10 e8 84 28 63 00 0f 1f 00 eb 06 89 77 68 89 4f 6c >> 48 83 c5 10 5b c3 <41> 54 53 48 83 ec 08 48 83 ed 08 41 89 f4 39 77 68 >> 73 17 39 77 >> ................... >> ................... >> >> such messages from syslog@gig64 continue at slow pace. >> ********************* >> >> I should also mention that gnome 2-30-1 is now problematic: on using >> the file browser, the computer immediately hangs (or appears to hang >> as said before). The same occurs on repeated use of emacs. >> >> >> Thanks for advice >> >> francesco pietra >> > -- To UNSUBSCRIBE, email to debian-amd64-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CAEv0nmuOWpYD4uZ1Erv5mjyW1TWOYLp62_=b+aruyqnpswa...@mail.gmail.com