Adam DeConinck <[email protected]> wrote : > I've seen similar messages on CentOS when the Nouveau drivers are > loaded and a Tesla K20 is installed. You should make sure that nouveau > is blacklisted so the kernel won't load it. > > Note that it hasn't always been enough for me to have nouveau listed > in /etc/modprobe.d/blacklist; sometimes I've had to actually put > "rdblacklist=nouveau" on the kernel line. nouveau driver loading is suppressed via /etc/modprobe.d . lsmod don't show the presence of nouveau module; therefore I hope that rdblacklist as kernel parameter is not necessary.
First group of kernel messages about BARs are presented BEFORE I start nvidia driver installations, and I think that my corresponding question doesn't "depends" from driver installation, and, in particular, from nouveau. Mikhail > > Disclaimer: I work at NVIDIA, but I haven't touched OpenSUSE in forever. > > Cheers, > Adam > > On Tue, Jul 16, 2013 at 10:29 AM, Mikhail Kuzminsky <[email protected]> wrote: > > I want to test NVIDIA GPU (PNY Tesla K20c) w/our own application for future > > using in our cluster. But I found problems w/NVIDIA driver (v.319.32) > > installation (OpenSUSE 12.3, kernel 3.7.10-1.1). > > > > 1st of all, before start of driver installation I've strange for me > > messages about BAR registers: > > -----------------------from /var/log/messages------ > > 2013-07-04T01:43:43.666022+04:00 c6ws4 kernel: [ 0.421559] pci > > 0000:00:01.0: BAR 15: can't assign mem pref (size 0x18000000) > > 2013-07-04T01:43:43.666024+04:00 c6ws4 kernel: [ 0.421563] pci > > 0000:00:01.0: BAR 14: assigned [mem 0xe1000000-0xe1ffffff] > > 2013-07-04T01:43:43.666025+04:00 c6ws4 kernel: [ 0.421566] pci > > 0000:00:16.1: BAR 0: assigned [mem 0xe0001000-0xe000100f 64bit] > > 2013-07-04T01:43:43.666026+04:00 c6ws4 kernel: [ 0.421576] pci > > 0000:01:00.0: BAR 1: can't assign mem pref (size 0x10000000) > > 2013-07-04T01:43:43.666027+04:00 c6ws4 kernel: [ 0.421579] pci > > 0000:01:00.0: BAR 3: can't assign mem pref (size 0x2000000) > > 2013-07-04T01:43:43.666027+04:00 c6ws4 kernel: [ 0.421581] pci > > 0000:01:00.0: BAR 0: assigned [mem 0xe1000000-0xe1ffffff] > > 2013-07-04T01:43:43.666028+04:00 c6ws4 kernel: [ 0.421584] pci > > 0000:01:00.0: BAR 6: can't assign mem pref (size 0x80000) > > 2013-07-04T01:43:43.666029+04:00 c6ws4 kernel: [ 0.421586] pci > > 0000:00:01.0: PCI bridge to [bus 01] > > ----------------------------------------------------------------------------------------------- > > > > May be it's hardware/BIOS (Supermicro X9SCA-F, last BIOS v.2.0b) error > > symptoms ? I tried both BIOS modes - "above 4G Decoding" enabled and > > disabled. > > > > It looks for me that NVIDIA driver uses BAR 1 (see below). Although it was > > also some unclear for me messages in nvidia-installer.log, installer shows > > that kernel interface of nvidia.ko was compiled, but then > > nvidia-installer.log contains > > > > --------------------------from nvidia-installer.log > > ---------------------------------- > > -> Kernel module load error: No such device > > -> Kernel messages: > > ...[ 25.286079] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready > > [ 1379.760532] nvidia: module license 'NVIDIA' taints kernel. > > [ 1379.760536] Disabling lock debugging due to kernel taint > > [ 1379.765158] nvidia 0000:01:00.0: enabling device (0140 -> 0142) > > [ 1379.765165] NVRM: This PCI I/O region assigned to your NVIDIA device is > > invalid: > > [ 1379.765165] NVRM: BAR1 is 0M @ 0x0 (PCI:0000:01:00.0) > > [ 1379.765166] NVRM: The system BIOS may have misconfigured your GPU. > > [ 1379.765169] nvidia: probe of 0000:01:00.0 failed with error -1 > > [ 1379.765177] NVRM: The NVIDIA probe routine failed for 1 device(s). > > [ 1379.765178] NVRM: None of the NVIDIA graphics adapters were initialized! > > --------------------------------------------------------------------------------------------- > > > > I add also lspci -v extraction : > > > > 01:00.0 3D controller: NVIDIA Corporation GK107 [Tesla K20c] (rev a1) > > Subsystem: NVIDIA Corporation Device 0982 > > Flags: fast devsel, IRQ 11 > > Memory at e1000000 (32-bit, non-prefetchable) [disabled] [size=16M] > > Memory at <unassigned> (64-bit, prefetchable) [disabled] > > Memory at <unassigned> (64-bit, prefetchable) [disabled] > > > > Does this kernel messages above means that I have hardware/BIOS problems or > > it may be some NVIDIA driver problems ? > > > > Mikhail Kuzminsky > > Computer Assistance to Chemical Research Center > > Zelinsky Institute of Organic Chemistry > > Moscow > > > > > > > > > > > > _______________________________________________ > > Beowulf mailing list, [email protected] sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, [email protected] sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > Mikhail Kuzminsky _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
