On 02/04/2021 22.06, PICCA Frederic-Emmanuel wrote:
All the graphical cards on the numa node 0 are not available on my computer.
what's the NUMA layout of the machine? what cpus do you have in there?
We are using this computer in order to do data treatment in a scientific
facility.
During the boot we have these error messages
Mar 29 13:08:12 re-grades-01 kernel: [ 6.726771] NVRM: request_mem_region
failed for 0M @ 0x0. This can
Mar 29 13:08:12 re-grades-01 kernel: [ 6.726771] NVRM: occur when a driver
such as rivatv is loaded and claims
Mar 29 13:08:12 re-grades-01 kernel: [ 6.726771] NVRM: ownership of the
device's registers.
Mar 29 13:08:12 re-grades-01 kernel: [ 6.726792] nvidia: probe of
0000:43:00.0 failed with error -1
...
I would like to help debug this issue, but I do not know where to start.
thanks for considering
[ 22.870] (--) PCI: (67@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @
0x260d0000000/268435456, 0x261e0000000/33554432
[ 22.870] (--) PCI: (68@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @
0x26330000000/268435456, 0x26440000000/33554432
[ 22.870] (--) PCI: (69@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @
0x26590000000/268435456, 0x266a0000000/33554432
[ 22.870] (--) PCI: (70@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @
0x267f0000000/268435456, 0x26800000000/33554432, I/O @ 0x00006000/128
[ 22.870] (--) PCI:*(101@0:0:0) 1a03:2000:1458:1000 rev 65, Mem @
0xd2000000/33554432, 0xd4000000/131072, I/O @ 0x00007000/128, BIOS @
0x????????/131072
[ 22.870] (--) PCI: (131@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @
0x96000000/16777216, 0x66160000000/268435456, 0x66190000000/33554432
[ 22.871] (--) PCI: (132@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @
0x94000000/16777216, 0x66020000000/268435456, 0x66050000000/33554432
[ 22.871] (--) PCI: (133@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @
0x9a000000/16777216, 0x661c0000000/268435456, 0x661d0000000/33554432, I/O @
0x0000d000/128, BIOS @ 0x????????/524288
[ 22.871] (--) PCI: (134@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @
0x98000000/16777216, 0x661a0000000/268435456, 0x661b0000000/33554432, I/O @
0x0000c000/128, BIOS @ 0x????????/524288
[ 22.871] (--) PCI: (135@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @
0x92000000/16777216, 0x65ee0000000/268435456, 0x65f10000000/33554432
Great. Half of the bus ids are in hex, half in decimal.
How many cards do you have in there? And how many pci devices are there
per card?
Have you tried different drivers? (tesla-450, tesla-460)
Have you tried with only one kind of cards (Tesla or Geforce) in there?
Have you tried with only a single card in there?
Andreas