On 02/04/2021 22.06, PICCA Frederic-Emmanuel wrote:
All the graphical cards on the numa node 0 are not available on my computer.

what's the NUMA layout of the machine? what cpus do you have in there?

We are using this computer in order to do data treatment in a scientific 
facility.

During the boot we have these error messages

Mar 29 13:08:12 re-grades-01 kernel: [    6.726771] NVRM: request_mem_region 
failed for 0M @ 0x0. This can
Mar 29 13:08:12 re-grades-01 kernel: [    6.726771] NVRM: occur when a driver 
such as rivatv is loaded and claims
Mar 29 13:08:12 re-grades-01 kernel: [    6.726771] NVRM: ownership of the 
device's registers.
Mar 29 13:08:12 re-grades-01 kernel: [    6.726792] nvidia: probe of 
0000:43:00.0 failed with error -1
...
I would like to help debug this issue, but I do not know where to start.

thanks for considering


[    22.870] (--) PCI: (67@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x260d0000000/268435456, 0x261e0000000/33554432
[    22.870] (--) PCI: (68@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x26330000000/268435456, 0x26440000000/33554432
[    22.870] (--) PCI: (69@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x26590000000/268435456, 0x266a0000000/33554432
[    22.870] (--) PCI: (70@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @ 
0x267f0000000/268435456, 0x26800000000/33554432, I/O @ 0x00006000/128
[    22.870] (--) PCI:*(101@0:0:0) 1a03:2000:1458:1000 rev 65, Mem @ 
0xd2000000/33554432, 0xd4000000/131072, I/O @ 0x00007000/128, BIOS @ 
0x????????/131072
[    22.870] (--) PCI: (131@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x96000000/16777216, 0x66160000000/268435456, 0x66190000000/33554432
[    22.871] (--) PCI: (132@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x94000000/16777216, 0x66020000000/268435456, 0x66050000000/33554432
[    22.871] (--) PCI: (133@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @ 
0x9a000000/16777216, 0x661c0000000/268435456, 0x661d0000000/33554432, I/O @ 
0x0000d000/128, BIOS @ 0x????????/524288
[    22.871] (--) PCI: (134@0:0:0) 10de:2204:1043:87d5 rev 161, Mem @ 
0x98000000/16777216, 0x661a0000000/268435456, 0x661b0000000/33554432, I/O @ 
0x0000c000/128, BIOS @ 0x????????/524288
[    22.871] (--) PCI: (135@0:0:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 
0x92000000/16777216, 0x65ee0000000/268435456, 0x65f10000000/33554432

Great. Half of the bus ids are in hex, half in decimal.

How many cards do you have in there? And how many pci devices are there per card?

Have you tried different drivers? (tesla-450, tesla-460)
Have you tried with only one kind of cards (Tesla or Geforce) in there?
Have you tried with only a single card in there?

Andreas

Reply via email to