On Tue, 3 Nov 2020 at 18:25, Ilia Mirkin <imir...@alum.mit.edu> wrote: > > On Tue, Nov 3, 2020 at 1:08 PM Dave Stevenson > <dave.steven...@raspberrypi.com> wrote: > > > > Hi Ilia > > Thanks again for the reply. > > > > On Wed, 28 Oct 2020 at 14:59, Ilia Mirkin <imir...@alum.mit.edu> wrote: > > > > > > On Wed, Oct 28, 2020 at 10:20 AM Dave Stevenson > > > <dave.steven...@raspberrypi.com> wrote: > > > > > > > > Hi Ilia > > > > > > > > Thanks for taking the time to reply. > > > > > > > > On Wed, 28 Oct 2020 at 14:10, Ilia Mirkin <imir...@alum.mit.edu> wrote: > > > > > > > > > > The most common issue on arm is that the pci memory window is too > > > > > narrow to allocate all the BARs. Can you see if there are messages in > > > > > the kernel to that effect? > > > > > > > > All the BAR allocations seem to succeed except for the IO one. > > > > AIUI I/O is deprecated, but is it still used on these cards? > > > > > > I must admit I was ignorant of the fact that the IO ports were treated > > > as a BAR, but it makes a lot of sense. > > > > > > One thing does stand out as odd: > > > > > > > > > > > [ 1.060851] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 > > > > ranges: > > > > [ 1.060892] brcm-pcie fd500000.pcie: No bus range found for > > > > /scb/pcie@7d500000, using [bus 00-ff] > > > > [ 1.060975] brcm-pcie fd500000.pcie: MEM > > > > 0x0600000000..0x063fffffff -> 0x00c0000000 > > > > [ 1.061061] brcm-pcie fd500000.pcie: IB MEM > > > > 0x0000000000..0x00ffffffff -> 0x0100000000 > > > > [ 1.109943] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC) > > > > [ 1.110129] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00 > > > > [ 1.110159] pci_bus 0000:00: root bus resource [bus 00-ff] > > > > [ 1.110187] pci_bus 0000:00: root bus resource [mem > > > > 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff]) > > > > [ 1.110286] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400 > > > > [ 1.110505] pci 0000:00:00.0: PME# supported from D0 D3hot > > > > [ 1.114095] pci 0000:00:00.0: bridge configuration invalid ([bus > > > > 00-00]), reconfiguring > > > > [ 1.114343] pci 0000:01:00.0: [10de:128b] type 00 class 0x030000 > > > > [ 1.114404] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff] > > > > [ 1.114456] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x07ffffff > > > > 64bit pref] > > > > [ 1.114510] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff > > > > 64bit pref] > > > > [ 1.114551] pci 0000:01:00.0: reg 0x24: [io 0x0000-0x007f] > > > > [ 1.114590] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff > > > > pref] > > > > [ 1.114853] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, > > > > limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 63.008 > > > > Gb/s with 8.0 GT/s PCIe x8 link) > > > > [ 1.115022] pci 0000:01:00.0: vgaarb: VGA device added: > > > > decodes=io+mem,owns=none,locks=none > > > > [ 1.115125] pci 0000:01:00.1: [10de:0e0f] type 00 class 0x040300 > > > > [ 1.115184] pci 0000:01:00.1: reg 0x10: [mem 0x00000000-0x00003fff] > > > > [ 1.119065] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to > > > > 01 > > > > [ 1.119120] pci 0000:00:00.0: BAR 9: assigned [mem > > > > 0x600000000-0x60bffffff 64bit pref] > > > > [ 1.119151] pci 0000:00:00.0: BAR 8: assigned [mem > > > > 0x60c000000-0x60d7fffff] > > > > > > This is your brcm-pcie device. > > > > > > > [ 1.119183] pci 0000:01:00.0: BAR 1: assigned [mem > > > > 0x600000000-0x607ffffff 64bit pref] > > > > [ 1.119235] pci 0000:01:00.0: BAR 3: assigned [mem > > > > 0x608000000-0x609ffffff 64bit pref] > > > > [ 1.119285] pci 0000:01:00.0: BAR 0: assigned [mem > > > > 0x60c000000-0x60cffffff] > > > > > > And this is the NVIDIA device. Note that these memory windows are > > > identical (or at least overlapping). I must admit almost complete > > > ignorance of PCIe and whether this is OK, but it seems sketchy at > > > first glance. A quick eyeballing of my x86 system suggests that all > > > PCIe devices get non-overlapping windows. OTOH there are messages > > > further up about some sort of remapping going on, so perhaps it's OK? > > > But two things on the same bus still shouldn't have the same addresses > > > allocated, based on my (limited) understanding. > > > > I've raised this with colleagues and it seems that this is normal. > > The PCI bridge reports the window through which devices can be mapped, > > and all devices have to fit within that. Pass as to whether that is a > > quirk of ARM or this particular bridge. > > > > I do note that on my x86 systems device 0000:00:00.0 is reported by > > lspci as a "Host bridge" instead of a "PCI bridge". > > On an Ubuntu VM I've got running, I do get > > [ 0.487249] PCI host bridge to bus 0000:00 > > [ 0.487252] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] > > [ 0.487254] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] > > [ 0.487256] pci_bus 0000:00: root bus resource [mem > > 0x000a0000-0x000bffff window] > > [ 0.487258] pci_bus 0000:00: root bus resource [mem > > 0xe0000000-0xfdffffff window] > > [ 0.487260] pci_bus 0000:00: root bus resource [bus 00-ff] > > and all device allocations are from within those ranges, so I'm not > > convinced it's that different. > > > > > In case it's an option, could you "unplug" the NIC (not just not load > > > its driver, but make it not appear at all on the PCI bus)? > > > > NIC? The network interface is totally separate. Or is this another > > reuse of a TLA? > > > > Unplugging the GPU means the PCI bus reports as being down and I get > > no output at all from lspci. > > Oh duh. I thought brcm-pcie was a broadcom NIC. Apparently it's the > whole bus - can't unplug that! Also explains the "conflict" which > makes a lot more sense if you (correctly) understand that other > "device" is the bus itself. Apologies for the misinterpretation :( > [And in hindsight, RPi runs on a Broadcom SoC, so ... I should have > remembered that. In my mind they just make network stuff, will try to > get that updated.]
Phew, I thought I was going crazy :-) > > > > >> I've tried it so far with a GT710 board [1] and ARM64. It's blowing > > > > >> up > > > > >> in the memset of nvkm_instobj_new whilst initialising the BAR > > > > >> subdevice [2], having gone through the "No such luck" path in > > > > >> nvkm_mmu_ptc_get [3]. > > > > >> > > > > >> Taking the naive approach of simply removing the memset, I get > > > > >> through > > > > >> initialising all the subdevices, but again die in a location I > > > > >> currently haven't pinpointed. The last logging messages are: > > > > > > That's not a winning strategy, I'm afraid. You need to figure out why > > > the memset is blowing up. The simplest explanation is "it's trying to > > > write to an I/O resource but that resource wasn't allocated", hence my > > > question about BARs. But something's not mapped, or mapped in the > > > wrong way, or whatever. If you can't write to it at that point in > > > time, you won't be able to write to it later either. I would focus on > > > that. > > > > I did say it was the naive approach :-) > > I was trying to gauge how much effort was going to be needed to get > > this going. Was it going to blow up in 1, 10, or 100 places? It feels > > like it is only a couple of things that are wrong, so there is hope. > > > > Slightly annoyingly something more urgent has come up and I need to > > shelve my experimentation for now, but thanks for the pointers. At > > least I have some idea of where to start looking when time allows. > > When/if you do get back to it, you might consider posting a more > complete log without getting rid of the memset, perhaps the nature of > the blow-up will make the underlying problem more apparent, or make > further investigation paths apparent. Will do, thanks. Dave _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau