Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

John Baldwin Wed, 14 Feb 2024 10:18:10 -0800

On 2/14/24 9:57 AM, Mark Millard wrote:

On Feb 14, 2024, at 08:08, John Baldwin <j...@freebsd.org> wrote:

On 2/12/24 5:57 PM, Mark Millard wrote:

On Feb 12, 2024, at 16:36, Mark Millard <mark...@yahoo.com> wrote:

On Feb 12, 2024, at 16:10, Mark Millard <mark...@yahoo.com> wrote:

On Feb 12, 2024, at 12:00, Mark Millard <mark...@yahoo.com> wrote:

[Gack: I was looking at the wrong vintage of source code, predating
your changes: wrong system used.]


On Feb 12, 2024, at 10:41, Mark Millard <mark...@yahoo.com> wrote:

On Feb 12, 2024, at 09:32, John Baldwin <j...@freebsd.org> wrote:

On 2/9/24 8:13 PM, Mark Millard wrote:

Summary:
pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0:  PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000
. . .
rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 
0x6000fffff
panic: Failed to add resource to rman


Hmmm, I suspect this is due to the way that bus_translate_resource works which 
is
fundamentally broken.  It rewrites the start address of a resource in-situ 
instead
of keeping downstream resources separate from the upstream resources.   For 
example,
I don't see how you could ever release a resource in this design without 
completely
screwing up your rman.  That is, I expect trying to detach a PCI device behind a
translating bridge that uses the current approach should corrupt the allocated
resource ranges in an rman long before my changes.

That said, that doesn't really explain the panic.  Hmm, the panic might be 
because
for PCI bridge windows the driver now passes RF_ACTIVE and the 
bus_translate_resource
hack only kicks in the activate_resource method of pci_host_generic.c.

Detail:
. . .
pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f 
irq 80,81 on simplebus2
pcib0: parsing FDT for ECAM0:
pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000


This indicates this is a translating bus.

pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0x1
pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff, count=0x100000
rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000, 0xc00fffff], 
length 0x100000, flags 102, device pcib1
rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff>
considering [0xc0000000, 0xffffffff]
truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 (requested 0x100000)
candidate region: [0xc0000000, 0xc00fffff], size 0x100000
allocating from the beginning
rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 
0x6000fffff


What you later typed does not match:

0x600000000
0x6000fffff

You later typed:

0x60000000
0x600fffffff

This seems to have lead to some confusion from using the
wrong figure(s).

The fact that we are trying to reserve the CPU addresses in the rman is because
bus_translate_resource rewrote the start address in the resource after it was 
allocated.

That said, I can't see why rman_manage_region would actually fail.  At this 
point the
rman is empty (this is the first call to rman_manage_region for "pcib1 memory 
window"),
so only the check that should be failing are the checks against rm_start and
rm_end.  For the memory window, rm_start is always 0, and rm_end is always
0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new (0x60000000 - 
0x600fffffff)
ranges are within those bounds.


No:

0xffffffff

.vs (actual):

0x600000000
0x6000fffff


Ok, then this explains the failure if the "raw" addresses are above 4G.  I have
access to an emag I'm currently using to test fixes to pci_host_generic.c to
avoid corrupting struct resource objects.  I'll post the diff once I've got
something verified to work.

It looks to me like in sys/dev/pci/pci_pci.c the:
static void
pcib_probe_windows(struct pcib_softc *sc)
{
. . .
         pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0xffffffff);
. . .
is just inappropriately restrictive about where in the system
address space a PCIe can validly be mapped to on the high end.
That, in turn, leads to the rejection on the RPi4B now that
the range use is checked.


No, the physical register in PCI-PCI bridges is only 32-bits.  Only the
prefetchable BAR supports 64-bit addresses.


Just for my edification . . .

As I understand, SYS_RES_MEMORY for the BCM2711
means the 35 bit addressing space in the BCM2711,
not a PCIe device internal address range that
corresponds. Am I wrong about that?

If I'm wrong, what does identify the 35 bit
addressing space in the BCM2711?

If I'm correct, then the 0..0xffffffff
seems to be from the wrong address space up
front. Or, may be, the SYS_RES_MEMORY and the
0xffffffff argments are not related as I
expected and the 0xffffffff is not a
SYS_RES_MEMORY value?


We use SYS_RES_MEMORY for both address spaces.  SYS_RES_MEMORY is more of
an address space "type" and doesn't necessarily name a single, unique
address space.  The way to think about these address spaces is instances
of 'struct rman'.  There's a global 'struct rman' in the arm64 nexus
driver that represents the CPU physical memory address space.  The
pci_host_generic driver contains its own 'struct rman' instances that
represent the SYS_RES_MEMORY (for memory PCI BARs) and SYS_RES_IOPORT
(for I/O port PCI BARs) address spaces.

Put another way, SYS_RES_MEMORY names an I/O memory address space
relative to a device's given position in the tree.  For a given device
node in the tree, SYS_RES_MEMORY is unique, but what it maps onto is
defined by a parent bus device.

--
John Baldwin

Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

Reply via email to