On Feb 12, 2024, at 09:32, John Baldwin <j...@freebsd.org> wrote: > On 2/9/24 8:13 PM, Mark Millard wrote: >> Summary: >> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f >> irq 80,81 on simplebus2 >> pcib0: parsing FDT for ECAM0: >> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000 >> . . . >> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end >> 0x6000fffff >> panic: Failed to add resource to rman > > Hmmm, I suspect this is due to the way that bus_translate_resource works > which is > fundamentally broken. It rewrites the start address of a resource in-situ > instead > of keeping downstream resources separate from the upstream resources. For > example, > I don't see how you could ever release a resource in this design without > completely > screwing up your rman. That is, I expect trying to detach a PCI device > behind a > translating bridge that uses the current approach should corrupt the allocated > resource ranges in an rman long before my changes. > > That said, that doesn't really explain the panic. Hmm, the panic might be > because > for PCI bridge windows the driver now passes RF_ACTIVE and the > bus_translate_resource > hack only kicks in the activate_resource method of pci_host_generic.c. > >> Detail: >> . . . >> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f >> irq 80,81 on simplebus2 >> pcib0: parsing FDT for ECAM0: >> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000 > > This indicates this is a translating bus. > >> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0 >> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0x1 >> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff, >> count=0x100000 >> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000, >> 0xc00fffff], length 0x100000, flags 102, device pcib1 >> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff> >> considering [0xc0000000, 0xffffffff] >> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 (requested >> 0x100000) >> candidate region: [0xc0000000, 0xc00fffff], size 0x100000 >> allocating from the beginning >> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end >> 0x6000fffff > > The fact that we are trying to reserve the CPU addresses in the rman is > because > bus_translate_resource rewrote the start address in the resource after it was > allocated. > > That said, I can't see why rman_manage_region would actually fail. At this > point the > rman is empty (this is the first call to rman_manage_region for "pcib1 memory > window"), > so only the check that should be failing are the checks against rm_start and > rm_end. For the memory window, rm_start is always 0, and rm_end is always > 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new (0x60000000 - > 0x600fffffff) > ranges are within those bounds. I would instead expect to see some other > issue later > on where we fail to allocate a resource for a child BAR, but I wouldn't expect > rman_manage_region to fail. > > Logging the return value from rman_manage_region would be the first step I > think > to see which error value it is returning.
Looking at the code in sys/kern/subr_rman.c for rman_manage_region I see the (mostly) return rv related logic only has ENONMEM (explicit return) and EBUSY as non-0 possibilities (no other returns). All the rv references and all the returns are shown below: int rv = 0; . . r = int_alloc_resource(M_NOWAIT); if (r == NULL) return ENOMEM; . . /* Check for any overlap with the current region. */ if (r->r_start <= s->r_end && r->r_end >= s->r_start) { rv = EBUSY; goto out; } /* Check for any overlap with the next region. */ t = TAILQ_NEXT(s, r_link); if (t && r->r_start <= t->r_end && r->r_end >= t->r_start) { rv = EBUSY; goto out; } . . out: mtx_unlock(rm->rm_mtx); return rv; int_alloc_resource failure would be failure (NULL) from the: struct resource_i *r; r = malloc(sizeof *r, M_RMAN, malloc_flag | M_ZERO); (associated with the M_NOWAIT argument). The malloc failure would likely go in a very different direction. Side note: looks like the EBUSY cases leak what r references. > Probably I should fix pci_host_generic.c to handle translation properly > however. > I can work on a patch for that. === Mark Millard marklmi at yahoo.com