Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

Mark Millard Mon, 12 Feb 2024 10:43:08 -0800

On Feb 12, 2024, at 09:32, John Baldwin <j...@freebsd.org> wrote:

> On 2/9/24 8:13 PM, Mark Millard wrote:
>> Summary:
>> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f 
>> irq 80,81 on simplebus2
>> pcib0: parsing FDT for ECAM0:
>> pcib0:  PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000
>> . . .
>> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 
>> 0x6000fffff
>> panic: Failed to add resource to rman
> 
> Hmmm, I suspect this is due to the way that bus_translate_resource works 
> which is
> fundamentally broken.  It rewrites the start address of a resource in-situ 
> instead
> of keeping downstream resources separate from the upstream resources.   For 
> example,
> I don't see how you could ever release a resource in this design without 
> completely
> screwing up your rman.  That is, I expect trying to detach a PCI device 
> behind a
> translating bridge that uses the current approach should corrupt the allocated
> resource ranges in an rman long before my changes.
> 
> That said, that doesn't really explain the panic.  Hmm, the panic might be 
> because
> for PCI bridge windows the driver now passes RF_ACTIVE and the 
> bus_translate_resource
> hack only kicks in the activate_resource method of pci_host_generic.c.
> 
>> Detail:
>> . . .
>> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f 
>> irq 80,81 on simplebus2
>> pcib0: parsing FDT for ECAM0:
>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000
> 
> This indicates this is a translating bus.
> 
>> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
>> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0x1
>> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff, 
>> count=0x100000
>> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000, 
>> 0xc00fffff], length 0x100000, flags 102, device pcib1
>> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff>
>> considering [0xc0000000, 0xffffffff]
>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 (requested 
>> 0x100000)
>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000
>> allocating from the beginning
>> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 
>> 0x6000fffff
> 
> The fact that we are trying to reserve the CPU addresses in the rman is 
> because
> bus_translate_resource rewrote the start address in the resource after it was 
> allocated.
> 
> That said, I can't see why rman_manage_region would actually fail.  At this 
> point the
> rman is empty (this is the first call to rman_manage_region for "pcib1 memory 
> window"),
> so only the check that should be failing are the checks against rm_start and
> rm_end.  For the memory window, rm_start is always 0, and rm_end is always
> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new (0x60000000 - 
> 0x600fffffff)
> ranges are within those bounds.  I would instead expect to see some other 
> issue later
> on where we fail to allocate a resource for a child BAR, but I wouldn't expect
> rman_manage_region to fail.
> 
> Logging the return value from rman_manage_region would be the first step I 
> think
> to see which error value it is returning.


Looking at the code in sys/kern/subr_rman.c for rman_manage_region I see
the (mostly) return rv related logic only has ENONMEM (explicit return) and
EBUSY as non-0 possibilities (no other returns). All the rv references and
all the returns are shown below:

        int rv = 0;
 . .
        r = int_alloc_resource(M_NOWAIT);
        if (r == NULL)
                return ENOMEM;
 . .
                /* Check for any overlap with the current region. */
                if (r->r_start <= s->r_end && r->r_end >= s->r_start) {
                        rv = EBUSY;
                        goto out;
                }

                /* Check for any overlap with the next region. */
                t = TAILQ_NEXT(s, r_link);
                if (t && r->r_start <= t->r_end && r->r_end >= t->r_start) {
                        rv = EBUSY;
                        goto out;
                }
 . .
out:
        mtx_unlock(rm->rm_mtx);
        return rv;

int_alloc_resource failure would be failure (NULL) from the:

        struct resource_i *r;
 
        r = malloc(sizeof *r, M_RMAN, malloc_flag | M_ZERO);

(associated with the M_NOWAIT argument).

The malloc failure would likely go in a very different direction.



Side note: looks like the EBUSY cases leak what r references.


> Probably I should fix pci_host_generic.c to handle translation properly 
> however.
> I can work on a patch for that.



===
Mark Millard
marklmi at yahoo.com

Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic

Reply via email to