On Tue, 16 Sep 2025, Lucas De Marchi wrote:

> On Mon, Sep 15, 2025 at 08:24:06PM +0300, Ilpo Järvinen wrote:
> > On Mon, 15 Sep 2025, Lucas De Marchi wrote:
> > 
> > > On Mon, Sep 15, 2025 at 12:13:47PM +0300, Ilpo Järvinen wrote:
> > > > pci.c has been used as catch everything that doesn't fits elsewhere
> > > > within PCI core and thus resizable BAR code has been placed there as
> > > > well. Move Resizable BAR related code to a newly introduced rebar.c to
> > > > reduce size of pci.c. After move, there are no pci_rebar_*() calls from
> > > > pci.c indicating this is indeed well-defined subset of PCI core.
> > > >
> > > > Endpoint drivers perform Resizable BAR related operations which could
> > > > well be performed by PCI core to simplify driver-side code. This
> > > > series adds a few new API functions to that effect and converts the
> > > > drivers to use the new APIs (in separate patches).
> > > >
> > > > While at it, also convert BAR sizes bitmask to u64 as PCIe spec already
> > > > specifies more sizes than what will fit u32 to make the API typing more
> > > > future-proof. The extra sizes beyond 128TB are not added at this point.
> > > >
> > > > These are based on pci/main plus a simple "adapter" patch to add the
> > > > include for xe_vram_types.h that was added by a commit in drm-tip.
> > > > Hopefully that is enough to avoid the within context conflict with
> > > > BAR_SIZE_SHIFT removal to let the xe CI tests to be run for this
> > > > series.
> > > >
> > > > There are two minor conflicts with the work in pci/resource but I'm
> > > > hesitant to base this on top of it as this is otherwise entirely
> > > > independent (and would likely prevent GPU CI tests as well). If we end
> > > > up having to pull the bridge window select changes, there should be no
> > > > reason why this does have to become collateral damage (so my
> > > > suggestion, if this is good to go in this cycle, to take this into a
> > > > separate branch than pci/resource and deal with those small conflicts
> > > > while merging into pci/next).
> > > >
> > > > I've tested sysfs resize, i915, and xe BAR resizing functionality. In
> > > > the case of xe, I did small hack patch as its resize is anyway broken
> > > > as is because BAR0 pins the bridge window so resizing BAR2 fails. My
> > > > hack caused other problems further down the road (likely because BAR0
> > > > is in use by the driver so releasing it messed assumptions xe driver
> > > > has) but the BAR resize itself was working which was all I was
> > > 
> > > is the hack you mention here to release all BARs before attempting the
> > > resize?
> > 
> > Yes, the patch added release of BAR0 prior to resize. The existing xe code
> > in _resize_bar() already releases BAR2.
> > 
> > During resize, if the first loop in pbus_reassign_bridge_resources()
> > (called from pci_resize_resource()) finds the bridge window closest to the
> > endpoint still has a child, it results in having empty saved list because
> > all upstream bridge windows will then have a child as well.
> > 
> > Empty saved list is checked after the loop and
> > pbus_reassign_bridge_resources() returns -ENOENT without even trying to
> > assign the resources. The error is returned even if the actual bridge
> > window size is large enough to fit the resized resource.
> > 
> > The logic in pci_resize_resource() and pbus_reassign_bridge_resources()
> > need some other improvements besides that problem, but I likely won't
> > have time to look at that until completing the fitting algorithm changes.
> > I'd actually want to add pci_release_and_resize_resource() which would
> > take care of releasing all the resources of the device (obviously driver
> > must have its hands off all those BARs when it calls that function). With
> > the current pci_resize_resource() API, handling the restore of BARs in
> > case of failure is not as robust as I'd like to make it.
> > 
> > > > interested to know. I'm not planning to pursue fixing the pinning
> > > > problem within xe driver because the core changes to consider maximum
> > > > size of the resizable BARs should take care of the main problem by
> > > > different means.
> > > 
> > > I'd actually like to pursue that myself as that could be propagated to
> > > stable since we do have some resize errors in xe with BMG that I wasn't
> > > understanding. It's likely due to xe_mmio_probe_early() taking a hold of
> > > BAR0 and not expecting it to be moved. We could either remap if we have
> > > have to resize or just move the resize logic early on.
> > 
> > Great. If you have any questions when it comes to the PCI core side code,
> > please let me know.
> 
> I moved the resize to happen before anything else in xe. However when
> testing I noticed a scenario failing without involving the driver.
> With and without this series I still have the same pass/failure
> scenarios:
> 
> Tests executed with a BMG. Just after boot, BAR2 is 16GB.
> 
> 1) If I resize it via sysfs to 8GB and then load the driver, it resizes
> it back. Resize from sysfs works too. No change in behavior.

It's expected that resizing smaller size -> back to the original works 
through sysfs because the upstream window pins won't prevent reacquiring 
the same or less space.

But the way resize is called from current xe code, sizing even to a 
smaller size fails because BAR0 pins the closest upstream window, 
resulting in -ENOENT as explained above. I don't see fixing this on core 
side as priority because I plan to rework the resizing code anyway and 
resizing to a smaller size doesn't seem overly useful use case.

> 2) If I do "remove the bridge via sysfs and rescan the bus"[1], it fails to
> resize (either automatically, on rescan, via sysfs, or loading the xe
> driver). It just stays at 256M.

This is because the larger resource sizes are only calculated on the 
actual resize call which occurs after the bridge windows were already 
sized on rescan to the smaller size. At that point, the critical bridge 
windows are already pinned in place and thus cannot be relocated to free 
area I assume there would be somewhere within 4000000000-7fffffffff.

> The only thing that brings it back is a reboot. /proc/iomem shows this:
> 
> 4000000000-7fffffffff : PCI Bus 0000:00
>   4000000000-44007fffff : PCI Bus 0000:01
>     4000000000-4017ffffff : PCI Bus 0000:02
>       4000000000-400fffffff : PCI Bus 0000:03    <<<< BMG
>         4000000000-400fffffff : 0000:03:00.0

>       4010000000-40100fffff : PCI Bus 0000:04

This pins 0000:01:00.0's window in place. And also prevents enlarging the 
siblings.

It would possible, though, to release it and still use sysfs to perform 
resize on 0000:03:00.0 as removing 0000:04:00.0 doesn't require removing 
0000:03:00.0. But...

>     4018000000-40187fffff : 0000:01:00.0

...This resource pins 0000:00:01.0's window in place. AFAIK, it cannot be 
released other than by removing 0000:01:00.0 which results in removing 
0000:03:00.0 as well, thus making it impossible to perform the BAR resize 
for 0000:03:00.0 through sysfs anymore. Catch-22.

Could you test if the attached quirk patch helps. Maybe it could be 
considered as the interim solution until the bridge sizing logic becomes 
aware of the resizable BARs. To use a quirk to do this feels hacky to me, 
but then it's hard to point out any real downsides with that approach 
(other than having to quirk it).

You'll still need to manually release 0000:04:00.0 but the BAR0 on the 
switch should be gone thanks to the quirk. When both of the window pins 
are gone, I think the resize through sysfs should work.

> And dmesg shows this for the rescan:
> 
> [ 1673.189737] pci 0000:01:00.0: [8086:e2ff] type 01 class 0x060400 PCIe
> Switch Upstream Port
> [ 1673.189794] pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x007fffff 64bit pref]
> [ 1673.189808] pci 0000:01:00.0: PCI bridge to [bus 00]
> [ 1673.189824] pci 0000:01:00.0:   bridge window [io  0x0000-0x0fff]
> [ 1673.189834] pci 0000:01:00.0:   bridge window [mem 0x00000000-0x000fffff]
> [ 1673.189856] pci 0000:01:00.0:   bridge window [mem 0x00000000-0x000fffff
> 64bit pref]
> [ 1673.189878] pci 0000:01:00.0: Max Payload Size set to 256 (was 128, max
> 256)
> [ 1673.190164] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> [ 1673.193531] pci 0000:01:00.0: Adding to iommu group 16
> [ 1673.196997] pcieport 0000:00:01.0: ASPM: current common clock configuration
> is inconsistent, reconfiguring
> [ 1673.197061] pci 0000:01:00.0: bridge configuration invalid ([bus 00-00]),
> reconfiguring
> [ 1673.197421] pci 0000:02:01.0: [8086:e2f0] type 01 class 0x060400 PCIe
> Switch Downstream Port
> [ 1673.197452] pci 0000:02:01.0: PCI bridge to [bus 00]
> [ 1673.197463] pci 0000:02:01.0:   bridge window [io  0x0000-0x0fff]
> [ 1673.197468] pci 0000:02:01.0:   bridge window [mem 0x00000000-0x000fffff]
> [ 1673.197482] pci 0000:02:01.0:   bridge window [mem 0x00000000-0x000fffff
> 64bit pref]
> [ 1673.197497] pci 0000:02:01.0: Max Payload Size set to 256 (was 128, max
> 256)
> [ 1673.197503] pci 0000:02:01.0: enabling Extended Tags
> [ 1673.197660] pci 0000:02:01.0: PME# supported from D0 D3hot D3cold
> [ 1673.198411] pci 0000:02:01.0: Adding to iommu group 17
> [ 1673.200258] pci 0000:02:02.0: [8086:e2f1] type 01 class 0x060400 PCIe
> Switch Downstream Port
> [ 1673.200289] pci 0000:02:02.0: PCI bridge to [bus 00]
> [ 1673.200299] pci 0000:02:02.0:   bridge window [io  0x0000-0x0fff]
> [ 1673.200304] pci 0000:02:02.0:   bridge window [mem 0x00000000-0x000fffff]
> [ 1673.200317] pci 0000:02:02.0:   bridge window [mem 0x00000000-0x000fffff
> 64bit pref]
> [ 1673.200333] pci 0000:02:02.0: Max Payload Size set to 256 (was 128, max
> 256)
> [ 1673.200337] pci 0000:02:02.0: enabling Extended Tags
> [ 1673.200470] pci 0000:02:02.0: PME# supported from D0 D3hot D3cold
> [ 1673.201059] pci 0000:02:02.0: Adding to iommu group 18
> [ 1673.202761] pci 0000:01:00.0: PCI bridge to [bus 02-04]
> [ 1673.202774] pci 0000:02:01.0: bridge configuration invalid ([bus 00-00]),
> reconfiguring
> [ 1673.202782] pci 0000:02:02.0: bridge configuration invalid ([bus 00-00]),
> reconfiguring
> [ 1673.203024] pci 0000:03:00.0: [8086:e221] type 00 class 0x030000 PCIe
> Endpoint
> [ 1673.203060] pci 0000:03:00.0: BAR 0 [mem 0x00000000-0x00ffffff 64bit]
> [ 1673.203064] pci 0000:03:00.0: BAR 2 [mem 0x00000000-0x0fffffff 64bit pref]
> [ 1673.203069] pci 0000:03:00.0: ROM [mem 0x00000000-0x001fffff pref]
> [ 1673.203077] pci 0000:03:00.0: Max Payload Size set to 256 (was 128, max
> 256)
> [ 1673.203209] pci 0000:03:00.0: PME# supported from D0 D3hot
> [ 1673.203770] pci 0000:03:00.0: Adding to iommu group 19
> [ 1673.205451] pci 0000:03:00.0: vgaarb: setting as boot VGA device
> [ 1673.205454] pci 0000:03:00.0: vgaarb: bridge control possible
> [ 1673.205455] pci 0000:03:00.0: vgaarb: VGA device added:
> decodes=io+mem,owns=none,locks=none
> [ 1673.205534] pci 0000:02:01.0: PCI bridge to [bus 03-04]
> [ 1673.205543] pci_bus 0000:03: busn_res: [bus 03-04] end is updated to 03
> [ 1673.205787] pci 0000:04:00.0: [8086:e2f7] type 00 class 0x040300 PCIe
> Endpoint
> [ 1673.205848] pci 0000:04:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
> [ 1673.205867] pci 0000:04:00.0: Max Payload Size set to 256 (was 128, max
> 256)
> [ 1673.205872] pci 0000:04:00.0: enabling Extended Tags
> [ 1673.206012] pci 0000:04:00.0: PME# supported from D3hot D3cold
> [ 1673.206528] pci 0000:04:00.0: Adding to iommu group 20
> [ 1673.208271] pci 0000:02:02.0: PCI bridge to [bus 04]
> [ 1673.208284] pci_bus 0000:04: busn_res: [bus 04] end is updated to 04
> [ 1673.208291] pci_bus 0000:02: busn_res: [bus 02-04] end is updated to 04
> [ 1673.232003] pcieport 0000:00:01.0: Assigned bridge window [mem
> 0x83000000-0x840fffff] to [bus 01-04] cannot fit 0x2000000 required for
> 0000:02:01.0 bridging to [bus 03]
> [ 1673.232009] pci 0000:02:01.0: bridge window [mem 0x00000000-0x000fffff] to
> [bus 03] requires relaxed alignment rules
> [ 1673.232016] pci 0000:02:01.0: bridge window [mem 0x01000000-0x01ffffff] to
> [bus 03] add_size 200000 add_align 1000000
> [ 1673.232020] pcieport 0000:00:01.0: Assigned bridge window [mem
> 0x83000000-0x840fffff] to [bus 01-04] cannot fit 0x1800000 required for
> 0000:01:00.0 bridging to [bus 02-04]
> [ 1673.232025] pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff] to
> [bus 02-04] requires relaxed alignment rules
> [ 1673.232027] pcieport 0000:00:01.0: Assigned bridge window [mem
> 0x83000000-0x840fffff] to [bus 01-04] cannot fit 0x2000000 required for
> 0000:01:00.0 bridging to [bus 02-04]
> [ 1673.232031] pci 0000:01:00.0: bridge window [mem 0x00000000-0x000fffff] to
> [bus 02-04] requires relaxed alignment rules
> [ 1673.232036] pci 0000:01:00.0: bridge window [mem 0x01000000-0x020fffff] to
> [bus 02-04] add_size 200000 add_align 1000000
> [ 1673.232077] pci 0000:01:00.0: bridge window [mem 0x4000000000-0x4017ffffff
> 64bit pref]: assigned
> [ 1673.232080] pci 0000:01:00.0: bridge window [mem size 0x01300000]: can't
> assign; no space
> [ 1673.232082] pci 0000:01:00.0: bridge window [mem size 0x01300000]: failed
> to assign
> [ 1673.232090] pci 0000:01:00.0: BAR 0 [mem 0x4018000000-0x40187fffff 64bit
> pref]: assigned
> [ 1673.232103] pci 0000:01:00.0: bridge window [io  0x8000-0x9fff]: assigned
> [ 1673.232129] pci 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff]:
> assigned
> [ 1673.232131] pci 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff]:
> failed to expand by 0x200000
> [ 1673.232136] pci 0000:01:00.0: bridge window [mem 0x83000000-0x840fffff]:
> failed to add optional 200000
> [ 1673.232192] pci 0000:02:01.0: bridge window [mem 0x4000000000-0x400fffffff
> 64bit pref]: assigned
> [ 1673.232196] pci 0000:02:01.0: bridge window [mem 0x83000000-0x83ffffff]:
> assigned
> [ 1673.232200] pci 0000:02:02.0: bridge window [mem 0x84000000-0x840fffff]:
> assigned
> [ 1673.232202] pci 0000:02:02.0: bridge window [mem 0x4010000000-0x40100fffff
> 64bit pref]: assigned
> [ 1673.232204] pci 0000:02:01.0: bridge window [io  0x8000-0x8fff]: assigned
> [ 1673.232206] pci 0000:02:02.0: bridge window [io  0x9000-0x9fff]: assigned
> [ 1673.232241] pci 0000:03:00.0: BAR 2 [mem 0x4000000000-0x400fffffff 64bit
> pref]: assigned
> [ 1673.232250] pci 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]:
> assigned
> [ 1673.232259] pci 0000:03:00.0: ROM [mem size 0x00200000 pref]: can't assign;
> no space
> [ 1673.232261] pci 0000:03:00.0: ROM [mem size 0x00200000 pref]: failed to
> assign
> [ 1673.232272] pci 0000:03:00.0: BAR 2 [mem 0x4000000000-0x400fffffff 64bit
> pref]: assigned
> [ 1673.232280] pci 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]:
> assigned
> [ 1673.232289] pci 0000:03:00.0: ROM [mem size 0x00200000 pref]: can't assign;
> no space
> [ 1673.232291] pci 0000:03:00.0: ROM [mem size 0x00200000 pref]: failed to
> assign
> [ 1673.232302] pci 0000:02:01.0: PCI bridge to [bus 03]
> [ 1673.232304] pci 0000:02:01.0:   bridge window [io  0x8000-0x8fff]
> [ 1673.232309] pci 0000:02:01.0:   bridge window [mem 0x83000000-0x83ffffff]
> [ 1673.232313] pci 0000:02:01.0:   bridge window [mem
> 0x4000000000-0x400fffffff 64bit pref]
> [ 1673.232321] pci 0000:04:00.0: BAR 0 [mem 0x84000000-0x84003fff 64bit]:
> assigned
> [ 1673.232336] pci 0000:02:02.0: PCI bridge to [bus 04]
> [ 1673.232339] pci 0000:02:02.0:   bridge window [io  0x9000-0x9fff]
> [ 1673.232345] pci 0000:02:02.0:   bridge window [mem 0x84000000-0x840fffff]
> [ 1673.232349] pci 0000:02:02.0:   bridge window [mem
> 0x4010000000-0x40100fffff 64bit pref]
> [ 1673.232356] pci 0000:01:00.0: PCI bridge to [bus 02-04]
> [ 1673.232359] pci 0000:01:00.0:   bridge window [io  0x8000-0x9fff]
> [ 1673.232363] pci 0000:01:00.0:   bridge window [mem 0x83000000-0x840fffff]
> [ 1673.232366] pci 0000:01:00.0:   bridge window [mem
> 0x4000000000-0x4017ffffff 64bit pref]
> [ 1673.232471] pcieport 0000:01:00.0: enabling device (0000 -> 0003)
> [ 1673.233508] pcieport 0000:02:01.0: enabling device (0000 -> 0003)
> [ 1673.233692] pcieport 0000:02:02.0: enabling device (0000 -> 0003)
> 
> # echo 9 > /sys/bus/pci/devices/0000\:03\:00.0/resource2_resize -bash: echo:
> write error: No space left on device
> 
> 
> [1] # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
>     # echo 0 > /sys/bus/pci/drivers_autoprobe
>     # echo 1 > /sys/bus/pci/rescan
> 
> 
> I can share the xe patch so you check if it at least fixes it in your
> test scenario.

Ah, one thing I didn't remember mention is that in my case the BAR is 
already at its maximum size, so to test the resize is working, I made
the target size smaller, not larger. (I understand this might not be very 
helpful in your case but I was only interested that resize code still 
works after this series).

-- 
 i.
From 948a49f01df54b3435861138a0eae85bb2c3f1f3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvi...@linux.intel.com>
Date: Wed, 17 Sep 2025 15:24:53 +0300
Subject: [PATCH 1/1] PCI: Release BAR0 of an integrated bridge to allow GPU
 BAR resize
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Resizing BAR to a larger size has to release upstream bridge windows in
order make the bridge windows larger as well (and to potential relocate
them into a larger free block within iomem space). Some GPUs have an
integrated PCI switch that has BAR0. The resource allocation assigns
space for that BAR0 as it does for any resource.

An extra resource on a bridge will pin its upstream bridge window in
place which prevents BAR resize for anything beneath that bridge.

Nothing in the pcieport driver provided by PCI core, which typically is
the driver bound to these bridges, requires that BAR0. Because of that,
releasing the extra BAR does not seem to have notable downsides but
comes with a clear upside.

Therefore, release BAR0 of such switches using a quirk and clear its
flags to prevent any new invocation of the resource assignment
algorithm from assigning the resource again.

Due to other siblings within the PCI hierarchy of all the devices
integrated into the GPU, some other devices may still have to be
manually removed before the resize is free of any bridge window pins.
Such siblings can be released through sysfs to unpin windows while
leaving access to GPU's sysfs entries required for initiating the
resize operation, whereas removing the topmost bridge this quirk
targets would result in removing the GPU device as well so no manual
workaround for this problem exists.

Reported-by: Lucas De Marchi <lucas.demar...@intel.com>
Link: https://lore.kernel.org/linux-pci/fl6tx5ztvttg7txmz2ps7oyd745wg3lwcp3h7esmvnyg26n44y@owo2ojiu2mov/
Signed-off-by: Ilpo Järvinen <ilpo.jarvi...@linux.intel.com>
---

This feels quite hacky to me and I'm working towards a better solution
which is to consider Resizable BAR maximum size the resource fitting
algorithm. But then, I don't expect the better solution to be something
we want to push into stable due to extremely invasive dependencies. So
maybe consider this an interim/legacy solution to the resizing problem
and remove it once the algorithmic approach works (or more precisely
retain it only in the old kernel versions).
---
 drivers/pci/quirks.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d97335a40193..98a4f0a1285b 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6338,3 +6338,23 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
 #endif
+
+/*
+ * PCI switches integrated into some GPUs have BAR0 that prevents resizing
+ * the BARs of the GPU device due to that bridge BAR0 pinning the bridge
+ * window it's under in place. Nothing in pcieport requires that BAR0.
+ *
+ * Release and disable BAR0 permanently by clearing its flags to prevent
+ * anything from assigning it again.
+ */
+static void pci_release_bar0(struct pci_dev *pdev)
+{
+	struct resource *res = pci_resource_n(pdev, 0);
+
+	if (!res->parent)
+		return;
+
+	pci_release_resource(pdev, 0);
+	res->flags = 0;
+}
+DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0xe2ff, pci_release_bar0);

base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
-- 
2.39.5

Reply via email to