On Thu, Oct 9, 2025 at 1:14 AM Yao, Jia <[email protected]> wrote: > > Good questions. @Pingfan Liu @Ville Syrjälä > > Driver-wise, no other access except for pxp stuff, for that after disabling > PCI_COMMAND_MEMORY, any access will cause MMIO failure, they will not be > able to be hidden. > And the invalid access is not from pxp, otherwise just doing > intel_pxp_fini in i915_driver_shutdown will fix the issue, it might come > from firmware or other that i915 driver can't see. >
Thank you for the clear explanation. > Current solution is defensive, not harmful just like turning on write > protection on a floppy disk when not using it. > I have a couple of questions about this patch: First, at what point (in the shutdown sequence) should the PCI_COMMAND_MEMORY bit be cleared in i915_driver_shutdown()? Does your patch clear it too late? Second, in the second kernel (after kexec), is there a proactive way to force the i915 driver into a clean state so it can recover from invalid memory accesses that occurred in the first kernel? I think this will be a better choice. I realize I may be nitpicking a bit, since these details might not be publicly documented. So I think your patch is reasonable. Thanks, Pingfan > Thanks, > Jia > > -----Original Message----- > From: Ville Syrjälä <[email protected]> > Sent: Wednesday, October 8, 2025 9:16 AM > To: Yao, Jia <[email protected]> > Cc: [email protected]; Zuo, Alex <[email protected]>; Lin, > Shuicheng <[email protected]>; Askar Safin <[email protected]>; > Pingfan Liu <[email protected]>; Chris Wilson <[email protected]> > Subject: Re: [PATCH v2] drm/i915: Setting/clearing the memory access bit when > en/disabling i915 > > On Wed, Oct 08, 2025 at 04:06:39PM +0000, Yao, Jia wrote: > > The actual bug is showing in > > https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14598 > > if CONFIG_INTEL_IOMMU_DEFAULT_ON=y , that IOMMU prevent the invalid > > access, but if CONFIG_INTEL_IOMMU_DEFAULT_ON=n, the invalid access will > > directly cause system crash after kexec reboot. > > I was asking you whether that invalid access was caused by that pxp stuff or > not? > > If yes, then just fix it. > > If not, then I guess someone needs to keep on debugging. > > > > > -----Original Message----- > > From: Ville Syrjälä <[email protected]> > > Sent: Wednesday, October 8, 2025 5:22 AM > > To: Yao, Jia <[email protected]> > > Cc: [email protected]; Zuo, Alex <[email protected]>; > > Lin, Shuicheng <[email protected]>; Askar Safin > > <[email protected]>; Pingfan Liu <[email protected]>; Chris Wilson > > <[email protected]> > > Subject: Re: [PATCH v2] drm/i915: Setting/clearing the memory access > > bit when en/disabling i915 > > > > On Tue, Oct 07, 2025 at 09:40:45PM +0000, Yao, Jia wrote: > > > You mean intel_pxp_fini(i915) ? > > > This is because mei_me_shutdown is called after > > > i915_driver_shutdown in pci_device_shutdown sequence. If we don't > > > close pxp in advance, it will cause > > > > > > [ 295.584775] i915 0000:00:02.0: [drm] *ERROR* gt: MMIO unreliable > > > (forcewake register returns 0xFFFFFFFF)! > > > > So that is the actual bug you're trying to fix? Please just submit the pxp > > fix on its own. > > > > > > > > Since we disabled PCI_COMMAND_MEMORY in i915_driver_shutdown > > > > > > Thanks, > > > Jia > > > > > > -----Original Message----- > > > From: Ville Syrjälä <[email protected]> > > > Sent: Tuesday, October 7, 2025 2:25 PM > > > To: Yao, Jia <[email protected]> > > > Cc: [email protected]; Zuo, Alex <[email protected]>; > > > Lin, Shuicheng <[email protected]>; Askar Safin > > > <[email protected]>; Pingfan Liu <[email protected]>; Chris Wilson > > > <[email protected]> > > > Subject: Re: [PATCH v2] drm/i915: Setting/clearing the memory access > > > bit when en/disabling i915 > > > > > > On Tue, Oct 07, 2025 at 08:25:14PM +0000, Jia Yao wrote: > > > > Make i915's PCI device management more robust by always > > > > setting/clearing the memory access bit when enabling/disabling the > > > > device, and by consolidating this logic into helper functions. > > > > > > > > It fixes kexec reboot issue by disabling memory access before > > > > shutting down the device, which can block unsafe and unwanted access > > > > from DMA. > > > > > > > > v2: > > > > - follow brace style > > > > > > > > Link: > > > > https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14598 > > > > Cc: Alex Zuo <[email protected]> > > > > Cc: Shuicheng Lin <[email protected]> > > > > Cc: Askar Safin <[email protected]> > > > > Cc: Pingfan Liu <[email protected]> > > > > Suggested-by: Chris Wilson <[email protected]> > > > > Signed-off-by: Jia Yao <[email protected]> > > > > --- > > > > drivers/gpu/drm/i915/i915_driver.c | 35 > > > > +++++++++++++++++++++++++++--- > > > > 1 file changed, 32 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_driver.c > > > > b/drivers/gpu/drm/i915/i915_driver.c > > > > index b46cb54ef5dc..766f85726b67 100644 > > > > --- a/drivers/gpu/drm/i915/i915_driver.c > > > > +++ b/drivers/gpu/drm/i915/i915_driver.c > > > > @@ -118,6 +118,33 @@ > > > > > > > > static const struct drm_driver i915_drm_driver; > > > > > > > > +static int i915_enable_device(struct pci_dev *pdev) { > > > > + u32 cmd; > > > > + int ret; > > > > + > > > > + ret = pci_enable_device(pdev); > > > > + if (ret) > > > > + return ret; > > > > + > > > > + pci_read_config_dword(pdev, PCI_COMMAND, &cmd); > > > > + if (!(cmd & PCI_COMMAND_MEMORY)) > > > > + pci_write_config_dword(pdev, PCI_COMMAND, cmd | > > > > +PCI_COMMAND_MEMORY); > > > > + > > > > + return 0; > > > > +} > > > > > > NAK. If the pci code is broken then fix the problem there. > > > Do not add ugly hacks into random drivers. > > > > > > > + > > > > +static void i915_disable_device(struct pci_dev *pdev) { > > > > + u32 cmd; > > > > + > > > > + pci_read_config_dword(pdev, PCI_COMMAND, &cmd); > > > > + if (cmd & PCI_COMMAND_MEMORY) > > > > + pci_write_config_dword(pdev, PCI_COMMAND, cmd & > > > > +~PCI_COMMAND_MEMORY); > > > > + > > > > + pci_disable_device(pdev); > > > > +} > > > > + > > > > static int i915_workqueues_init(struct drm_i915_private *dev_priv) { > > > > /* > > > > @@ -788,7 +815,7 @@ int i915_driver_probe(struct pci_dev *pdev, const > > > > struct pci_device_id *ent) > > > > struct intel_display *display; > > > > int ret; > > > > > > > > - ret = pci_enable_device(pdev); > > > > + ret = i915_enable_device(pdev); > > > > if (ret) { > > > > pr_err("Failed to enable graphics device: %pe\n", > > > > ERR_PTR(ret)); > > > > return ret; > > > > @@ -796,7 +823,7 @@ int i915_driver_probe(struct pci_dev *pdev, > > > > const struct pci_device_id *ent) > > > > > > > > i915 = i915_driver_create(pdev, ent); > > > > if (IS_ERR(i915)) { > > > > - pci_disable_device(pdev); > > > > + i915_disable_device(pdev); > > > > return PTR_ERR(i915); > > > > } > > > > > > > > @@ -885,7 +912,7 @@ int i915_driver_probe(struct pci_dev *pdev, const > > > > struct pci_device_id *ent) > > > > enable_rpm_wakeref_asserts(&i915->runtime_pm); > > > > i915_driver_late_release(i915); > > > > out_pci_disable: > > > > - pci_disable_device(pdev); > > > > + i915_disable_device(pdev); > > > > i915_probe_error(i915, "Device initialization failed (%d)\n", ret); > > > > return ret; > > > > } > > > > @@ -1003,6 +1030,7 @@ void i915_driver_shutdown(struct > > > > drm_i915_private *i915) > > > > > > > > intel_dmc_suspend(display); > > > > > > > > + intel_pxp_fini(i915); > > > > > > What is that doing in this patch? > > > > > > > i915_gem_suspend(i915); > > > > > > > > /* > > > > @@ -1020,6 +1048,7 @@ void i915_driver_shutdown(struct drm_i915_private > > > > *i915) > > > > enable_rpm_wakeref_asserts(&i915->runtime_pm); > > > > > > > > intel_runtime_pm_driver_last_release(&i915->runtime_pm); > > > > + i915_disable_device(to_pci_dev(i915->drm.dev)); > > > > } > > > > > > > > static bool suspend_to_idle(struct drm_i915_private *dev_priv) > > > > -- > > > > 2.34.1 > > > > > > -- > > > Ville Syrjälä > > > Intel > > > > -- > > Ville Syrjälä > > Intel > > -- > Ville Syrjälä > Intel >
