On Thu, Oct 9, 2025 at 1:14 AM Yao, Jia <[email protected]> wrote:
>
> Good questions.   @Pingfan Liu  @Ville Syrjälä
>
> Driver-wise,  no other access except for pxp stuff,  for that after disabling 
> PCI_COMMAND_MEMORY,  any access will cause MMIO failure,  they will not be 
> able to be hidden.
> And the invalid access is not from pxp,  otherwise just doing   
> intel_pxp_fini   in i915_driver_shutdown will fix the issue, it might come 
> from firmware or other that i915 driver can't see.
>

Thank you for the clear explanation.

> Current solution is defensive,  not harmful just like turning on write 
> protection on a floppy disk when not using it.
>

I have a couple of questions about this patch:
First, at what point (in the shutdown sequence) should the
PCI_COMMAND_MEMORY bit be cleared in i915_driver_shutdown()? Does your
patch clear it too late?
Second, in the second kernel (after kexec), is there a proactive way
to force the i915 driver into a clean state so it can recover from
invalid memory accesses that occurred in the first kernel? I think
this will be a better choice.

I realize I may be nitpicking a bit, since these details might not be
publicly documented. So I think your patch is reasonable.


Thanks,

Pingfan


> Thanks,
> Jia
>
> -----Original Message-----
> From: Ville Syrjälä <[email protected]>
> Sent: Wednesday, October 8, 2025 9:16 AM
> To: Yao, Jia <[email protected]>
> Cc: [email protected]; Zuo, Alex <[email protected]>; Lin, 
> Shuicheng <[email protected]>; Askar Safin <[email protected]>; 
> Pingfan Liu <[email protected]>; Chris Wilson <[email protected]>
> Subject: Re: [PATCH v2] drm/i915: Setting/clearing the memory access bit when 
> en/disabling i915
>
> On Wed, Oct 08, 2025 at 04:06:39PM +0000, Yao, Jia wrote:
> > The actual bug is showing in 
> > https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14598
> > if CONFIG_INTEL_IOMMU_DEFAULT_ON=y  ,  that IOMMU prevent the invalid 
> > access,  but if  CONFIG_INTEL_IOMMU_DEFAULT_ON=n,   the invalid access will 
> > directly cause system crash after kexec reboot.
>
> I was asking you whether that invalid access was caused by that pxp stuff or 
> not?
>
> If yes, then just fix it.
>
> If not, then I guess someone needs to keep on debugging.
>
> >
> > -----Original Message-----
> > From: Ville Syrjälä <[email protected]>
> > Sent: Wednesday, October 8, 2025 5:22 AM
> > To: Yao, Jia <[email protected]>
> > Cc: [email protected]; Zuo, Alex <[email protected]>;
> > Lin, Shuicheng <[email protected]>; Askar Safin
> > <[email protected]>; Pingfan Liu <[email protected]>; Chris Wilson
> > <[email protected]>
> > Subject: Re: [PATCH v2] drm/i915: Setting/clearing the memory access
> > bit when en/disabling i915
> >
> > On Tue, Oct 07, 2025 at 09:40:45PM +0000, Yao, Jia wrote:
> > > You mean  intel_pxp_fini(i915)  ?
> > > This is because mei_me_shutdown  is called after
> > > i915_driver_shutdown in pci_device_shutdown sequence.  If we don't
> > > close pxp in advance, it will cause
> > >
> > > [  295.584775] i915 0000:00:02.0: [drm] *ERROR* gt: MMIO unreliable 
> > > (forcewake register returns 0xFFFFFFFF)!
> >
> > So that is the actual bug you're trying to fix? Please just submit the pxp 
> > fix on its own.
> >
> > >
> > > Since we disabled PCI_COMMAND_MEMORY in  i915_driver_shutdown
> > >
> > > Thanks,
> > > Jia
> > >
> > > -----Original Message-----
> > > From: Ville Syrjälä <[email protected]>
> > > Sent: Tuesday, October 7, 2025 2:25 PM
> > > To: Yao, Jia <[email protected]>
> > > Cc: [email protected]; Zuo, Alex <[email protected]>;
> > > Lin, Shuicheng <[email protected]>; Askar Safin
> > > <[email protected]>; Pingfan Liu <[email protected]>; Chris Wilson
> > > <[email protected]>
> > > Subject: Re: [PATCH v2] drm/i915: Setting/clearing the memory access
> > > bit when en/disabling i915
> > >
> > > On Tue, Oct 07, 2025 at 08:25:14PM +0000, Jia Yao wrote:
> > > > Make i915's PCI device management more robust by always
> > > > setting/clearing the memory access bit when enabling/disabling the
> > > > device, and by consolidating this logic into helper functions.
> > > >
> > > > It fixes kexec reboot issue by disabling memory access before
> > > > shutting down the device, which can block unsafe and unwanted access 
> > > > from DMA.
> > > >
> > > > v2:
> > > >   - follow brace style
> > > >
> > > > Link:
> > > > https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14598
> > > > Cc: Alex Zuo <[email protected]>
> > > > Cc: Shuicheng Lin <[email protected]>
> > > > Cc: Askar Safin <[email protected]>
> > > > Cc: Pingfan Liu <[email protected]>
> > > > Suggested-by: Chris Wilson <[email protected]>
> > > > Signed-off-by: Jia Yao <[email protected]>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_driver.c | 35
> > > > +++++++++++++++++++++++++++---
> > > >  1 file changed, 32 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/i915_driver.c
> > > > b/drivers/gpu/drm/i915/i915_driver.c
> > > > index b46cb54ef5dc..766f85726b67 100644
> > > > --- a/drivers/gpu/drm/i915/i915_driver.c
> > > > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > > > @@ -118,6 +118,33 @@
> > > >
> > > >  static const struct drm_driver i915_drm_driver;
> > > >
> > > > +static int i915_enable_device(struct pci_dev *pdev) {
> > > > + u32 cmd;
> > > > + int ret;
> > > > +
> > > > + ret = pci_enable_device(pdev);
> > > > + if (ret)
> > > > +         return ret;
> > > > +
> > > > + pci_read_config_dword(pdev, PCI_COMMAND, &cmd);
> > > > + if (!(cmd & PCI_COMMAND_MEMORY))
> > > > +         pci_write_config_dword(pdev, PCI_COMMAND, cmd |
> > > > +PCI_COMMAND_MEMORY);
> > > > +
> > > > + return 0;
> > > > +}
> > >
> > > NAK. If the pci code is broken then fix the problem there.
> > > Do not add ugly hacks into random drivers.
> > >
> > > > +
> > > > +static void i915_disable_device(struct pci_dev *pdev) {
> > > > + u32 cmd;
> > > > +
> > > > + pci_read_config_dword(pdev, PCI_COMMAND, &cmd);
> > > > + if (cmd & PCI_COMMAND_MEMORY)
> > > > +         pci_write_config_dword(pdev, PCI_COMMAND, cmd &
> > > > +~PCI_COMMAND_MEMORY);
> > > > +
> > > > + pci_disable_device(pdev);
> > > > +}
> > > > +
> > > >  static int i915_workqueues_init(struct drm_i915_private *dev_priv)  {
> > > >   /*
> > > > @@ -788,7 +815,7 @@ int i915_driver_probe(struct pci_dev *pdev, const 
> > > > struct pci_device_id *ent)
> > > >   struct intel_display *display;
> > > >   int ret;
> > > >
> > > > - ret = pci_enable_device(pdev);
> > > > + ret = i915_enable_device(pdev);
> > > >   if (ret) {
> > > >           pr_err("Failed to enable graphics device: %pe\n", 
> > > > ERR_PTR(ret));
> > > >           return ret;
> > > > @@ -796,7 +823,7 @@ int i915_driver_probe(struct pci_dev *pdev,
> > > > const struct pci_device_id *ent)
> > > >
> > > >   i915 = i915_driver_create(pdev, ent);
> > > >   if (IS_ERR(i915)) {
> > > > -         pci_disable_device(pdev);
> > > > +         i915_disable_device(pdev);
> > > >           return PTR_ERR(i915);
> > > >   }
> > > >
> > > > @@ -885,7 +912,7 @@ int i915_driver_probe(struct pci_dev *pdev, const 
> > > > struct pci_device_id *ent)
> > > >   enable_rpm_wakeref_asserts(&i915->runtime_pm);
> > > >   i915_driver_late_release(i915);
> > > >  out_pci_disable:
> > > > - pci_disable_device(pdev);
> > > > + i915_disable_device(pdev);
> > > >   i915_probe_error(i915, "Device initialization failed (%d)\n", ret);
> > > >   return ret;
> > > >  }
> > > > @@ -1003,6 +1030,7 @@ void i915_driver_shutdown(struct
> > > > drm_i915_private *i915)
> > > >
> > > >   intel_dmc_suspend(display);
> > > >
> > > > + intel_pxp_fini(i915);
> > >
> > > What is that doing in this patch?
> > >
> > > >   i915_gem_suspend(i915);
> > > >
> > > >   /*
> > > > @@ -1020,6 +1048,7 @@ void i915_driver_shutdown(struct drm_i915_private 
> > > > *i915)
> > > >   enable_rpm_wakeref_asserts(&i915->runtime_pm);
> > > >
> > > >   intel_runtime_pm_driver_last_release(&i915->runtime_pm);
> > > > + i915_disable_device(to_pci_dev(i915->drm.dev));
> > > >  }
> > > >
> > > >  static bool suspend_to_idle(struct drm_i915_private *dev_priv)
> > > > --
> > > > 2.34.1
> > >
> > > --
> > > Ville Syrjälä
> > > Intel
> >
> > --
> > Ville Syrjälä
> > Intel
>
> --
> Ville Syrjälä
> Intel
>

Reply via email to