On Mon, Jan 31, 2022 at 11:17:01PM +1100, Jonathan Gray wrote:
> On Mon, Jan 31, 2022 at 12:54:53AM -0700, Thomas Frohwein wrote:
> > On Sat, 29 Jan 2022 12:15:10 -0300
> > Martin Pieuchot <[email protected]> wrote:
> > 
> > > On 28/01/22(Fri) 23:03, Thomas Frohwein wrote:
> > > > On Sat, 29 Jan 2022 15:19:20 +1100
> > > > Jonathan Gray <[email protected]> wrote:
> > > >   
> > > > > does this diff to revert uvm_fault.c rev 1.124 change anything?  
> > > > 
> > > > Unfortunately no. Same pmap error as in the original bug report occurs
> > > > with a kernel with this diff.  
> > > 
> > > Could you submit a new bug report?  Could you manage to include ps and the
> > > trace of all the CPUs when the pmap corruption occurs?
> > 
> > See below
> > 
> > > 
> > > Do you have some steps to reproduce the corruption?  Which program is
> > > currently running?  Is it multi-threaded?  What is the simplest scenario
> > > to trigger the corruption?
> > 
> > It's during boot of the MP kernel. The only scenario I can provide is
> > booting this machine with an MP kernel from January 18 or newer. If I
> > boot SP kernel, or build an MP kernel with jsg@'s diff that adds
> > `pool_debug = 2`, the panic does _not_ occur.
> > 
> > Here some new (hand-typed from a picture) output when I boot a freshly
> > downloaded snapshot MP kernel from January 30th (note this is an 8 core/16
> > hyperthreads CPU; I have _not_ enabled hyperthreading). I attached dmesg 
> > from
> > booting bsd.sp, too.
> > 
> > ... (boot, see dmesg in original bugs@ submission)
> > wsdisplay0: screen 1-5 added (std, vt100 emulation)
> > iwm0: hw rev 0x200, fw ver 36.ca7b901d.0, address [...]
> > va 7f7fffffb000 ppa ffffffffff000
> > panic: pmap_get_ptp: unmanaged user PTP
> > Stopped at     db_enter+0x10: popq   %rbp
> >     TID     PID     UID     PRFLAGS PFLAGS CPU COMMAND
> > * 28644   1       0           0      0   2K swapper
> > db_enter() at db_enter+0x10
> > panic(ffffffff81f3dd1f) at panic+0xbf
> > pmap_get_ptp(fffffd888e52ee58,7f7fffffb000) at pmap_get_ptp+0x303
> > pmap_enter(fffffd888e52ee58,7f7fffffb000,13d151000,3,22) at pmap_enter+0x188
> > uvm_fault_lower(ffff8000156852a0,ffff8000156852d8,ffff800015685220,0) at 
> > uvm_fault_lower+0x63d
> > uvm_fault(fffffd888e52fdd0,7f7fffffb000,0,2) at uvm_fault+0x1b3
> > kpageflttrap(ffff800015685420,7f7fffffbff5) at kpageflttrap+0x12c
> > kerntrap(ffff800015685420) at kerntrap+0x91
> > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> > copyout() at copyout+0x53
> > end trace frame: 0x0, count: 5
> 
> does this diff to provide stolen memory data help?

I have committed this minus the printf change
would still be interested to hear if it helps

> 
> Index: sys/dev/pci/drm/i915/i915_drv.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_drv.c,v
> retrieving revision 1.135
> diff -u -p -r1.135 i915_drv.c
> --- sys/dev/pci/drm/i915/i915_drv.c   19 Jan 2022 02:20:06 -0000      1.135
> +++ sys/dev/pci/drm/i915/i915_drv.c   31 Jan 2022 11:20:04 -0000
> @@ -2350,6 +2350,7 @@ inteldrm_match(struct device *parent, vo
>  }
>  
>  int drm_gem_init(struct drm_device *);
> +void intel_init_stolen_res(struct inteldrm_softc *);
>  
>  void
>  inteldrm_attach(struct device *parent, struct device *self, void *aux)
> @@ -2469,6 +2470,7 @@ inteldrm_attach(struct device *parent, s
>               return;
>       }
>       dev->pdev->irq = -1;
> +     intel_init_stolen_res(dev_priv);
>  
>       config_mountroot(self, inteldrm_attachhook);
>  }
> Index: sys/dev/pci/drm/i915/intel_stolen.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/i915/intel_stolen.c,v
> retrieving revision 1.2
> diff -u -p -r1.2 intel_stolen.c
> --- sys/dev/pci/drm/i915/intel_stolen.c       14 Jan 2022 06:53:11 -0000      
> 1.2
> +++ sys/dev/pci/drm/i915/intel_stolen.c       31 Jan 2022 11:25:37 -0000
> @@ -163,7 +163,7 @@ intel_init_stolen_res(struct inteldrm_so
>  
>       if (GRAPHICS_VER(dev_priv) >= 3 && GRAPHICS_VER(dev_priv) < 11)
>               stolen_base  = gen3_stolen_base(dev_priv);
> -     else if (GRAPHICS_VER(dev_priv) == 11)
> +     else if (GRAPHICS_VER(dev_priv) == 11 || GRAPHICS_VER(dev_priv) == 12)
>               stolen_base = gen11_stolen_base(dev_priv);
>  
>       if (IS_I830(dev_priv) || IS_I845G(dev_priv))
> @@ -177,7 +177,7 @@ intel_init_stolen_res(struct inteldrm_so
>               stolen_size = gen6_stolen_size(dev_priv);
>       else if (GRAPHICS_VER(dev_priv) == 8)
>               stolen_size = gen8_stolen_size(dev_priv);
> -     else if (GRAPHICS_VER(dev_priv) >= 9 && GRAPHICS_VER(dev_priv) < 12)
> +     else if (GRAPHICS_VER(dev_priv) >= 9 && GRAPHICS_VER(dev_priv) <= 12)
>               stolen_size = gen9_stolen_size(dev_priv);
>  
>       if (stolen_base == 0 || stolen_size == 0)
> Index: sys/dev/pci/drm/i915/gt/intel_ggtt.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/drm/i915/gt/intel_ggtt.c,v
> retrieving revision 1.4
> diff -u -p -r1.4 intel_ggtt.c
> --- sys/dev/pci/drm/i915/gt/intel_ggtt.c      26 Jan 2022 01:46:12 -0000      
> 1.4
> +++ sys/dev/pci/drm/i915/gt/intel_ggtt.c      31 Jan 2022 11:33:05 -0000
> @@ -1320,10 +1320,10 @@ static int ggtt_probe_hw(struct i915_ggt
>       }
>  
>       /* GMADR is the PCI mmio aperture into the global GTT. */
> -     drm_dbg(&i915->drm, "GGTT size = %lluM\n", ggtt->vm.total >> 20);
> -     drm_dbg(&i915->drm, "GMADR size = %lluM\n",
> +     drm_warn(&i915->drm, "GGTT size = %lluM\n", ggtt->vm.total >> 20);
> +     drm_warn(&i915->drm, "GMADR size = %lluM\n",
>               (u64)ggtt->mappable_end >> 20);
> -     drm_dbg(&i915->drm, "DSM size = %lluM\n",
> +     drm_warn(&i915->drm, "DSM size = %lluM\n",
>               (u64)resource_size(&intel_graphics_stolen_res) >> 20);
>  
>       return 0;
> 
> 

Reply via email to