On 28/07/2022 15:03, Tvrtko Ursulin wrote:

On 28/07/2022 09:01, Patchwork wrote:

[snip]

        Possible regressions

  * igt@gem_mmap_offset@clear:
      o shard-iclb: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb6/igt@gem_mmap_off...@clear.html>
        -> INCOMPLETE
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106589v6/shard-iclb1/igt@gem_mmap_off...@clear.html>

What was supposed to be a simple patch.. a storm of errors like:

yeah, them's the breaks sometimes ....


 DMAR: ERROR: DMA PTE for vPFN 0x3d00000 already set (to 2fd7ff003 not 2fd7ff003)
  ------------[ cut here ]------------
 WARNING: CPU: 6 PID: 1254 at drivers/iommu/intel/iommu.c:2278 __domain_mapping.cold.93+0x32/0x39<>  Modules linked in: vgem drm_shmem_helper snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_cod>  CPU: 6 PID: 1254 Comm: gem_mmap_offset Not tainted 5.19.0-rc8-Patchwork_106589v6-g0e9c43d76a14+ #>  Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS >
  RIP: 0010:__domain_mapping.cold.93+0x32/0x39
 Code: fe 48 c7 c7 28 32 37 82 4c 89 5c 24 08 e8 e4 61 fd ff 8b 05 bf 8e c9 00 4c 8b 5c 24 08 85 c>
  RSP: 0000:ffffc9000037f9c0 EFLAGS: 00010202
  RAX: 0000000000000004 RBX: ffff8881117b4000 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: ffffffff82320b25 RDI: 00000000ffffffff
  RBP: 0000000000000001 R08: 0000000000000000 R09: c0000000ffff7fff
  R10: 0000000000000001 R11: 00000000002fd7ff R12: 00000002fd7ff003
  R13: 0000000000076c01 R14: ffff8881039ee800 R15: 0000000003d00000
 FS:  00007f2863c1d700(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f2692c53000 CR3: 000000011c440006 CR4: 0000000000770ee0
  PKRU: 55555554
  Call Trace:
   &lt;TASK&gt;
   intel_iommu_map_pages+0xb7/0xe0
   __iommu_map+0xe0/0x310
   __iommu_map_sg+0xa2/0x140
   iommu_dma_map_sg+0x2ef/0x4e0
   __dma_map_sg_attrs+0x64/0x70
   dma_map_sg_attrs+0x5/0x20
   i915_gem_gtt_prepare_pages+0x56/0x70 [i915]
   shmem_get_pages+0xe3/0x360 [i915]
   ____i915_gem_object_get_pages+0x32/0x100 [i915]
   __i915_gem_object_get_pages+0x8d/0xa0 [i915]
   vm_fault_gtt+0x3d0/0x940 [i915]
   ? ptlock_alloc+0x15/0x40
   ? rt_mutex_debug_task_free+0x91/0xa0
   __do_fault+0x30/0x180
   do_fault+0x1c4/0x4c0
   __handle_mm_fault+0x615/0xbe0
   handle_mm_fault+0x75/0x1c0
   do_user_addr_fault+0x1e7/0x670
   exc_page_fault+0x62/0x230
   asm_exc_page_fault+0x22/0x30

No idea. Maybe try CI kernel config on your Tigerlake?

I have an idea of what could be happening:

The warning is due to a pte already existing. We can see from the warning that it is the same value, which indicates that the same page has been mapped to the same iova before.

This map shrink loop will keep mapping the same sg, shrinking if it fails to hopefully free up iova space.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/i915/i915_gem_gtt.c?h=v5.19-rc8#n32

If we now look at the intel iommu driver's mapping function:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/intel/iommu.c?h=v5.19-rc8#n2248

If that -ENOMEM loop breaking return is hit (presumably running out of pte space, though I have not delved deeper), then it will return back up the stack, eventually returning 0 from dma_map_sg_attrs() indicating the error. This will cause a shrink and retry.

The problem is that the iommu does not undo it's partial mapping on error. So the next time round, it will map the same page to the same address giving the same pte encoding, which would give the warning observed.

I would need to get some time to try to repro and debug to confirm, but this looks like it might be exposing an iommu driver issue due to us changing our mapping patterns because the segment sizes are now different.

I'll see if I can get some time allotted to debug it further, but for now, I don't have the bandwidth, so this may need to go on hold until I or someone else can get time to look in to it.


Regards,

Tvrtko

Reply via email to