Re: [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: stop using swiotlb (rev6)

Robert Beckett Thu, 28 Jul 2022 08:54:34 -0700



On 28/07/2022 15:03, Tvrtko Ursulin wrote:

On 28/07/2022 09:01, Patchwork wrote:

[snip]
        Possible regressions

  * igt@gem_mmap_offset@clear:
      o shard-iclb: PASS
<https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb6/igt@gem_mmap_off...@clear.html>
        -> INCOMPLETE
<https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106589v6/shard-iclb1/igt@gem_mmap_off...@clear.html>
What was supposed to be a simple patch.. a storm of errors like:


yeah, them's the breaks sometimes ....

DMAR: ERROR: DMA PTE for vPFN 0x3d00000 already set (to 2fd7ff003 not2fd7ff003)
  ------------[ cut here ]------------
WARNING: CPU: 6 PID: 1254 at drivers/iommu/intel/iommu.c:2278__domain_mapping.cold.93+0x32/0x39<> Modules linked in: vgem drm_shmem_helper snd_hda_codec_hdmisnd_hda_codec_realtek snd_hda_cod> CPU: 6 PID: 1254 Comm: gem_mmap_offset Not tainted5.19.0-rc8-Patchwork_106589v6-g0e9c43d76a14+ #> Hardware name: Intel Corporation Ice Lake Client Platform/IceLake UDDR4 SODIMM PD RVP TLC, BIOS >
  RIP: 0010:__domain_mapping.cold.93+0x32/0x39
Code: fe 48 c7 c7 28 32 37 82 4c 89 5c 24 08 e8 e4 61 fd ff 8b 05 bf8e c9 00 4c 8b 5c 24 08 85 c>
  RSP: 0000:ffffc9000037f9c0 EFLAGS: 00010202
  RAX: 0000000000000004 RBX: ffff8881117b4000 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: ffffffff82320b25 RDI: 00000000ffffffff
  RBP: 0000000000000001 R08: 0000000000000000 R09: c0000000ffff7fff
  R10: 0000000000000001 R11: 00000000002fd7ff R12: 00000002fd7ff003
  R13: 0000000000076c01 R14: ffff8881039ee800 R15: 0000000003d00000
FS: 00007f2863c1d700(0000) GS:ffff88849fd00000(0000)knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f2692c53000 CR3: 000000011c440006 CR4: 0000000000770ee0
  PKRU: 55555554
  Call Trace:
   &lt;TASK&gt;
   intel_iommu_map_pages+0xb7/0xe0
   __iommu_map+0xe0/0x310
   __iommu_map_sg+0xa2/0x140
   iommu_dma_map_sg+0x2ef/0x4e0
   __dma_map_sg_attrs+0x64/0x70
   dma_map_sg_attrs+0x5/0x20
   i915_gem_gtt_prepare_pages+0x56/0x70 [i915]
   shmem_get_pages+0xe3/0x360 [i915]
   ____i915_gem_object_get_pages+0x32/0x100 [i915]
   __i915_gem_object_get_pages+0x8d/0xa0 [i915]
   vm_fault_gtt+0x3d0/0x940 [i915]
   ? ptlock_alloc+0x15/0x40
   ? rt_mutex_debug_task_free+0x91/0xa0
   __do_fault+0x30/0x180
   do_fault+0x1c4/0x4c0
   __handle_mm_fault+0x615/0xbe0
   handle_mm_fault+0x75/0x1c0
   do_user_addr_fault+0x1e7/0x670
   exc_page_fault+0x62/0x230
   asm_exc_page_fault+0x22/0x30

No idea. Maybe try CI kernel config on your Tigerlake?


I have an idea of what could be happening:

The warning is due to a pte already existing. We can see from thewarning that it is the same value, which indicates that the same pagehas been mapped to the same iova before.

This map shrink loop will keep mapping the same sg, shrinking if itfails to hopefully free up iova space.


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/i915/i915_gem_gtt.c?h=v5.19-rc8#n32

If we now look at the intel iommu driver's mapping function:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/intel/iommu.c?h=v5.19-rc8#n2248

If that -ENOMEM loop breaking return is hit (presumably running out ofpte space, though I have not delved deeper), then it will return back upthe stack, eventually returning 0 from dma_map_sg_attrs() indicating theerror. This will cause a shrink and retry.

The problem is that the iommu does not undo it's partial mapping onerror. So the next time round, it will map the same page to the sameaddress giving the same pte encoding, which would give the warning observed.

I would need to get some time to try to repro and debug to confirm, butthis looks like it might be exposing an iommu driver issue due to uschanging our mapping patterns because the segment sizes are now different.

I'll see if I can get some time allotted to debug it further, but fornow, I don't have the bandwidth, so this may need to go on hold until Ior someone else can get time to look in to it.


Regards,

Tvrtko

Re: [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: stop using swiotlb (rev6)

Reply via email to