Dear Robin:
Thank you for your explanation. Now, I understand that this could be
NIC driver's fault, but how could I confirm it? Do I have to debug the
driver myself?
Robin Murphy <[email protected]> 于2020年4月24日周五 下午8:15写道:
> On 2020-04-24 1:06 pm, Bin wrote:
> > I'm not familiar with the mmu stuff, so what you mean by "some driver
> > leaking DMA mappings", is it possible that some other kernel module like
> > KVM or NIC driver leads to the leaking problem instead of the iommu
> module
> > itself?
>
> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> should, since I'd expect a lot of people to have noticed that. It's far
> more likely that some driver is failing to call dma_unmap_* when it's
> finished with a buffer - with the IOMMU disabled that would be a no-op
> on x86 with a modern 64-bit-capable device, so such a latent bug could
> have been easily overlooked.
>
> Robin.
>
> > Bin <[email protected]> 于 2020年4月24日周五 20:00写道:
> >
> >> Well, that's the problem! I'm assuming the iommu kernel module is
> leaking
> >> memory. But I don't know why and how.
> >>
> >> Do you have any idea about it? Or any further information is needed?
> >>
> >> Robin Murphy <[email protected]> 于 2020年4月24日周五 19:20写道:
> >>
> >>> On 2020-04-24 1:40 am, Bin wrote:
> >>>> Hello? anyone there?
> >>>>
> >>>> Bin <[email protected]> 于2020年4月23日周四 下午5:14写道:
> >>>>
> >>>>> Forget to mention, I've already disabled the slab merge, so this is
> >>> what
> >>>>> it is.
> >>>>>
> >>>>> Bin <[email protected]> 于2020年4月23日周四 下午5:11写道:
> >>>>>
> >>>>>> Hey, guys:
> >>>>>>
> >>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
> >>>>>>
> >>>>>> ```
> >>>>>> # cat /etc/lsb-release
> >>>>>> DISTRIB_ID="Container Linux by CoreOS"
> >>>>>> DISTRIB_RELEASE=2303.3.0
> >>>>>> DISTRIB_CODENAME="Rhyolite"
> >>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 (Rhyolite)"
> >>>>>> ```
> >>>>>>
> >>>>>> ```
> >>>>>> # uname -a
> >>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 -00
> >>> 2019
> >>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> >>> GNU/Linux
> >>>>>> ```
> >>>>>> Recently, I found my vms constently being killed due to OOM, and
> after
> >>>>>> digging into the problem, I finally realized that the kernel is
> >>> leaking
> >>>>>> memory.
> >>>>>>
> >>>>>> Here's my slabinfo:
> >>>>>>
> >>>>>> Active / Total Objects (% used) : 83818306 / 84191607 (99.6%)
> >>>>>> Active / Total Slabs (% used) : 1336293 / 1336293 (100.0%)
> >>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%)
> >>>>>> Active / Total Size (% used) : 5828768.08K / 5996848.72K
> >>> (97.2%)
> >>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> >>>>>>
> >>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> >>>>>>
> >>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K
> iommu_iova
> >>>
> >>> Do you really have a peak demand of ~80 million simultaneous DMA
> >>> buffers, or is some driver leaking DMA mappings?
> >>>
> >>> Robin.
> >>>
> >>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32
> >>>>>>
> >>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry
> >>>>>>
> >>>>>> 254400 252784 99% 0.06K 3975 64 15900K
> anon_vma_chain
> >>>>>>
> >>>>>> 222528 39255 17% 0.50K 6954 32 111264K kmalloc-512
> >>>>>>
> >>>>>> 202482 201814 99% 0.19K 4821 42 38568K
> vm_area_struct
> >>>>>>
> >>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8
> >>>>>>
> >>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp
> >>>>>>
> >>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64
> >>>>>>
> >>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma
> >>>>>>
> >>>>>> 146640 143123 97% 0.10K 3760 39 15040K buffer_head
> >>>>>>
> >>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96
> >>>>>>
> >>>>>> 129752 129752 100% 0.07K 2317 56 9268K Acpi-Operand
> >>>>>>
> >>>>>> 105468 105106 99% 0.04K 1034 102 4136K
> >>>>>> selinux_inode_security
> >>>>>> 73080 73080 100% 0.13K 2436 30 9744K
> >>> kernfs_node_cache
> >>>>>>
> >>>>>> 72360 70261 97% 0.59K 1340 54 42880K inode_cache
> >>>>>>
> >>>>>> 71040 71040 100% 0.12K 2220 32 8880K
> eventpoll_epi
> >>>>>>
> >>>>>> 68096 59262 87% 0.02K 266 256 1064K kmalloc-16
> >>>>>>
> >>>>>> 53652 53652 100% 0.04K 526 102 2104K pde_opener
> >>>>>>
> >>>>>> 50496 31654 62% 2.00K 3156 16 100992K
> kmalloc-2048
> >>>>>>
> >>>>>> 46242 46242 100% 0.19K 1101 42 8808K cred_jar
> >>>>>>
> >>>>>> 44496 43013 96% 0.66K 927 48 29664K
> >>> proc_inode_cache
> >>>>>>
> >>>>>> 44352 44352 100% 0.06K 693 64 2772K
> >>> task_delay_info
> >>>>>>
> >>>>>> 43516 43471 99% 0.69K 946 46 30272K
> >>> sock_inode_cache
> >>>>>>
> >>>>>> 37856 27626 72% 1.00K 1183 32 37856K
> kmalloc-1024
> >>>>>>
> >>>>>> 36736 36736 100% 0.07K 656 56 2624K
> eventpoll_pwq
> >>>>>>
> >>>>>> 34076 31282 91% 0.57K 1217 28 19472K
> >>> radix_tree_node
> >>>>>>
> >>>>>> 33660 30528 90% 1.05K 1122 30 35904K
> >>> ext4_inode_cache
> >>>>>>
> >>>>>> 32760 30959 94% 0.19K 780 42 6240K kmalloc-192
> >>>>>>
> >>>>>> 32028 32028 100% 0.04K 314 102 1256K
> >>> ext4_extent_status
> >>>>>>
> >>>>>> 30048 30048 100% 0.25K 939 32 7512K
> >>> skbuff_head_cache
> >>>>>>
> >>>>>> 28736 28736 100% 0.06K 449 64 1796K fs_cache
> >>>>>>
> >>>>>> 24702 24702 100% 0.69K 537 46 17184K files_cache
> >>>>>>
> >>>>>> 23808 23808 100% 0.66K 496 48 15872K ovl_inode
> >>>>>>
> >>>>>> 23104 22945 99% 0.12K 722 32 2888K kmalloc-128
> >>>>>>
> >>>>>> 22724 21307 93% 0.69K 494 46 15808K
> >>> shmem_inode_cache
> >>>>>>
> >>>>>> 21472 21472 100% 0.12K 671 32 2684K seq_file
> >>>>>>
> >>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX
> >>>>>>
> >>>>>> 17340 17340 100% 1.06K 578 30 18496K mm_struct
> >>>>>>
> >>>>>> 15980 15980 100% 0.02K 94 170 376K avtab_node
> >>>>>>
> >>>>>> 14070 14070 100% 1.06K 469 30 15008K
> signal_cache
> >>>>>>
> >>>>>> 13248 13248 100% 0.12K 414 32 1656K pid
> >>>>>>
> >>>>>> 12128 11777 97% 0.25K 379 32 3032K kmalloc-256
> >>>>>>
> >>>>>> 11008 11008 100% 0.02K 43 256 172K
> >>>>>> selinux_file_security
> >>>>>> 10812 10812 100% 0.04K 106 102 424K
> Acpi-Namespace
> >>>>>>
> >>>>>> These information shows that the 'iommu_iova' is the top memory
> >>> consumer.
> >>>>>> In order to optimize the network performence of Openstack virtual
> >>> machines,
> >>>>>> I enabled the vt-d feature in bios and sriov feature of Intel 82599
> >>> 10G
> >>>>>> NIC. I'm assuming this is the root cause of this issue.
> >>>>>>
> >>>>>> Is there anything I can do to fix it?
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> iommu mailing list
> >>>> [email protected]
> >>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >>>>
> >>>
> >>
> >
>
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu