Dear John: Thank you for your reply. The case you mentioned is a typical performance regression issue, there's no need for the kernel to oom kill any random process even in the worst case. But in my observations, the iommu_iova slab could consume up to 40G memory, and the kernel have to kill my vm process to free memory (64G memory installed). So I don't think it's relevent.
John Garry <john.ga...@huawei.com> 于2020年4月25日周六 上午1:50写道: > On 24/04/2020 17:30, Robin Murphy wrote: > > On 2020-04-24 2:20 pm, Bin wrote: > >> Dear Robin: > >> Thank you for your explanation. Now, I understand that this could > be > >> NIC driver's fault, but how could I confirm it? Do I have to debug the > >> driver myself? > > > > I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through > > memory about an order of magnitude faster than the IOVAs alone, but it > > should shed some light on whether DMA API usage looks suspicious, and > > dumping the mappings should help track down the responsible driver(s). > > Although the debugfs code doesn't show the stacktrace of where each > > mapping was made, I guess it would be fairly simple to tweak that for a > > quick way to narrow down where to start looking in an offending driver. > > > > Robin. > > Just mentioning this in case it's relevant - we found long term aging > throughput test causes RB tree to grow very large (and would I assume > eat lots of memory): > > > https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/ > > John > > > > >> Robin Murphy <robin.mur...@arm.com> 于2020年4月24日周五 下午8:15写道: > >> > >>> On 2020-04-24 1:06 pm, Bin wrote: > >>>> I'm not familiar with the mmu stuff, so what you mean by "some driver > >>>> leaking DMA mappings", is it possible that some other kernel module > like > >>>> KVM or NIC driver leads to the leaking problem instead of the iommu > >>> module > >>>> itself? > >>> > >>> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it > >>> should, since I'd expect a lot of people to have noticed that. It's far > >>> more likely that some driver is failing to call dma_unmap_* when it's > >>> finished with a buffer - with the IOMMU disabled that would be a no-op > >>> on x86 with a modern 64-bit-capable device, so such a latent bug could > >>> have been easily overlooked. > >>> > >>> Robin. > >>> > >>>> Bin <anole1...@gmail.com> 于 2020年4月24日周五 20:00写道: > >>>> > >>>>> Well, that's the problem! I'm assuming the iommu kernel module is > >>> leaking > >>>>> memory. But I don't know why and how. > >>>>> > >>>>> Do you have any idea about it? Or any further information is needed? > >>>>> > >>>>> Robin Murphy <robin.mur...@arm.com> 于 2020年4月24日周五 19:20写道: > >>>>> > >>>>>> On 2020-04-24 1:40 am, Bin wrote: > >>>>>>> Hello? anyone there? > >>>>>>> > >>>>>>> Bin <anole1...@gmail.com> 于2020年4月23日周四 下午5:14写道: > >>>>>>> > >>>>>>>> Forget to mention, I've already disabled the slab merge, so this > is > >>>>>> what > >>>>>>>> it is. > >>>>>>>> > >>>>>>>> Bin <anole1...@gmail.com> 于2020年4月23日周四 下午5:11写道: > >>>>>>>> > >>>>>>>>> Hey, guys: > >>>>>>>>> > >>>>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is: > >>>>>>>>> > >>>>>>>>> ``` > >>>>>>>>> # cat /etc/lsb-release > >>>>>>>>> DISTRIB_ID="Container Linux by CoreOS" > >>>>>>>>> DISTRIB_RELEASE=2303.3.0 > >>>>>>>>> DISTRIB_CODENAME="Rhyolite" > >>>>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0 > (Rhyolite)" > >>>>>>>>> ``` > >>>>>>>>> > >>>>>>>>> ``` > >>>>>>>>> # uname -a > >>>>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38 > -00 > >>>>>> 2019 > >>>>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel > >>>>>> GNU/Linux > >>>>>>>>> ``` > >>>>>>>>> Recently, I found my vms constently being killed due to OOM, and > >>> after > >>>>>>>>> digging into the problem, I finally realized that the kernel is > >>>>>> leaking > >>>>>>>>> memory. > >>>>>>>>> > >>>>>>>>> Here's my slabinfo: > >>>>>>>>> > >>>>>>>>> Active / Total Objects (% used) : 83818306 / 84191607 > (99.6%) > >>>>>>>>> Active / Total Slabs (% used) : 1336293 / 1336293 > (100.0%) > >>>>>>>>> Active / Total Caches (% used) : 152 / 217 (70.0%) > >>>>>>>>> Active / Total Size (% used) : 5828768.08K / > 5996848.72K > >>>>>> (97.2%) > >>>>>>>>> Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K > >>>>>>>>> > >>>>>>>>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME > >>>>>>>>> > >>>>>>>>> 80253888 80253888 100% 0.06K 1253967 64 5015868K > >>> iommu_iova > >>>>>> > >>>>>> Do you really have a peak demand of ~80 million simultaneous DMA > >>>>>> buffers, or is some driver leaking DMA mappings? > >>>>>> > >>>>>> Robin. > >>>>>> > >>>>>>>>> 489472 489123 99% 0.03K 3824 128 15296K kmalloc-32 > >>>>>>>>> > >>>>>>>>> 297444 271112 91% 0.19K 7082 42 56656K dentry > >>>>>>>>> > >>>>>>>>> 254400 252784 99% 0.06K 3975 64 15900K > >>> anon_vma_chain > >>>>>>>>> > >>>>>>>>> 222528 39255 17% 0.50K 6954 32 111264K > kmalloc-512 > >>>>>>>>> > >>>>>>>>> 202482 201814 99% 0.19K 4821 42 38568K > >>> vm_area_struct > >>>>>>>>> > >>>>>>>>> 200192 200192 100% 0.01K 391 512 1564K kmalloc-8 > >>>>>>>>> > >>>>>>>>> 170528 169359 99% 0.25K 5329 32 42632K filp > >>>>>>>>> > >>>>>>>>> 158144 153508 97% 0.06K 2471 64 9884K kmalloc-64 > >>>>>>>>> > >>>>>>>>> 149914 149365 99% 0.09K 3259 46 13036K anon_vma > >>>>>>>>> > >>>>>>>>> 146640 143123 97% 0.10K 3760 39 15040K > buffer_head > >>>>>>>>> > >>>>>>>>> 130368 32791 25% 0.09K 3104 42 12416K kmalloc-96 > >>>>>>>>> > >>>>>>>>> 129752 129752 100% 0.07K 2317 56 9268K > Acpi-Operand > >>>>>>>>> > >>>>>>>>> 105468 105106 99% 0.04K 1034 102 4136K > >>>>>>>>> selinux_inode_security > >>>>>>>>> 73080 73080 100% 0.13K 2436 30 9744K > >>>>>> kernfs_node_cache > >>>>>>>>> > >>>>>>>>> 72360 70261 97% 0.59K 1340 54 42880K > inode_cache > >>>>>>>>> > >>>>>>>>> 71040 71040 100% 0.12K 2220 32 8880K > >>> eventpoll_epi > >>>>>>>>> > >>>>>>>>> 68096 59262 87% 0.02K 266 256 1064K > kmalloc-16 > >>>>>>>>> > >>>>>>>>> 53652 53652 100% 0.04K 526 102 2104K > pde_opener > >>>>>>>>> > >>>>>>>>> 50496 31654 62% 2.00K 3156 16 100992K > >>> kmalloc-2048 > >>>>>>>>> > >>>>>>>>> 46242 46242 100% 0.19K 1101 42 8808K > cred_jar > >>>>>>>>> > >>>>>>>>> 44496 43013 96% 0.66K 927 48 29664K > >>>>>> proc_inode_cache > >>>>>>>>> > >>>>>>>>> 44352 44352 100% 0.06K 693 64 2772K > >>>>>> task_delay_info > >>>>>>>>> > >>>>>>>>> 43516 43471 99% 0.69K 946 46 30272K > >>>>>> sock_inode_cache > >>>>>>>>> > >>>>>>>>> 37856 27626 72% 1.00K 1183 32 37856K > >>> kmalloc-1024 > >>>>>>>>> > >>>>>>>>> 36736 36736 100% 0.07K 656 56 2624K > >>> eventpoll_pwq > >>>>>>>>> > >>>>>>>>> 34076 31282 91% 0.57K 1217 28 19472K > >>>>>> radix_tree_node > >>>>>>>>> > >>>>>>>>> 33660 30528 90% 1.05K 1122 30 35904K > >>>>>> ext4_inode_cache > >>>>>>>>> > >>>>>>>>> 32760 30959 94% 0.19K 780 42 6240K > kmalloc-192 > >>>>>>>>> > >>>>>>>>> 32028 32028 100% 0.04K 314 102 1256K > >>>>>> ext4_extent_status > >>>>>>>>> > >>>>>>>>> 30048 30048 100% 0.25K 939 32 7512K > >>>>>> skbuff_head_cache > >>>>>>>>> > >>>>>>>>> 28736 28736 100% 0.06K 449 64 1796K > fs_cache > >>>>>>>>> > >>>>>>>>> 24702 24702 100% 0.69K 537 46 17184K > files_cache > >>>>>>>>> > >>>>>>>>> 23808 23808 100% 0.66K 496 48 15872K > ovl_inode > >>>>>>>>> > >>>>>>>>> 23104 22945 99% 0.12K 722 32 2888K > kmalloc-128 > >>>>>>>>> > >>>>>>>>> 22724 21307 93% 0.69K 494 46 15808K > >>>>>> shmem_inode_cache > >>>>>>>>> > >>>>>>>>> 21472 21472 100% 0.12K 671 32 2684K > seq_file > >>>>>>>>> > >>>>>>>>> 19904 19904 100% 1.00K 622 32 19904K UNIX > >>>>>>>>> > >>>>>>>>> 17340 17340 100% 1.06K 578 30 18496K > mm_struct > >>>>>>>>> > >>>>>>>>> 15980 15980 100% 0.02K 94 170 376K > avtab_node > >>>>>>>>> > >>>>>>>>> 14070 14070 100% 1.06K 469 30 15008K > >>> signal_cache > >>>>>>>>> > >>>>>>>>> 13248 13248 100% 0.12K 414 32 1656K pid > >>>>>>>>> > >>>>>>>>> 12128 11777 97% 0.25K 379 32 3032K > kmalloc-256 > >>>>>>>>> > >>>>>>>>> 11008 11008 100% 0.02K 43 256 172K > >>>>>>>>> selinux_file_security > >>>>>>>>> 10812 10812 100% 0.04K 106 102 424K > >>> Acpi-Namespace > >>>>>>>>> > >>>>>>>>> These information shows that the 'iommu_iova' is the top memory > >>>>>> consumer. > >>>>>>>>> In order to optimize the network performence of Openstack virtual > >>>>>> machines, > >>>>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel > 82599 > >>>>>> 10G > >>>>>>>>> NIC. I'm assuming this is the root cause of this issue. > >>>>>>>>> > >>>>>>>>> Is there anything I can do to fix it? > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> iommu mailing list > >>>>>>> iommu@lists.linux-foundation.org > >>>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > _______________________________________________ > > iommu mailing list > > iommu@lists.linux-foundation.org > > https://lists.linuxfoundation.org/mailman/listinfo/iommu > > > >
_______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu