Hi Shlil:

Thank you for your attention, and these are my answers:

1. I don't really understand what you're saying. What's the difference
between DMA buffer and DMA mapping?
It's like a memory block pool and a memory block or something like that?
2. Yes, the TSO is enabled all the time, but it seems not helping.
3. The CPU usage is pretty normal, and what's the point of this question?
Is it relevant to the leaking problem?

FYI:
I found an interesting phenomenon that it's just a small part of the
running hosts has this issue, even though they all
have the same kernel, configuration and hardwares, I don't know if this
really mean something.


Salil Mehta <salil.me...@huawei.com> 于2020年4月28日周二 下午5:17写道:

> Hi Bin,
>
> Few questions:
>
> 1. If there is a leak of IOVA due to dma_unmap_* not being called
> somewhere then
> at certain point the throughput will drastically fall and will almost
> become equal
> to zero. This should be due to unavailability of the mapping anymore. But
> in your
> case VM is getting killed so this could be actual DMA buffer leak not DMA
> mapping
> leak. I doubt VM will get killed due to exhaustion of the DMA mappings in
> the IOMMU
> Layer for a transient reason or even due to mapping/unmapping leak.
>
> 2. Could you check if you have TSO offload enabled on Intel 82599? It will
> help
> in reducing the number of mappings and will take off IOVA mapping pressure
> from
> the IOMMU/VT-d? Though I am not sure it will help in reducing the amount
> of memory
> required for the buffers.
>
> 3. Also, have you checked the cpu-usage while your experiment is going on?
>
> Thanks
> Salil.
>
> > -----Original Message-----
> > From: iommu [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf
> Of
> > Robin Murphy
> > Sent: Friday, April 24, 2020 5:31 PM
> > To: Bin <anole1...@gmail.com>
> > Cc: iommu@lists.linux-foundation.org
> > Subject: Re: iommu_iova slab eats too much memory
> >
> > On 2020-04-24 2:20 pm, Bin wrote:
> > > Dear Robin:
> > >      Thank you for your explanation. Now, I understand that this could
> be
> > > NIC driver's fault, but how could I confirm it? Do I have to debug the
> > > driver myself?
> >
> > I'd start with CONFIG_DMA_API_DEBUG - of course it will chew through
> > memory about an order of magnitude faster than the IOVAs alone, but it
> > should shed some light on whether DMA API usage looks suspicious, and
> > dumping the mappings should help track down the responsible driver(s).
> > Although the debugfs code doesn't show the stacktrace of where each
> > mapping was made, I guess it would be fairly simple to tweak that for a
> > quick way to narrow down where to start looking in an offending driver.
> >
> > Robin.
> >
> > > Robin Murphy <robin.mur...@arm.com> 于2020年4月24日周五 下午8:15写道:
> > >
> > >> On 2020-04-24 1:06 pm, Bin wrote:
> > >>> I'm not familiar with the mmu stuff, so what you mean by "some driver
> > >>> leaking DMA mappings", is it possible that some other kernel module
> like
> > >>> KVM or NIC driver leads to the leaking problem instead of the iommu
> > >> module
> > >>> itself?
> > >>
> > >> Yes - I doubt that intel-iommu itself is failing to free IOVAs when it
> > >> should, since I'd expect a lot of people to have noticed that. It's
> far
> > >> more likely that some driver is failing to call dma_unmap_* when it's
> > >> finished with a buffer - with the IOMMU disabled that would be a no-op
> > >> on x86 with a modern 64-bit-capable device, so such a latent bug could
> > >> have been easily overlooked.
> > >>
> > >> Robin.
> > >>
> > >>> Bin <anole1...@gmail.com> 于 2020年4月24日周五 20:00写道:
> > >>>
> > >>>> Well, that's the problem! I'm assuming the iommu kernel module is
> > >> leaking
> > >>>> memory. But I don't know why and how.
> > >>>>
> > >>>> Do you have any idea about it? Or any further information is needed?
> > >>>>
> > >>>> Robin Murphy <robin.mur...@arm.com> 于 2020年4月24日周五 19:20写道:
> > >>>>
> > >>>>> On 2020-04-24 1:40 am, Bin wrote:
> > >>>>>> Hello? anyone there?
> > >>>>>>
> > >>>>>> Bin <anole1...@gmail.com> 于2020年4月23日周四 下午5:14写道:
> > >>>>>>
> > >>>>>>> Forget to mention, I've already disabled the slab merge, so this
> is
> > >>>>> what
> > >>>>>>> it is.
> > >>>>>>>
> > >>>>>>> Bin <anole1...@gmail.com> 于2020年4月23日周四 下午5:11写道:
> > >>>>>>>
> > >>>>>>>> Hey, guys:
> > >>>>>>>>
> > >>>>>>>> I'm running a batch of CoreOS boxes, the lsb_release is:
> > >>>>>>>>
> > >>>>>>>> ```
> > >>>>>>>> # cat /etc/lsb-release
> > >>>>>>>> DISTRIB_ID="Container Linux by CoreOS"
> > >>>>>>>> DISTRIB_RELEASE=2303.3.0
> > >>>>>>>> DISTRIB_CODENAME="Rhyolite"
> > >>>>>>>> DISTRIB_DESCRIPTION="Container Linux by CoreOS 2303.3.0
> (Rhyolite)"
> > >>>>>>>> ```
> > >>>>>>>>
> > >>>>>>>> ```
> > >>>>>>>> # uname -a
> > >>>>>>>> Linux cloud-worker-25 4.19.86-coreos #1 SMP Mon Dec 2 20:13:38
> -00
> > >>>>> 2019
> > >>>>>>>> x86_64 Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz GenuineIntel
> > >>>>> GNU/Linux
> > >>>>>>>> ```
> > >>>>>>>> Recently, I found my vms constently being killed due to OOM, and
> > >> after
> > >>>>>>>> digging into the problem, I finally realized that the kernel is
> > >>>>> leaking
> > >>>>>>>> memory.
> > >>>>>>>>
> > >>>>>>>> Here's my slabinfo:
> > >>>>>>>>
> > >>>>>>>>     Active / Total Objects (% used)    : 83818306 / 84191607
> (99.6%)
> > >>>>>>>>     Active / Total Slabs (% used)      : 1336293 / 1336293
> (100.0%)
> > >>>>>>>>     Active / Total Caches (% used)     : 152 / 217 (70.0%)
> > >>>>>>>>     Active / Total Size (% used)       : 5828768.08K /
> 5996848.72K
> > >>>>> (97.2%)
> > >>>>>>>>     Minimum / Average / Maximum Object : 0.01K / 0.07K / 23.25K
> > >>>>>>>>
> > >>>>>>>>      OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> > >>>>>>>>
> > >>>>>>>> 80253888 80253888 100%    0.06K 1253967       64   5015868K
> > >> iommu_iova
> > >>>>>
> > >>>>> Do you really have a peak demand of ~80 million simultaneous DMA
> > >>>>> buffers, or is some driver leaking DMA mappings?
> > >>>>>
> > >>>>> Robin.
> > >>>>>
> > >>>>>>>> 489472 489123  99%    0.03K   3824      128     15296K
> kmalloc-32
> > >>>>>>>>
> > >>>>>>>> 297444 271112  91%    0.19K   7082       42     56656K dentry
> > >>>>>>>>
> > >>>>>>>> 254400 252784  99%    0.06K   3975       64     15900K
> > >> anon_vma_chain
> > >>>>>>>>
> > >>>>>>>> 222528  39255  17%    0.50K   6954       32    111264K
> kmalloc-512
> > >>>>>>>>
> > >>>>>>>> 202482 201814  99%    0.19K   4821       42     38568K
> > >> vm_area_struct
> > >>>>>>>>
> > >>>>>>>> 200192 200192 100%    0.01K    391      512      1564K kmalloc-8
> > >>>>>>>>
> > >>>>>>>> 170528 169359  99%    0.25K   5329       32     42632K filp
> > >>>>>>>>
> > >>>>>>>> 158144 153508  97%    0.06K   2471       64      9884K
> kmalloc-64
> > >>>>>>>>
> > >>>>>>>> 149914 149365  99%    0.09K   3259       46     13036K anon_vma
> > >>>>>>>>
> > >>>>>>>> 146640 143123  97%    0.10K   3760       39     15040K
> buffer_head
> > >>>>>>>>
> > >>>>>>>> 130368  32791  25%    0.09K   3104       42     12416K
> kmalloc-96
> > >>>>>>>>
> > >>>>>>>> 129752 129752 100%    0.07K   2317       56      9268K
> Acpi-Operand
> > >>>>>>>>
> > >>>>>>>> 105468 105106  99%    0.04K   1034      102      4136K
> > >>>>>>>> selinux_inode_security
> > >>>>>>>>     73080  73080 100%    0.13K   2436       30      9744K
> > >>>>> kernfs_node_cache
> > >>>>>>>>
> > >>>>>>>>     72360  70261  97%    0.59K   1340       54     42880K
> inode_cache
> > >>>>>>>>
> > >>>>>>>>     71040  71040 100%    0.12K   2220       32      8880K
> > >> eventpoll_epi
> > >>>>>>>>
> > >>>>>>>>     68096  59262  87%    0.02K    266      256      1064K
> kmalloc-16
> > >>>>>>>>
> > >>>>>>>>     53652  53652 100%    0.04K    526      102      2104K
> pde_opener
> > >>>>>>>>
> > >>>>>>>>     50496  31654  62%    2.00K   3156       16    100992K
> > >> kmalloc-2048
> > >>>>>>>>
> > >>>>>>>>     46242  46242 100%    0.19K   1101       42      8808K
> cred_jar
> > >>>>>>>>
> > >>>>>>>>     44496  43013  96%    0.66K    927       48     29664K
> > >>>>> proc_inode_cache
> > >>>>>>>>
> > >>>>>>>>     44352  44352 100%    0.06K    693       64      2772K
> > >>>>> task_delay_info
> > >>>>>>>>
> > >>>>>>>>     43516  43471  99%    0.69K    946       46     30272K
> > >>>>> sock_inode_cache
> > >>>>>>>>
> > >>>>>>>>     37856  27626  72%    1.00K   1183       32     37856K
> > >> kmalloc-1024
> > >>>>>>>>
> > >>>>>>>>     36736  36736 100%    0.07K    656       56      2624K
> > >> eventpoll_pwq
> > >>>>>>>>
> > >>>>>>>>     34076  31282  91%    0.57K   1217       28     19472K
> > >>>>> radix_tree_node
> > >>>>>>>>
> > >>>>>>>>     33660  30528  90%    1.05K   1122       30     35904K
> > >>>>> ext4_inode_cache
> > >>>>>>>>
> > >>>>>>>>     32760  30959  94%    0.19K    780       42      6240K
> kmalloc-192
> > >>>>>>>>
> > >>>>>>>>     32028  32028 100%    0.04K    314      102      1256K
> > >>>>> ext4_extent_status
> > >>>>>>>>
> > >>>>>>>>     30048  30048 100%    0.25K    939       32      7512K
> > >>>>> skbuff_head_cache
> > >>>>>>>>
> > >>>>>>>>     28736  28736 100%    0.06K    449       64      1796K
> fs_cache
> > >>>>>>>>
> > >>>>>>>>     24702  24702 100%    0.69K    537       46     17184K
> files_cache
> > >>>>>>>>
> > >>>>>>>>     23808  23808 100%    0.66K    496       48     15872K
> ovl_inode
> > >>>>>>>>
> > >>>>>>>>     23104  22945  99%    0.12K    722       32      2888K
> kmalloc-128
> > >>>>>>>>
> > >>>>>>>>     22724  21307  93%    0.69K    494       46     15808K
> > >>>>> shmem_inode_cache
> > >>>>>>>>
> > >>>>>>>>     21472  21472 100%    0.12K    671       32      2684K
> seq_file
> > >>>>>>>>
> > >>>>>>>>     19904  19904 100%    1.00K    622       32     19904K UNIX
> > >>>>>>>>
> > >>>>>>>>     17340  17340 100%    1.06K    578       30     18496K
> mm_struct
> > >>>>>>>>
> > >>>>>>>>     15980  15980 100%    0.02K     94      170       376K
> avtab_node
> > >>>>>>>>
> > >>>>>>>>     14070  14070 100%    1.06K    469       30     15008K
> > >> signal_cache
> > >>>>>>>>
> > >>>>>>>>     13248  13248 100%    0.12K    414       32      1656K pid
> > >>>>>>>>
> > >>>>>>>>     12128  11777  97%    0.25K    379       32      3032K
> kmalloc-256
> > >>>>>>>>
> > >>>>>>>>     11008  11008 100%    0.02K     43      256       172K
> > >>>>>>>> selinux_file_security
> > >>>>>>>>     10812  10812 100%    0.04K    106      102       424K
> > >> Acpi-Namespace
> > >>>>>>>>
> > >>>>>>>> These information shows that the 'iommu_iova' is the top memory
> > >>>>> consumer.
> > >>>>>>>> In order to optimize the network performence of Openstack
> virtual
> > >>>>> machines,
> > >>>>>>>> I enabled the vt-d feature in bios and sriov feature of Intel
> 82599
> > >>>>> 10G
> > >>>>>>>> NIC. I'm assuming this is the root cause of this issue.
> > >>>>>>>>
> > >>>>>>>> Is there anything I can do to fix it?
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> iommu mailing list
> > >>>>>> iommu@lists.linux-foundation.org
> > >>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> > _______________________________________________
> > iommu mailing list
> > iommu@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to