On Thu, Mar 21, 2024 at 01:37:36AM +0000, Liu, Yuan1 wrote:
> > -----Original Message-----
> > From: Peter Xu <pet...@redhat.com>
> > Sent: Thursday, March 21, 2024 4:32 AM
> > To: Liu, Yuan1 <yuan1....@intel.com>
> > Cc: Daniel P. Berrangé <berra...@redhat.com>; faro...@suse.de; qemu-
> > de...@nongnu.org; hao.xi...@bytedance.com; bryan.zh...@bytedance.com; Zou,
> > Nanhai <nanhai....@intel.com>
> > Subject: Re: [PATCH v5 5/7] migration/multifd: implement initialization of
> > qpl compression
> > 
> > On Wed, Mar 20, 2024 at 04:23:01PM +0000, Liu, Yuan1 wrote:
> > > let me explain here, during the decompression operation of IAA, the
> > > decompressed data can be directly output to the virtual address of the
> > > guest memory by IAA hardware.  It can avoid copying the decompressed
> > data
> > > to guest memory by CPU.
> > 
> > I see.
> > 
> > > Without -mem-prealloc, all the guest memory is not populated, and IAA
> > > hardware needs to trigger I/O page fault first and then output the
> > > decompressed data to the guest memory region.  Besides that, CPU page
> > > faults will also trigger IOTLB flush operation when IAA devices use SVM.
> > 
> > Oh so the IAA hardware already can use CPU pgtables?  Nice..
> > 
> > Why IOTLB flush is needed?  AFAIU we're only installing new pages, the
> > request can either come from a CPU access or a DMA.  In all cases there
> > should have no tearing down of an old page.  Isn't an iotlb flush only
> > needed if a tear down happens?
> 
> As far as I know, IAA hardware uses SVM technology to use the CPU's page 
> table 
> for address translation (IOMMU scalable mode directly accesses the CPU page 
> table).
> Therefore, when the CPU page table changes, the device's Invalidation 
> operation needs
> to be triggered to update the IOMMU and the device's cache. 
> 
> My current kernel version is mainline 6.2. The issue I see is as follows:
> --Handle_mm_fault
>  |
>   -- wp_page_copy

This is the CoW path.  Not usual at all..

I assume this issue should only present on destination.  Then the guest
pages should be the destination of such DMAs to happen, which means these
should be write faults, and as we see here it is, otherwise it won't
trigger a CoW.

However it's not clear to me why a pre-installed zero page existed.  It
means someone read the guest pages first.

It might be interesting to know _why_ someone reads the guest pages, even
if we know they're all zeros.  If we can avoid such reads then it'll be a
hole rather than a prefaulted read on zero page, then invalidations are not
needed, and I expect that should fix the iotlb storm issue.

It'll still be good we can fix this first to not make qpl special from this
regard, so that the hope is migration submodule shouldn't rely on any
pre-config (-mem-prealloc) on guest memory behaviors to work properly.

>     |
>     -- mmu_notifier_invalidate_range
>       |
>       -- intel_invalidate_rage
>         |
>         -- qi_flush_piotlb
>         -- qi_flush_dev_iotlb_pasid

-- 
Peter Xu


Reply via email to