On Fri, May 15, 2026 at 11:33:21PM +0000, Michael Kelley wrote:
> From: Michael Kelley <[email protected]> Sent: Friday, May 15, 2026 11:00 
> AM
> > 
> > From: Yu Zhang <[email protected]> Sent: Friday, May 15, 2026 
> > 9:24 AM
> > >
> > > On Thu, May 14, 2026 at 06:14:22PM +0000, Michael Kelley wrote:
> > > > From: Yu Zhang <[email protected]> Sent: Monday, May 11, 
> > > > 2026 9:24 AM
> > > > >
> > 
> > [....]
> > 
> > > > > +     unsigned long nr_pages = end_pfn - start_pfn;
> > > > > +     u16 count = 0;
> > > > > +
> > > > > +     while (nr_pages > 0) {
> > > > > +             unsigned long flush_pages;
> > > > > +             int order;
> > > > > +             unsigned long pfn_align;
> > > > > +             unsigned long size_align;
> > > > > +
> > > > > +             if (count >= HV_IOMMU_MAX_FLUSH_VA_COUNT) {
> > > > > +                     count = HV_IOMMU_FLUSH_VA_OVERFLOW;
> > > > > +                     break;
> > > > > +             }
> > > > > +
> > > > > +             if (start_pfn)
> > > > > +                     pfn_align = __ffs(start_pfn);
> > > >
> > > > I don't understand why __ffs() is correct here. I would expect
> > > > __fls() so it is consistent with the calculation of size_align. But I
> > > > can only surmise how the hypercall works since there's no
> > > > documentation, so maybe my understanding of the hypercall is
> > > > wrong.   If __ffs really is correct, a comment explaining why
> > > > would help. :-)
> > > >
> > >
> > > The use of __ffs() is intentional. Each flush entry invalidates a
> > > naturally aligned 2^N page block, and the hypervisor requires the
> > > page_number to be aligned to 2^page_mask_shift.
> > >
> > > Here __ffs() and __fls() serve different purposes:
> > > - __ffs(start_pfn) is about the alignment constraint, e.g.,  how
> > > large a block can this address support?
> > > - __fls(nr_pages) is about  the size constraint, e.g., how large
> > > a block can the remaining range hold?
> > >
> > > Taking min() of both ensures each entry is both properly aligned
> > > and within bounds.
> > >
> > > Thanks for raising this - it definitely deserves a comment. I had to
> > > stare at it for a while myself to remember why. :)
> > 
> > Hmmm. Something about this still nags at me. I'll run some
> > experiments to either convince myself that you are right, or to
> > come up with a counterexample.
> 
> I have resolved what was nagging at me. My understanding of how
> _ffs() and __fls() work was wrong. :-(  Your code is correct.
> 
> > 
> > A related thought occurred to me. If each flush entry that is passed
> > to Hyper-V describes a naturally aligned 2^N page block, I don't
> > think the HV_IOMMU_MAX_FLUSH_VA_COUNT can ever
> > be reached. The number of entries is limited by the number of
> > bits in a PFN and the pages count, both of which are 64. And with
> > 52 bit physical addressing and 4KiB pages, the actual size of
> > a PFN and pages count is even smaller than 64.
> > HV_IOMMU_MAX_FLUSH_VA_COUNT is the number of 8 byte
> > union hv_iommu_flush_va entries that fit in a 4KiB page, so
> > it's ~500.
> > 
> > My statement applies to a single flush range. If multiple flush
> > ranges were strung together in a single hypercall, a larger count
> > could be reached, but hv_flush_device_domain_list() only does
> > a single range. So I don't think the overflow case in
> > hv_flush_device_domain_list() can ever happen. But let me
> > do my experiments, and I will also look at this aspect to confirm
> > if it's right.
> 
> My experiments also convince me that the overflow case can't
> happen as long as the hypercall is being populated with entries
> derived from a single range.

Gosh. You're right. I think I've been overthinking this :( 

So we can simplify this to just do the list flush and fall back
to HVCALL_FLUSH_DEVICE_DOMAIN only if the list flush fails.

B.R.
Yu

Reply via email to