On Mon, Nov 05, 2018 at 02:51:41PM -0700, Alex Williamson wrote:
> On Mon,  5 Nov 2018 11:55:51 -0500
> Daniel Jordan <daniel.m.jor...@oracle.com> wrote:
> > +static int vfio_pin_map_dma_chunk(unsigned long start_vaddr,
> > +                             unsigned long end_vaddr,
> > +                             struct vfio_pin_args *args)
> >  {
> > -   dma_addr_t iova = dma->iova;
> > -   unsigned long vaddr = dma->vaddr;
> > -   size_t size = map_size;
> > +   struct vfio_dma *dma = args->dma;
> > +   dma_addr_t iova = dma->iova + (start_vaddr - dma->vaddr);
> > +   unsigned long unmapped_size = end_vaddr - start_vaddr;
> > +   unsigned long pfn, mapped_size = 0;
> >     long npage;
> > -   unsigned long pfn, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> >     int ret = 0;
> >  
> > -   while (size) {
> > +   while (unmapped_size) {
> >             /* Pin a contiguous chunk of memory */
> > -           npage = vfio_pin_pages_remote(dma, vaddr + dma->size,
> > -                                         size >> PAGE_SHIFT, &pfn, limit);
> > +           npage = vfio_pin_pages_remote(dma, start_vaddr + mapped_size,
> > +                                         unmapped_size >> PAGE_SHIFT,
> > +                                         &pfn, args->limit, args->mm);
> >             if (npage <= 0) {
> >                     WARN_ON(!npage);
> >                     ret = (int)npage;
> > @@ -1052,22 +1067,50 @@ static int vfio_pin_map_dma(struct vfio_iommu 
> > *iommu, struct vfio_dma *dma,
> >             }
> >  
> >             /* Map it! */
> > -           ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage,
> > -                                dma->prot);
> > +           ret = vfio_iommu_map(args->iommu, iova + mapped_size, pfn,
> > +                                npage, dma->prot);
> >             if (ret) {
> > -                   vfio_unpin_pages_remote(dma, iova + dma->size, pfn,
> > +                   vfio_unpin_pages_remote(dma, iova + mapped_size, pfn,
> >                                             npage, true);
> >                     break;
> >             }
> >  
> > -           size -= npage << PAGE_SHIFT;
> > -           dma->size += npage << PAGE_SHIFT;
> > +           unmapped_size -= npage << PAGE_SHIFT;
> > +           mapped_size   += npage << PAGE_SHIFT;
> >     }
> >  
> > +   return (ret == 0) ? KTASK_RETURN_SUCCESS : ret;
> 
> Overall I'm a big fan of this, but I think there's an undo problem
> here.  Per 03/13, kc_undo_func is only called for successfully
> completed chunks and each kc_thread_func should handle cleanup of any
> intermediate work before failure.  That's not done here afaict.  Should
> we be calling the vfio_pin_map_dma_undo() manually on the completed
> range before returning error?

Yes, we should be, thanks very much for catching this.

At least I documented what I didn't do?  :)

> 
> > +}
> > +
> > +static void vfio_pin_map_dma_undo(unsigned long start_vaddr,
> > +                             unsigned long end_vaddr,
> > +                             struct vfio_pin_args *args)
> > +{
> > +   struct vfio_dma *dma = args->dma;
> > +   dma_addr_t iova = dma->iova + (start_vaddr - dma->vaddr);
> > +   dma_addr_t end  = dma->iova + (end_vaddr   - dma->vaddr);
> > +
> > +   vfio_unmap_unpin(args->iommu, args->dma, iova, end, true);
> > +}
> > +
> > +static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
> > +                       size_t map_size)
> > +{
> > +   unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > +   int ret = 0;
> > +   struct vfio_pin_args args = { iommu, dma, limit, current->mm };
> > +   /* Stay on PMD boundary in case THP is being used. */
> > +   DEFINE_KTASK_CTL(ctl, vfio_pin_map_dma_chunk, &args, PMD_SIZE);
> 
> PMD_SIZE chunks almost seems too convenient, I wonder a) is that really
> enough work per thread, and b) is this really successfully influencing
> THP?  Thanks,

Yes, you're right on both counts.  I'd been using PUD_SIZE for a while in
testing and meant to switch it back to KTASK_MEM_CHUNK (128M) but used PMD_SIZE
by mistake.  PUD_SIZE chunks have made thread finishing times too spread out
in some cases, so 128M seems to be a reasonable compromise.

Thanks for the thorough and quick review.

Daniel

Reply via email to