On Mon, 2018-07-02 at 18:47 +1000, Alexey Kardashevskiy wrote:
> On Fri, 29 Jun 2018 17:34:36 +1000
> Russell Currey <rus...@russell.cc> wrote:
> 
> > DMA pseudo-bypass is a new set of DMA operations that solve some
> > issues for
> > devices that want to address more than 32 bits but can't address
> > the 59
> > bits required to enable direct DMA.
> > 
> > The previous implementation for POWER8/PHB3 worked around this by
> > configuring a bypass from the default 32-bit address space into 64-
> > bit
> > address space.  This approach does not work for POWER9/PHB4 because
> > regions of memory are discontiguous and many devices will be unable
> > to
> > address memory beyond the first node.
> > 
> > Instead, implement a new set of DMA operations that allocate TCEs
> > as DMA
> > mappings are requested so that all memory is addressable even when
> > a
> > one-to-one mapping between real addresses and DMA addresses isn't
> > possible.  These TCEs are the maximum size available on the
> > platform,
> > which is 256M on PHB3 and 1G on PHB4.
> > 
> > Devices can now map any region of memory up to the maximum amount
> > they can
> > address according to the DMA mask set, in chunks of the largest
> > available
> > TCE size.
> > 
> > This implementation replaces the need for the existing PHB3
> > solution and
> > should be compatible with future PHB versions.
> > 
> > It is, however, rather naive.  There is no unmapping, and as a
> > result
> > devices can eventually run out of space if they address their
> > entire
> > DMA mask worth of TCEs.  An implementation with unmap() will come
> > in
> > future (and requires a much more complex implementation), but this
> > is a
> > good start due to the drastic performance improvement.
> 
> 
> Why does not dma_iommu_ops work in this case? I keep asking and yet
> no
> comment in the commit log or mails...

Yes, I should cover this in the commit message.

So the primary reason that the IOMMU doesn't work for this case is the
TCE allocation - the IOMMU doesn't have a refcount and will allocate
(in this case on P9) 1GB TCEs to each map which will quickly fail.

This isn't intended to be a replacement for the IOMMU, it's a
roundabout way of achieving what the direct ops do (like NVIDIA devices
on P8 can do today).

Reply via email to