On 11/06/2016 07:45 PM, Rob Clark wrote:
On Fri, Nov 4, 2016 at 6:44 PM, Jordan Crouse <jcro...@codeaurora.org> wrote:
For reasons that are not entirely understood using dma_map_sg()
for nocache/write combine buffers doesn't always successfully flush
the cache after the memory is zeroed somewhere deep in the bowels
of the shmem code.  My working theory is that the cache flush on
the swiotlb bounce buffer address work isn't always flushing what
we need.

Instead of using dma_map_sg() directly kmap and flush each page
at allocate time.  We could use invalidate + clean or just invalidate
if we wanted to but on ARM64 using a flush is safer and not much
slower for what we are trying to do.

Hopefully someday I'll more clearly understand the relationship between
shmem  kmap, vmap and the swiotlb bounce buffer and we can be smarter
about when and how we invalidate the caches.

Like I mentioned on irc, we defn don't want to ever hit bounce
buffers.  I think the problem here is dma-mapping assumes we only
support 32b physical addresses, which is not true (at least as long as
we have iommu)..

Archit hit a similar problem on the display side of things.

Yeah, the shmem allocated pages ended up being 33 bit addresses some times
on db820c. The msm driver sets the dma mask to a default of 32 bits.
The dma mapping api gets unhappy whenever we get sg chunks with 33 bit
addresses, and tries to use switolb for them. We eventually end up
overflowing the swiotlb.

Setting the mask to 33 bits worked as a temporary hack.

I think the proper solution is to do something like this somewhere:

   dma_set_mask_and_coherent(drm->dev, DMA_BIT_MASK(size))

where size is 32 or 64 (48?) depending on device..

Note that this value should be perhaps something that the iommu driver
knows, since really it is about the iommu page tables.

Maybe it would even make sense for the iommu driver to set this when
we attach?  Although since the GEM code is allocating/mapping for
multiple subdev's that might not work out (ie. if we only attached
subdev's and not the parent drm dev which is used by the GEM code for

By multiple subdevs, you mean MDP and GPU devs? As far as their dma
masks are concerned, it should be the same for both, right? It's only
the iommu driver's returned iova that we need to care about. Weren't we
thinking of managing that by having separate drm_mm's for MDP and GPU?

(ofc it would be better if dma-mapping was structured more like
helpers which could be bypassed in the few special cases where the
abstraction doesn't work, rather than forcing us to do pretend
dma_map/unmap() for cache operations.. but that is a topic for a
different rant)

Yeah, that would be nice.



Signed-off-by: Jordan Crouse <jcro...@codeaurora.org>
 drivers/gpu/drm/msm/msm_gem.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 85f3047..29f5a30 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -79,6 +79,7 @@ static struct page **get_pages(struct drm_gem_object *obj)
                struct drm_device *dev = obj->dev;
                struct page **p;
                int npages = obj->size >> PAGE_SHIFT;
+               int i;

                if (use_pages(obj))
                        p = drm_gem_get_pages(obj);
@@ -91,6 +92,13 @@ static struct page **get_pages(struct drm_gem_object *obj)
                        return p;

+               for (i = 0; i < npages; i++) {
+                       void *addr = kmap_atomic(p[i]);
+                       __dma_flush_range(addr, addr + PAGE_SIZE);
+                       kunmap_atomic(addr);
+               }
                msm_obj->sgt = drm_prime_pages_to_sg(p, npages);
                if (IS_ERR(msm_obj->sgt)) {
                        dev_err(dev->dev, "failed to allocate sgt\n");
@@ -98,13 +106,6 @@ static struct page **get_pages(struct drm_gem_object *obj)

                msm_obj->pages = p;
-               /* For non-cached buffers, ensure the new pages are clean
-                * because display controller, GPU, etc. are not coherent:
-                */
-               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
-                       dma_map_sg(dev->dev, msm_obj->sgt->sgl,
-                                       msm_obj->sgt->nents, DMA_BIDIRECTIONAL);

        return msm_obj->pages;
@@ -115,12 +116,6 @@ static void put_pages(struct drm_gem_object *obj)
        struct msm_gem_object *msm_obj = to_msm_bo(obj);

        if (msm_obj->pages) {
-               /* For non-cached buffers, ensure the new pages are clean
-                * because display controller, GPU, etc. are not coherent:
-                */
-               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
-                       dma_unmap_sg(obj->dev->dev, msm_obj->sgt->sgl,
-                                       msm_obj->sgt->nents, DMA_BIDIRECTIONAL);


Freedreno mailing list

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Freedreno mailing list

Reply via email to