On Mon, Nov 7, 2016 at 3:35 AM, Archit Taneja <arch...@codeaurora.org> wrote:
>
>
> On 11/06/2016 07:45 PM, Rob Clark wrote:
>>
>> On Fri, Nov 4, 2016 at 6:44 PM, Jordan Crouse <jcro...@codeaurora.org>
>> wrote:
>>>
>>> For reasons that are not entirely understood using dma_map_sg()
>>> for nocache/write combine buffers doesn't always successfully flush
>>> the cache after the memory is zeroed somewhere deep in the bowels
>>> of the shmem code.  My working theory is that the cache flush on
>>> the swiotlb bounce buffer address work isn't always flushing what
>>> we need.
>>>
>>> Instead of using dma_map_sg() directly kmap and flush each page
>>> at allocate time.  We could use invalidate + clean or just invalidate
>>> if we wanted to but on ARM64 using a flush is safer and not much
>>> slower for what we are trying to do.
>>>
>>> Hopefully someday I'll more clearly understand the relationship between
>>> shmem  kmap, vmap and the swiotlb bounce buffer and we can be smarter
>>> about when and how we invalidate the caches.
>>
>>
>> Like I mentioned on irc, we defn don't want to ever hit bounce
>> buffers.  I think the problem here is dma-mapping assumes we only
>> support 32b physical addresses, which is not true (at least as long as
>> we have iommu)..
>>
>> Archit hit a similar problem on the display side of things.
>
>
> Yeah, the shmem allocated pages ended up being 33 bit addresses some times
> on db820c. The msm driver sets the dma mask to a default of 32 bits.
> The dma mapping api gets unhappy whenever we get sg chunks with 33 bit
> addresses, and tries to use switolb for them. We eventually end up
> overflowing the swiotlb.
>
> Setting the mask to 33 bits worked as a temporary hack.

actually I think setting the mask is the correct thing to do, but
probably should use dma_set_ fxn..  and I suspect that the PA for the
iommu is larger than 33 bits.  It is probably equal to the number of
address lines that are wired up.  Not quite sure what that is, but I
don't think there are any devices where the iommu cannot map some
physical pages.

>>
>> I think the proper solution is to do something like this somewhere:
>>
>>    dma_set_mask_and_coherent(drm->dev, DMA_BIT_MASK(size))
>>
>> where size is 32 or 64 (48?) depending on device..
>>
>> Note that this value should be perhaps something that the iommu driver
>> knows, since really it is about the iommu page tables.
>>
>> Maybe it would even make sense for the iommu driver to set this when
>> we attach?  Although since the GEM code is allocating/mapping for
>> multiple subdev's that might not work out (ie. if we only attached
>> subdev's and not the parent drm dev which is used by the GEM code for
>> dma_map/unmap()).
>
>
> By multiple subdevs, you mean MDP and GPU devs? As far as their dma
> masks are concerned, it should be the same for both, right? It's only
> the iommu driver's returned iova that we need to care about. Weren't we
> thinking of managing that by having separate drm_mm's for MDP and GPU?

correct, it isn't about the mdp vs gpu iova, but about the pa that is
mapped in the iommu.

And yeah, multiple address spaces is one of the things I have for 4.10:

https://github.com/freedreno/kernel-msm/commit/40d73b331e45fa4bb255ae26012378d396ee46a5

BR,
-R

>
>>
>> (ofc it would be better if dma-mapping was structured more like
>> helpers which could be bypassed in the few special cases where the
>> abstraction doesn't work, rather than forcing us to do pretend
>> dma_map/unmap() for cache operations.. but that is a topic for a
>> different rant)
>
>
> Yeah, that would be nice.
>
> Archit
>
>
>>
>> BR,
>> -R
>>
>>> Signed-off-by: Jordan Crouse <jcro...@codeaurora.org>
>>> ---
>>>  drivers/gpu/drm/msm/msm_gem.c | 21 ++++++++-------------
>>>  1 file changed, 8 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/msm_gem.c
>>> b/drivers/gpu/drm/msm/msm_gem.c
>>> index 85f3047..29f5a30 100644
>>> --- a/drivers/gpu/drm/msm/msm_gem.c
>>> +++ b/drivers/gpu/drm/msm/msm_gem.c
>>> @@ -79,6 +79,7 @@ static struct page **get_pages(struct drm_gem_object
>>> *obj)
>>>                 struct drm_device *dev = obj->dev;
>>>                 struct page **p;
>>>                 int npages = obj->size >> PAGE_SHIFT;
>>> +               int i;
>>>
>>>                 if (use_pages(obj))
>>>                         p = drm_gem_get_pages(obj);
>>> @@ -91,6 +92,13 @@ static struct page **get_pages(struct drm_gem_object
>>> *obj)
>>>                         return p;
>>>                 }
>>>
>>> +               for (i = 0; i < npages; i++) {
>>> +                       void *addr = kmap_atomic(p[i]);
>>> +
>>> +                       __dma_flush_range(addr, addr + PAGE_SIZE);
>>> +                       kunmap_atomic(addr);
>>> +               }
>>> +
>>>                 msm_obj->sgt = drm_prime_pages_to_sg(p, npages);
>>>                 if (IS_ERR(msm_obj->sgt)) {
>>>                         dev_err(dev->dev, "failed to allocate sgt\n");
>>> @@ -98,13 +106,6 @@ static struct page **get_pages(struct drm_gem_object
>>> *obj)
>>>                 }
>>>
>>>                 msm_obj->pages = p;
>>> -
>>> -               /* For non-cached buffers, ensure the new pages are clean
>>> -                * because display controller, GPU, etc. are not
>>> coherent:
>>> -                */
>>> -               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
>>> -                       dma_map_sg(dev->dev, msm_obj->sgt->sgl,
>>> -                                       msm_obj->sgt->nents,
>>> DMA_BIDIRECTIONAL);
>>>         }
>>>
>>>         return msm_obj->pages;
>>> @@ -115,12 +116,6 @@ static void put_pages(struct drm_gem_object *obj)
>>>         struct msm_gem_object *msm_obj = to_msm_bo(obj);
>>>
>>>         if (msm_obj->pages) {
>>> -               /* For non-cached buffers, ensure the new pages are clean
>>> -                * because display controller, GPU, etc. are not
>>> coherent:
>>> -                */
>>> -               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
>>> -                       dma_unmap_sg(obj->dev->dev, msm_obj->sgt->sgl,
>>> -                                       msm_obj->sgt->nents,
>>> DMA_BIDIRECTIONAL);
>>>                 sg_free_table(msm_obj->sgt);
>>>                 kfree(msm_obj->sgt);
>>>
>>> --
>>> 1.9.1
>>>
>>> _______________________________________________
>>> Freedreno mailing list
>>> Freedreno@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/freedreno
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

Reply via email to