Re: [PATCH 2/5] drm: add ARM flush implementation

Gurchetan Singh Wed, 24 Jan 2018 10:46:01 -0800

On Wed, Jan 24, 2018 at 4:45 AM, Russell King - ARM Linux <
li...@armlinux.org.uk> wrote:


> On Tue, Jan 23, 2018 at 06:56:03PM -0800, Gurchetan Singh wrote:
> > The dma_cache_maint_page function is important for cache maintenance on
> > ARM32 (this was determined via testing).
> >
> > Since we desire direct control of the caches in drm_cache.c, let's make
> > a copy of the function, rename it and use it.
> >
> > v2: Don't use DMA API, call functions directly (Daniel)
> >
> > Signed-off-by: Gurchetan Singh <gurchetansi...@chromium.org>
> > ---
> >  drivers/gpu/drm/drm_cache.c | 61 ++++++++++++++++++++++++++++++
> +++++++++++++++
> >  1 file changed, 61 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> > index 89cdd32fe1f3..5124582451c6 100644
> > --- a/drivers/gpu/drm/drm_cache.c
> > +++ b/drivers/gpu/drm/drm_cache.c
> > @@ -69,6 +69,55 @@ static void drm_cache_flush_clflush(struct page
> *pages[],
> >  }
> >  #endif
> >
> > +#if defined(CONFIG_ARM)
> > +static void drm_cache_maint_page(struct page *page, unsigned long
> offset,
> > +                              size_t size, enum dma_data_direction dir,
> > +                              void (*op)(const void *, size_t, int))
> > +{
> > +     unsigned long pfn;
> > +     size_t left = size;
> > +
> > +     pfn = page_to_pfn(page) + offset / PAGE_SIZE;
> > +     offset %= PAGE_SIZE;
> > +
> > +     /*
> > +      * A single sg entry may refer to multiple physically contiguous
> > +      * pages.  But we still need to process highmem pages individually.
> > +      * If highmem is not configured then the bulk of this loop gets
> > +      * optimized out.
> > +      */
> > +     do {
> > +             size_t len = left;
> > +             void *vaddr;
> > +
> > +             page = pfn_to_page(pfn);
> > +
> > +             if (PageHighMem(page)) {
> > +                     if (len + offset > PAGE_SIZE)
> > +                             len = PAGE_SIZE - offset;
> > +
> > +                     if (cache_is_vipt_nonaliasing()) {
> > +                             vaddr = kmap_atomic(page);
> > +                             op(vaddr + offset, len, dir);
> > +                             kunmap_atomic(vaddr);
> > +                     } else {
> > +                             vaddr = kmap_high_get(page);
> > +                             if (vaddr) {
> > +                                     op(vaddr + offset, len, dir);
> > +                                     kunmap_high(page);
> > +                             }
> > +                     }
> > +             } else {
> > +                     vaddr = page_address(page) + offset;
> > +                     op(vaddr, len, dir);
> > +             }
> > +             offset = 0;
> > +             pfn++;
> > +             left -= len;
> > +     } while (left);
> > +}
> > +#endif
> > +
> >  /**
> >   * drm_flush_pages - Flush dcache lines of a set of pages.
> >   * @pages: List of pages to be flushed.
> > @@ -104,6 +153,12 @@ drm_flush_pages(struct page *pages[], unsigned long
> num_pages)
> >                                  (unsigned long)page_virtual +
> PAGE_SIZE);
> >               kunmap_atomic(page_virtual);
> >       }
> > +#elif defined(CONFIG_ARM)
> > +     unsigned long i;
> > +
> > +     for (i = 0; i < num_pages; i++)
> > +             drm_cache_maint_page(pages[i], 0, PAGE_SIZE,
> DMA_TO_DEVICE,
> > +                                  dmac_map_area);
> >  #else
> >       pr_err("Architecture has no drm_cache.c support\n");
> >       WARN_ON_ONCE(1);
> > @@ -135,6 +190,12 @@ drm_flush_sg(struct sg_table *st)
> >
> >       if (wbinvd_on_all_cpus())
> >               pr_err("Timed out waiting for cache flush\n");
> > +#elif defined(CONFIG_ARM)
> > +     struct sg_page_iter sg_iter;
> > +
> > +     for_each_sg_page(st->sgl, &sg_iter, st->nents, 0)
> > +             drm_cache_maint_page(sg_page_iter_page(&sg_iter), 0,
> PAGE_SIZE,
> > +                                  DMA_TO_DEVICE, dmac_map_area);
> >  #else
> >       pr_err("Architecture has no drm_cache.c support\n");
> >       WARN_ON_ONCE(1);
>
> NAK.  With this, you're "going under the covers" of the architecture and
> using architecture private functions (dmac_map_area/dmac_unmap_area) in
> a driver.
>
> The hint is that dmac_map_area() and dmac_unmap_area() are not declared
> in any asm/*.h header file, but in arch/arm/mm/dma.h, and they carry this
> warning above their definition:
>
> /*
>  * These are private to the dma-mapping API.  Do not use directly.
>  * Their sole purpose is to ensure that data held in the cache
>  * is visible to DMA, or data written by DMA to system memory is
>  * visible to the CPU.
>  */
>

My preference is to use the dma-mapping API (version 1 of this
patchset) too because:

1) the warnings in these headers
2) this approach entails code duplication

However, I've received feedback that's not desirable for DRM.  Given your
disapproval of this approach, I think the dma-mapping API is the way to go.


> So no, this is not an acceptable approach.
>
> Secondly, in light of spectre and meltdown, do we _really_ want to
> export cache flushing to userspace in any case - these attacks rely
> on being able to flush specific cache lines from the caches in order
> to do the timing attacks (while leaving others in place.)

Currently, 32-bit ARM does not export such flushing capabilities to
> userspace, which makes it very difficult (I'm not going to say
> impossible) to get any working proof-of-code program that even
> illustrates the timing attack.  Exposing this functionality changes
> that game, and means that we're much more open to these exploits.
> (Some may say that you can flush cache lines by reading a large
> enough buffer - I'm aware, I've tried that, the results are too
> unreliable even for a simple attempt which doesn't involve crossing
> privilege boundaries.)
>

Will using the DMA API (dma_sync_single_for_device /
dma_sync_sg_for_device) mitigate your Meltdown / Spectre concerns in any
way?


> Do you really need cacheable GPU buffers, or will write combining
> buffers (as we use elsewhere such as etnaviv) suffice?  Please provide
> some _real_ _world_ performance measurements that demonstrate that
> there is a real need for this functionality.


My desire is for the vgem driver to work correctly on ARM, which requires
cache flushing.  The mappings vgem itself creates are write combine.  The
issue is the pages retrieved on ARM architecture usually have to be flushed
before they can be used (see rockchip_gem_get_pages / tegra_bo_get_pages).
This patch set attempts to do the flushing in an architecture independent
manner (since vgem is intended to work on ARM / x86).

There is some interest in cache-able DRM buffers (though, again, this
patchset is not about that).  Renderscript accesses are very slow on ARM
and we keep shadow buffers to improve performance (see
crrev.com/602736).  Jeffy
has done some tests with memcpys in our camera stack that shows
improvements (with caching --> 4 to 7 ms, without caching --> 20 to 90ms).
However, I do agree Spectre complicates things.


> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps
> up
> According to speedtest.net: 8.21Mbps down 510kbps up
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 2/5] drm: add ARM flush implementation

Reply via email to