i915: Async GPU relocation processing

Joonas Lahtinen Fri, 17 Mar 2017 04:15:52 -0700

On to, 2017-03-16 at 13:20 +0000, Chris Wilson wrote:
> If the user requires patching of their batch or auxiliary buffers, we
> currently make the alterations on the cpu. If they are active on the GPU
> at the time, we wait under the struct_mutex for them to finish executing
> before we rewrite the contents. This happens if shared relocation trees
> are used between different contexts with separate address space (and the
> buffers then have different addresses in each), the 3D state will need
> to be adjusted between execution on each context. However, we don't need
> to use the CPU to do the relocation patching, as we could queue commands
> to the GPU to perform it and use fences to serialise the operation with
> the current activity and future - so the operation on the GPU appears
> just as atomic as performing it immediately. Performing the relocation
> rewrites on the GPU is not free, in terms of pure throughput, the number
> of relocations/s is about halved - but more importantly so is the time
> under the struct_mutex.
> 
> Signed-off-by: Chris Wilson <[email protected]>


<SNIP>

>       /* Must be a variable in the struct to allow GCC to unroll. */
> +     cache->gen = INTEL_GEN(i915);
>       cache->has_llc = HAS_LLC(i915);
> -     cache->has_fence = INTEL_GEN(i915) < 4;
> -     cache->use_64bit_reloc = HAS_64BIT_RELOC(i915);
> +     cache->has_fence = cache->gen < 4;
> +     cache->use_64bit_reloc = cache->gen >= 8;

Keep using HAS_64BIT_RELOC(i915)...

> +static u32 *reloc_gpu(struct i915_execbuffer *eb,
> +                   struct i915_vma *vma,
> +                   unsigned int len)
> +{
> +     struct reloc_cache *cache = &eb->reloc_cache;
> +     u32 *cmd;
> +
> +     if (cache->rq_size > PAGE_SIZE/sizeof(u32) - (len + 1))

There's no overflow chance here, so I'd rq_size + len + 1. 

> +             reloc_gpu_flush(cache);
> +
> +     if (!cache->rq) {
> +             struct drm_i915_gem_object *obj;
> +             struct drm_i915_gem_request *rq;
> +             struct i915_vma *batch;
> +             int err;
> +
> +             GEM_BUG_ON(vma->obj->base.write_domain & I915_GEM_DOMAIN_CPU);

Use ==.

I just close my eyes for the reminder of this function and expect v2 to
have a proper teardown :P

Also vma / obj pair naming varies from what they usually are, so I'd
consider renaming one of them to lessen confusion.

> @@ -1012,6 +1148,67 @@ relocate_entry(struct i915_vma *vma,
>       bool wide = eb->reloc_cache.use_64bit_reloc;
>       void *vaddr;
>  
> +     if (!eb->reloc_cache.vaddr &&
> +         (DBG_FORCE_RELOC == FORCE_GPU_RELOC ||
> +          !reservation_object_test_signaled_rcu(obj->resv, true))) {
> +             const unsigned int gen = eb->reloc_cache.gen;
> +             unsigned int len;
> +             u32 *batch;
> +             u64 addr;
> +
> +             if (wide)
> +                     len = offset & 7 ? 8 : 5;
> +             else if (gen >= 4)
> +                     len = 4;
> +             else if (gen >= 3)
> +                     len = 3;
> +             else /* On gen2 MI_STORE_DWORD_IMM uses a physical address */
> +                     goto repeat;
> +
> +             batch = reloc_gpu(eb, vma, len);
> +             if (IS_ERR(batch))
> +                     goto repeat;
> +
> +             addr = gen8_canonical_addr(vma->node.start + offset);
> +             if (wide) {
> +                     if (offset & 7) {
> +                             *batch++ = MI_STORE_DWORD_IMM_GEN4;

This indent level calls for a new function. __relocate_entry_gpu

Couldn't we share some of this STORE IMM code we have around? I don't
want to crawl the specs every time :(

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 14/15] drm/i915: Async GPU relocation processing

Reply via email to