gpu_memcpy: Add a lighter-weight memcpy path

Jason Ekstrand Tue, 02 May 2017 16:59:14 -0700

On Thu, Apr 27, 2017 at 11:32 AM, Nanley Chery <nanleych...@gmail.com>
wrote:


> We're now performing a GPU memcpy in more places to copy small amounts
> of data. Add a path to thrash less state.
>
> Signed-off-by: Nanley Chery <nanley.g.ch...@intel.com>
> ---
>  src/intel/vulkan/genX_gpu_memcpy.c | 38 ++++++++++++++++++++++++++++++
> ++++++++
>  1 file changed, 38 insertions(+)
>
> diff --git a/src/intel/vulkan/genX_gpu_memcpy.c
> b/src/intel/vulkan/genX_gpu_memcpy.c
> index 3cbc7235cf..f15c2a5f72 100644
> --- a/src/intel/vulkan/genX_gpu_memcpy.c
> +++ b/src/intel/vulkan/genX_gpu_memcpy.c
> @@ -28,6 +28,8 @@
>
>  #include "common/gen_l3_config.h"
>
> +#define MI_PREDICATE_SRC0 0x2400
> +
>  /**
>   * This file implements some lightweight memcpy/memset operations on the
> GPU
>   * using a vertex buffer and streamout.
> @@ -63,6 +65,42 @@ genX(cmd_buffer_gpu_memcpy)(struct anv_cmd_buffer
> *cmd_buffer,
>     assert(dst_offset + size <= dst->size);
>     assert(src_offset + size <= src->size);
>
> +   /* This memcpy expects DWord aligned memory. */
> +   assert(size % 4 == 0);
> +   assert(dst_offset % 4 == 0);
> +   assert(src_offset % 4 == 0);
> +
> +   /* Use a simpler memcpy operation when copying 16 bytes or less of
> data.
> +    * This is the size of a surface state's clear value on SKL+.
> +    */
>

I think I would rather just have a separate function.  Why?  Because these
two methods have very different characteristics in terms of what state they
trash (quite a bit vs. none) and how they perform.  I'd rather we be
explicit about which method we use.  Feel free to rename
cmd_buffer_gpu_memcpy to cmd_buffer_streamout_copy and then you can name
the other cmd_buffer_mem_mem_copy or similar.


> +   if (size <= 16) {
> +      for (uint32_t i = 0; i < size; i += 4) {
> +         const struct anv_address src_addr =
> +            (struct anv_address) { src, src_offset + i};
> +         const struct anv_address dst_addr =
> +            (struct anv_address) { dst, dst_offset + i};
> +#if GEN_GEN >= 8
> +         anv_batch_emit(&cmd_buffer->batch, GENX(MI_COPY_MEM_MEM), cp) {
> +            cp.DestinationMemoryAddress = dst_addr;
> +            cp.SourceMemoryAddress = src_addr;
> +         }
> +#else
> +         /* IVB does not have a general purpose register for command
> streamer
> +          * commands. Therefore, we use an alternate temporary register.
> +          */
> +         anv_batch_emit(&cmd_buffer->batch, GENX(MI_LOAD_REGISTER_MEM),
> load) {
> +            load.RegisterAddress = MI_PREDICATE_SRC0;
> +            load.MemoryAddress = src_addr;
> +         }
> +         anv_batch_emit(&cmd_buffer->batch, GENX(MI_STORE_REGISTER_MEM),
> store) {
> +            store.RegisterAddress = MI_PREDICATE_SRC0;
> +            store.MemoryAddress = dst_addr;
> +         }
> +#endif
> +      }
> +      return;
> +   }
> +
>     /* The maximum copy block size is 4 32-bit components at a time. */
>     unsigned bs = 16;
>     bs = gcd_pow2_u64(bs, src_offset);
> --
> 2.12.2
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 12/22] anv/gpu_memcpy: Add a lighter-weight memcpy path

Reply via email to