Re: [Mesa-dev] [PATCH v3 5/6] radeonsi: use slot indexes for bindless handles

2017-08-11 Thread Marek Olšák
On Fri, Aug 11, 2017 at 6:53 PM, Marek Olšák  wrote:
> On Tue, Aug 8, 2017 at 6:57 PM, Samuel Pitoiset
>  wrote:
>> Using VRAM address as bindless handles is not a good idea because
>> we have to use LLVMIntToPTr and the LLVM CSE pass can't optimize
>> because it has no information about the pointer.
>>
>> Instead, use slots indexes like the existing descriptors. Note
>> that we use fixed 16-dword slots for both samplers and images.
>> This doesn't really matter because no real apps use image handles.
>>
>> This improves performance with DOW3 by +7%.
>>
>> v3: - fix si_emit_global_shader_pointers() for merged GFX9 shaders
>> - always re-upload the array of descriptors at creation time
>> v2: - inline si_release_bindless_descriptors()
>
> I meant that you inline the function manually. Anyway:

I see patch 6 where the function is no longer a one-liner. There is no
need to inline.

>
> Reviewed-by: Marek Olšák 
>
> Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 5/6] radeonsi: use slot indexes for bindless handles

2017-08-11 Thread Marek Olšák
On Tue, Aug 8, 2017 at 6:57 PM, Samuel Pitoiset
 wrote:
> Using VRAM address as bindless handles is not a good idea because
> we have to use LLVMIntToPTr and the LLVM CSE pass can't optimize
> because it has no information about the pointer.
>
> Instead, use slots indexes like the existing descriptors. Note
> that we use fixed 16-dword slots for both samplers and images.
> This doesn't really matter because no real apps use image handles.
>
> This improves performance with DOW3 by +7%.
>
> v3: - fix si_emit_global_shader_pointers() for merged GFX9 shaders
> - always re-upload the array of descriptors at creation time
> v2: - inline si_release_bindless_descriptors()

I meant that you inline the function manually. Anyway:

Reviewed-by: Marek Olšák 

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3 5/6] radeonsi: use slot indexes for bindless handles

2017-08-08 Thread Samuel Pitoiset
Using VRAM address as bindless handles is not a good idea because
we have to use LLVMIntToPTr and the LLVM CSE pass can't optimize
because it has no information about the pointer.

Instead, use slots indexes like the existing descriptors. Note
that we use fixed 16-dword slots for both samplers and images.
This doesn't really matter because no real apps use image handles.

This improves performance with DOW3 by +7%.

v3: - fix si_emit_global_shader_pointers() for merged GFX9 shaders
- always re-upload the array of descriptors at creation time
v2: - inline si_release_bindless_descriptors()
- fix overwriting sampler and image slots
- use fixed 16-dword slots for images

Signed-off-by: Samuel Pitoiset 
Reviewed-by: Marek Olšák  (v2)
---
 src/gallium/drivers/radeonsi/si_descriptors.c | 350 ++
 src/gallium/drivers/radeonsi/si_pipe.c|  12 -
 src/gallium/drivers/radeonsi/si_pipe.h|  23 +-
 src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c |  35 ++-
 4 files changed, 193 insertions(+), 227 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 799a53eefb..2e8f1320a1 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -1875,16 +1875,20 @@ static void si_rebind_buffer(struct pipe_context *ctx, 
struct pipe_resource *buf
 
/* Bindless texture handles */
if (rbuffer->texture_handle_allocated) {
+   struct si_descriptors *descs = >bindless_descriptors;
+
util_dynarray_foreach(>resident_tex_handles,
  struct si_texture_handle *, tex_handle) {
struct pipe_sampler_view *view = (*tex_handle)->view;
-   struct si_bindless_descriptor *desc = 
(*tex_handle)->desc;
+   unsigned desc_slot = (*tex_handle)->desc_slot;
 
if (view->texture == buf) {
si_set_buf_desc_address(rbuffer,
view->u.buf.offset,
-   >desc_list[4]);
-   desc->dirty = true;
+   descs->list +
+   desc_slot * 16 + 4);
+
+   (*tex_handle)->desc_dirty = true;
sctx->bindless_descriptors_dirty = true;
 
radeon_add_to_buffer_list_check_mem(
@@ -1897,10 +1901,12 @@ static void si_rebind_buffer(struct pipe_context *ctx, 
struct pipe_resource *buf
 
/* Bindless image handles */
if (rbuffer->image_handle_allocated) {
+   struct si_descriptors *descs = >bindless_descriptors;
+
util_dynarray_foreach(>resident_img_handles,
  struct si_image_handle *, img_handle) {
struct pipe_image_view *view = &(*img_handle)->view;
-   struct si_bindless_descriptor *desc = 
(*img_handle)->desc;
+   unsigned desc_slot = (*img_handle)->desc_slot;
 
if (view->resource == buf) {
if (view->access & PIPE_IMAGE_ACCESS_WRITE)
@@ -1908,8 +1914,10 @@ static void si_rebind_buffer(struct pipe_context *ctx, 
struct pipe_resource *buf
 
si_set_buf_desc_address(rbuffer,
view->u.buf.offset,
-   >desc_list[4]);
-   desc->dirty = true;
+   descs->list +
+   desc_slot * 16 + 4);
+
+   (*img_handle)->desc_dirty = true;
sctx->bindless_descriptors_dirty = true;
 
radeon_add_to_buffer_list_check_mem(
@@ -1941,11 +1949,19 @@ static void si_invalidate_buffer(struct pipe_context 
*ctx, struct pipe_resource
 }
 
 static void si_upload_bindless_descriptor(struct si_context *sctx,
- struct si_bindless_descriptor *desc)
+ unsigned desc_slot,
+ unsigned num_dwords)
 {
+   struct si_descriptors *desc = >bindless_descriptors;
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
-   uint64_t va = desc->buffer->gpu_address + desc->offset;
-   unsigned num_dwords = sizeof(desc->desc_list) / 4;
+   unsigned desc_slot_offset = desc_slot * 16;
+   uint32_t *data;
+   uint64_t va;
+
+   data = desc->list + desc_slot_offset;
+
+   va = desc->buffer->gpu_address +