[Mesa-dev] radv: use rtld for shader uploads

2019-07-01 Thread Nicolai Hähnle-Montoro
Hey folks,

the merge request at
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1220 moves
radv to using the rtld for shader uploads for parity with radeonsi and
to address some issues that arose with changes in LLVM behavior.
Please review!

Cheers,
Nicolai
-- 
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] amd/rtld: update the ELF representation of LDS symbols

2019-06-16 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The initial prototype used a processor-specific symbol type, but
feedback suggests that an approach using processor-specific section
name that encodes the alignment analogous to SHN_COMMON symbols is
preferred.

This patch keeps both variants around for now to reduce problems
with LLVM compatibility as we switch branches around.

This also cleans up the error reporting in this function.
---
 src/amd/common/ac_rtld.c | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/src/amd/common/ac_rtld.c b/src/amd/common/ac_rtld.c
index dc9cc04705b..6379e55120d 100644
--- a/src/amd/common/ac_rtld.c
+++ b/src/amd/common/ac_rtld.c
@@ -32,21 +32,25 @@
 
 #include "ac_binary.h"
 #include "ac_gpu_info.h"
 #include "util/u_dynarray.h"
 #include "util/u_math.h"
 
 // Old distributions may not have this enum constant
 #define MY_EM_AMDGPU 224
 
 #ifndef STT_AMDGPU_LDS
-#define STT_AMDGPU_LDS 13
+#define STT_AMDGPU_LDS 13 // this is deprecated -- remove
+#endif
+
+#ifndef SHN_AMDGPU_LDS
+#define SHN_AMDGPU_LDS 0xff00
 #endif
 
 #ifndef R_AMDGPU_NONE
 #define R_AMDGPU_NONE 0
 #define R_AMDGPU_ABS32_LO 1
 #define R_AMDGPU_ABS32_HI 2
 #define R_AMDGPU_ABS64 3
 #define R_AMDGPU_REL32 4
 #define R_AMDGPU_REL64 5
 #define R_AMDGPU_ABS32 6
@@ -169,47 +173,60 @@ static bool layout_symbols(struct ac_rtld_symbol 
*symbols, unsigned num_symbols,
  * Read LDS symbols from the given \p section of the ELF of \p part and append
  * them to the LDS symbols list.
  *
  * Shared LDS symbols are filtered out.
  */
 static bool read_private_lds_symbols(struct ac_rtld_binary *binary,
 unsigned part_idx,
 Elf_Scn *section,
 uint32_t *lds_end_align)
 {
-#define report_elf_if(cond) \
+#define report_if(cond) \
do { \
if ((cond)) { \
report_errorf(#cond); \
return false; \
} \
} while (false)
+#define report_elf_if(cond) \
+   do { \
+   if ((cond)) { \
+   report_elf_errorf(#cond); \
+   return false; \
+   } \
+   } while (false)
 
struct ac_rtld_part *part = >parts[part_idx];
Elf64_Shdr *shdr = elf64_getshdr(section);
uint32_t strtabidx = shdr->sh_link;
Elf_Data *symbols_data = elf_getdata(section, NULL);
report_elf_if(!symbols_data);
 
const Elf64_Sym *symbol = symbols_data->d_buf;
size_t num_symbols = symbols_data->d_size / sizeof(Elf64_Sym);
 
for (size_t j = 0; j < num_symbols; ++j, ++symbol) {
-   if (ELF64_ST_TYPE(symbol->st_info) != STT_AMDGPU_LDS)
+   struct ac_rtld_symbol s = {};
+
+   if (ELF64_ST_TYPE(symbol->st_info) == STT_AMDGPU_LDS) {
+   /* old-style LDS symbols from initial prototype -- 
remove eventually */
+   s.align = MIN2(1u << (symbol->st_other >> 3), 1u << 16);
+   } else if (symbol->st_shndx == SHN_AMDGPU_LDS) {
+   s.align = MIN2(symbol->st_value, 1u << 16);
+   report_if(!util_is_power_of_two_nonzero(s.align));
+   } else
continue;
 
-   report_elf_if(symbol->st_size > 1u << 29);
+   report_if(symbol->st_size > 1u << 29);
 
-   struct ac_rtld_symbol s = {};
s.name = elf_strptr(part->elf, strtabidx, symbol->st_name);
s.size = symbol->st_size;
-   s.align = MIN2(1u << (symbol->st_other >> 3), 1u << 16);
s.part_idx = part_idx;
 
if (!strcmp(s.name, "__lds_end")) {
report_elf_if(s.size != 0);
*lds_end_align = MAX2(*lds_end_align, s.align);
continue;
}
 
const struct ac_rtld_symbol *shared =
find_symbol(>lds_symbols, s.name, part_idx);
@@ -217,20 +234,21 @@ static bool read_private_lds_symbols(struct 
ac_rtld_binary *binary,
report_elf_if(s.align > shared->align);
report_elf_if(s.size > shared->size);
continue;
}
 
util_dynarray_append(>lds_symbols, struct 
ac_rtld_symbol, s);
}
 
return true;
 
+#undef report_if
 #undef report_elf_if
 }
 
 /**
  * Open a binary consisting of one or more shader parts.
  *
  * \param binary the uninitialized struct
  * \param i binary opening parameters
  */
 bool ac_rtld_open(struct ac_rtld_binary *binary,
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] amd/common: derive register headers and ac_debug from a JSON database

2019-05-13 Thread Nicolai Hähnle
By the way: Yes, I'm aware that other "database" formats are in use in 
Mesa either directly or indirectly, but for various reasons it wasn't 
easily possible to reuse the corresponding code.


On 13.05.19 23:26, Nicolai Hähnle wrote:

Hi all,

this series moves to a description of registers in a JSON file as the
single source of truth for register descriptions. Both register headers
and the tables used for decoding command buffers for debugging are
derived from this JSON description at build time.

This should make ac_debug less fragile down the line, and allows us to
be more explicit about which chips have which registers / fields / enum
values.

The JSON description also has a notion of address spaces. This is
already used to distinguish packet3 payloads from registers, and could
be used in the future to distinguish more cleanly between registers and
resource descriptor fields as well.

Since some of the patches are too large for the mailing list, the series
is here as a merge request:
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/880

Please review!

Thanks,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] amd/common: derive register headers and ac_debug from a JSON database

2019-05-13 Thread Nicolai Hähnle

Hi all,

this series moves to a description of registers in a JSON file as the 
single source of truth for register descriptions. Both register headers 
and the tables used for decoding command buffers for debugging are 
derived from this JSON description at build time.


This should make ac_debug less fragile down the line, and allows us to 
be more explicit about which chips have which registers / fields / enum 
values.


The JSON description also has a notion of address spaces. This is 
already used to distinguish packet3 payloads from registers, and could 
be used in the future to distinguish more cleanly between registers and 
resource descriptor fields as well.


Since some of the patches are too large for the mailing list, the series 
is here as a merge request: 
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/880


Please review!

Thanks,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] radeonsi: add radeonsi_debug_disassembly option

2019-05-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This dumps disassembly to the pipe_debug_callback together with shader
stats.

Can be used together with shader-db to get full disassembly of all shaders
in the database.
---
 src/gallium/drivers/radeonsi/si_debug_options.h |  1 +
 src/gallium/drivers/radeonsi/si_shader.c| 15 +--
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_debug_options.h 
b/src/gallium/drivers/radeonsi/si_debug_options.h
index db642366ca6..ef8435804fb 100644
--- a/src/gallium/drivers/radeonsi/si_debug_options.h
+++ b/src/gallium/drivers/radeonsi/si_debug_options.h
@@ -1,8 +1,9 @@
 OPT_BOOL(clear_db_cache_before_clear, false, "Clear DB cache before fast depth 
clear")
 OPT_BOOL(enable_nir, false, "Enable NIR")
 OPT_BOOL(aux_debug, false, "Generate ddebug_dumps for the auxiliary context")
 OPT_BOOL(sync_compile, false, "Always compile synchronously (will cause 
stalls)")
 OPT_BOOL(dump_shader_binary, false, "Dump shader binary as part of 
ddebug_dumps")
+OPT_BOOL(debug_disassembly, false, "Report shader disassembly as part of 
driver debug messages (for shader db)")
 OPT_BOOL(vs_fetch_always_opencode, false, "Always open code vertex fetches 
(less efficient, purely for testing)")
 
 #undef OPT_BOOL
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 2186938fec9..4c321dc60dc 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -5154,27 +5154,22 @@ static void si_shader_dump_disassembly(struct si_screen 
*screen,
.elf_ptrs = >elf_buffer,
.elf_sizes = >elf_size }))
return;
 
const char *disasm;
uint64_t nbytes;
 
if (!ac_rtld_get_section_by_name(_binary, ".AMDGPU.disasm", 
, ))
goto out;
 
-   fprintf(file, "Shader %s disassembly:\n", name);
-   if (nbytes > INT_MAX) {
-   fprintf(file, "too long\n");
+   if (nbytes > INT_MAX)
goto out;
-   }
-
-   fprintf(file, "%*s", (int)nbytes, disasm);
 
if (debug && debug->debug_message) {
/* Very long debug messages are cut off, so send the
 * disassembly one line at a time. This causes more
 * overhead, but on the plus side it simplifies
 * parsing of resulting logs.
 */
pipe_debug_message(debug, SHADER_INFO,
   "Shader Disassembly Begin");
 
@@ -5190,20 +5185,25 @@ static void si_shader_dump_disassembly(struct si_screen 
*screen,
   "%.*s", count, disasm + 
line);
}
 
line += count + 1;
}
 
pipe_debug_message(debug, SHADER_INFO,
   "Shader Disassembly End");
}
 
+   if (file) {
+   fprintf(file, "Shader %s disassembly:\n", name);
+   fprintf(file, "%*s", (int)nbytes, disasm);
+   }
+
 out:
ac_rtld_close(_binary);
 }
 
 static void si_calculate_max_simd_waves(struct si_shader *shader)
 {
struct si_screen *sscreen = shader->selector->screen;
struct ac_shader_config *conf = >config;
unsigned num_inputs = shader->selector->info.num_inputs;
unsigned lds_increment = sscreen->info.chip_class >= CIK ? 512 : 256;
@@ -5255,20 +5255,23 @@ static void si_calculate_max_simd_waves(struct 
si_shader *shader)
 
shader->max_simd_waves = max_simd_waves;
 }
 
 void si_shader_dump_stats_for_shader_db(struct si_screen *screen,
struct si_shader *shader,
struct pipe_debug_callback *debug)
 {
const struct ac_shader_config *conf = >config;
 
+   if (screen->options.debug_disassembly)
+   si_shader_dump_disassembly(screen, >binary, debug, 
"main", NULL);
+
pipe_debug_message(debug, SHADER_INFO,
   "Shader Stats: SGPRS: %d VGPRS: %d Code Size: %d "
   "LDS: %d Scratch: %d Max Waves: %d Spilled SGPRs: %d 
"
   "Spilled VGPRs: %d PrivMem VGPRs: %d",
   conf->num_sgprs, conf->num_vgprs,
   si_get_shader_binary_size(screen, shader),
   conf->lds_size, conf->scratch_bytes_per_wave,
   shader->max_simd_waves, conf->spilled_sgprs,
   conf->spilled_vgprs, shader->private_mem_vgprs);
 }
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] radeonsi: fix line splitting in si_shader_dump_assembly

2019-05-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Compute the count since the start of the current line instead of the
count since the start of the the disassembly.
---
 src/gallium/drivers/radeonsi/si_shader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 835eedd89e6..2186938fec9 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -5176,21 +5176,21 @@ static void si_shader_dump_disassembly(struct si_screen 
*screen,
 * parsing of resulting logs.
 */
pipe_debug_message(debug, SHADER_INFO,
   "Shader Disassembly Begin");
 
uint64_t line = 0;
while (line < nbytes) {
int count = nbytes - line;
const char *nl = memchr(disasm + line, '\n', nbytes - 
line);
if (nl)
-   count = nl - disasm;
+   count = nl - (disasm + line);
 
if (count) {
pipe_debug_message(debug, SHADER_INFO,
   "%.*s", count, disasm + 
line);
}
 
line += count + 1;
}
 
pipe_debug_message(debug, SHADER_INFO,
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] amd/common: use ARRAY_SIZE for the LLVM command line options

2019-05-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This is more convenient for changing it around during debug.
---
 src/amd/common/ac_llvm_util.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_llvm_util.c b/src/amd/common/ac_llvm_util.c
index 69446863b95..16c1b25d8c2 100644
--- a/src/amd/common/ac_llvm_util.c
+++ b/src/amd/common/ac_llvm_util.c
@@ -52,22 +52,22 @@ static void ac_init_llvm_target()
/* Workaround for bug in llvm 4.0 that causes image intrinsics
 * to disappear.
 * https://reviews.llvm.org/D26348
 *
 * "mesa" is the prefix for error messages.
 *
 * -global-isel-abort=2 is a no-op unless global isel has been enabled.
 * This option tells the backend to fall-back to SelectionDAG and print
 * a diagnostic message if global isel fails.
 */
-   const char *argv[3] = { "mesa", "-simplifycfg-sink-common=false", 
"-global-isel-abort=2" };
-   LLVMParseCommandLineOptions(3, argv, NULL);
+   const char *argv[] = { "mesa", "-simplifycfg-sink-common=false", 
"-global-isel-abort=2" };
+   LLVMParseCommandLineOptions(ARRAY_SIZE(argv), argv, NULL);
 }
 
 static once_flag ac_init_llvm_target_once_flag = ONCE_FLAG_INIT;
 
 void ac_init_llvm_once(void)
 {
call_once(_init_llvm_target_once_flag, ac_init_llvm_target);
 }
 
 static LLVMTargetRef ac_get_llvm_target(const char *triple)
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] radeonsi: cleanup some #includes

2019-05-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeonsi/si_texture.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_texture.c 
b/src/gallium/drivers/radeonsi/si_texture.c
index 59d50376438..b31a2f6428a 100644
--- a/src/gallium/drivers/radeonsi/si_texture.c
+++ b/src/gallium/drivers/radeonsi/si_texture.c
@@ -16,22 +16,22 @@
  *
  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
  * THE AUTHOR(S) AND/OR THEIR SUPPLIERS BE LIABLE FOR ANY CLAIM,
  * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
  * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  */
 
-#include "radeonsi/si_pipe.h"
-#include "radeonsi/si_query.h"
+#include "si_pipe.h"
+#include "si_query.h"
 #include "util/u_format.h"
 #include "util/u_log.h"
 #include "util/u_memory.h"
 #include "util/u_pack_color.h"
 #include "util/u_resource.h"
 #include "util/u_surface.h"
 #include "util/u_transfer.h"
 #include "util/os_time.h"
 #include 
 #include 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 3/3] u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macros

2019-05-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The main motivation for this change is API ergonomics: most operations
on dynarrays are really on elements, not on bytes, so it's weird to have
grow and resize as the odd operations out.

The secondary motivation is memory safety. Users of the old byte-oriented
functions would often multiply a number of elements with the element size,
which could overflow, and checking for overflow is tedious.

With this change, we only need to implement the overflow checks once.
The checks are cheap: since eltsize is a compile-time constant and the
functions should be inlined, they only add a single comparison and an
unlikely branch.

v2:
- ensure operations are no-op when allocation fails
- in util_dynarray_clone, call resize_bytes with a compile-time constant 
element size
---
 .../drivers/nouveau/nv30/nvfx_fragprog.c  |  2 +-
 src/gallium/drivers/nouveau/nv50/nv50_state.c |  5 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c |  5 +-
 .../compiler/brw_nir_analyze_ubo_ranges.c |  2 +-
 src/mesa/drivers/dri/i965/brw_bufmgr.c|  4 +-
 src/util/u_dynarray.h | 46 +--
 6 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c 
b/src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c
index 86e3599325e..2bcb62b97d8 100644
--- a/src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c
+++ b/src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c
@@ -66,21 +66,21 @@ release_temps(struct nvfx_fpc *fpc)
fpc->r_temps &= ~fpc->r_temps_discard;
fpc->r_temps_discard = 0ULL;
 }
 
 static inline struct nvfx_reg
 nvfx_fp_imm(struct nvfx_fpc *fpc, float a, float b, float c, float d)
 {
float v[4] = {a, b, c, d};
int idx = fpc->imm_data.size >> 4;
 
-   memcpy(util_dynarray_grow(>imm_data, sizeof(float) * 4), v, 4 * 
sizeof(float));
+   memcpy(util_dynarray_grow(>imm_data, float, 4), v, 4 * sizeof(float));
return nvfx_reg(NVFXSR_IMM, idx);
 }
 
 static void
 grow_insns(struct nvfx_fpc *fpc, int size)
 {
struct nv30_fragprog *fp = fpc->fp;
 
fp->insn_len += size;
fp->insn = realloc(fp->insn, sizeof(uint32_t) * fp->insn_len);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c 
b/src/gallium/drivers/nouveau/nv50/nv50_state.c
index 55167a27c09..228feced5d1 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c
@@ -1256,24 +1256,23 @@ nv50_set_global_bindings(struct pipe_context *pipe,
  struct pipe_resource **resources,
  uint32_t **handles)
 {
struct nv50_context *nv50 = nv50_context(pipe);
struct pipe_resource **ptr;
unsigned i;
const unsigned end = start + nr;
 
if (nv50->global_residents.size <= (end * sizeof(struct pipe_resource *))) {
   const unsigned old_size = nv50->global_residents.size;
-  const unsigned req_size = end * sizeof(struct pipe_resource *);
-  util_dynarray_resize(>global_residents, req_size);
+  util_dynarray_resize(>global_residents, struct pipe_resource *, 
end);
   memset((uint8_t *)nv50->global_residents.data + old_size, 0,
- req_size - old_size);
+ nv50->global_residents.size - old_size);
}
 
if (resources) {
   ptr = util_dynarray_element(
  >global_residents, struct pipe_resource *, start);
   for (i = 0; i < nr; ++i) {
  pipe_resource_reference([i], resources[i]);
  nv50_set_global_handle(handles[i], resources[i]);
   }
} else {
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
index 12e21862ee0..2ab51c8529e 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
@@ -1363,24 +1363,23 @@ nvc0_set_global_bindings(struct pipe_context *pipe,
  struct pipe_resource **resources,
  uint32_t **handles)
 {
struct nvc0_context *nvc0 = nvc0_context(pipe);
struct pipe_resource **ptr;
unsigned i;
const unsigned end = start + nr;
 
if (nvc0->global_residents.size <= (end * sizeof(struct pipe_resource *))) {
   const unsigned old_size = nvc0->global_residents.size;
-  const unsigned req_size = end * sizeof(struct pipe_resource *);
-  util_dynarray_resize(>global_residents, req_size);
+  util_dynarray_resize(>global_residents, struct pipe_resource *, 
end);
   memset((uint8_t *)nvc0->global_residents.data + old_size, 0,
- req_size - old_size);
+ nvc0->global_residents.size - old_size);
}
 
if (resources) {
   ptr = util_dynarray_element(
  >global_residents, struct pipe_resource *, start);
   for (i = 0; i < nr; ++i) {
  pipe_resource_reference([i], resources[i]);
  nvc0_set_global_handle(handles[i], resourc

[Mesa-dev] [PATCH v2 2/3] u_dynarray: return 0 on realloc failure and ensure no-op

2019-05-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

We're not very good at handling out-of-memory conditions in general, but
this change at least gives the caller the option of handling it gracefully
and without memory leaks.

This happens to fix an error in out-of-memory handling in i965, which has
the following code in brw_bufmgr.c:

  node = util_dynarray_grow(vma_list, sizeof(struct vma_bucket_node));
  if (unlikely(!node))
 return 0ull;

Previously, allocation failure for util_dynarray_grow wouldn't actually
return NULL when the dynarray was previously non-empty.

v2:
- make util_dynarray_ensure_cap a no-op on failure, add MUST_CHECK attribute
- simplify the new capacity calculation: aside from avoiding a useless loop
  when newcap is very large, this also avoids an infinite loop when newcap
  is larger than 1 << 31
---
 src/util/u_dynarray.h | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/util/u_dynarray.h b/src/util/u_dynarray.h
index b30fd7b1154..769c3820546 100644
--- a/src/util/u_dynarray.h
+++ b/src/util/u_dynarray.h
@@ -70,35 +70,37 @@ util_dynarray_fini(struct util_dynarray *buf)
 }
 
 static inline void
 util_dynarray_clear(struct util_dynarray *buf)
 {
buf->size = 0;
 }
 
 #define DYN_ARRAY_INITIAL_SIZE 64
 
-static inline void *
+MUST_CHECK static inline void *
 util_dynarray_ensure_cap(struct util_dynarray *buf, unsigned newcap)
 {
if (newcap > buf->capacity) {
-  if (buf->capacity == 0)
- buf->capacity = DYN_ARRAY_INITIAL_SIZE;
-
-  while (newcap > buf->capacity)
- buf->capacity *= 2;
+  unsigned capacity = MAX3(DYN_ARRAY_INITIAL_SIZE, buf->capacity * 2, 
newcap);
+  void *data;
 
   if (buf->mem_ctx) {
- buf->data = reralloc_size(buf->mem_ctx, buf->data, buf->capacity);
+ data = reralloc_size(buf->mem_ctx, buf->data, capacity);
   } else {
- buf->data = realloc(buf->data, buf->capacity);
+ data = realloc(buf->data, capacity);
   }
+  if (!data)
+ return 0;
+
+  buf->data = data;
+  buf->capacity = capacity;
}
 
return (void *)((char *)buf->data + buf->size);
 }
 
 static inline void *
 util_dynarray_grow_cap(struct util_dynarray *buf, int diff)
 {
return util_dynarray_ensure_cap(buf, buf->size + diff);
 }
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 0/3] u_dynarray: minor API cleanups

2019-05-13 Thread Nicolai Hähnle
Hi all,

after reflecting on the comments a bit more, here's a v2 of the series
in which resize and grow are no-ops on reallocation failure, but get
MUST_CHECK attributes so that the compiler warns if the return value
isn't used.

It seems like almost all callers are well-behaved, except for a few
call-sites in nouveau. I'll leave it up to nouveau folks to decide
how to handle it, since there's no real regression: reallocation failure
wasn't handled properly before, and it still isn't handled properly.

Please review!

Thanks,
Nicolai
--
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c| 12 +++
 src/gallium/drivers/freedreno/a3xx/fd3_gmem.c|  4 +--
 src/gallium/drivers/freedreno/a4xx/fd4_gmem.c|  2 +-
 src/gallium/drivers/freedreno/a5xx/fd5_gmem.c|  2 +-
 src/gallium/drivers/freedreno/a6xx/fd6_gmem.c|  4 +--
 src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c |  2 +-
 src/gallium/drivers/nouveau/nv50/nv50_state.c|  5 ++-
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c|  5 ++-
 src/intel/compiler/brw_nir_analyze_ubo_ranges.c  |  2 +-
 src/mesa/drivers/dri/i965/brw_bufmgr.c   |  4 +--
 src/util/u_dynarray.h| 64 
-
 11 files changed, 62 insertions(+), 44 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 1/3] freedreno: use util_dynarray_clear instead of util_dynarray_resize(_, 0)

2019-05-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This is more expressive and simplifies a subsequent change.

v2:
- fix one more call-site after rebase
---
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 12 ++--
 src/gallium/drivers/freedreno/a3xx/fd3_gmem.c |  4 ++--
 src/gallium/drivers/freedreno/a4xx/fd4_gmem.c |  2 +-
 src/gallium/drivers/freedreno/a5xx/fd5_gmem.c |  2 +-
 src/gallium/drivers/freedreno/a6xx/fd6_gmem.c |  4 ++--
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index 0c7ea844fa4..0edc5e940c1 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
@@ -397,21 +397,21 @@ static void
 patch_draws(struct fd_batch *batch, enum pc_di_vis_cull_mode vismode)
 {
unsigned i;
 
if (!is_a20x(batch->ctx->screen)) {
/* identical to a3xx */
for (i = 0; i < fd_patch_num_elements(>draw_patches); 
i++) {
struct fd_cs_patch *patch = 
fd_patch_element(>draw_patches, i);
*patch->cs = patch->val | DRAW(0, 0, 0, vismode, 0);
}
-   util_dynarray_resize(>draw_patches, 0);
+   util_dynarray_clear(>draw_patches);
return;
}
 
if (vismode == USE_VISIBILITY)
return;
 
for (i = 0; i < batch->draw_patches.size / sizeof(uint32_t*); i++) {
uint32_t *ptr = *util_dynarray_element(>draw_patches, 
uint32_t*, i);
unsigned cnt = ptr[0] >> 16 & 0xfff; /* 5 with idx buffer, 3 
without */
 
@@ -465,22 +465,22 @@ fd2_emit_sysmem_prep(struct fd_batch *batch)
OUT_RING(ring, A2XX_PA_SC_SCREEN_SCISSOR_TL_WINDOW_OFFSET_DISABLE);
OUT_RING(ring, A2XX_PA_SC_SCREEN_SCISSOR_BR_X(pfb->width) |
A2XX_PA_SC_SCREEN_SCISSOR_BR_Y(pfb->height));
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_WINDOW_OFFSET));
OUT_RING(ring, A2XX_PA_SC_WINDOW_OFFSET_X(0) |
A2XX_PA_SC_WINDOW_OFFSET_Y(0));
 
patch_draws(batch, IGNORE_VISIBILITY);
-   util_dynarray_resize(>draw_patches, 0);
-   util_dynarray_resize(>shader_patches, 0);
+   util_dynarray_clear(>draw_patches);
+   util_dynarray_clear(>shader_patches);
 }
 
 /* before first tile */
 static void
 fd2_emit_tile_init(struct fd_batch *batch)
 {
struct fd_context *ctx = batch->ctx;
struct fd_ringbuffer *ring = batch->gmem;
struct pipe_framebuffer_state *pfb = >framebuffer;
struct fd_gmem_stateobj *gmem = >gmem;
@@ -544,21 +544,21 @@ fd2_emit_tile_init(struct fd_batch *batch)
continue;
}
 
patch->cs[0] = A2XX_PA_SC_SCREEN_SCISSOR_BR_X(32) |
A2XX_PA_SC_SCREEN_SCISSOR_BR_Y(lines);
patch->cs[4] = A2XX_RB_COLOR_INFO_BASE(color_base) |
A2XX_RB_COLOR_INFO_FORMAT(COLORX_8_8_8_8);
patch->cs[5] = A2XX_RB_DEPTH_INFO_DEPTH_BASE(depth_base) |
A2XX_RB_DEPTH_INFO_DEPTH_FORMAT(1);
}
-   util_dynarray_resize(>gmem_patches, 0);
+   util_dynarray_clear(>gmem_patches);
 
/* set to zero, for some reason hardware doesn't like certain values */
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_CURRENT_BIN_ID_MIN));
OUT_RING(ring, 0);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_CURRENT_BIN_ID_MAX));
OUT_RING(ring, 0);
 
@@ -649,22 +649,22 @@ fd2_emit_tile_init(struct fd_batch *batch)
 
ctx->emit_ib(ring, batch->binning);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
OUT_RING(ring, 0x0002);
} else {
patch_draws(batch, IGNORE_VISIBILITY);
}
 
-   util_dynarray_resize(>draw_patches, 0);
-   util_dynarray_resize(>shader_patches, 0);
+   util_dynarray_clear(>draw_patches);
+   util_dynarray_clear(>shader_patches);
 }
 
 /* before mem2gmem */
 static void
 fd2_emit_tile_prep(struct fd_batch *batch, struct fd_tile *tile)
 {
struct fd_ringbuffer *ring = batch->gmem;
struct pipe_framebuffer_state *pfb = >framebuffer;
enum pipe_format format = pipe_surface_format(pfb->cbufs[0]);
 
diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_gmem.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_gmem.c
index 7de0a92cdc1..e4455b3fa63 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_gmem.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_gmem.c
@@ -704,32 +704,32 @@ fd3_emit_tile_mem2gmem(struct fd_batch *batch, struct 
fd_tile *tile)
 }
 
 static vo

[Mesa-dev] [PATCH 1/6] radeonsi: inline si_shader_binary_read_config into its only caller

2019-05-04 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Since it can only be used for reading the config of an individual,
non-combined shader, it is not very reusable anyway.
---
 src/gallium/drivers/radeonsi/si_shader.c | 21 +++--
 src/gallium/drivers/radeonsi/si_shader.h |  2 --
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 757624c52f7..528c34aecba 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -5302,33 +5302,20 @@ void si_shader_dump(struct si_screen *sscreen, const 
struct si_shader *shader,
if (shader->epilog)
si_shader_dump_disassembly(>epilog->binary,
   debug, "epilog", file);
fprintf(file, "\n");
}
 
si_shader_dump_stats(sscreen, shader, processor, file,
 check_debug_option);
 }
 
-bool si_shader_binary_read_config(struct si_shader_binary *binary,
- struct ac_shader_config *conf)
-{
-   struct ac_rtld_binary rtld;
-   if (!ac_rtld_open(, 1, >elf_buffer, >elf_size))
-   return false;
-
-   bool ok = ac_rtld_read_config(, conf);
-
-   ac_rtld_close();
-   return ok;
-}
-
 static int si_compile_llvm(struct si_screen *sscreen,
   struct si_shader_binary *binary,
   struct ac_shader_config *conf,
   struct ac_llvm_compiler *compiler,
   LLVMModuleRef mod,
   struct pipe_debug_callback *debug,
   unsigned processor,
   const char *name,
   bool less_optimized)
 {
@@ -5350,21 +5337,27 @@ static int si_compile_llvm(struct si_screen *sscreen,
LLVMDisposeMessage(ir);
}
 
if (!si_replace_shader(count, binary)) {
unsigned r = si_llvm_compile(mod, binary, compiler, debug,
 less_optimized);
if (r)
return r;
}
 
-   if (!si_shader_binary_read_config(binary, conf))
+   struct ac_rtld_binary rtld;
+   if (!ac_rtld_open(, 1, >elf_buffer, >elf_size))
+   return -1;
+
+   bool ok = ac_rtld_read_config(, conf);
+   ac_rtld_close();
+   if (!ok)
return -1;
 
/* Enable 64-bit and 16-bit denormals, because there is no performance
 * cost.
 *
 * If denormals are enabled, all floating-point output modifiers are
 * ignored.
 *
 * Don't enable denormals for 32-bit floats, because:
 * - Floating-point output modifiers would be ignored by the hw.
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 302de427c04..ef9f5c379d3 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -685,22 +685,20 @@ unsigned si_shader_io_get_unique_index(unsigned 
semantic_name, unsigned index,
 bool si_shader_binary_upload(struct si_screen *sscreen, struct si_shader 
*shader,
 uint64_t scratch_va);
 void si_shader_dump(struct si_screen *sscreen, const struct si_shader *shader,
struct pipe_debug_callback *debug, unsigned processor,
FILE *f, bool check_debug_option);
 void si_shader_dump_stats_for_shader_db(const struct si_shader *shader,
struct pipe_debug_callback *debug);
 void si_multiwave_lds_size_workaround(struct si_screen *sscreen,
  unsigned *lds_size);
 const char *si_get_shader_name(const struct si_shader *shader, unsigned 
processor);
-bool si_shader_binary_read_config(struct si_shader_binary *binary,
- struct ac_shader_config *conf);
 void si_shader_binary_clean(struct si_shader_binary *binary);
 
 /* si_shader_nir.c */
 void si_nir_scan_shader(const struct nir_shader *nir,
struct tgsi_shader_info *info);
 void si_nir_scan_tess_ctrl(const struct nir_shader *nir,
   struct tgsi_tessctrl_info *out);
 void si_lower_nir(struct si_shader_selector *sel);
 
 /* Inline helpers. */
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/6] amd,radeonsi: link explicit LDS symbols

2019-05-04 Thread Nicolai Hähnle
this series builds on my recent series adding a runtime linker to now
support layout and relocation of explicit LDS symbols.

Currently, all our uses of LDS have a single LDS base pointer which
is defined either by an inttoptr case from 0 or as a single global LDS
symbol. This is fine for our current use cases, but it gets tedious
when we want to do more with LDS, such as keeping multiple logically
separately variables in LDS.

(LS/HS shaders are already affected by this issue, because they use LDS
for two conceptually separate things: vertex shader outputs to be read
by the TCS, and TCS outputs in case they are read back for cross-thread
communication. Ironically, since we don't know the LS/HS LDS data sizes
until draw time, this series won't help there.)

This series works in tandem with related changes in LLVM, see the changes
leading up to and including https://reviews.llvm.org/D61494:
- global in the LDS address space are written out as specially marked
  symbols to the ELF object by LLVM
- the Mesa rtld combines those symbols with driver-specified "shared"
  LDS symbols, where the "shared" means shared between multiple shader
  parts
- rtld calculates a layout for the objects in LDS: shared  symbols
  first, followed by private, per-shader-part symbols that can alias,
  followed by the special __lds_end symbol marking the end of LDS
  memory
- rtld resolves any relocations

For a smooth upgrade with Mesa master and LLVM trunk, the plan to upstream
these changes is:

1. Land at least the first two patches of this series, which add rtld
   support for the new LDS symbols.
2. Land the LLVM changes for generating the symbols in the ELF
3. Land the remainder of this series (this should mostly be possible
   earlier, actually).

Please review!

Thanks,
Nicolai
--
 src/amd/common/ac_rtld.c | 210 +++--
 src/amd/common/ac_rtld.h |  39 ++-
 src/gallium/drivers/radeonsi/si_compute.c|   9 +-
 src/gallium/drivers/radeonsi/si_debug.c  |  22 +-
 src/gallium/drivers/radeonsi/si_shader.c | 210 +++--
 src/gallium/drivers/radeonsi/si_shader.h |  26 +-
 src/gallium/drivers/radeonsi/si_state_draw.c |   5 +
 .../drivers/radeonsi/si_state_shaders.c  |  31 +--
 8 files changed, 431 insertions(+), 121 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/6] amd/rtld: layout and relocate LDS symbols

2019-05-04 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Upcoming changes to LLVM will emit LDS objects as symbols in the ELF
symbol table, with relocations that will be resolved with this change.

Callers will also be able to define LDS symbols that are shared between
shader parts. This will be used by radeonsi for the ESGS ring in gfx9+
merged shaders.
---
 src/amd/common/ac_rtld.c  | 210 --
 src/amd/common/ac_rtld.h  |  39 +++-
 src/gallium/drivers/radeonsi/si_compute.c |   9 +-
 src/gallium/drivers/radeonsi/si_debug.c   |  22 +-
 src/gallium/drivers/radeonsi/si_shader.c  |  61 +++--
 src/gallium/drivers/radeonsi/si_shader.h  |   5 +-
 .../drivers/radeonsi/si_state_shaders.c   |   2 +-
 7 files changed, 296 insertions(+), 52 deletions(-)

diff --git a/src/amd/common/ac_rtld.c b/src/amd/common/ac_rtld.c
index 4e0468d2062..3df7b3ba51f 100644
--- a/src/amd/common/ac_rtld.c
+++ b/src/amd/common/ac_rtld.c
@@ -24,25 +24,31 @@
 #include "ac_rtld.h"
 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 
 #include "ac_binary.h"
+#include "ac_gpu_info.h"
+#include "util/u_dynarray.h"
 #include "util/u_math.h"
 
 // Old distributions may not have this enum constant
 #define MY_EM_AMDGPU 224
 
+#ifndef STT_AMDGPU_LDS
+#define STT_AMDGPU_LDS 13
+#endif
+
 #ifndef R_AMDGPU_NONE
 #define R_AMDGPU_NONE 0
 #define R_AMDGPU_ABS32_LO 1
 #define R_AMDGPU_ABS32_HI 2
 #define R_AMDGPU_ABS64 3
 #define R_AMDGPU_REL32 4
 #define R_AMDGPU_REL64 5
 #define R_AMDGPU_ABS32 6
 #define R_AMDGPU_GOTPCREL 7
 #define R_AMDGPU_GOTPCREL32_LO 8
@@ -97,41 +103,155 @@ static void report_elf_errorf(const char *fmt, ...) 
PRINTFLIKE(1, 2);
 static void report_elf_errorf(const char *fmt, ...)
 {
va_list va;
va_start(va, fmt);
report_erroraf(fmt, va);
va_end(va);
 
fprintf(stderr, "ELF error: %s\n", elf_errmsg(elf_errno()));
 }
 
+/**
+ * Find a symbol in a dynarray of struct ac_rtld_symbol by \p name and shader
+ * \p part_idx.
+ */
+static const struct ac_rtld_symbol *find_symbol(const struct util_dynarray 
*symbols,
+   const char *name, unsigned 
part_idx)
+{
+   util_dynarray_foreach(symbols, struct ac_rtld_symbol, symbol) {
+   if ((symbol->part_idx == ~0u || symbol->part_idx == part_idx) &&
+   !strcmp(name, symbol->name))
+   return symbol;
+   }
+   return 0;
+}
+
+static int compare_symbol_by_align(const void *lhsp, const void *rhsp)
+{
+   const struct ac_rtld_symbol *lhs = lhsp;
+   const struct ac_rtld_symbol *rhs = rhsp;
+   if (rhs->align > lhs->align)
+   return -1;
+   if (rhs->align < lhs->align)
+   return 1;
+   return 0;
+}
+
+/**
+ * Sort the given symbol list by decreasing alignment and assign offsets.
+ */
+static bool layout_symbols(struct ac_rtld_symbol *symbols, unsigned 
num_symbols,
+  uint64_t *ptotal_size)
+{
+   qsort(symbols, num_symbols, sizeof(*symbols), compare_symbol_by_align);
+
+   uint64_t total_size = *ptotal_size;
+
+   for (unsigned i = 0; i < num_symbols; ++i) {
+   struct ac_rtld_symbol *s = [i];
+   assert(util_is_power_of_two_nonzero(s->align));
+
+   total_size = align64(total_size, s->align);
+   s->offset = total_size;
+
+   if (total_size + s->size < total_size) {
+   report_errorf("%s: size overflow", __FUNCTION__);
+   return false;
+   }
+
+   total_size += s->size;
+   }
+
+   *ptotal_size = total_size;
+   return true;
+}
+
+/**
+ * Read LDS symbols from the given \p section of the ELF of \p part and append
+ * them to the LDS symbols list.
+ *
+ * Shared LDS symbols are filtered out.
+ */
+static bool read_private_lds_symbols(struct ac_rtld_binary *binary,
+unsigned part_idx,
+Elf_Scn *section,
+uint32_t *lds_end_align)
+{
+#define report_elf_if(cond) \
+   do { \
+   if ((cond)) { \
+   report_errorf(#cond); \
+   return false; \
+   } \
+   } while (false)
+
+   struct ac_rtld_part *part = >parts[part_idx];
+   Elf64_Shdr *shdr = elf64_getshdr(section);
+   uint32_t strtabidx = shdr->sh_link;
+   Elf_Data *symbols_data = elf_getdata(section, NULL);
+   report_elf_if(!symbols_data);
+
+   const Elf64_Sym *symbol = symbols_data->d_buf;
+   size_t num_symbols = symbols_data->d_size / sizeof(Elf64_Sym);
+
+   for (size_t j = 0; j < num_symbols; ++j, ++symbol) {
+   if (ELF64_ST_TYPE(symbol->st_info) != STT_AMDG

[Mesa-dev] [PATCH 5/6] radeonsi: use an explicit symbol for the LSHS LDS memory

2019-05-04 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeonsi/si_shader.c | 17 +++--
 src/gallium/drivers/radeonsi/si_state_draw.c |  5 +
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index d127b525963..0cf4d01a36f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4842,22 +4842,35 @@ static void create_function(struct si_shader_context 
*ctx)
 
for (i = 0; i < fninfo.num_sgpr_params; ++i)
shader->info.num_input_sgprs += 
ac_get_type_size(fninfo.types[i]) / 4;
 
for (; i < fninfo.num_params; ++i)
shader->info.num_input_vgprs += 
ac_get_type_size(fninfo.types[i]) / 4;
 
assert(shader->info.num_input_vgprs >= num_prolog_vgprs);
shader->info.num_input_vgprs -= num_prolog_vgprs;
 
-   if (shader->key.as_ls || ctx->type == PIPE_SHADER_TESS_CTRL)
-   ac_declare_lds_as_pointer(>ac);
+   if (shader->key.as_ls || ctx->type == PIPE_SHADER_TESS_CTRL) {
+   if (USE_LDS_SYMBOLS && HAVE_LLVM >= 0x0900) {
+   /* The LSHS size is not known until draw time, so we 
append it
+* at the end of whatever LDS use there may be in the 
rest of
+* the shader (currently none, unless LLVM decides to 
do its
+* own LDS-based lowering).
+*/
+   ctx->ac.lds = LLVMAddGlobalInAddressSpace(
+   ctx->ac.module, LLVMArrayType(ctx->i32, 0),
+   "__lds_end", AC_ADDR_SPACE_LDS);
+   LLVMSetAlignment(ctx->ac.lds, 256);
+   } else {
+   ac_declare_lds_as_pointer(>ac);
+   }
+   }
 }
 
 /**
  * Load ESGS and GSVS ring buffer resource descriptors and save the variables
  * for later use.
  */
 static void preload_ring_buffers(struct si_shader_context *ctx)
 {
LLVMBuilderRef builder = ctx->ac.builder;
 
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 8e01e1b35e1..011aaf18ab1 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -244,20 +244,25 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
} else {
assert(lds_size <= 32768);
lds_size = align(lds_size, 256) / 256;
}
 
/* Set SI_SGPR_VS_STATE_BITS. */
sctx->current_vs_state &= C_VS_STATE_LS_OUT_PATCH_SIZE &
  C_VS_STATE_LS_OUT_VERTEX_SIZE;
sctx->current_vs_state |= tcs_in_layout;
 
+   /* We should be able to support in-shader LDS use with LLVM >= 9
+* by just adding the lds_sizes together, but it has never
+* been tested. */
+   assert(ls_current->config.lds_size == 0);
+
if (sctx->chip_class >= GFX9) {
unsigned hs_rsrc2 = ls_current->config.rsrc2 |
S_00B42C_LDS_SIZE(lds_size);
 
radeon_set_sh_reg(cs, R_00B42C_SPI_SHADER_PGM_RSRC2_HS, 
hs_rsrc2);
 
/* Set userdata SGPRs for merged LS-HS. */
radeon_set_sh_reg_seq(cs,
  R_00B430_SPI_SHADER_USER_DATA_LS_0 +
  GFX9_SGPR_TCS_OFFCHIP_LAYOUT * 4, 3);
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/6] radeonsi: rename lds_{load, store} to lshs_lds_{load, store}

2019-05-04 Thread Nicolai Hähnle
From: Nicolai Hähnle 

These functions are now only used in LS/HS shaders (both separate and
merged).
---
 src/gallium/drivers/radeonsi/si_shader.c | 33 
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index f95a96f2458..d127b525963 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -999,68 +999,68 @@ static LLVMValueRef buffer_load(struct 
lp_build_tgsi_context *bld_base,
value = ac_build_buffer_load(>ac, buffer, 1, NULL, base, offset,
  swizzle * 4, 1, 0, can_speculate, false);
 
value2 = ac_build_buffer_load(>ac, buffer, 1, NULL, base, offset,
   swizzle * 4 + 4, 1, 0, can_speculate, false);
 
return si_llvm_emit_fetch_64bit(bld_base, type, value, value2);
 }
 
 /**
- * Load from LDS.
+ * Load from LSHS LDS storage.
  *
  * \param type output value type
  * \param swizzle  offset (typically 0..3); it can be ~0, which loads a 
vec4
  * \param dw_addr  address in dwords
  */
-static LLVMValueRef lds_load(struct lp_build_tgsi_context *bld_base,
+static LLVMValueRef lshs_lds_load(struct lp_build_tgsi_context *bld_base,
 LLVMTypeRef type, unsigned swizzle,
 LLVMValueRef dw_addr)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMValueRef value;
 
if (swizzle == ~0) {
LLVMValueRef values[TGSI_NUM_CHANNELS];
 
for (unsigned chan = 0; chan < TGSI_NUM_CHANNELS; chan++)
-   values[chan] = lds_load(bld_base, type, chan, dw_addr);
+   values[chan] = lshs_lds_load(bld_base, type, chan, 
dw_addr);
 
return ac_build_gather_values(>ac, values,
  TGSI_NUM_CHANNELS);
}
 
/* Split 64-bit loads. */
if (llvm_type_is_64bit(ctx, type)) {
LLVMValueRef lo, hi;
 
-   lo = lds_load(bld_base, ctx->i32, swizzle, dw_addr);
-   hi = lds_load(bld_base, ctx->i32, swizzle + 1, dw_addr);
+   lo = lshs_lds_load(bld_base, ctx->i32, swizzle, dw_addr);
+   hi = lshs_lds_load(bld_base, ctx->i32, swizzle + 1, dw_addr);
return si_llvm_emit_fetch_64bit(bld_base, type, lo, hi);
}
 
dw_addr = LLVMBuildAdd(ctx->ac.builder, dw_addr,
   LLVMConstInt(ctx->i32, swizzle, 0), "");
 
value = ac_lds_load(>ac, dw_addr);
 
return LLVMBuildBitCast(ctx->ac.builder, value, type, "");
 }
 
 /**
- * Store to LDS.
+ * Store to LSHS LDS storage.
  *
  * \param swizzle  offset (typically 0..3)
  * \param dw_addr  address in dwords
  * \param valuevalue to store
  */
-static void lds_store(struct si_shader_context *ctx,
+static void lshs_lds_store(struct si_shader_context *ctx,
  unsigned dw_offset_imm, LLVMValueRef dw_addr,
  LLVMValueRef value)
 {
dw_addr = LLVMBuildAdd(ctx->ac.builder, dw_addr,
   LLVMConstInt(ctx->i32, dw_offset_imm, 0), "");
 
ac_lds_store(>ac, dw_addr, value);
 }
 
 enum si_tess_ring {
@@ -1110,21 +1110,21 @@ static LLVMValueRef fetch_input_tcs(
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type, unsigned swizzle_in)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMValueRef dw_addr, stride;
unsigned swizzle = swizzle_in & 0x;
stride = get_tcs_in_vertex_dw_stride(ctx);
dw_addr = get_tcs_in_current_patch_offset(ctx);
dw_addr = get_dw_address(ctx, NULL, reg, stride, dw_addr);
 
-   return lds_load(bld_base, tgsi2llvmtype(bld_base, type), swizzle, 
dw_addr);
+   return lshs_lds_load(bld_base, tgsi2llvmtype(bld_base, type), swizzle, 
dw_addr);
 }
 
 static LLVMValueRef si_nir_load_tcs_varyings(struct ac_shader_abi *abi,
 LLVMTypeRef type,
 LLVMValueRef vertex_index,
 LLVMValueRef param_index,
 unsigned const_index,
 unsigned location,
 unsigned driver_location,
 unsigned component,
@@ -1177,21 +1177,21 @@ static LLVMValueRef si_nir_load_tcs_varyings(struct 
ac_shader_abi *abi,
  names, indices,
  is_patch);
 
LLVMValueRef value[4];
for 

[Mesa-dev] [PATCH 3/6] radeonsi/gfx9: declare LDS ESGS ring as an explicit symbol on LLVM >= 9

2019-05-04 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This will make it easier to use LDS for other purposes in geometry
shaders in the future.

The lifetime of the esgs_ring variable is as follows:
- declared as [0 x i32] while compiling shader parts or monolithic shaders
- just before uploading, gfx9_get_gs_info computes (among other things)
  the final ESGS ring size (this depends on both the ES and the GS shader)
- during upload, the "esgs_ring" symbol is given to ac_rtld as a shared
  LDS symbol, which will lead to correctly laying out the LDS including
  other LDS objects that may be defined in the future
- si_shader_gs uses shader->config.lds_size as the LDS size

This change depends on the LLVM changes for emitting LDS symbols into
the ELF file.
---
 src/gallium/drivers/radeonsi/si_shader.c  | 82 +++
 src/gallium/drivers/radeonsi/si_shader.h  | 19 +
 .../drivers/radeonsi/si_state_shaders.c   | 29 ++-
 3 files changed, 94 insertions(+), 36 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 6968038d4d0..f95a96f2458 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1527,23 +1527,36 @@ LLVMValueRef si_llvm_load_input_gs(struct ac_shader_abi 
*abi,
break;
case 2:
vtx_offset = si_unpack_param(ctx, 
ctx->param_gs_vtx45_offset,
  index % 2 ? 16 : 0, 16);
break;
default:
assert(0);
return NULL;
}
 
+   unsigned offset = param * 4 + swizzle;
vtx_offset = LLVMBuildAdd(ctx->ac.builder, vtx_offset,
- LLVMConstInt(ctx->i32, param * 4, 0), 
"");
-   return lds_load(bld_base, type, swizzle, vtx_offset);
+ LLVMConstInt(ctx->i32, offset, 
false), "");
+
+   LLVMValueRef ptr = ac_build_gep0(>ac, ctx->esgs_ring, 
vtx_offset);
+   LLVMValueRef value = LLVMBuildLoad(ctx->ac.builder, ptr, "");
+   if (llvm_type_is_64bit(ctx, type)) {
+   ptr = LLVMBuildGEP(ctx->ac.builder, ptr,
+  >ac.i32_1, 1, "");
+   LLVMValueRef values[2] = {
+   value,
+   LLVMBuildLoad(ctx->ac.builder, ptr, "")
+   };
+   value = ac_build_gather_values(>ac, values, 2);
+   }
+   return LLVMBuildBitCast(ctx->ac.builder, value, type, "");
}
 
/* GFX6: input load from the ESGS ring in memory. */
if (swizzle == ~0) {
LLVMValueRef values[TGSI_NUM_CHANNELS];
unsigned chan;
for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) {
values[chan] = si_llvm_load_input_gs(abi, input_index, 
vtx_offset_param,
 type, chan);
}
@@ -3424,21 +3437,23 @@ static void si_llvm_emit_es_epilogue(struct 
ac_shader_abi *abi,
 
for (chan = 0; chan < 4; chan++) {
if (!(info->output_usagemask[i] & (1 << chan)))
continue;
 
LLVMValueRef out_val = LLVMBuildLoad(ctx->ac.builder, 
addrs[4 * i + chan], "");
out_val = ac_to_integer(>ac, out_val);
 
/* GFX9 has the ESGS ring in LDS. */
if (ctx->screen->info.chip_class >= GFX9) {
-   lds_store(ctx, param * 4 + chan, lds_base, 
out_val);
+   LLVMValueRef idx = LLVMConstInt(ctx->i32, param 
* 4 + chan, false);
+   idx = LLVMBuildAdd(ctx->ac.builder, lds_base, 
idx, "");
+   ac_build_indexed_store(>ac, 
ctx->esgs_ring, idx, out_val);
continue;
}
 
ac_build_buffer_store_dword(>ac,
ctx->esgs_ring,
out_val, 1, NULL, soffset,
(4 * param + chan) * 4,
1, 1, true, true);
}
}
@@ -4828,47 +4843,62 @@ static void create_function(struct si_shader_context 
*ctx)
 
for (i = 0; i < fninfo.num_sgpr_params; ++i)
shader->info.num_input_sgprs += 
ac_get_type_size(fninfo.types[i]) / 4;
 
for (; i < fnin

[Mesa-dev] [PATCH 6/6] radeonsi: raise the alignment of LDS memory for compute shaders

2019-05-04 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This implies that the memory will always be at address 0, which allows
LLVM to generate slightly better code.
---
 src/gallium/drivers/radeonsi/si_shader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 0cf4d01a36f..91f4c177bd0 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -2201,21 +2201,21 @@ void si_declare_compute_memory(struct si_shader_context 
*ctx)
 
LLVMTypeRef i8p = LLVMPointerType(ctx->i8, AC_ADDR_SPACE_LDS);
LLVMValueRef var;
 
assert(!ctx->ac.lds);
 
var = LLVMAddGlobalInAddressSpace(ctx->ac.module,
  LLVMArrayType(ctx->i8, lds_size),
  "compute_lds",
  AC_ADDR_SPACE_LDS);
-   LLVMSetAlignment(var, 4);
+   LLVMSetAlignment(var, 64 * 1024);
 
ctx->ac.lds = LLVMBuildBitCast(ctx->ac.builder, var, i8p, "");
 }
 
 void si_tgsi_declare_compute_memory(struct si_shader_context *ctx,
const struct tgsi_full_declaration *decl)
 {
assert(decl->Declaration.MemType == TGSI_MEMORY_TYPE_SHARED);
assert(decl->Range.First == decl->Range.Last);
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] u_dynarray: return 0 on realloc failure

2019-05-04 Thread Nicolai Hähnle
From: Nicolai Hähnle 

We're not very good at handling out-of-memory conditions in general, but
this change at least gives the caller the option of handling it.

This happens to fix an error in out-of-memory handling in i965, which has
the following code in brw_bufmgr.c:

  node = util_dynarray_grow(vma_list, sizeof(struct vma_bucket_node));
  if (unlikely(!node))
 return 0ull;

Previously, allocation failure for util_dynarray_grow wouldn't actually
return NULL when the dynarray was previously non-empty.
---
 src/util/u_dynarray.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/util/u_dynarray.h b/src/util/u_dynarray.h
index b30fd7b1154..f6a81609dbe 100644
--- a/src/util/u_dynarray.h
+++ b/src/util/u_dynarray.h
@@ -85,20 +85,22 @@ util_dynarray_ensure_cap(struct util_dynarray *buf, 
unsigned newcap)
  buf->capacity = DYN_ARRAY_INITIAL_SIZE;
 
   while (newcap > buf->capacity)
  buf->capacity *= 2;
 
   if (buf->mem_ctx) {
  buf->data = reralloc_size(buf->mem_ctx, buf->data, buf->capacity);
   } else {
  buf->data = realloc(buf->data, buf->capacity);
   }
+  if (!buf->data)
+ return 0;
}
 
return (void *)((char *)buf->data + buf->size);
 }
 
 static inline void *
 util_dynarray_grow_cap(struct util_dynarray *buf, int diff)
 {
return util_dynarray_ensure_cap(buf, buf->size + diff);
 }
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macros

2019-05-04 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The main motivation for this change is API ergonomics: most operations
on dynarrays are really on elements, not on bytes, so it's weird to have
grow and resize as the odd operations out.

The secondary motivation is memory safety. Users of the old byte-oriented
functions would often multiply a number of elements with the element size,
which could overflow, and checking for overflow is tedious.

With this change, we only need to implement the overflow checks once.
The checks are cheap: since eltsize is a compile-time constant and the
functions should be inlined, they only add a single comparison and an
unlikely branch.
---
 .../drivers/nouveau/nv30/nvfx_fragprog.c  |  2 +-
 src/gallium/drivers/nouveau/nv50/nv50_state.c |  5 +--
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c |  5 +--
 .../compiler/brw_nir_analyze_ubo_ranges.c |  2 +-
 src/mesa/drivers/dri/i965/brw_bufmgr.c|  4 +-
 src/util/u_dynarray.h | 38 +--
 6 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c 
b/src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c
index 86e3599325e..2bcb62b97d8 100644
--- a/src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c
+++ b/src/gallium/drivers/nouveau/nv30/nvfx_fragprog.c
@@ -66,21 +66,21 @@ release_temps(struct nvfx_fpc *fpc)
fpc->r_temps &= ~fpc->r_temps_discard;
fpc->r_temps_discard = 0ULL;
 }
 
 static inline struct nvfx_reg
 nvfx_fp_imm(struct nvfx_fpc *fpc, float a, float b, float c, float d)
 {
float v[4] = {a, b, c, d};
int idx = fpc->imm_data.size >> 4;
 
-   memcpy(util_dynarray_grow(>imm_data, sizeof(float) * 4), v, 4 * 
sizeof(float));
+   memcpy(util_dynarray_grow(>imm_data, float, 4), v, 4 * sizeof(float));
return nvfx_reg(NVFXSR_IMM, idx);
 }
 
 static void
 grow_insns(struct nvfx_fpc *fpc, int size)
 {
struct nv30_fragprog *fp = fpc->fp;
 
fp->insn_len += size;
fp->insn = realloc(fp->insn, sizeof(uint32_t) * fp->insn_len);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c 
b/src/gallium/drivers/nouveau/nv50/nv50_state.c
index 55167a27c09..228feced5d1 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c
@@ -1256,24 +1256,23 @@ nv50_set_global_bindings(struct pipe_context *pipe,
  struct pipe_resource **resources,
  uint32_t **handles)
 {
struct nv50_context *nv50 = nv50_context(pipe);
struct pipe_resource **ptr;
unsigned i;
const unsigned end = start + nr;
 
if (nv50->global_residents.size <= (end * sizeof(struct pipe_resource *))) {
   const unsigned old_size = nv50->global_residents.size;
-  const unsigned req_size = end * sizeof(struct pipe_resource *);
-  util_dynarray_resize(>global_residents, req_size);
+  util_dynarray_resize(>global_residents, struct pipe_resource *, 
end);
   memset((uint8_t *)nv50->global_residents.data + old_size, 0,
- req_size - old_size);
+ nv50->global_residents.size - old_size);
}
 
if (resources) {
   ptr = util_dynarray_element(
  >global_residents, struct pipe_resource *, start);
   for (i = 0; i < nr; ++i) {
  pipe_resource_reference([i], resources[i]);
  nv50_set_global_handle(handles[i], resources[i]);
   }
} else {
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
index 12e21862ee0..2ab51c8529e 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
@@ -1363,24 +1363,23 @@ nvc0_set_global_bindings(struct pipe_context *pipe,
  struct pipe_resource **resources,
  uint32_t **handles)
 {
struct nvc0_context *nvc0 = nvc0_context(pipe);
struct pipe_resource **ptr;
unsigned i;
const unsigned end = start + nr;
 
if (nvc0->global_residents.size <= (end * sizeof(struct pipe_resource *))) {
   const unsigned old_size = nvc0->global_residents.size;
-  const unsigned req_size = end * sizeof(struct pipe_resource *);
-  util_dynarray_resize(>global_residents, req_size);
+  util_dynarray_resize(>global_residents, struct pipe_resource *, 
end);
   memset((uint8_t *)nvc0->global_residents.data + old_size, 0,
- req_size - old_size);
+ nvc0->global_residents.size - old_size);
}
 
if (resources) {
   ptr = util_dynarray_element(
  >global_residents, struct pipe_resource *, start);
   for (i = 0; i < nr; ++i) {
  pipe_resource_reference([i], resources[i]);
  nvc0_set_global_handle(handles[i], resources[i]);
   }
} else {
diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c 
b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c

[Mesa-dev] [PATCH 1/3] freedreno: use util_dynarray_clear instead of util_dynarray_resize(_, 0)

2019-05-04 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This is more expressive and simplifies a subsequent change.
---
 src/gallium/drivers/freedreno/a2xx/fd2_gmem.c | 12 ++--
 src/gallium/drivers/freedreno/a3xx/fd3_gmem.c |  4 ++--
 src/gallium/drivers/freedreno/a4xx/fd4_gmem.c |  2 +-
 src/gallium/drivers/freedreno/a5xx/fd5_gmem.c |  2 +-
 src/gallium/drivers/freedreno/a6xx/fd6_gmem.c |  2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
index 0c7ea844fa4..0edc5e940c1 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_gmem.c
@@ -397,21 +397,21 @@ static void
 patch_draws(struct fd_batch *batch, enum pc_di_vis_cull_mode vismode)
 {
unsigned i;
 
if (!is_a20x(batch->ctx->screen)) {
/* identical to a3xx */
for (i = 0; i < fd_patch_num_elements(>draw_patches); 
i++) {
struct fd_cs_patch *patch = 
fd_patch_element(>draw_patches, i);
*patch->cs = patch->val | DRAW(0, 0, 0, vismode, 0);
}
-   util_dynarray_resize(>draw_patches, 0);
+   util_dynarray_clear(>draw_patches);
return;
}
 
if (vismode == USE_VISIBILITY)
return;
 
for (i = 0; i < batch->draw_patches.size / sizeof(uint32_t*); i++) {
uint32_t *ptr = *util_dynarray_element(>draw_patches, 
uint32_t*, i);
unsigned cnt = ptr[0] >> 16 & 0xfff; /* 5 with idx buffer, 3 
without */
 
@@ -465,22 +465,22 @@ fd2_emit_sysmem_prep(struct fd_batch *batch)
OUT_RING(ring, A2XX_PA_SC_SCREEN_SCISSOR_TL_WINDOW_OFFSET_DISABLE);
OUT_RING(ring, A2XX_PA_SC_SCREEN_SCISSOR_BR_X(pfb->width) |
A2XX_PA_SC_SCREEN_SCISSOR_BR_Y(pfb->height));
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_PA_SC_WINDOW_OFFSET));
OUT_RING(ring, A2XX_PA_SC_WINDOW_OFFSET_X(0) |
A2XX_PA_SC_WINDOW_OFFSET_Y(0));
 
patch_draws(batch, IGNORE_VISIBILITY);
-   util_dynarray_resize(>draw_patches, 0);
-   util_dynarray_resize(>shader_patches, 0);
+   util_dynarray_clear(>draw_patches);
+   util_dynarray_clear(>shader_patches);
 }
 
 /* before first tile */
 static void
 fd2_emit_tile_init(struct fd_batch *batch)
 {
struct fd_context *ctx = batch->ctx;
struct fd_ringbuffer *ring = batch->gmem;
struct pipe_framebuffer_state *pfb = >framebuffer;
struct fd_gmem_stateobj *gmem = >gmem;
@@ -544,21 +544,21 @@ fd2_emit_tile_init(struct fd_batch *batch)
continue;
}
 
patch->cs[0] = A2XX_PA_SC_SCREEN_SCISSOR_BR_X(32) |
A2XX_PA_SC_SCREEN_SCISSOR_BR_Y(lines);
patch->cs[4] = A2XX_RB_COLOR_INFO_BASE(color_base) |
A2XX_RB_COLOR_INFO_FORMAT(COLORX_8_8_8_8);
patch->cs[5] = A2XX_RB_DEPTH_INFO_DEPTH_BASE(depth_base) |
A2XX_RB_DEPTH_INFO_DEPTH_FORMAT(1);
}
-   util_dynarray_resize(>gmem_patches, 0);
+   util_dynarray_clear(>gmem_patches);
 
/* set to zero, for some reason hardware doesn't like certain values */
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_CURRENT_BIN_ID_MIN));
OUT_RING(ring, 0);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_CURRENT_BIN_ID_MAX));
OUT_RING(ring, 0);
 
@@ -649,22 +649,22 @@ fd2_emit_tile_init(struct fd_batch *batch)
 
ctx->emit_ib(ring, batch->binning);
 
OUT_PKT3(ring, CP_SET_CONSTANT, 2);
OUT_RING(ring, CP_REG(REG_A2XX_VGT_VERTEX_REUSE_BLOCK_CNTL));
OUT_RING(ring, 0x0002);
} else {
patch_draws(batch, IGNORE_VISIBILITY);
}
 
-   util_dynarray_resize(>draw_patches, 0);
-   util_dynarray_resize(>shader_patches, 0);
+   util_dynarray_clear(>draw_patches);
+   util_dynarray_clear(>shader_patches);
 }
 
 /* before mem2gmem */
 static void
 fd2_emit_tile_prep(struct fd_batch *batch, struct fd_tile *tile)
 {
struct fd_ringbuffer *ring = batch->gmem;
struct pipe_framebuffer_state *pfb = >framebuffer;
enum pipe_format format = pipe_surface_format(pfb->cbufs[0]);
 
diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_gmem.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_gmem.c
index 7de0a92cdc1..e4455b3fa63 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_gmem.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_gmem.c
@@ -704,32 +704,32 @@ fd3_emit_tile_mem2gmem(struct fd_batch *batch, struct 
fd_tile *tile)
 }
 
 static void
 patch_draws(struct fd_batch *batch, en

[Mesa-dev] [PATCH 0/3] u_dynarray: minor API cleanups

2019-05-04 Thread Nicolai Hähnle
just some small changes that should make util_dynarray more convenient
and safer to use.

Please review!

Thanks,
Nicolai
--
 .../drivers/freedreno/a2xx/fd2_gmem.c| 12 +++---
 .../drivers/freedreno/a3xx/fd3_gmem.c|  4 +-
 .../drivers/freedreno/a4xx/fd4_gmem.c|  2 +-
 .../drivers/freedreno/a5xx/fd5_gmem.c|  2 +-
 .../drivers/freedreno/a6xx/fd6_gmem.c|  2 +-
 .../drivers/nouveau/nv30/nvfx_fragprog.c |  2 +-
 .../drivers/nouveau/nv50/nv50_state.c|  5 +--
 .../drivers/nouveau/nvc0/nvc0_state.c|  5 +--
 .../compiler/brw_nir_analyze_ubo_ranges.c|  2 +-
 src/mesa/drivers/dri/i965/brw_bufmgr.c   |  4 +-
 src/util/u_dynarray.h| 40 +-
 11 files changed, 48 insertions(+), 32 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/10] radeonsi: don't declare pointers to static strings

2019-05-03 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The compiler should be able to optimize them away, but still. There's
no point in declaring those as pointers, and if the compiler *doesn't*
optimize them away, they add unnecessary load-time relocations.
---
 src/gallium/drivers/radeonsi/si_shader.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 71c85eb79a5..c457ca12b9a 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -30,24 +30,24 @@
 
 #include "ac_exp_param.h"
 #include "ac_shader_util.h"
 #include "ac_llvm_util.h"
 #include "si_shader_internal.h"
 #include "si_pipe.h"
 #include "sid.h"
 
 #include "compiler/nir/nir.h"
 
-static const char *scratch_rsrc_dword0_symbol =
+static const char scratch_rsrc_dword0_symbol[] =
"SCRATCH_RSRC_DWORD0";
 
-static const char *scratch_rsrc_dword1_symbol =
+static const char scratch_rsrc_dword1_symbol[] =
"SCRATCH_RSRC_DWORD1";
 
 struct si_shader_output_values
 {
LLVMValueRef values[4];
unsigned semantic_name;
unsigned semantic_index;
ubyte vertex_stream[4];
 };
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/10] amd/common: clarify ac_shader_binary::lds_size

2019-05-03 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/amd/common/ac_binary.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/common/ac_binary.h b/src/amd/common/ac_binary.h
index febc4da7fed..8f594a9ce75 100644
--- a/src/amd/common/ac_binary.h
+++ b/src/amd/common/ac_binary.h
@@ -68,21 +68,21 @@ struct ac_shader_binary {
/** Disassembled shader in a string. */
char *disasm_string;
char *llvm_ir_string;
 };
 
 struct ac_shader_config {
unsigned num_sgprs;
unsigned num_vgprs;
unsigned spilled_sgprs;
unsigned spilled_vgprs;
-   unsigned lds_size;
+   unsigned lds_size; /* in HW allocation units; i.e 256 bytes on SI, 512 
bytes on CI+ */
unsigned spi_ps_input_ena;
unsigned spi_ps_input_addr;
unsigned float_mode;
unsigned scratch_bytes_per_wave;
 };
 
 /*
  * Parse the elf binary stored in \p elf_data and create a
  * ac_shader_binary object.
  */
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/10] amd/common: extract ac_parse_shader_binary_config

2019-05-03 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/amd/common/ac_binary.c | 77 +-
 src/amd/common/ac_binary.h |  4 ++
 2 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/src/amd/common/ac_binary.c b/src/amd/common/ac_binary.c
index fabeb15a204..44251886b5f 100644
--- a/src/amd/common/ac_binary.c
+++ b/src/amd/common/ac_binary.c
@@ -199,57 +199,30 @@ const unsigned char *ac_shader_binary_config_start(
unsigned i;
for (i = 0; i < binary->global_symbol_count; ++i) {
if (binary->global_symbol_offsets[i] == symbol_offset) {
unsigned offset = i * binary->config_size_per_symbol;
return binary->config + offset;
}
}
return binary->config;
 }
 
-
-static const char *scratch_rsrc_dword0_symbol =
-   "SCRATCH_RSRC_DWORD0";
-
-static const char *scratch_rsrc_dword1_symbol =
-   "SCRATCH_RSRC_DWORD1";
-
-void ac_shader_binary_read_config(struct ac_shader_binary *binary,
- struct ac_shader_config *conf,
- unsigned symbol_offset,
- bool supports_spill)
+/* Parse configuration data in .AMDGPU.config section format. */
+void ac_parse_shader_binary_config(const char *data, size_t nbytes,
+  bool really_needs_scratch,
+  struct ac_shader_config *conf)
 {
-   unsigned i;
-   const unsigned char *config =
-   ac_shader_binary_config_start(binary, symbol_offset);
-   bool really_needs_scratch = false;
uint32_t wavesize = 0;
-   /* LLVM adds SGPR spills to the scratch size.
-* Find out if we really need the scratch buffer.
-*/
-   if (supports_spill) {
-   really_needs_scratch = true;
-   } else {
-   for (i = 0; i < binary->reloc_count; i++) {
-   const struct ac_shader_reloc *reloc = 
>relocs[i];
 
-   if (!strcmp(scratch_rsrc_dword0_symbol, reloc->name) ||
-   !strcmp(scratch_rsrc_dword1_symbol, reloc->name)) {
-   really_needs_scratch = true;
-   break;
-   }
-   }
-   }
-
-   for (i = 0; i < binary->config_size_per_symbol; i+= 8) {
-   unsigned reg = util_le32_to_cpu(*(uint32_t*)(config + i));
-   unsigned value = util_le32_to_cpu(*(uint32_t*)(config + i + 4));
+   for (size_t i = 0; i < nbytes; i += 8) {
+   unsigned reg = util_le32_to_cpu(*(uint32_t*)(data + i));
+   unsigned value = util_le32_to_cpu(*(uint32_t*)(data + i + 4));
switch (reg) {
case R_00B028_SPI_SHADER_PGM_RSRC1_PS:
case R_00B128_SPI_SHADER_PGM_RSRC1_VS:
case R_00B228_SPI_SHADER_PGM_RSRC1_GS:
case R_00B848_COMPUTE_PGM_RSRC1:
case R_00B428_SPI_SHADER_PGM_RSRC1_HS:
conf->num_sgprs = MAX2(conf->num_sgprs, 
(G_00B028_SGPRS(value) + 1) * 8);
conf->num_vgprs = MAX2(conf->num_vgprs, 
(G_00B028_VGPRS(value) + 1) * 4);
conf->float_mode =  G_00B028_FLOAT_MODE(value);
break;
@@ -292,20 +265,56 @@ void ac_shader_binary_read_config(struct ac_shader_binary 
*binary,
if (!conf->spi_ps_input_addr)
conf->spi_ps_input_addr = conf->spi_ps_input_ena;
}
 
if (really_needs_scratch) {
/* sgprs spills aren't spilling */
conf->scratch_bytes_per_wave = G_00B860_WAVESIZE(wavesize) * 
256 * 4;
}
 }
 
+static const char *scratch_rsrc_dword0_symbol =
+   "SCRATCH_RSRC_DWORD0";
+
+static const char *scratch_rsrc_dword1_symbol =
+   "SCRATCH_RSRC_DWORD1";
+
+void ac_shader_binary_read_config(struct ac_shader_binary *binary,
+ struct ac_shader_config *conf,
+ unsigned symbol_offset,
+ bool supports_spill)
+{
+   unsigned i;
+   const char *config =
+   (const char *)ac_shader_binary_config_start(binary, 
symbol_offset);
+   bool really_needs_scratch = false;
+   /* LLVM adds SGPR spills to the scratch size.
+* Find out if we really need the scratch buffer.
+*/
+   if (supports_spill) {
+   really_needs_scratch = true;
+   } else {
+   for (i = 0; i < binary->reloc_count; i++) {
+   const struct ac_shader_reloc *reloc = 
>relocs[i];
+
+   if (!strcmp(scratch_rsrc_dword0_symbol, reloc->name) ||
+   !strcmp(scratch_rsrc_dword1_s

[Mesa-dev] [PATCH 06/10] radeonsi: return bool from si_shader_binary_upload

2019-05-03 Thread Nicolai Hähnle
From: Nicolai Hähnle 

We didn't really use error codes anyway.
---
 src/gallium/drivers/radeonsi/si_compute.c |  6 +++---
 src/gallium/drivers/radeonsi/si_shader.c  | 21 +--
 src/gallium/drivers/radeonsi/si_shader.h  |  2 +-
 .../drivers/radeonsi/si_state_shaders.c   |  6 ++
 4 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 2acd96545aa..2899ee146d4 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -137,21 +137,21 @@ static void si_create_compute_state_async(void *job, int 
thread_index)
mtx_lock(>shader_cache_mutex);
 
if (ir_binary &&
si_shader_cache_load_shader(sscreen, ir_binary, shader)) {
mtx_unlock(>shader_cache_mutex);
 
si_shader_dump_stats_for_shader_db(shader, debug);
si_shader_dump(sscreen, shader, debug, PIPE_SHADER_COMPUTE,
   stderr, true);
 
-   if (si_shader_binary_upload(sscreen, shader))
+   if (!si_shader_binary_upload(sscreen, shader))
program->shader.compilation_failed = true;
} else {
mtx_unlock(>shader_cache_mutex);
 
if (!si_shader_create(sscreen, compiler, >shader, 
debug)) {
program->shader.compilation_failed = true;
 
if (program->ir_type == PIPE_SHADER_IR_TGSI)
FREE(program->ir.tgsi);
program->shader.selector = NULL;
@@ -246,21 +246,21 @@ static void *si_create_compute_state(
program->shader.binary.reloc_count);
FREE(program);
return NULL;
}
} else {
ac_shader_binary_read_config(>shader.binary,
 >shader.config, 0, false);
}
si_shader_dump(sctx->screen, >shader, >debug,
   PIPE_SHADER_COMPUTE, stderr, true);
-   if (si_shader_binary_upload(sctx->screen, >shader) < 
0) {
+   if (!si_shader_binary_upload(sctx->screen, >shader)) {
fprintf(stderr, "LLVM failed to upload shader\n");
FREE(program);
return NULL;
}
}
 
return program;
 }
 
 static void si_bind_compute_state(struct pipe_context *ctx, void *state)
@@ -388,21 +388,21 @@ static bool si_setup_compute_scratch_buffer(struct 
si_context *sctx,
 
if (!sctx->compute_scratch_buffer)
return false;
}
 
if (sctx->compute_scratch_buffer != shader->scratch_bo && 
scratch_needed) {
uint64_t scratch_va = sctx->compute_scratch_buffer->gpu_address;
 
si_shader_apply_scratch_relocs(shader, scratch_va);
 
-   if (si_shader_binary_upload(sctx->screen, shader))
+   if (!si_shader_binary_upload(sctx->screen, shader))
return false;
 
si_resource_reference(>scratch_bo,
sctx->compute_scratch_buffer);
}
 
return true;
 }
 
 static bool si_switch_compute_shader(struct si_context *sctx,
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 4d08ab88f4a..71c85eb79a5 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -5005,21 +5005,21 @@ static unsigned si_get_shader_binary_size(const struct 
si_shader *shader)
size += shader->prolog->binary.code_size;
if (shader->previous_stage)
size += shader->previous_stage->binary.code_size;
if (shader->prolog2)
size += shader->prolog2->binary.code_size;
if (shader->epilog)
size += shader->epilog->binary.code_size;
return size + DEBUGGER_NUM_MARKERS * 4;
 }
 
-int si_shader_binary_upload(struct si_screen *sscreen, struct si_shader 
*shader)
+bool si_shader_binary_upload(struct si_screen *sscreen, struct si_shader 
*shader)
 {
const struct ac_shader_binary *prolog =
shader->prolog ? >prolog->binary : NULL;
const struct ac_shader_binary *previous_stage =
shader->previous_stage ? >previous_stage->binary : NULL;
const struct ac_shader_binary *prolog2 =
shader->prolog2 ? >prolog2->binary : NULL;
const struct ac_shader_binary *epilog =
shader->epilog ? >epilog->binary : 

[Mesa-dev] [PATCH 03/10] amd/common: add a more powerful runtime linker

2019-05-03 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Using an explicit linker instead of just concatenating .text
sections will allow us to start using .rodata sections and
explicit descriptions of data on LDS that is shared between
stages.
---
 src/amd/Makefile.sources   |   2 +
 src/amd/common/ac_binary.h |   2 +
 src/amd/common/ac_rtld.c   | 556 +
 src/amd/common/ac_rtld.h   |  87 ++
 src/amd/common/meson.build |   2 +
 5 files changed, 649 insertions(+)
 create mode 100644 src/amd/common/ac_rtld.c
 create mode 100644 src/amd/common/ac_rtld.h

diff --git a/src/amd/Makefile.sources b/src/amd/Makefile.sources
index 58e0008ee62..122fa306eb1 100644
--- a/src/amd/Makefile.sources
+++ b/src/amd/Makefile.sources
@@ -35,20 +35,22 @@ ADDRLIB_FILES = \
 
 AMD_COMPILER_FILES = \
common/ac_binary.c \
common/ac_binary.h \
common/ac_exp_param.h \
common/ac_llvm_build.c \
common/ac_llvm_build.h \
common/ac_llvm_helper.cpp \
common/ac_llvm_util.c \
common/ac_llvm_util.h \
+   common/ac_rtld.c \
+   common/ac_rtld.h \
common/ac_shader_abi.h \
common/ac_shader_util.c \
common/ac_shader_util.h
 
 
 AMD_NIR_FILES = \
common/ac_nir_to_llvm.c \
common/ac_nir_to_llvm.h
 
 AMD_COMMON_FILES = \
diff --git a/src/amd/common/ac_binary.h b/src/amd/common/ac_binary.h
index 8f594a9ce75..b91ecb4317b 100644
--- a/src/amd/common/ac_binary.h
+++ b/src/amd/common/ac_binary.h
@@ -73,20 +73,22 @@ struct ac_shader_binary {
 struct ac_shader_config {
unsigned num_sgprs;
unsigned num_vgprs;
unsigned spilled_sgprs;
unsigned spilled_vgprs;
unsigned lds_size; /* in HW allocation units; i.e 256 bytes on SI, 512 
bytes on CI+ */
unsigned spi_ps_input_ena;
unsigned spi_ps_input_addr;
unsigned float_mode;
unsigned scratch_bytes_per_wave;
+   unsigned rsrc1;
+   unsigned rsrc2;
 };
 
 /*
  * Parse the elf binary stored in \p elf_data and create a
  * ac_shader_binary object.
  */
 bool ac_elf_read(const char *elf_data, unsigned elf_size,
 struct ac_shader_binary *binary);
 
 /**
diff --git a/src/amd/common/ac_rtld.c b/src/amd/common/ac_rtld.c
new file mode 100644
index 000..a79447904f3
--- /dev/null
+++ b/src/amd/common/ac_rtld.c
@@ -0,0 +1,556 @@
+/*
+ * Copyright 2014-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
THE
+ * SOFTWARE.
+ */
+
+#include "ac_rtld.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ac_binary.h"
+#include "util/u_math.h"
+
+// Old distributions may not have this enum constant
+#define MY_EM_AMDGPU 224
+
+#ifndef R_AMDGPU_NONE
+#define R_AMDGPU_NONE 0
+#define R_AMDGPU_ABS32_LO 1
+#define R_AMDGPU_ABS32_HI 2
+#define R_AMDGPU_ABS64 3
+#define R_AMDGPU_REL32 4
+#define R_AMDGPU_REL64 5
+#define R_AMDGPU_ABS32 6
+#define R_AMDGPU_GOTPCREL 7
+#define R_AMDGPU_GOTPCREL32_LO 8
+#define R_AMDGPU_GOTPCREL32_HI 9
+#define R_AMDGPU_REL32_LO 10
+#define R_AMDGPU_REL32_HI 11
+#define R_AMDGPU_RELATIVE64 13
+#endif
+
+/* For the UMR disassembler. */
+#define DEBUGGER_END_OF_CODE_MARKER0xbf9f /* invalid instruction */
+#define DEBUGGER_NUM_MARKERS   5
+
+struct ac_rtld_section {
+   bool is_rx : 1;
+   bool is_pasted_text : 1;
+   uint64_t offset;
+   const char *name;
+};
+
+struct ac_rtld_part {
+   Elf *elf;
+   struct ac_rtld_section *sections;
+   unsigned num_sections;
+};
+
+static void report_erroraf(const char *fmt, va_list va)
+{
+   char *msg;
+   int ret = asprintf(, fmt, va);
+   if (ret < 0)
+   msg = "(asprintf failed)";
+
+   fprintf(stderr, "ac_rtld error: %s\n", msg);
+
+   if (ret >= 0)
+   free(msg);
+}
+
+static void report

[Mesa-dev] [PATCH 08/10] amd/common: add ac_compile_module_to_elf

2019-05-03 Thread Nicolai Hähnle
From: Nicolai Hähnle 

A new variant of ac_compile_module_to_binary that allows us to
keep the entire ELF around.
---
 src/amd/common/ac_llvm_helper.cpp | 88 ---
 src/amd/common/ac_llvm_util.h |  2 +
 2 files changed, 83 insertions(+), 7 deletions(-)

diff --git a/src/amd/common/ac_llvm_helper.cpp 
b/src/amd/common/ac_llvm_helper.cpp
index dcfb8008546..834c5d7f94e 100644
--- a/src/amd/common/ac_llvm_helper.cpp
+++ b/src/amd/common/ac_llvm_helper.cpp
@@ -22,23 +22,27 @@
  * of the Software.
  *
  */
 
 /* based on Marek's patch to lp_bld_misc.cpp */
 
 // Workaround http://llvm.org/PR23628
 #pragma push_macro("DEBUG")
 #undef DEBUG
 
+#include 
+
 #include "ac_binary.h"
 #include "ac_llvm_util.h"
 
+#include "util/macros.h"
+
 #include 
 #include 
 #include 
 #include 
 #include 
 
 #include 
 
 void ac_add_attr_dereferenceable(LLVMValueRef val, uint64_t bytes)
 {
@@ -102,28 +106,90 @@ ac_create_target_library_info(const char *triple)
 {
return reinterpret_cast(new 
llvm::TargetLibraryInfoImpl(llvm::Triple(triple)));
 }
 
 void
 ac_dispose_target_library_info(LLVMTargetLibraryInfoRef library_info)
 {
delete reinterpret_cast(library_info);
 }
 
+/* Implementation of raw_pwrite_stream that works on malloc()ed memory for
+ * better compatibility with C code. */
+struct raw_memory_ostream : public llvm::raw_pwrite_stream {
+   char *buffer;
+   size_t written;
+   size_t bufsize;
+
+   raw_memory_ostream()
+   {
+   buffer = NULL;
+   written = 0;
+   bufsize = 0;
+   SetUnbuffered();
+   }
+
+   ~raw_memory_ostream()
+   {
+   free(buffer);
+   }
+
+   void clear()
+   {
+   written = 0;
+   }
+
+   void take(char *_buffer, size_t _size)
+   {
+   out_buffer = buffer;
+   out_size = written;
+   buffer = NULL;
+   written = 0;
+   bufsize = 0;
+   }
+
+   void flush() = delete;
+
+   void write_impl(const char *ptr, size_t size) override
+   {
+   if (unlikely(written + size < written))
+   abort();
+   if (written + size > bufsize) {
+   bufsize = MAX3(1024, written + size, bufsize / 3 * 4);
+   buffer = (char *)realloc(buffer, bufsize);
+   if (!buffer) {
+   fprintf(stderr, "amd: out of memory allocating 
ELF buffer\n");
+   abort();
+   }
+   }
+   memcpy(buffer + written, ptr, size);
+   written += size;
+   }
+
+   void pwrite_impl(const char *ptr, size_t size, uint64_t offset) override
+   {
+   assert(offset == (size_t)offset &&
+  offset + size >= offset && offset + size <= written);
+   memcpy(buffer + offset, ptr, size);
+   }
+
+   uint64_t current_pos() const override
+   {
+   return written;
+   }
+};
+
 /* The LLVM compiler is represented as a pass manager containing passes for
  * optimizations, instruction selection, and code generation.
  */
 struct ac_compiler_passes {
-   ac_compiler_passes(): ostream(code_string) {}
-
-   llvm::SmallString<0> code_string;  /* ELF shader binary */
-   llvm::raw_svector_ostream ostream; /* stream for appending data to the 
binary */
+   raw_memory_ostream ostream; /* ELF shader binary stream */
llvm::legacy::PassManager passmgr; /* list of passes */
 };
 
 struct ac_compiler_passes *ac_create_llvm_passes(LLVMTargetMachineRef tm)
 {
struct ac_compiler_passes *p = new ac_compiler_passes();
if (!p)
return NULL;
 
llvm::TargetMachine *TM = reinterpret_cast(tm);
@@ -142,28 +208,36 @@ void ac_destroy_llvm_passes(struct ac_compiler_passes *p)
 {
delete p;
 }
 
 /* This returns false on failure. */
 bool ac_compile_module_to_binary(struct ac_compiler_passes *p, LLVMModuleRef 
module,
 struct ac_shader_binary *binary)
 {
p->passmgr.run(*llvm::unwrap(module));
 
-   llvm::StringRef data = p->ostream.str();
-   bool success = ac_elf_read(data.data(), data.size(), binary);
-   p->code_string = ""; /* release the ELF shader binary */
+   bool success = ac_elf_read(p->ostream.buffer, p->ostream.written, 
binary);
+   p->ostream.clear();
 
if (!success)
fprintf(stderr, "amd: cannot read an ELF shader binary\n");
return success;
 }
 
+/* This returns false on failure. */
+bool ac_compile_module_to_elf(struct ac_compiler_passes *p, LLVMModuleRef 
module,
+ char **pelf_buffer, size_t *pelf_size)
+{
+  

[Mesa-dev] [PATCH 00/10] amd,radeonsi: add a real runtime linker

2019-05-03 Thread Nicolai Hähnle
these patches change the way we load shaders, initially for radeonsi
but ideally radv would adopt the same approach.

Basically, instead of hard-coding that we have a single .text section
in the ELF generated by LLVM, we align ourselves more with the ELF
standard and actually look at all the sections in the file(s), lay
them out in memory, and resolve relocations between them.

There is still hard-coding of ".text" sections for the purpose of
gfx9+ merged shaders.

The immediate consequence is that we will be able to emit .rodata
in LLVM and emit absolute or relative relocations that will be
resolved when shaders are uploaded to the GPU.

As a next step, I want us to explicitly record LDS symbol in the ELF
symbol table and have ac_rtld lay out and resolve those symbols at
load time. This will allow us to use LDS both for communication
between shader parts and for temporary variables used within each
part.

Please review!

Thanks,
Nicolai



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/10] radeonsi: dump shader binary buffer contents

2019-05-03 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Help identify bugs related to corruption of shaders in memory,
or errors in shader upload / rtld.
---
 src/gallium/drivers/radeonsi/si_debug.c| 18 ++
 .../drivers/radeonsi/si_debug_options.h|  1 +
 2 files changed, 19 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index 9a4494a98fe..c40dcd0b5d6 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -98,20 +98,38 @@ void si_destroy_saved_cs(struct si_saved_cs *scs)
 }
 
 static void si_dump_shader(struct si_screen *sscreen,
   enum pipe_shader_type processor,
   const struct si_shader *shader, FILE *f)
 {
if (shader->shader_log)
fwrite(shader->shader_log, shader->shader_log_size, 1, f);
else
si_shader_dump(sscreen, shader, NULL, processor, f, false);
+
+   if (shader->bo && sscreen->options.dump_shader_binary) {
+   unsigned size = shader->bo->b.b.width0;
+   fprintf(f, "BO: VA=%"PRIx64" Size=%u\n", 
shader->bo->gpu_address, size);
+
+   const char *mapped = sscreen->ws->buffer_map(shader->bo->buf, 
NULL,
+  
PIPE_TRANSFER_UNSYNCHRONIZED |
+  PIPE_TRANSFER_READ |
+  
RADEON_TRANSFER_TEMPORARY);
+
+   for (unsigned i = 0; i < size; i += 4) {
+   fprintf(f, " %4x: %08x\n", i, *(uint32_t*)(mapped + i));
+   }
+
+   sscreen->ws->buffer_unmap(shader->bo->buf);
+
+   fprintf(f, "\n");
+   }
 }
 
 struct si_log_chunk_shader {
/* The shader destroy code assumes a current context for unlinking of
 * PM4 packets etc.
 *
 * While we should be able to destroy shaders without a context, doing
 * so would happen only very rarely and be therefore likely to fail
 * just when you're trying to debug something. Let's just remember the
 * current context in the chunk.
diff --git a/src/gallium/drivers/radeonsi/si_debug_options.h 
b/src/gallium/drivers/radeonsi/si_debug_options.h
index 0bde7910fc6..db642366ca6 100644
--- a/src/gallium/drivers/radeonsi/si_debug_options.h
+++ b/src/gallium/drivers/radeonsi/si_debug_options.h
@@ -1,7 +1,8 @@
 OPT_BOOL(clear_db_cache_before_clear, false, "Clear DB cache before fast depth 
clear")
 OPT_BOOL(enable_nir, false, "Enable NIR")
 OPT_BOOL(aux_debug, false, "Generate ddebug_dumps for the auxiliary context")
 OPT_BOOL(sync_compile, false, "Always compile synchronously (will cause 
stalls)")
+OPT_BOOL(dump_shader_binary, false, "Dump shader binary as part of 
ddebug_dumps")
 OPT_BOOL(vs_fetch_always_opencode, false, "Always open code vertex fetches 
(less efficient, purely for testing)")
 
 #undef OPT_BOOL
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/10] radeonsi: use the new run-time linker for shaders

2019-05-03 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeonsi/si_compute.c |  63 ++--
 src/gallium/drivers/radeonsi/si_debug.c   |  74 +++--
 src/gallium/drivers/radeonsi/si_pipe.c|   2 +-
 src/gallium/drivers/radeonsi/si_pipe.h|   2 +-
 src/gallium/drivers/radeonsi/si_shader.c  | 291 +-
 src/gallium/drivers/radeonsi/si_shader.h  |  19 +-
 .../drivers/radeonsi/si_shader_internal.h |   3 +-
 .../drivers/radeonsi/si_shader_tgsi_setup.c   |  14 +-
 .../drivers/radeonsi/si_state_shaders.c   |  39 +--
 9 files changed, 270 insertions(+), 237 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 2899ee146d4..e4ef138db33 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -21,20 +21,21 @@
  * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  *
  */
 
 #include "tgsi/tgsi_parse.h"
 #include "util/u_async_debug.h"
 #include "util/u_memory.h"
 #include "util/u_upload_mgr.h"
 
+#include "ac_rtld.h"
 #include "amd_kernel_code_t.h"
 #include "si_build_pm4.h"
 #include "si_compute.h"
 
 #define COMPUTE_DBG(sscreen, fmt, args...) \
do { \
if ((sscreen->debug_flags & DBG(COMPUTE))) fprintf(stderr, fmt, 
##args); \
} while (0);
 
 struct dispatch_packet {
@@ -54,22 +55,40 @@ struct dispatch_packet {
uint64_t reserved2;
 };
 
 static const amd_kernel_code_t *si_compute_get_code_object(
const struct si_compute *program,
uint64_t symbol_offset)
 {
if (!program->use_code_object_v2) {
return NULL;
}
-   return (const amd_kernel_code_t*)
-   (program->shader.binary.code + symbol_offset);
+
+   struct ac_rtld_binary rtld;
+   if (!ac_rtld_open(, 1, >shader.binary.elf_buffer,
+ >shader.binary.elf_size))
+   return NULL;
+
+   const amd_kernel_code_t *result = NULL;
+   const char *text;
+   size_t size;
+   if (!ac_rtld_get_section_by_name(, ".text", , ))
+   goto out;
+
+   if (symbol_offset + sizeof(amd_kernel_code_t) > size)
+   goto out;
+
+   result = (const amd_kernel_code_t*)(text + symbol_offset);
+
+out:
+   ac_rtld_close();
+   return result;
 }
 
 static void code_object_to_config(const amd_kernel_code_t *code_object,
  struct ac_shader_config *out_config) {
 
uint32_t rsrc1 = code_object->compute_pgm_resource_registers;
uint32_t rsrc2 = code_object->compute_pgm_resource_registers >> 32;
out_config->num_sgprs = code_object->wavefront_sgpr_count;
out_config->num_vgprs = code_object->workitem_vgpr_count;
out_config->float_mode = G_00B028_FLOAT_MODE(rsrc1);
@@ -137,21 +156,21 @@ static void si_create_compute_state_async(void *job, int 
thread_index)
mtx_lock(>shader_cache_mutex);
 
if (ir_binary &&
si_shader_cache_load_shader(sscreen, ir_binary, shader)) {
mtx_unlock(>shader_cache_mutex);
 
si_shader_dump_stats_for_shader_db(shader, debug);
si_shader_dump(sscreen, shader, debug, PIPE_SHADER_COMPUTE,
   stderr, true);
 
-   if (!si_shader_binary_upload(sscreen, shader))
+   if (!si_shader_binary_upload(sscreen, shader, 0))
program->shader.compilation_failed = true;
} else {
mtx_unlock(>shader_cache_mutex);
 
if (!si_shader_create(sscreen, compiler, >shader, 
debug)) {
program->shader.compilation_failed = true;
 
if (program->ir_type == PIPE_SHADER_IR_TGSI)
FREE(program->ir.tgsi);
program->shader.selector = NULL;
@@ -229,39 +248,37 @@ static void *si_create_compute_state(
si_schedule_initial_compile(sctx, PIPE_SHADER_COMPUTE,
>ready,
>compiler_ctx_state,
program, 
si_create_compute_state_async);
} else {
const struct pipe_llvm_program_header *header;
const char *code;
header = cso->prog;
code = cso->prog + sizeof(struct pipe_llvm_program_header);
 
-   ac_elf_read(code, header->num_bytes, >shader.binary);
-   if (program->use_code_object_v2) {
-   const amd_kernel_code_t *code_object =
-   si_compute_get_code_object(program, 0);
-  

[Mesa-dev] [PATCH 04/10] radeonsi: use ac_shader_config

2019-05-03 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/amd/common/ac_binary.c|   2 +
 src/gallium/drivers/radeonsi/si_compute.c |  14 +--
 src/gallium/drivers/radeonsi/si_shader.c  | 112 +++---
 src/gallium/drivers/radeonsi/si_shader.h  |  25 +
 4 files changed, 27 insertions(+), 126 deletions(-)

diff --git a/src/amd/common/ac_binary.c b/src/amd/common/ac_binary.c
index 44251886b5f..d0ca55e0e0d 100644
--- a/src/amd/common/ac_binary.c
+++ b/src/amd/common/ac_binary.c
@@ -218,26 +218,28 @@ void ac_parse_shader_binary_config(const char *data, 
size_t nbytes,
unsigned value = util_le32_to_cpu(*(uint32_t*)(data + i + 4));
switch (reg) {
case R_00B028_SPI_SHADER_PGM_RSRC1_PS:
case R_00B128_SPI_SHADER_PGM_RSRC1_VS:
case R_00B228_SPI_SHADER_PGM_RSRC1_GS:
case R_00B848_COMPUTE_PGM_RSRC1:
case R_00B428_SPI_SHADER_PGM_RSRC1_HS:
conf->num_sgprs = MAX2(conf->num_sgprs, 
(G_00B028_SGPRS(value) + 1) * 8);
conf->num_vgprs = MAX2(conf->num_vgprs, 
(G_00B028_VGPRS(value) + 1) * 4);
conf->float_mode =  G_00B028_FLOAT_MODE(value);
+   conf->rsrc1 = value;
break;
case R_00B02C_SPI_SHADER_PGM_RSRC2_PS:
conf->lds_size = MAX2(conf->lds_size, 
G_00B02C_EXTRA_LDS_SIZE(value));
break;
case R_00B84C_COMPUTE_PGM_RSRC2:
conf->lds_size = MAX2(conf->lds_size, 
G_00B84C_LDS_SIZE(value));
+   conf->rsrc2 = value;
break;
case R_0286CC_SPI_PS_INPUT_ENA:
conf->spi_ps_input_ena = value;
break;
case R_0286D0_SPI_PS_INPUT_ADDR:
conf->spi_ps_input_addr = value;
break;
case R_0286E8_SPI_TMPRING_SIZE:
case R_00B860_COMPUTE_TMPRING_SIZE:
/* WAVESIZE is in units of 256 dwords. */
diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 541d7e6f118..02d7bac406a 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -59,21 +59,21 @@ static const amd_kernel_code_t *si_compute_get_code_object(
uint64_t symbol_offset)
 {
if (!program->use_code_object_v2) {
return NULL;
}
return (const amd_kernel_code_t*)
(program->shader.binary.code + symbol_offset);
 }
 
 static void code_object_to_config(const amd_kernel_code_t *code_object,
- struct si_shader_config *out_config) {
+ struct ac_shader_config *out_config) {
 
uint32_t rsrc1 = code_object->compute_pgm_resource_registers;
uint32_t rsrc2 = code_object->compute_pgm_resource_registers >> 32;
out_config->num_sgprs = code_object->wavefront_sgpr_count;
out_config->num_vgprs = code_object->workitem_vgpr_count;
out_config->float_mode = G_00B028_FLOAT_MODE(rsrc1);
out_config->rsrc1 = rsrc1;
out_config->lds_size = MAX2(out_config->lds_size, 
G_00B84C_LDS_SIZE(rsrc2));
out_config->rsrc2 = rsrc2;
out_config->scratch_bytes_per_wave =
@@ -241,22 +241,22 @@ static void *si_create_compute_state(
const amd_kernel_code_t *code_object =
si_compute_get_code_object(program, 0);
code_object_to_config(code_object, 
>shader.config);
if (program->shader.binary.reloc_count != 0) {
fprintf(stderr, "Error: %d unsupported 
relocations\n",
program->shader.binary.reloc_count);
FREE(program);
return NULL;
}
} else {
-   si_shader_binary_read_config(>shader.binary,
->shader.config, 0);
+   ac_shader_binary_read_config(>shader.binary,
+>shader.config, 0, false);
}
si_shader_dump(sctx->screen, >shader, >debug,
   PIPE_SHADER_COMPUTE, stderr, true);
if (si_shader_binary_upload(sctx->screen, >shader) < 
0) {
fprintf(stderr, "LLVM failed to upload shader\n");
FREE(program);
return NULL;
}
}
 
@@ -362,21 +362,21 @@ static void si_initialize_compute(struct si_context *sctx)
 

[Mesa-dev] [PATCH 05/10] radeonsi: let si_shader_create return a boolean

2019-05-03 Thread Nicolai Hähnle
From: Nicolai Hähnle 

We didn't really use error codes anyway.
---
 src/gallium/drivers/radeonsi/si_compute.c  |  2 +-
 src/gallium/drivers/radeonsi/si_shader.c   | 18 +-
 src/gallium/drivers/radeonsi/si_shader.h   |  2 +-
 .../drivers/radeonsi/si_state_shaders.c|  8 +++-
 4 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 02d7bac406a..2acd96545aa 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -142,21 +142,21 @@ static void si_create_compute_state_async(void *job, int 
thread_index)
 
si_shader_dump_stats_for_shader_db(shader, debug);
si_shader_dump(sscreen, shader, debug, PIPE_SHADER_COMPUTE,
   stderr, true);
 
if (si_shader_binary_upload(sscreen, shader))
program->shader.compilation_failed = true;
} else {
mtx_unlock(>shader_cache_mutex);
 
-   if (si_shader_create(sscreen, compiler, >shader, 
debug)) {
+   if (!si_shader_create(sscreen, compiler, >shader, 
debug)) {
program->shader.compilation_failed = true;
 
if (program->ir_type == PIPE_SHADER_IR_TGSI)
FREE(program->ir.tgsi);
program->shader.selector = NULL;
return;
}
 
bool scratch_enabled = shader->config.scratch_bytes_per_wave > 
0;
unsigned user_sgprs = SI_NUM_RESOURCE_SGPRS +
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index da43447013d..4d08ab88f4a 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -7769,94 +7769,94 @@ static void si_fix_resource_usage(struct si_screen 
*sscreen,
 
shader->config.num_sgprs = MAX2(shader->config.num_sgprs, min_sgprs);
 
if (shader->selector->type == PIPE_SHADER_COMPUTE &&
si_get_max_workgroup_size(shader) > 64) {
si_multiwave_lds_size_workaround(sscreen,
 >config.lds_size);
}
 }
 
-int si_shader_create(struct si_screen *sscreen, struct ac_llvm_compiler 
*compiler,
+bool si_shader_create(struct si_screen *sscreen, struct ac_llvm_compiler 
*compiler,
 struct si_shader *shader,
 struct pipe_debug_callback *debug)
 {
struct si_shader_selector *sel = shader->selector;
struct si_shader *mainp = *si_get_main_shader_part(sel, >key);
int r;
 
/* LS, ES, VS are compiled on demand if the main part hasn't been
 * compiled for that stage.
 *
 * Vertex shaders are compiled on demand when a vertex fetch
 * workaround must be applied.
 */
if (shader->is_monolithic) {
/* Monolithic shader (compiled as a whole, has many variants,
 * may take a long time to compile).
 */
r = si_compile_tgsi_shader(sscreen, compiler, shader, debug);
if (r)
-   return r;
+   return false;
} else {
/* The shader consists of several parts:
 *
 * - the middle part is the user shader, it has 1 variant only
 *   and it was compiled during the creation of the shader
 *   selector
 * - the prolog part is inserted at the beginning
 * - the epilog part is inserted at the end
 *
 * The prolog and epilog have many (but simple) variants.
 *
 * Starting with gfx9, geometry and tessellation control
 * shaders also contain the prolog and user shader parts of
 * the previous shader stage.
 */
 
if (!mainp)
-   return -1;
+   return false;
 
/* Copy the compiled TGSI shader data over. */
shader->is_binary_shared = true;
shader->binary = mainp->binary;
shader->config = mainp->config;
shader->info.num_input_sgprs = mainp->info.num_input_sgprs;
shader->info.num_input_vgprs = mainp->info.num_input_vgprs;
shader->info.face_vgpr_index = mainp->info.face_vgpr_index;
shader->info.ancillary_vgpr_index = 
mainp->info.ancillary_vgpr_index;
memcpy(shader->info.vs_output_param_offset,
   mainp->info.vs_output_param_offset,
   sizeof(ma

Re: [Mesa-dev] [PATCH 0/3] radeonsi: handle unaligned vertex buffers in hardware

2019-05-03 Thread Nicolai Hähnle

On 30.04.19 21:20, Marek Olšák wrote:

Why can we not use tbuffer loads?


tbuffer_load_format has the exact same limitations as 
buffer_load_format. They both use the same hardware path, the only 
difference is that tbuffer_load_format gets the format information from 
the instruction, while buffer_load_format gets it from the resource 
descriptor.


Therefore, in all cases where we *can* use tbuffer_load_format, we may 
as well use buffer_load_format (because we can just initialize the 
descriptor for that vertex input / vertex element correctly).


The benefit that tbuffer_load_format could potentially give us in the 
future is that when multiple vertex elements reference the same vertex 
buffer, we could put a single buffer descriptor into the descriptor 
table (or into USER_SGPRs) instead of having one buffer descriptor for 
every element.


Cheers,
Nicolai




Marek

On Thu, Apr 25, 2019 at 7:18 AM Nicolai Hähnle <mailto:nhaeh...@gmail.com>> wrote:


Hi all,

the following patches contain code to implement all vertex fetches
using plain, non-format loads plus explicit shader arithmetic for
format conversion.

This allows us to remove the software workaround for unaligned vertex
buffers on SI, because we can just load individual bytes on the GPU.
CI+ will still use short/dword loads even in the unaligned case.

The format conversion code was tested by running with
radeonsi_vs_fetch_always_opencode=true on both Verde and Vega.

Please review!

Thanks,
Nicolai
--
  src/amd/common/ac_llvm_build.c               | 313 +
  src/amd/common/ac_llvm_build.h               |  30 ++
  .../drivers/radeonsi/si_debug_options.h      |   1 +
  src/gallium/drivers/radeonsi/si_get.c        |   2 +-
  src/gallium/drivers/radeonsi/si_pipe.h       |   1 +
  src/gallium/drivers/radeonsi/si_shader.c     | 249 +
  src/gallium/drivers/radeonsi/si_shader.h     |  46 +--
  src/gallium/drivers/radeonsi/si_state.c      | 233 +++-
  src/gallium/drivers/radeonsi/si_state.h      |  19 +
  .../drivers/radeonsi/si_state_shaders.c      |  37 +-
  10 files changed, 645 insertions(+), 286 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org <mailto:mesa-dev@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/3] radeonsi: handle unaligned vertex buffers in hardware

2019-04-25 Thread Nicolai Hähnle
Hi all,

the following patches contain code to implement all vertex fetches
using plain, non-format loads plus explicit shader arithmetic for
format conversion.

This allows us to remove the software workaround for unaligned vertex
buffers on SI, because we can just load individual bytes on the GPU.
CI+ will still use short/dword loads even in the unaligned case.

The format conversion code was tested by running with
radeonsi_vs_fetch_always_opencode=true on both Verde and Vega.

Please review!

Thanks,
Nicolai
--
 src/amd/common/ac_llvm_build.c   | 313 +
 src/amd/common/ac_llvm_build.h   |  30 ++
 .../drivers/radeonsi/si_debug_options.h  |   1 +
 src/gallium/drivers/radeonsi/si_get.c|   2 +-
 src/gallium/drivers/radeonsi/si_pipe.h   |   1 +
 src/gallium/drivers/radeonsi/si_shader.c | 249 +
 src/gallium/drivers/radeonsi/si_shader.h |  46 +--
 src/gallium/drivers/radeonsi/si_state.c  | 233 +++-
 src/gallium/drivers/radeonsi/si_state.h  |  19 +
 .../drivers/radeonsi/si_state_shaders.c  |  37 +-
 10 files changed, 645 insertions(+), 286 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] radeonsi: store sctx->vertex_elements in a local in si_shader_selector_key_vs

2019-04-25 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Purely as a shorthand in the remainder of the function.
---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index f57e7730905..583d7c9d3ca 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1372,33 +1372,32 @@ static unsigned si_get_alpha_test_func(struct 
si_context *sctx)
 
 static void si_shader_selector_key_vs(struct si_context *sctx,
  struct si_shader_selector *vs,
  struct si_shader_key *key,
  struct si_vs_prolog_bits *prolog_key)
 {
if (!sctx->vertex_elements ||
vs->info.properties[TGSI_PROPERTY_VS_BLIT_SGPRS])
return;
 
-   prolog_key->instance_divisor_is_one =
-   sctx->vertex_elements->instance_divisor_is_one;
-   prolog_key->instance_divisor_is_fetched =
-   sctx->vertex_elements->instance_divisor_is_fetched;
+   struct si_vertex_elements *elts = sctx->vertex_elements;
+
+   prolog_key->instance_divisor_is_one = elts->instance_divisor_is_one;
+   prolog_key->instance_divisor_is_fetched = 
elts->instance_divisor_is_fetched;
 
/* Prefer a monolithic shader to allow scheduling divisions around
 * VBO loads. */
if (prolog_key->instance_divisor_is_fetched)
key->opt.prefer_mono = 1;
 
-   unsigned count = MIN2(vs->info.num_inputs,
- sctx->vertex_elements->count);
-   memcpy(key->mono.vs_fix_fetch, sctx->vertex_elements->fix_fetch, count);
+   unsigned count = MIN2(vs->info.num_inputs, elts->count);
+   memcpy(key->mono.vs_fix_fetch, elts->fix_fetch, count);
 }
 
 static void si_shader_selector_key_hw_vs(struct si_context *sctx,
 struct si_shader_selector *vs,
 struct si_shader_key *key)
 {
struct si_shader_selector *ps = sctx->ps_shader.cso;
 
key->opt.clip_disable =
sctx->queued.named.rasterizer->clip_plane_enable == 0 &&
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] radeonsi: overhaul the vertex fetch fixup mechanism

2019-04-25 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The overall goal is to support unaligned loads from vertex buffers
natively on SI.

In the unaligned case, we fall back to the general case implementation in
ac_build_opencoded_load_format. Since this function is fully general,
we will also use it going forward for cases requiring fully manual format
conversions of dwords anyway.

This requires a different encoding of the fix_fetch array, which will now
contain the entire format information if a fixup is required.

Having to check the alignment of vertex buffers is awkward. To keep the
impact on the fast path minimal, the si_context will keep track of which
vertex buffers are (not) at least dword-aligned, while the
si_vertex_elements will note which vertex buffers have some (at most dword)
alignment requirement. Vertex buffers should be dword-aligned most of the
time, which allows a fast early-out in almost all cases.

Add the radeonsi_vs_fetch_always_opencode configuration variable for
testing purposes. Note that it can only be used reliably on LLVM >= 9,
because support for byte and short load is required.
---
 .../drivers/radeonsi/si_debug_options.h   |   1 +
 src/gallium/drivers/radeonsi/si_get.c |   2 +-
 src/gallium/drivers/radeonsi/si_pipe.h|   1 +
 src/gallium/drivers/radeonsi/si_shader.c  | 249 ++
 src/gallium/drivers/radeonsi/si_shader.h  |  46 ++--
 src/gallium/drivers/radeonsi/si_state.c   | 233 +---
 src/gallium/drivers/radeonsi/si_state.h   |  19 ++
 .../drivers/radeonsi/si_state_shaders.c   |  26 +-
 8 files changed, 297 insertions(+), 280 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_debug_options.h 
b/src/gallium/drivers/radeonsi/si_debug_options.h
index 019256ca1d1..0bde7910fc6 100644
--- a/src/gallium/drivers/radeonsi/si_debug_options.h
+++ b/src/gallium/drivers/radeonsi/si_debug_options.h
@@ -1,6 +1,7 @@
 OPT_BOOL(clear_db_cache_before_clear, false, "Clear DB cache before fast depth 
clear")
 OPT_BOOL(enable_nir, false, "Enable NIR")
 OPT_BOOL(aux_debug, false, "Generate ddebug_dumps for the auxiliary context")
 OPT_BOOL(sync_compile, false, "Always compile synchronously (will cause 
stalls)")
+OPT_BOOL(vs_fetch_always_opencode, false, "Always open code vertex fetches 
(less efficient, purely for testing)")
 
 #undef OPT_BOOL
diff --git a/src/gallium/drivers/radeonsi/si_get.c 
b/src/gallium/drivers/radeonsi/si_get.c
index 4e23d283ab7..ff825c5e30a 100644
--- a/src/gallium/drivers/radeonsi/si_get.c
+++ b/src/gallium/drivers/radeonsi/si_get.c
@@ -190,21 +190,21 @@ static int si_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
/* Optimal number for good TexSubImage performance on 
Polaris10. */
return 64 * 1024 * 1024;
 
case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
case PIPE_CAP_MAX_SHADER_BUFFER_SIZE:
return MIN2(sscreen->info.max_alloc_size, INT_MAX);
 
case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
-   return !sscreen->info.has_unaligned_shader_loads;
+   return HAVE_LLVM < 0x0900 && 
!sscreen->info.has_unaligned_shader_loads;
 
case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE:
return sscreen->info.has_sparse_vm_mappings ?
RADEON_SPARSE_PAGE_SIZE : 0;
 
case PIPE_CAP_PACKED_UNIFORMS:
if (sscreen->options.enable_nir)
return 1;
return 0;
 
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 7fc0319973b..1d241436a6d 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -938,20 +938,21 @@ struct si_context {
union pipe_color_union  *border_color_map; /* in VRAM (slow 
access), little endian */
unsignedborder_color_count;
unsignednum_vs_blit_sgprs;
uint32_t
vs_blit_sh_data[SI_VS_BLIT_SGPRS_POS_TEXCOORD];
uint32_tcs_user_data[4];
 
/* Vertex and index buffers. */
boolvertex_buffers_dirty;
boolvertex_buffer_pointer_dirty;
struct pipe_vertex_buffer   vertex_buffer[SI_NUM_VERTEX_BUFFERS];
+   uint16_tvertex_buffer_unaligned; /* bitmask of 
not dword-aligned buffers */
 
/* MSAA config state. */
int ps_iter_samples;
boolps_uses_fbfetch;
boolsmoothing_enabled;
 
/* DB render state. */
unsignedps_db

[Mesa-dev] [PATCH 1/3] amd/common: add ac_build_opencoded_fetch_format

2019-04-25 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Implement software emulation of buffer_load_format for all types required
by vertex buffer fetches.
---
 src/amd/common/ac_llvm_build.c | 313 +
 src/amd/common/ac_llvm_build.h |  30 
 2 files changed, 343 insertions(+)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 4fdf73c99ba..197c58a8e45 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -1667,20 +1667,333 @@ ac_build_tbuffer_load_byte(struct ac_llvm_context *ctx,
 
res = ac_build_raw_tbuffer_load(ctx, rsrc, voffset, soffset,
immoffset, 1, dfmt, nfmt, glc, 
false,
false);
 
res = LLVMBuildTrunc(ctx->builder, res, ctx->i8, "");
}
 
return res;
 }
+
+/**
+ * Convert an 11- or 10-bit unsigned floating point number to an f32.
+ *
+ * The input exponent is expected to be biased analogous to IEEE-754, i.e. by
+ * 2^(exp_bits-1) - 1 (as defined in OpenGL and other graphics APIs).
+ */
+static LLVMValueRef
+ac_ufN_to_float(struct ac_llvm_context *ctx, LLVMValueRef src, unsigned 
exp_bits, unsigned mant_bits)
+{
+   assert(LLVMTypeOf(src) == ctx->i32);
+
+   LLVMValueRef tmp;
+   LLVMValueRef mantissa;
+   mantissa = LLVMBuildAnd(ctx->builder, src, LLVMConstInt(ctx->i32, (1 << 
mant_bits) - 1, false), "");
+
+   /* Converting normal numbers is just a shift + correcting the exponent 
bias */
+   unsigned normal_shift = 23 - mant_bits;
+   unsigned bias_shift = 127 - ((1 << (exp_bits - 1)) - 1);
+   LLVMValueRef shifted, normal;
+
+   shifted = LLVMBuildShl(ctx->builder, src, LLVMConstInt(ctx->i32, 
normal_shift, false), "");
+   normal = LLVMBuildAdd(ctx->builder, shifted, LLVMConstInt(ctx->i32, 
bias_shift << 23, false), "");
+
+   /* Converting nan/inf numbers is the same, but with a different 
exponent update */
+   LLVMValueRef naninf;
+   naninf = LLVMBuildOr(ctx->builder, normal, LLVMConstInt(ctx->i32, 0xff 
<< 23, false), "");
+
+   /* Converting denormals is the complex case: determine the leading 
zeros of the
+* mantissa to obtain the correct shift for the mantissa and exponent 
correction.
+*/
+   LLVMValueRef denormal;
+   LLVMValueRef params[2] = {
+   mantissa,
+   ctx->i1true, /* result can be undef when arg is 0 */
+   };
+   LLVMValueRef ctlz = ac_build_intrinsic(ctx, "llvm.ctlz.i32", ctx->i32,
+ params, 2, AC_FUNC_ATTR_READNONE);
+
+   /* Shift such that the leading 1 ends up as the LSB of the exponent 
field. */
+   tmp = LLVMBuildSub(ctx->builder, ctlz, LLVMConstInt(ctx->i32, 8, 
false), "");
+   denormal = LLVMBuildShl(ctx->builder, mantissa, tmp, "");
+
+   unsigned denormal_exp = bias_shift + (32 - mant_bits) - 1;
+   tmp = LLVMBuildSub(ctx->builder, LLVMConstInt(ctx->i32, denormal_exp, 
false), ctlz, "");
+   tmp = LLVMBuildShl(ctx->builder, tmp, LLVMConstInt(ctx->i32, 23, 
false), "");
+   denormal = LLVMBuildAdd(ctx->builder, denormal, tmp, "");
+
+   /* Select the final result. */
+   LLVMValueRef result;
+
+   tmp = LLVMBuildICmp(ctx->builder, LLVMIntUGE, src,
+   LLVMConstInt(ctx->i32, ((1 << exp_bits) - 1) << 
mant_bits, false), "");
+   result = LLVMBuildSelect(ctx->builder, tmp, naninf, normal, "");
+
+   tmp = LLVMBuildICmp(ctx->builder, LLVMIntUGE, src,
+   LLVMConstInt(ctx->i32, 1 << mant_bits, false), "");
+   result = LLVMBuildSelect(ctx->builder, tmp, result, denormal, "");
+
+   tmp = LLVMBuildICmp(ctx->builder, LLVMIntNE, src, ctx->i32_0, "");
+   result = LLVMBuildSelect(ctx->builder, tmp, result, ctx->i32_0, "");
+
+   return ac_to_float(ctx, result);
+}
+
+/**
+ * Generate a fully general open coded buffer format fetch with all required
+ * fixups suitable for vertex fetch, using non-format buffer loads.
+ *
+ * Some combinations of argument values have special interpretations:
+ * - size = 8 bytes, format = fixed indicates PIPE_FORMAT_R11G11B10_FLOAT
+ * - size = 8 bytes, format != {float,fixed} indicates a 2_10_10_10 data format
+ *
+ * \param log_size log(size of channel in bytes)
+ * \param num_channels number of channels (1 to 4)
+ * \param format AC_FETCH_FORMAT_xxx value
+ * \param reverse whether XYZ channels are reversed
+ * \param known_aligned whether the source is known to be aligned to hardware's
+ *  effective element size for loa

Re: [Mesa-dev] [PATCH 1/8] radeonsi: add si_debug_options for convenient adding/removing of options

2019-04-25 Thread Nicolai Hähnle

On 25.04.19 04:45, Marek Olšák wrote:
[snip]

-       bool                            clear_db_cache_before_clear;
         bool                            has_msaa_sample_loc_bug;
         bool                            has_ls_vgpr_init_bug;
         bool                            has_dcc_constant_encode;
         bool                            dpbb_allowed;
         bool                            dfsm_allowed;
         bool                            llvm_has_working_vgpr_indexing;

+       struct {
+#define OPT_BOOL(name, dflt, description) uint8_t name:1;


Why not bool instead of uint8_t?


Sure.



+#include "si_debug_options.inc"


Why not use the .h file extension?


The original intention was to distinguish it from header files where 
including is supposed idempotent, i.e. including the file a second time 
makes no difference. Anyway, I'm changing it.


Cheers,
Nicolai




Other than those, this is:

Reviewed-by: Marek Olšák mailto:marek.ol...@amd.com>>

Marek



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 7/8] radeonsi: add radeonsi_aux_debug option for aux context debug dumps

2019-04-24 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Enabling this option will create ddebug-style dumps for the aux context,
except that instead of intercepting the pipe_context layer
we just dump the IB contents on flush.
---
 src/gallium/drivers/radeonsi/si_debug.c | 17 +
 .../drivers/radeonsi/si_debug_options.inc   |  1 +
 src/gallium/drivers/radeonsi/si_pipe.c  | 14 +-
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index 07de96057dc..9a4494a98fe 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -475,20 +475,37 @@ void si_auto_log_cs(void *data, struct u_log_context *log)
struct si_context *ctx = (struct si_context *)data;
si_log_cs(ctx, log, false);
 }
 
 void si_log_hw_flush(struct si_context *sctx)
 {
if (!sctx->log)
return;
 
si_log_cs(sctx, sctx->log, true);
+
+   if (>b == sctx->screen->aux_context) {
+   /* The aux context isn't captured by the ddebug wrapper,
+* so we dump it on a flush-by-flush basis here.
+*/
+   FILE *f = dd_get_debug_file(false);
+   if (!f) {
+   fprintf(stderr, "radeonsi: error opening aux context 
dump file.\n");
+   } else {
+   dd_write_header(f, >screen->b, 0);
+
+   fprintf(f, "Aux context dump:\n\n");
+   u_log_new_page_print(sctx->log, f);
+
+   fclose(f);
+   }
+   }
 }
 
 static const char *priority_to_string(enum radeon_bo_priority priority)
 {
 #define ITEM(x) [RADEON_PRIO_##x] = #x
static const char *table[64] = {
ITEM(FENCE),
ITEM(TRACE),
ITEM(SO_FILLED_SIZE),
ITEM(QUERY),
diff --git a/src/gallium/drivers/radeonsi/si_debug_options.inc 
b/src/gallium/drivers/radeonsi/si_debug_options.inc
index 165dba8baf5..f4c3e19ed95 100644
--- a/src/gallium/drivers/radeonsi/si_debug_options.inc
+++ b/src/gallium/drivers/radeonsi/si_debug_options.inc
@@ -1,4 +1,5 @@
 OPT_BOOL(clear_db_cache_before_clear, false, "Clear DB cache before fast depth 
clear")
 OPT_BOOL(enable_nir, false, "Enable NIR")
+OPT_BOOL(aux_debug, false, "Generate ddebug_dumps for the auxiliary context")
 
 #undef OPT_BOOL
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 9f8bd2039ee..10566a9b8d5 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -718,20 +718,26 @@ static void si_destroy_screen(struct pipe_screen* pscreen)
}
}
mtx_destroy(>shader_parts_mutex);
si_destroy_shader_cache(sscreen);
 
si_destroy_perfcounters(sscreen);
si_gpu_load_kill_thread(sscreen);
 
mtx_destroy(>gpu_load_mutex);
mtx_destroy(>aux_context_lock);
+   struct u_log_context *aux_log = ((struct si_context 
*)sscreen->aux_context)->log;
+   if (aux_log) {
+   sscreen->aux_context->set_log_context(sscreen->aux_context, 
NULL);
+   u_log_context_destroy(aux_log);
+   FREE(aux_log);
+   }
sscreen->aux_context->destroy(sscreen->aux_context);
 
slab_destroy_parent(>pool_transfers);
 
disk_cache_destroy(sscreen->disk_shader_cache);
sscreen->ws->destroy(sscreen->ws);
FREE(sscreen);
 }
 
 static void si_init_gs_info(struct si_screen *sscreen)
@@ -1176,21 +1182,27 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws,
sscreen->eqaa_force_color_samples = f;
}
}
 
for (i = 0; i < num_comp_hi_threads; i++)
si_init_compiler(sscreen, >compiler[i]);
for (i = 0; i < num_comp_lo_threads; i++)
si_init_compiler(sscreen, >compiler_lowp[i]);
 
/* Create the auxiliary context. This must be done last. */
-   sscreen->aux_context = si_create_context(>b, 0);
+   sscreen->aux_context = si_create_context(
+   >b, sscreen->options.aux_debug ? PIPE_CONTEXT_DEBUG : 
0);
+   if (sscreen->options.aux_debug) {
+   struct u_log_context *log = CALLOC_STRUCT(u_log_context);
+   u_log_context_init(log);
+   sscreen->aux_context->set_log_context(sscreen->aux_context, 
log);
+   }
 
if (sscreen->debug_flags & DBG(TEST_DMA))
si_test_dma(sscreen);
 
if (sscreen->debug_flags & DBG(TEST_DMA_PERF)) {
si_test_dma_perf(sscreen);
}
 
if (sscreen->debug_flags & (DBG(TEST_VMFAULT_CP) |

[Mesa-dev] [PATCH 6/8] ddebug: expose some helper functions as non-inline

2019-04-24 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/auxiliary/driver_ddebug/dd_draw.c | 63 +-
 src/gallium/auxiliary/driver_ddebug/dd_util.h | 66 +++
 2 files changed, 70 insertions(+), 59 deletions(-)

diff --git a/src/gallium/auxiliary/driver_ddebug/dd_draw.c 
b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
index 98e7a6bb99f..eef44a7c348 100644
--- a/src/gallium/auxiliary/driver_ddebug/dd_draw.c
+++ b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
@@ -33,22 +33,83 @@
 #include "util/u_helpers.h"
 #include "util/u_inlines.h"
 #include "util/u_memory.h"
 #include "util/u_process.h"
 #include "tgsi/tgsi_parse.h"
 #include "tgsi/tgsi_scan.h"
 #include "util/os_time.h"
 #include 
 #include "pipe/p_config.h"
 
+void
+dd_get_debug_filename_and_mkdir(char *buf, size_t buflen, bool verbose)
+{
+   static unsigned index;
+   char proc_name[128], dir[256];
 
-static void
+   if (!os_get_process_name(proc_name, sizeof(proc_name))) {
+  fprintf(stderr, "dd: can't get the process name\n");
+  strcpy(proc_name, "unknown");
+   }
+
+   util_snprintf(dir, sizeof(dir), "%s/"DD_DIR, debug_get_option("HOME", "."));
+
+   if (mkdir(dir, 0774) && errno != EEXIST)
+  fprintf(stderr, "dd: can't create a directory (%i)\n", errno);
+
+   util_snprintf(buf, buflen, "%s/%s_%u_%08u", dir, proc_name, getpid(),
+ p_atomic_inc_return() - 1);
+
+   if (verbose)
+  fprintf(stderr, "dd: dumping to file %s\n", buf);
+}
+
+FILE *
+dd_get_debug_file(bool verbose)
+{
+   char name[512];
+   FILE *f;
+
+   dd_get_debug_filename_and_mkdir(name, sizeof(name), verbose);
+   f = fopen(name, "w");
+   if (!f) {
+  fprintf(stderr, "dd: can't open file %s\n", name);
+  return NULL;
+   }
+
+   return f;
+}
+
+void
+dd_parse_apitrace_marker(const char *string, int len, unsigned *call_number)
+{
+   unsigned num;
+   char *s;
+
+   if (len <= 0)
+  return;
+
+   /* Make it zero-terminated. */
+   s = alloca(len + 1);
+   memcpy(s, string, len);
+   s[len] = 0;
+
+   /* Parse the number. */
+   errno = 0;
+   num = strtol(s, NULL, 10);
+   if (errno)
+  return;
+
+   *call_number = num;
+}
+
+void
 dd_write_header(FILE *f, struct pipe_screen *screen, unsigned 
apitrace_call_number)
 {
char cmd_line[4096];
if (os_get_command_line(cmd_line, sizeof(cmd_line)))
   fprintf(f, "Command: %s\n", cmd_line);
fprintf(f, "Driver vendor: %s\n", screen->get_vendor(screen));
fprintf(f, "Device vendor: %s\n", screen->get_device_vendor(screen));
fprintf(f, "Device name: %s\n\n", screen->get_name(screen));
 
if (apitrace_call_number)
diff --git a/src/gallium/auxiliary/driver_ddebug/dd_util.h 
b/src/gallium/auxiliary/driver_ddebug/dd_util.h
index 20aca94cc67..d3a1a36af62 100644
--- a/src/gallium/auxiliary/driver_ddebug/dd_util.h
+++ b/src/gallium/auxiliary/driver_ddebug/dd_util.h
@@ -44,73 +44,23 @@
 #elif defined(PIPE_OS_WINDOWS)
 #include 
 #include 
 #define mkdir(dir, mode) _mkdir(dir)
 #endif
 
 
 /* name of the directory in home */
 #define DD_DIR "ddebug_dumps"
 
-static inline void
-dd_get_debug_filename_and_mkdir(char *buf, size_t buflen, bool verbose)
-{
-   static unsigned index;
-   char proc_name[128], dir[256];
+void
+dd_get_debug_filename_and_mkdir(char *buf, size_t buflen, bool verbose);
 
-   if (!os_get_process_name(proc_name, sizeof(proc_name))) {
-  fprintf(stderr, "dd: can't get the process name\n");
-  strcpy(proc_name, "unknown");
-   }
+FILE *
+dd_get_debug_file(bool verbose);
 
-   util_snprintf(dir, sizeof(dir), "%s/"DD_DIR, debug_get_option("HOME", "."));
+void
+dd_parse_apitrace_marker(const char *string, int len, unsigned *call_number);
 
-   if (mkdir(dir, 0774) && errno != EEXIST)
-  fprintf(stderr, "dd: can't create a directory (%i)\n", errno);
-
-   util_snprintf(buf, buflen, "%s/%s_%u_%08u", dir, proc_name, getpid(),
- p_atomic_inc_return() - 1);
-
-   if (verbose)
-  fprintf(stderr, "dd: dumping to file %s\n", buf);
-}
-
-static inline FILE *
-dd_get_debug_file(bool verbose)
-{
-   char name[512];
-   FILE *f;
-
-   dd_get_debug_filename_and_mkdir(name, sizeof(name), verbose);
-   f = fopen(name, "w");
-   if (!f) {
-  fprintf(stderr, "dd: can't open file %s\n", name);
-  return NULL;
-   }
-
-   return f;
-}
-
-static inline void
-dd_parse_apitrace_marker(const char *string, int len, unsigned *call_number)
-{
-   unsigned num;
-   char *s;
-
-   if (len <= 0)
-  return;
-
-   /* Make it zero-terminated. */
-   s = alloca(len + 1);
-   memcpy(s, string, len);
-   s[len] = 0;
-
-   /* Parse the number. */
-   errno = 0;
-   num = strtol(s, NULL, 10);
-   if (errno)
-  return;
-
-   *call_number = num;
-}
+void
+dd_write_header(FILE *f, struct pipe_screen *screen, unsigned 
apitrace_call_number);
 
 #endif /* DD_UTIL_H */
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/8] ddebug: log calls to pipe->flush

2019-04-24 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This can be useful when internal draws lead to a hang.
---
 src/gallium/auxiliary/driver_ddebug/dd_draw.c | 75 ++-
 src/gallium/auxiliary/driver_ddebug/dd_pipe.h |  6 ++
 2 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/src/gallium/auxiliary/driver_ddebug/dd_draw.c 
b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
index 4eb0dd096f4..bda1891c49b 100644
--- a/src/gallium/auxiliary/driver_ddebug/dd_draw.c
+++ b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
@@ -275,20 +275,27 @@ dd_dump_shader(struct dd_draw_state *dstate, enum 
pipe_shader_type sh, FILE *f)
for (i = 0; i < PIPE_MAX_SHADER_BUFFERS; i++)
   if (dstate->shader_buffers[sh][i].buffer) {
  DUMP_I(shader_buffer, >shader_buffers[sh][i], i);
  if (dstate->shader_buffers[sh][i].buffer)
 DUMP_M(resource, >shader_buffers[sh][i], buffer);
   }
 
fprintf(f, COLOR_SHADER "end shader: %s" COLOR_RESET "\n\n", 
shader_str[sh]);
 }
 
+static void
+dd_dump_flush(struct dd_draw_state *dstate, struct call_flush *info, FILE *f)
+{
+   fprintf(f, "%s:\n", __func__+8);
+   DUMP_M(hex, info, flags);
+}
+
 static void
 dd_dump_draw_vbo(struct dd_draw_state *dstate, struct pipe_draw_info *info, 
FILE *f)
 {
int sh, i;
 
DUMP(draw_info, info);
if (info->count_from_stream_output)
   DUMP_M(stream_output_target, info,
  count_from_stream_output);
if (info->indirect) {
@@ -550,20 +557,23 @@ dd_dump_driver_state(struct dd_context *dctx, FILE *f, 
unsigned flags)
 "***\n");
   fprintf(f, "Driver-specific state:\n\n");
   dctx->pipe->dump_debug_state(dctx->pipe, f, flags);
}
 }
 
 static void
 dd_dump_call(FILE *f, struct dd_draw_state *state, struct dd_call *call)
 {
switch (call->type) {
+   case CALL_FLUSH:
+  dd_dump_flush(state, >info.flush, f);
+  break;
case CALL_DRAW_VBO:
   dd_dump_draw_vbo(state, >info.draw_vbo.draw, f);
   break;
case CALL_LAUNCH_GRID:
   dd_dump_launch_grid(state, >info.launch_grid, f);
   break;
case CALL_RESOURCE_COPY_REGION:
   dd_dump_resource_copy_region(state,
>info.resource_copy_region, f);
   break;
@@ -621,20 +631,22 @@ dd_kill_process(void)
fprintf(stderr, "dd: Aborting the process...\n");
fflush(stdout);
fflush(stderr);
exit(1);
 }
 
 static void
 dd_unreference_copy_of_call(struct dd_call *dst)
 {
switch (dst->type) {
+   case CALL_FLUSH:
+  break;
case CALL_DRAW_VBO:
   
pipe_so_target_reference(>info.draw_vbo.draw.count_from_stream_output, 
NULL);
   pipe_resource_reference(>info.draw_vbo.indirect.buffer, NULL);
   
pipe_resource_reference(>info.draw_vbo.indirect.indirect_draw_count, NULL);
   if (dst->info.draw_vbo.draw.index_size &&
   !dst->info.draw_vbo.draw.has_user_indices)
  pipe_resource_reference(>info.draw_vbo.draw.index.resource, 
NULL);
   else
  dst->info.draw_vbo.draw.index.user = NULL;
   break;
@@ -1086,27 +1098,37 @@ dd_create_record(struct dd_context *dctx)
util_queue_fence_init(>driver_finished);
util_queue_fence_reset(>driver_finished);
 
dd_init_copy_of_draw_state(>draw_state);
dd_copy_draw_state(>draw_state.base, >draw_state);
 
return record;
 }
 
 static void
-dd_context_flush(struct pipe_context *_pipe,
- struct pipe_fence_handle **fence, unsigned flags)
+dd_add_record(struct dd_context *dctx, struct dd_draw_record *record)
 {
-   struct dd_context *dctx = dd_context(_pipe);
-   struct pipe_context *pipe = dctx->pipe;
+   mtx_lock(>mutex);
+   if (unlikely(dctx->num_records > 1)) {
+  dctx->api_stalled = true;
+  /* Since this is only a heuristic to prevent the API thread from getting
+   * too far ahead, we don't need a loop here. */
+  cnd_wait(>cond, >mutex);
+  dctx->api_stalled = false;
+   }
 
-   pipe->flush(pipe, fence, flags);
+   if (list_empty(>records))
+  cnd_signal(>cond);
+
+   list_addtail(>list, >records);
+   dctx->num_records++;
+   mtx_unlock(>mutex);
 }
 
 static void
 dd_before_draw(struct dd_context *dctx, struct dd_draw_record *record)
 {
struct dd_screen *dscreen = dd_screen(dctx->base.screen);
struct pipe_context *pipe = dctx->pipe;
struct pipe_screen *screen = dscreen->screen;
 
record->time_before = os_time_get_nano();
@@ -1118,35 +1140,21 @@ dd_before_draw(struct dd_context *dctx, struct 
dd_draw_record *record)
   } else {
  pipe->flush(pipe, >prev_bottom_of_pipe,
  PIPE_FLUSH_DEFERRED | PIPE_FLUSH_BOTTOM_OF_PIPE);
  pipe->flush(pipe, >top_of_pipe,
  PIPE_FLUSH_DEFERRED | 

[Mesa-dev] [PATCH 5/8] ddebug: dump driver state into a separate file

2019-04-24 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Due to asynchronous execution, it's not clear which of the draws the state
may refer to.

This also works around an issue encountered with radeonsi where dumping
the driver state itself caused a hang.
---
 src/gallium/auxiliary/driver_ddebug/dd_draw.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/gallium/auxiliary/driver_ddebug/dd_draw.c 
b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
index bda1891c49b..98e7a6bb99f 100644
--- a/src/gallium/auxiliary/driver_ddebug/dd_draw.c
+++ b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
@@ -981,36 +981,43 @@ dd_report_hang(struct dd_context *dctx)
 
   FILE *f = fopen(name, "w");
   if (!f) {
  fprintf(stderr, "fopen failed\n");
   } else {
  fprintf(stderr, "%s\n", name);
 
  dd_write_header(f, dscreen->screen, 
record->draw_state.base.apitrace_call_number);
  dd_write_record(f, record);
 
- if (!encountered_hang) {
-dd_dump_driver_state(dctx, f, PIPE_DUMP_DEVICE_STATUS_REGISTERS);
-dd_dump_dmesg(f);
- }
-
  fclose(f);
   }
 
   if (top_not_reached)
  stop_output = true;
   encountered_hang = true;
}
 
if (num_later)
   fprintf(stderr, "... and %u additional draws.\n", num_later);
 
+   char name[512];
+   dd_get_debug_filename_and_mkdir(name, sizeof(name), false);
+   FILE *f = fopen(name, "w");
+   if (!f) {
+  fprintf(stderr, "fopen failed\n");
+   } else {
+  dd_write_header(f, dscreen->screen, 0);
+  dd_dump_driver_state(dctx, f, PIPE_DUMP_DEVICE_STATUS_REGISTERS);
+  dd_dump_dmesg(f);
+  fclose(f);
+   }
+
fprintf(stderr, "\nDone.\n");
dd_kill_process();
 }
 
 int
 dd_thread_main(void *input)
 {
struct dd_context *dctx = (struct dd_context *)input;
struct dd_screen *dscreen = dd_screen(dctx->base.screen);
struct pipe_screen *screen = dscreen->screen;
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/8] ddebug: set thread name

2019-04-24 Thread Nicolai Hähnle
From: Nicolai Hähnle 

For better debuggability.
---
 src/gallium/auxiliary/driver_ddebug/dd_draw.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/gallium/auxiliary/driver_ddebug/dd_draw.c 
b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
index f5b94356119..4eb0dd096f4 100644
--- a/src/gallium/auxiliary/driver_ddebug/dd_draw.c
+++ b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
@@ -26,20 +26,21 @@
  **/
 
 #include "dd_pipe.h"
 
 #include "util/u_dump.h"
 #include "util/u_format.h"
 #include "util/u_framebuffer.h"
 #include "util/u_helpers.h"
 #include "util/u_inlines.h"
 #include "util/u_memory.h"
+#include "util/u_process.h"
 #include "tgsi/tgsi_parse.h"
 #include "tgsi/tgsi_scan.h"
 #include "util/os_time.h"
 #include 
 #include "pipe/p_config.h"
 
 
 static void
 dd_write_header(FILE *f, struct pipe_screen *screen, unsigned 
apitrace_call_number)
 {
@@ -995,20 +996,29 @@ dd_report_hang(struct dd_context *dctx)
dd_kill_process();
 }
 
 int
 dd_thread_main(void *input)
 {
struct dd_context *dctx = (struct dd_context *)input;
struct dd_screen *dscreen = dd_screen(dctx->base.screen);
struct pipe_screen *screen = dscreen->screen;
 
+   const char *process_name = util_get_process_name();
+   if (process_name) {
+  char threadname[16];
+  util_snprintf(threadname, sizeof(threadname), "%.*s:ddbg",
+(int)MIN2(strlen(process_name), sizeof(threadname) - 6),
+process_name);
+  u_thread_setname(threadname);
+   }
+
mtx_lock(>mutex);
 
for (;;) {
   struct list_head records;
   list_replace(>records, );
   list_inithead(>records);
   dctx->num_records = 0;
 
   if (dctx->api_stalled)
  cnd_signal(>cond);
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/8] ddebug, radeonsi: misc changes to help debugging

2019-04-24 Thread Nicolai Hähnle
Hi folks,

this is a collection of assorted patches that should help with driver
debugging:

- add driconf-style debug options in a convenient way
- some minor ddebug cleanups
- allow dumping aux context command streams
- allow force-syncing of compile threads

Please review!

Thanks,
Nicolai
--
 .../auxiliary/driver_ddebug/dd_draw.c| 165 ++---
 .../auxiliary/driver_ddebug/dd_pipe.h|   6 +
 .../auxiliary/driver_ddebug/dd_util.h|  66 +--
 src/gallium/auxiliary/util/u_log.c   |   4 +
 .../drivers/radeonsi/driinfo_radeonsi.h  |  12 +-
 src/gallium/drivers/radeonsi/si_clear.c  |   2 +-
 src/gallium/drivers/radeonsi/si_debug.c  |  17 ++
 .../drivers/radeonsi/si_debug_options.inc|   6 +
 src/gallium/drivers/radeonsi/si_get.c|   6 +-
 src/gallium/drivers/radeonsi/si_pipe.c   |  36 +++-
 src/gallium/drivers/radeonsi/si_pipe.h   |   7 +-
 .../drivers/radeonsi/si_state_shaders.c  |  13 +-
 src/util/merge_driinfo.py|  58 +-
 src/util/xmlpool/t_options.h |   9 -
 14 files changed, 288 insertions(+), 119 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 8/8] radeonsi: add radeonsi_sync_compile option

2019-04-24 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Force the driver thread to sync immediately with a compiler thread (but
compilation still happens in a separate thread).

This can be useful to simplify debugging compiler issues.
---
 src/gallium/drivers/radeonsi/si_debug_options.inc |  1 +
 src/gallium/drivers/radeonsi/si_state_shaders.c   | 13 ++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_debug_options.inc 
b/src/gallium/drivers/radeonsi/si_debug_options.inc
index f4c3e19ed95..019256ca1d1 100644
--- a/src/gallium/drivers/radeonsi/si_debug_options.inc
+++ b/src/gallium/drivers/radeonsi/si_debug_options.inc
@@ -1,5 +1,6 @@
 OPT_BOOL(clear_db_cache_before_clear, false, "Clear DB cache before fast depth 
clear")
 OPT_BOOL(enable_nir, false, "Enable NIR")
 OPT_BOOL(aux_debug, false, "Generate ddebug_dumps for the auxiliary context")
+OPT_BOOL(sync_compile, false, "Always compile synchronously (will cause 
stalls)")
 
 #undef OPT_BOOL
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 5bdfd4f6ac1..f57e7730905 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1945,20 +1945,24 @@ current_not_ready:
sel->first_variant = shader;
sel->last_variant = shader;
} else {
sel->last_variant->next_variant = shader;
sel->last_variant = shader;
}
 
/* Use the default (unoptimized) shader for now. */
memset(>opt, 0, sizeof(key->opt));
mtx_unlock(>mutex);
+
+   if (sscreen->options.sync_compile)
+   util_queue_fence_wait(>ready);
+
goto again;
}
 
/* Reset the fence before adding to the variant list. */
util_queue_fence_reset(>ready);
 
if (!sel->last_variant) {
sel->first_variant = shader;
sel->last_variant = shader;
} else {
@@ -2157,38 +2161,41 @@ static void si_init_shader_selector_async(void *job, 
int thread_index)
 }
 
 void si_schedule_initial_compile(struct si_context *sctx, unsigned processor,
 struct util_queue_fence *ready_fence,
 struct si_compiler_ctx_state 
*compiler_ctx_state,
 void *job, util_queue_execute_func execute)
 {
util_queue_fence_init(ready_fence);
 
struct util_async_debug_callback async_debug;
-   bool wait =
+   bool debug =
(sctx->debug.debug_message && !sctx->debug.async) ||
sctx->is_debug ||
si_can_dump_shader(sctx->screen, processor);
 
-   if (wait) {
+   if (debug) {
u_async_debug_init(_debug);
compiler_ctx_state->debug = async_debug.base;
}
 
util_queue_add_job(>screen->shader_compiler_queue, job,
   ready_fence, execute, NULL);
 
-   if (wait) {
+   if (debug) {
util_queue_fence_wait(ready_fence);
u_async_debug_drain(_debug, >debug);
u_async_debug_cleanup(_debug);
}
+
+   if (sctx->screen->options.sync_compile)
+   util_queue_fence_wait(ready_fence);
 }
 
 /* Return descriptor slot usage masks from the given shader info. */
 void si_get_active_slot_masks(const struct tgsi_shader_info *info,
  uint32_t *const_and_shader_buffers,
  uint64_t *samplers_and_images)
 {
unsigned start, num_shaderbufs, num_constbufs, num_images, num_samplers;
 
num_shaderbufs = util_last_bit(info->shader_buffers_declared);
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/8] util/u_log: flush auto loggers before starting a new page

2019-04-24 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Without this, command stream dumps of radeonsi may misleadingly end up
in a later page.
---
 src/gallium/auxiliary/util/u_log.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/auxiliary/util/u_log.c 
b/src/gallium/auxiliary/util/u_log.c
index 90fd24ca394..095421edd06 100644
--- a/src/gallium/auxiliary/util/u_log.c
+++ b/src/gallium/auxiliary/util/u_log.c
@@ -180,35 +180,39 @@ u_log_chunk(struct u_log_context *ctx, const struct 
u_log_chunk_type *type,
 out_of_memory:
fprintf(stderr, "Gallium: u_log: out of memory\n");
 }
 
 /**
  * Convenience helper that starts a new page and prints the previous one.
  */
 void
 u_log_new_page_print(struct u_log_context *ctx, FILE *stream)
 {
+   u_log_flush(ctx);
+
if (ctx->cur) {
   u_log_page_print(ctx->cur, stream);
   u_log_page_destroy(ctx->cur);
   ctx->cur = NULL;
}
 }
 
 /**
  * Return the current page from the logging context and start a new one.
  *
  * The caller is responsible for destroying the returned page.
  */
 struct u_log_page *
 u_log_new_page(struct u_log_context *ctx)
 {
+   u_log_flush(ctx);
+
struct u_log_page *page = ctx->cur;
ctx->cur = NULL;
return page;
 }
 
 /**
  * Free all data associated with \p page.
  */
 void
 u_log_page_destroy(struct u_log_page *page)
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/8] radeonsi: add si_debug_options for convenient adding/removing of options

2019-04-24 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Move the definition of radeonsi_clear_db_cache_before_clear there,
as well as radeonsi_enable_nir.

This removes the AMD_DEBUG=nir option.

We currently still have two places for options: the driconf machinery
and AMD_DEBUG/R600_DEBUG. If we are to have a single place for options,
then the driconf machinery should be preferred since it's more flexible.

The only downside of the driconf machinery was that adding new options
was quite inconvenient. With this change, a simple boolean option can
be added with a single line of code, same as for AMD_DEBUG.

One technical limitation of this particular implementation is that while
almost all driconf features are available, the translation machinery doesn't
pick up the description strings for options added in si_debvug_options. In
practice, translations haven't been provided anyway, and this is intended
for developer options, so I'm not too worried. It could always be added
later if anybody really cares.
---
 .../drivers/radeonsi/driinfo_radeonsi.h   | 12 +++-
 src/gallium/drivers/radeonsi/si_clear.c   |  2 +-
 .../drivers/radeonsi/si_debug_options.inc |  4 ++
 src/gallium/drivers/radeonsi/si_get.c |  6 +-
 src/gallium/drivers/radeonsi/si_pipe.c| 22 ---
 src/gallium/drivers/radeonsi/si_pipe.h|  7 ++-
 src/util/merge_driinfo.py | 58 +--
 src/util/xmlpool/t_options.h  |  9 ---
 8 files changed, 89 insertions(+), 31 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/driinfo_radeonsi.h 
b/src/gallium/drivers/radeonsi/driinfo_radeonsi.h
index edf8edba035..d92883b9c38 100644
--- a/src/gallium/drivers/radeonsi/driinfo_radeonsi.h
+++ b/src/gallium/drivers/radeonsi/driinfo_radeonsi.h
@@ -4,13 +4,21 @@ DRI_CONF_SECTION_QUALITY
 DRI_CONF_SECTION_END
 
 DRI_CONF_SECTION_PERFORMANCE
 DRI_CONF_RADEONSI_ENABLE_SISCHED("false")
 DRI_CONF_RADEONSI_ASSUME_NO_Z_FIGHTS("false")
 DRI_CONF_RADEONSI_COMMUTATIVE_BLEND_ADD("false")
 DRI_CONF_RADEONSI_ZERO_ALL_VRAM_ALLOCS("false")
 DRI_CONF_SECTION_END
 
 DRI_CONF_SECTION_DEBUG
-   DRI_CONF_RADEONSI_CLEAR_DB_CACHE_BEFORE_CLEAR("false")
-   DRI_CONF_RADEONSI_ENABLE_NIR("false")
+
+//= BEGIN VERBATIM
+#define OPT_BOOL(name, dflt, description) \
+   DRI_CONF_OPT_BEGIN_B(radeonsi_##name, #dflt) \
+   DRI_CONF_DESC(en, description) \
+   DRI_CONF_OPT_END
+
+#include "radeonsi/si_debug_options.inc"
+//= END VERBATIM
+
 DRI_CONF_SECTION_END
diff --git a/src/gallium/drivers/radeonsi/si_clear.c 
b/src/gallium/drivers/radeonsi/si_clear.c
index e1805f2a1c9..a4ebd5cf2a5 100644
--- a/src/gallium/drivers/radeonsi/si_clear.c
+++ b/src/gallium/drivers/radeonsi/si_clear.c
@@ -631,21 +631,21 @@ static void si_clear(struct pipe_context *ctx, unsigned 
buffers,
 * a coincidence and the root cause is elsewhere.
 *
 * The corruption can be fixed by putting the DB flush before
 * or after the depth clear. (surprisingly)
 *
 * https://bugs.freedesktop.org/show_bug.cgi?id=102955 
(apitrace)
 *
 * This hack decreases back-to-back ClearDepth performance.
 */
if ((sctx->db_depth_clear || sctx->db_stencil_clear) &&
-   sctx->screen->clear_db_cache_before_clear)
+   sctx->screen->options.clear_db_cache_before_clear)
sctx->flags |= SI_CONTEXT_FLUSH_AND_INV_DB;
}
 
si_blitter_begin(sctx, SI_CLEAR);
util_blitter_clear(sctx->blitter, fb->width, fb->height,
   util_framebuffer_get_num_layers(fb),
   buffers, color, depth, stencil);
si_blitter_end(sctx);
 
if (sctx->db_depth_clear) {
diff --git a/src/gallium/drivers/radeonsi/si_debug_options.inc 
b/src/gallium/drivers/radeonsi/si_debug_options.inc
new file mode 100644
index 000..165dba8baf5
--- /dev/null
+++ b/src/gallium/drivers/radeonsi/si_debug_options.inc
@@ -0,0 +1,4 @@
+OPT_BOOL(clear_db_cache_before_clear, false, "Clear DB cache before fast depth 
clear")
+OPT_BOOL(enable_nir, false, "Enable NIR")
+
+#undef OPT_BOOL
diff --git a/src/gallium/drivers/radeonsi/si_get.c 
b/src/gallium/drivers/radeonsi/si_get.c
index 67fbc50998b..fda71da16e6 100644
--- a/src/gallium/drivers/radeonsi/si_get.c
+++ b/src/gallium/drivers/radeonsi/si_get.c
@@ -203,21 +203,21 @@ static int si_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
return !sscreen->info.has_unaligned_shader_loads;
 
case PIPE_CAP_SPARSE_BUFFE

Re: [Mesa-dev] [PATCH 1/2] radeonsi: always use compute rings for clover on CI and newer

2019-02-12 Thread Nicolai Hähnle

On 11.02.19 21:27, Marek Olšák wrote:

From: Marek Olšák 

initialize all non-compute context functions to NULL.
---
  src/gallium/drivers/radeonsi/si_blit.c| 14 ++-
  src/gallium/drivers/radeonsi/si_clear.c   |  7 +-
  src/gallium/drivers/radeonsi/si_compute.c | 15 +--
  src/gallium/drivers/radeonsi/si_descriptors.c | 10 +-
  src/gallium/drivers/radeonsi/si_gfx_cs.c  | 29 +++---
  src/gallium/drivers/radeonsi/si_pipe.c| 95 +++
  src/gallium/drivers/radeonsi/si_pipe.h|  3 +-
  src/gallium/drivers/radeonsi/si_state.c   |  3 +-
  src/gallium/drivers/radeonsi/si_state.h   |  1 +
  src/gallium/drivers/radeonsi/si_state_draw.c  | 25 +++--
  src/gallium/drivers/radeonsi/si_texture.c |  3 +
  11 files changed, 130 insertions(+), 75 deletions(-)


[snip]

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 20767c806d2..98c4fabc741 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -381,61 +381,56 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen,
  {
struct si_context *sctx = CALLOC_STRUCT(si_context);
struct si_screen* sscreen = (struct si_screen *)screen;
struct radeon_winsys *ws = sscreen->ws;
int shader, i;
bool stop_exec_on_failure = (flags & 
PIPE_CONTEXT_LOSE_CONTEXT_ON_RESET) != 0;
  
  	if (!sctx)

return NULL;
  
+	sctx->has_graphics = sscreen->info.chip_class >= CIK &&

+!(flags & PIPE_CONTEXT_COMPUTE_ONLY);


The logic seems backwards here for SI.

Cheers,
Nicolai




+
if (flags & PIPE_CONTEXT_DEBUG)
sscreen->record_llvm_ir = true; /* racy but not critical */
  
  	sctx->b.screen = screen; /* this must be set first */

sctx->b.priv = NULL;
sctx->b.destroy = si_destroy_context;
-   sctx->b.emit_string_marker = si_emit_string_marker;
-   sctx->b.set_debug_callback = si_set_debug_callback;
-   sctx->b.set_log_context = si_set_log_context;
-   sctx->b.set_context_param = si_set_context_param;
sctx->screen = sscreen; /* Easy accessing of screen/winsys. */
sctx->is_debug = (flags & PIPE_CONTEXT_DEBUG) != 0;
  
  	slab_create_child(>pool_transfers, >pool_transfers);

slab_create_child(>pool_transfers_unsync, 
>pool_transfers);
  
  	sctx->ws = sscreen->ws;

sctx->family = sscreen->info.family;
sctx->chip_class = sscreen->info.chip_class;
  
  	if (sscreen->info.has_gpu_reset_counter_query) {

sctx->gpu_reset_counter =
sctx->ws->query_value(sctx->ws, 
RADEON_GPU_RESET_COUNTER);
}
  
-	sctx->b.get_device_reset_status = si_get_reset_status;

-   sctx->b.set_device_reset_callback = si_set_device_reset_callback;
-
-   si_init_context_texture_functions(sctx);
-   si_init_query_functions(sctx);
  
  	if (sctx->chip_class == CIK ||

sctx->chip_class == VI ||
sctx->chip_class == GFX9) {
sctx->eop_bug_scratch = si_resource(
pipe_buffer_create(>b, 0, PIPE_USAGE_DEFAULT,
   16 * 
sscreen->info.num_render_backends));
if (!sctx->eop_bug_scratch)
goto fail;
}
  
+	/* Initialize context allocators. */

sctx->allocator_zeroed_memory =
u_suballocator_create(>b, 128 * 1024,
  0, PIPE_USAGE_DEFAULT,
  SI_RESOURCE_FLAG_UNMAPPABLE |
  SI_RESOURCE_FLAG_CLEAR, false);
if (!sctx->allocator_zeroed_memory)
goto fail;
  
  	sctx->b.stream_uploader = u_upload_create(>b, 1024 * 1024,

0, PIPE_USAGE_STREAM,
@@ -459,38 +454,22 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen,
sctx->ctx = sctx->ws->ctx_create(sctx->ws);
if (!sctx->ctx)
goto fail;
  
  	if (sscreen->info.num_sdma_rings && !(sscreen->debug_flags & DBG(NO_ASYNC_DMA))) {

sctx->dma_cs = sctx->ws->cs_create(sctx->ctx, RING_DMA,
   (void*)si_flush_dma_cs,
   sctx, stop_exec_on_failure);
}
  
-	si_init_buffer_functions(sctx);

-   si_init_clear_functions(sctx);
-   si_init_blit_functions(sctx);
-   si_init_compute_functions(sctx);
-   si_init_compute_blit_functions(sctx);
-   si_init_debug_functions(sctx);
-   si_init_msaa_functions(sctx);
-   si_init_streamout_functions(sctx);
-
-   if (sscreen->info.has_hw_decode) {
-   sctx->b.create_video_codec = si_uvd_create_decoder;
-   sctx->b.create_video_buffer = si_video_buffer_create;
-   } else {
-   

Re: [Mesa-dev] [PATCH 2/2] radeonsi: use MEM instead of MEM_GRBM in COPY_DATA.DST_SEL

2019-02-12 Thread Nicolai Hähnle

Both patches:

Reviewed-by: Nicolai Hähnle 

On 11.02.19 21:26, Marek Olšák wrote:

From: Marek Olšák 

---
  src/gallium/drivers/radeonsi/si_perfcounter.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_perfcounter.c 
b/src/gallium/drivers/radeonsi/si_perfcounter.c
index d55394f2cba..4ce71f9500d 100644
--- a/src/gallium/drivers/radeonsi/si_perfcounter.c
+++ b/src/gallium/drivers/radeonsi/si_perfcounter.c
@@ -669,21 +669,21 @@ static void si_pc_emit_select(struct si_context *sctx,
  static void si_pc_emit_start(struct si_context *sctx,
 struct si_resource *buffer, uint64_t va)
  {
struct radeon_cmdbuf *cs = sctx->gfx_cs;
  
  	radeon_add_to_buffer_list(sctx, sctx->gfx_cs, buffer,

  RADEON_USAGE_WRITE, RADEON_PRIO_QUERY);
  
  	radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));

radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_IMM) |
-   COPY_DATA_DST_SEL(COPY_DATA_DST_MEM_GRBM));
+   COPY_DATA_DST_SEL(COPY_DATA_DST_MEM));
radeon_emit(cs, 1); /* immediate */
radeon_emit(cs, 0); /* unused */
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
  
  	radeon_set_uconfig_reg(cs, R_036020_CP_PERFMON_CNTL,

   
S_036020_PERFMON_STATE(V_036020_DISABLE_AND_RESET));
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_PERFCOUNTER_START) | 
EVENT_INDEX(0));
radeon_set_uconfig_reg(cs, R_036020_CP_PERFMON_CNTL,
@@ -725,34 +725,34 @@ static void si_pc_emit_read(struct si_context *sctx,
if (!(regs->layout & SI_PC_FAKE)) {
if (regs->layout & SI_PC_REG_REVERSE)
reg_delta = -reg_delta;
  
  		for (idx = 0; idx < count; ++idx) {

if (regs->counters)
reg = regs->counters[idx];
  
  			radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));

radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_PERF) |
-   
COPY_DATA_DST_SEL(COPY_DATA_DST_MEM_GRBM) |
+   COPY_DATA_DST_SEL(COPY_DATA_DST_MEM) |
COPY_DATA_COUNT_SEL); /* 64 bits */
radeon_emit(cs, reg >> 2);
radeon_emit(cs, 0); /* unused */
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
va += sizeof(uint64_t);
reg += reg_delta;
}
} else {
for (idx = 0; idx < count; ++idx) {
radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_IMM) |
-   
COPY_DATA_DST_SEL(COPY_DATA_DST_MEM_GRBM) |
+   COPY_DATA_DST_SEL(COPY_DATA_DST_MEM) |
COPY_DATA_COUNT_SEL);
radeon_emit(cs, 0); /* immediate */
radeon_emit(cs, 0);
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
va += sizeof(uint64_t);
}
}
  }
  



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] radeonsi: use SDMA for uploading data through const_uploader

2019-02-11 Thread Nicolai Hähnle

On 07.02.19 02:22, Marek Olšák wrote:

+   bool use_sdma_upload = sscreen->info.has_dedicated_vram && sctx->dma_cs && 
debug_get_bool_option("SDMA", true);


Could you please namespace the environment variable, e.g. RADEONSI_SDMA?

Apart from that, series is

Reviewed-by: Nicolai Hähnle 



+   sctx->b.const_uploader = u_upload_create(>b, 256 * 1024,
+0, PIPE_USAGE_DEFAULT,
+SI_RESOURCE_FLAG_32BIT |
+(use_sdma_upload ?
+ 
SI_RESOURCE_FLAG_UPLOAD_FLUSH_EXPLICIT_VIA_SDMA :
+ 
(sscreen->cpdma_prefetch_writes_memory ?
+  0 : 
SI_RESOURCE_FLAG_READ_ONLY)));
+   if (!sctx->b.const_uploader)
+   goto fail;
+
+   if (use_sdma_upload)
+   u_upload_enable_flush_explicit(sctx->b.const_uploader);
+
si_init_buffer_functions(sctx);
si_init_clear_functions(sctx);
si_init_blit_functions(sctx);
si_init_compute_functions(sctx);
si_init_compute_blit_functions(sctx);
si_init_debug_functions(sctx);
si_init_msaa_functions(sctx);
si_init_streamout_functions(sctx);
  
  	if (sscreen->info.has_hw_decode) {

diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index b01d5744752..b208bdeb848 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -103,20 +103,22 @@
  #define SI_MAX_VARIABLE_THREADS_PER_BLOCK 1024
  
  #define SI_RESOURCE_FLAG_TRANSFER	(PIPE_RESOURCE_FLAG_DRV_PRIV << 0)

  #define SI_RESOURCE_FLAG_FLUSHED_DEPTH(PIPE_RESOURCE_FLAG_DRV_PRIV << 
1)
  #define SI_RESOURCE_FLAG_FORCE_MSAA_TILING (PIPE_RESOURCE_FLAG_DRV_PRIV << 2)
  #define SI_RESOURCE_FLAG_DISABLE_DCC  (PIPE_RESOURCE_FLAG_DRV_PRIV << 3)
  #define SI_RESOURCE_FLAG_UNMAPPABLE   (PIPE_RESOURCE_FLAG_DRV_PRIV << 4)
  #define SI_RESOURCE_FLAG_READ_ONLY(PIPE_RESOURCE_FLAG_DRV_PRIV << 5)
  #define SI_RESOURCE_FLAG_32BIT(PIPE_RESOURCE_FLAG_DRV_PRIV << 
6)
  #define SI_RESOURCE_FLAG_CLEAR(PIPE_RESOURCE_FLAG_DRV_PRIV << 
7)
+/* For const_uploader, upload data via GTT and copy to VRAM on context flush 
via SDMA. */
+#define SI_RESOURCE_FLAG_UPLOAD_FLUSH_EXPLICIT_VIA_SDMA  
(PIPE_RESOURCE_FLAG_DRV_PRIV << 8)
  
  enum si_clear_code

  {
DCC_CLEAR_COLOR_   = 0x,
DCC_CLEAR_COLOR_0001   = 0x40404040,
DCC_CLEAR_COLOR_1110   = 0x80808080,
DCC_CLEAR_COLOR_   = 0xC0C0C0C0,
DCC_CLEAR_COLOR_REG= 0x20202020,
DCC_UNCOMPRESSED   = 0x,
  };
@@ -769,20 +771,28 @@ struct si_saved_cs {
struct si_context   *ctx;
struct radeon_saved_cs  gfx;
struct si_resource  *trace_buf;
unsignedtrace_id;
  
  	unsigned		gfx_last_dw;

boolflushed;
int64_t time_flush;
  };
  
+struct si_sdma_upload {

+   struct si_resource  *dst;
+   struct si_resource  *src;
+   unsignedsrc_offset;
+   unsigneddst_offset;
+   unsignedsize;
+};
+
  struct si_context {
struct pipe_context b; /* base class */
  
  	enum radeon_family		family;

enum chip_class chip_class;
  
  	struct radeon_winsys		*ws;

struct radeon_winsys_ctx*ctx;
struct radeon_cmdbuf*gfx_cs;
struct radeon_cmdbuf*dma_cs;
@@ -1074,20 +1084,26 @@ struct si_context {
int num_perfect_occlusion_queries;
struct list_headactive_queries;
unsignednum_cs_dw_queries_suspend;
  
  	/* Render condition. */

struct pipe_query   *render_cond;
unsignedrender_cond_mode;
boolrender_cond_invert;
boolrender_cond_force_off; /* for u_blitter 
*/
  
+	/* For uploading data via GTT and copy to VRAM on context flush via SDMA. */

+   boolsdma_uploads_in_progress;
+   struct si_sdma_upload   *sdma_uploads;
+   unsignednum_sdma_uploads;
+   unsignedmax_sdma_uploads;
+
/* Statistics gathering for the DCC enablement heuristic. It can't be
 * in si_texture because si_texture can be shared by multiple
 * contexts. This is for back buffers only. We shouldn't get too many
 * of those.
 *
 * X11 DRI3 rotates among a fin

Re: [Mesa-dev] [PATCH 6/6] winsys/amdgpu: cs_check_space sets the minimum IB size for future IBs

2019-02-11 Thread Nicolai Hähnle

For the series:

Reviewed-by: Nicolai Hähnle 

On 06.02.19 22:20, Marek Olšák wrote:

From: Marek Olšák 

---
  src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 18 --
  src/gallium/winsys/amdgpu/drm/amdgpu_cs.h |  7 +++
  2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index b3dedef3d73..dd5193c003d 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -675,21 +675,21 @@ static bool amdgpu_ib_new_buffer(struct amdgpu_winsys 
*ws, struct amdgpu_ib *ib,
  * size, aligned to a power of two (and multiplied by 4 to reduce internal
  * fragmentation if chaining is not available). Limit to 512k dwords, which
  * is the largest power of two that fits into the size field of the
  * INDIRECT_BUFFER packet.
  */
 if (amdgpu_cs_has_chaining(amdgpu_cs_from_ib(ib)))
buffer_size = 4 *util_next_power_of_two(ib->max_ib_size);
 else
buffer_size = 4 *util_next_power_of_two(4 * ib->max_ib_size);
  
-   const unsigned min_size = 8 * 1024 * 4;

+   const unsigned min_size = MAX2(ib->max_check_space_size, 8 * 1024 * 4);
 const unsigned max_size = 512 * 1024 * 4;
  
 buffer_size = MIN2(buffer_size, max_size);

 buffer_size = MAX2(buffer_size, min_size); /* min_size is more important */
  
 pb = ws->base.buffer_create(>base, buffer_size,

 ws->info.gart_page_size,
 RADEON_DOMAIN_GTT,
 RADEON_FLAG_NO_INTERPROCESS_SHARING |
 (ring_type == RING_GFX ||
@@ -742,20 +742,25 @@ static bool amdgpu_get_new_ib(struct radeon_winsys *ws, 
struct amdgpu_cs *cs,
  
 switch (ib_type) {

 case IB_MAIN:
ib = >main;
ib_size = 4 * 1024 * 4;
break;
 default:
unreachable("unhandled IB type");
 }
  
+   /* Always allocate at least the size of the biggest cs_check_space call,

+* because precisely the last call might have requested this size.
+*/
+   ib_size = MAX2(ib_size, ib->max_check_space_size);
+
 if (!amdgpu_cs_has_chaining(cs)) {
ib_size = MAX2(ib_size,
   4 * MIN2(util_next_power_of_two(ib->max_ib_size),
amdgpu_ib_max_submit_dwords(ib_type)));
 }
  
 ib->max_ib_size = ib->max_ib_size - ib->max_ib_size / 32;
  
 ib->base.prev_dw = 0;

 ib->base.num_prev = 0;
@@ -776,20 +781,21 @@ static bool amdgpu_get_new_ib(struct radeon_winsys *ws, 
struct amdgpu_cs *cs,
 ib->ptr_ib_size = >ib_bytes;
 ib->ptr_ib_size_inside_ib = false;
  
 amdgpu_cs_add_buffer(>main.base, ib->big_ib_buffer,

  RADEON_USAGE_READ, 0, RADEON_PRIO_IB1);
  
 ib->base.current.buf = (uint32_t*)(ib->ib_mapped + ib->used_ib_space);
  
 ib_size = ib->big_ib_buffer->size - ib->used_ib_space;

 ib->base.current.max_dw = ib_size / 4 - 
amdgpu_cs_epilog_dws(cs->ring_type);
+   assert(ib->base.current.max_dw >= ib->max_check_space_size / 4);
 return true;
  }
  
  static void amdgpu_set_ib_size(struct amdgpu_ib *ib)

  {
 if (ib->ptr_ib_size_inside_ib) {
*ib->ptr_ib_size = ib->base.current.cdw |
   S_3F2_CHAIN(1) | S_3F2_VALID(1);
 } else {
*ib->ptr_ib_size = ib->base.current.cdw;
@@ -971,25 +977,32 @@ amdgpu_cs_create(struct radeon_winsys_ctx *rwctx,
  static bool amdgpu_cs_validate(struct radeon_cmdbuf *rcs)
  {
 return true;
  }
  
  static bool amdgpu_cs_check_space(struct radeon_cmdbuf *rcs, unsigned dw)

  {
 struct amdgpu_ib *ib = amdgpu_ib(rcs);
 struct amdgpu_cs *cs = amdgpu_cs_from_ib(ib);
 unsigned requested_size = rcs->prev_dw + rcs->current.cdw + dw;
+   unsigned cs_epilog_dw = amdgpu_cs_epilog_dws(cs->ring_type);
+   unsigned need_byte_size = (dw + cs_epilog_dw) * 4;
 uint64_t va;
 uint32_t *new_ptr_ib_size;
  
 assert(rcs->current.cdw <= rcs->current.max_dw);
  
+   /* 125% of the size for IB epilog. */

+   unsigned safe_byte_size = need_byte_size + need_byte_size / 4;
+   ib->max_check_space_size = MAX2(ib->max_check_space_size,
+   safe_byte_size);
+
 if (requested_size > amdgpu_ib_max_submit_dwords(ib->ib_type))
return false;
  
 ib->max_ib_size = MAX2(ib->max_ib_size, requested_size);
  
 if (rcs->current.max_dw - rcs->current.cdw >= dw)

return true;
  
 if (!amdgpu_cs_has_chaining(cs))

return false;
@@ -1038,21 +1051,22 @@ static bool amdgpu_cs_check_space(struct radeon_cmdbuf 
*rcs, unsigned dw)
 /* Hook up the new chunk */
 rcs->prev[rcs->num_prev].buf = rcs->current.buf;
 rcs->prev[rcs->num_prev]

Re: [Mesa-dev] [PATCH 2/2] radeonsi: fix EXPLICIT_FLUSH for flush offsets > 0

2019-02-11 Thread Nicolai Hähnle

Both patches:

Reviewed-by: Nicolai Hähnle 

On 06.02.19 22:12, Marek Olšák wrote:

From: Marek Olšák 

Cc: 18.3 19.0 
---
  src/gallium/drivers/radeonsi/si_buffer.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_buffer.c 
b/src/gallium/drivers/radeonsi/si_buffer.c
index bac561de2cb..c01118ce96a 100644
--- a/src/gallium/drivers/radeonsi/si_buffer.c
+++ b/src/gallium/drivers/radeonsi/si_buffer.c
@@ -518,24 +518,27 @@ static void *si_buffer_transfer_map(struct pipe_context 
*ctx,
  }
  
  static void si_buffer_do_flush_region(struct pipe_context *ctx,

  struct pipe_transfer *transfer,
  const struct pipe_box *box)
  {
struct si_transfer *stransfer = (struct si_transfer*)transfer;
struct si_resource *buf = si_resource(transfer->resource);
  
  	if (stransfer->staging) {

+   unsigned src_offset = stransfer->offset +
+ transfer->box.x % SI_MAP_BUFFER_ALIGNMENT 
+
+ (box->x - transfer->box.x);
+
/* Copy the staging buffer into the original one. */
si_copy_buffer((struct si_context*)ctx, transfer->resource,
-  >staging->b.b, box->x,
-  stransfer->offset + box->x % 
SI_MAP_BUFFER_ALIGNMENT,
+  >staging->b.b, box->x, src_offset,
   box->width);
}
  
  	util_range_add(>valid_buffer_range, box->x,

   box->x + box->width);
  }
  
  static void si_buffer_flush_region(struct pipe_context *ctx,

   struct pipe_transfer *transfer,
   const struct pipe_box *rel_box)



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] radeonsi: use local ws variable in si_need_dma_space

2019-02-05 Thread Nicolai Hähnle

For the series:

Reviewed-by: Nicolai Hähnle 

On 31.01.19 19:56, Marek Olšák wrote:

From: Marek Olšák 

---
  src/gallium/drivers/radeonsi/si_dma_cs.c | 19 ++-
  1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_dma_cs.c 
b/src/gallium/drivers/radeonsi/si_dma_cs.c
index 33177a9e4ad..2aafc1f09a0 100644
--- a/src/gallium/drivers/radeonsi/si_dma_cs.c
+++ b/src/gallium/drivers/radeonsi/si_dma_cs.c
@@ -119,71 +119,72 @@ void si_sdma_clear_buffer(struct si_context *sctx, struct 
pipe_resource *dst,
radeon_emit(cs, clear_value);
radeon_emit(cs, sctx->chip_class >= GFX9 ? csize - 1 : csize);
offset += csize;
size -= csize;
}
  }
  
  void si_need_dma_space(struct si_context *ctx, unsigned num_dw,

   struct si_resource *dst, struct si_resource *src)
  {
+   struct radeon_winsys *ws = ctx->ws;
uint64_t vram = ctx->dma_cs->used_vram;
uint64_t gtt = ctx->dma_cs->used_gart;
  
  	if (dst) {

vram += dst->vram_usage;
gtt += dst->gart_usage;
}
if (src) {
vram += src->vram_usage;
gtt += src->gart_usage;
}
  
  	/* Flush the GFX IB if DMA depends on it. */

if (radeon_emitted(ctx->gfx_cs, ctx->initial_gfx_cs_size) &&
((dst &&
- ctx->ws->cs_is_buffer_referenced(ctx->gfx_cs, dst->buf,
-RADEON_USAGE_READWRITE)) ||
+ ws->cs_is_buffer_referenced(ctx->gfx_cs, dst->buf,
+ RADEON_USAGE_READWRITE)) ||
 (src &&
- ctx->ws->cs_is_buffer_referenced(ctx->gfx_cs, src->buf,
-RADEON_USAGE_WRITE
+ ws->cs_is_buffer_referenced(ctx->gfx_cs, src->buf,
+ RADEON_USAGE_WRITE
si_flush_gfx_cs(ctx, RADEON_FLUSH_ASYNC_START_NEXT_GFX_IB_NOW, 
NULL);
  
  	/* Flush if there's not enough space, or if the memory usage per IB

 * is too large.
 *
 * IBs using too little memory are limited by the IB submission 
overhead.
 * IBs using too much memory are limited by the kernel/TTM overhead.
 * Too long IBs create CPU-GPU pipeline bubbles and add latency.
 *
 * This heuristic makes sure that DMA requests are executed
 * very soon after the call is made and lowers memory usage.
 * It improves texture upload performance by keeping the DMA
 * engine busy while uploads are being submitted.
 */
num_dw++; /* for emit_wait_idle below */
-   if (!ctx->ws->cs_check_space(ctx->dma_cs, num_dw) ||
+   if (!ws->cs_check_space(ctx->dma_cs, num_dw) ||
ctx->dma_cs->used_vram + ctx->dma_cs->used_gart > 64 * 1024 * 1024 
||
!radeon_cs_memory_below_limit(ctx->screen, ctx->dma_cs, vram, gtt)) 
{
si_flush_dma_cs(ctx, PIPE_FLUSH_ASYNC, NULL);
assert((num_dw + ctx->dma_cs->current.cdw) <= 
ctx->dma_cs->current.max_dw);
}
  
  	/* Wait for idle if either buffer has been used in the IB before to

 * prevent read-after-write hazards.
 */
if ((dst &&
-ctx->ws->cs_is_buffer_referenced(ctx->dma_cs, dst->buf,
-   RADEON_USAGE_READWRITE)) ||
+ws->cs_is_buffer_referenced(ctx->dma_cs, dst->buf,
+RADEON_USAGE_READWRITE)) ||
(src &&
-ctx->ws->cs_is_buffer_referenced(ctx->dma_cs, src->buf,
-   RADEON_USAGE_WRITE)))
+ws->cs_is_buffer_referenced(ctx->dma_cs, src->buf,
+RADEON_USAGE_WRITE)))
si_dma_emit_wait_idle(ctx);
  
  	if (dst) {

radeon_add_to_buffer_list(ctx, ctx->dma_cs, dst,
  RADEON_USAGE_WRITE, 0);
}
if (src) {
radeon_add_to_buffer_list(ctx, ctx->dma_cs, src,
  RADEON_USAGE_READ, 0);
}



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] amd/surface: provide firstMipIdInTail for metadata surface calculations

2019-02-05 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This field was added in a recent addrlib update, and while there
currently seems to be no issue with skipping it, we will have to
set it correctly in the future.
---
 src/amd/common/ac_surface.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c
index 91004e032a3..27e63c318e6 100644
--- a/src/amd/common/ac_surface.c
+++ b/src/amd/common/ac_surface.c
@@ -1135,20 +1135,21 @@ static int gfx9_compute_miptree(ADDR_HANDLE addrlib,
hout.size = sizeof(ADDR2_COMPUTE_HTILE_INFO_OUTPUT);
 
hin.hTileFlags.pipeAligned = !in->flags.metaPipeUnaligned;
hin.hTileFlags.rbAligned = !in->flags.metaRbUnaligned;
hin.depthFlags = in->flags;
hin.swizzleMode = in->swizzleMode;
hin.unalignedWidth = in->width;
hin.unalignedHeight = in->height;
hin.numSlices = in->numSlices;
hin.numMipLevels = in->numMipLevels;
+   hin.firstMipIdInTail = out.firstMipIdInTail;
 
ret = Addr2ComputeHtileInfo(addrlib, , );
if (ret != ADDR_OK)
return ret;
 
surf->u.gfx9.htile.rb_aligned = hin.hTileFlags.rbAligned;
surf->u.gfx9.htile.pipe_aligned = hin.hTileFlags.pipeAligned;
surf->htile_size = hout.htileBytes;
surf->htile_slice_size = hout.sliceSize;
surf->htile_alignment = hout.baseAlign;
@@ -1201,20 +1202,21 @@ static int gfx9_compute_miptree(ADDR_HANDLE addrlib,
din.colorFlags = in->flags;
din.resourceType = in->resourceType;
din.swizzleMode = in->swizzleMode;
din.bpp = in->bpp;
din.unalignedWidth = in->width;
din.unalignedHeight = in->height;
din.numSlices = in->numSlices;
din.numFrags = in->numFrags;
din.numMipLevels = in->numMipLevels;
din.dataSurfaceSize = out.surfSize;
+   din.firstMipIdInTail = out.firstMipIdInTail;
 
ret = Addr2ComputeDccInfo(addrlib, , );
if (ret != ADDR_OK)
return ret;
 
surf->u.gfx9.dcc.rb_aligned = din.dccKeyFlags.rbAligned;
surf->u.gfx9.dcc.pipe_aligned = 
din.dccKeyFlags.pipeAligned;
surf->u.gfx9.dcc_pitch_max = dout.pitch - 1;
surf->dcc_size = dout.dccRamSize;
surf->dcc_alignment = dout.dccRamBaseAlign;
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] meson: link LLVM 'native' component when LLVM is available

2019-02-04 Thread Nicolai Hähnle
From: Nicolai Hähnle 

It is required for the draw module, and makes a difference when
linking statically or against LLVM built with BUILD_SHARED_LIBS=ON.
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index bfff862c3c8..e955bdedcc6 100644
--- a/meson.build
+++ b/meson.build
@@ -1164,21 +1164,21 @@ dep_libdrm = dependency(
   'libdrm', version : '>=' + _drm_ver,
   required : with_dri2 or with_dri3
 )
 if dep_libdrm.found()
   pre_args += '-DHAVE_LIBDRM'
   if with_dri_platform == 'drm' and with_dri
 with_gallium_drisw_kms = true
   endif
 endif
 
-llvm_modules = ['bitwriter', 'engine', 'mcdisassembler', 'mcjit']
+llvm_modules = ['bitwriter', 'engine', 'mcdisassembler', 'mcjit', 'native']
 llvm_optional_modules = []
 if with_amd_vk or with_gallium_radeonsi or with_gallium_r600
   llvm_modules += ['amdgpu', 'native', 'bitreader', 'ipo']
   if with_gallium_r600
 llvm_modules += 'asmparser'
   endif
 endif
 if with_gallium_opencl
   llvm_modules += [
 'all-targets', 'linker', 'coverage', 'instrumentation', 'ipo', 'irreader',
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 23/25] radeonsi: factor si_query_buffer logic out of si_query_hw

2019-02-04 Thread Nicolai Hähnle

On 01.02.19 05:25, Timothy Arceri wrote:

On 26/1/19 11:56 am, Marek Olšák wrote:

Timothy, can you please test the attached fix?


I'm having trouble compiling 32bit mesa on my machine at the moment so 
haven't been able to test Batman. But this commit also causes No Mans 
Sky to lock up my machine and the attached patch does not fix it.


Is there a trace or something else to easily reproduce it?

Cheers,
Nicolai







Thanks,
Marek

On Wed, Jan 2, 2019 at 10:58 PM Timothy Arceri <mailto:tarc...@itsqueeze.com>> wrote:


    This commit seems to cause bad stuttering in the Batman Arkham City
    benchmark.

    On 7/12/18 1:00 am, Nicolai Hähnle wrote:
 > From: Nicolai Hähnle mailto:nicolai.haeh...@amd.com>>
 >
 > This is a move towards using composition instead of inheritance 
for

 > different query types.
 >
 > This change weakens out-of-memory error reporting somewhat,
    though this
 > should be acceptable since we didn't consistently report such
    errors in
 > the first place.
 > ---
 >   src/gallium/drivers/radeonsi/si_perfcounter.c |   8 +-
 >   src/gallium/drivers/radeonsi/si_query.c       | 177
    +-
 >   src/gallium/drivers/radeonsi/si_query.h       |  17 +-
 >   src/gallium/drivers/radeonsi/si_texture.c     |   7 +-
 >   4 files changed, 99 insertions(+), 110 deletions(-)
 >
 > diff --git a/src/gallium/drivers/radeonsi/si_perfcounter.c
    b/src/gallium/drivers/radeonsi/si_perfcounter.c
 > index 0b3d8f89273..f0d10c054c4 100644
 > --- a/src/gallium/drivers/radeonsi/si_perfcounter.c
 > +++ b/src/gallium/drivers/radeonsi/si_perfcounter.c
 > @@ -761,23 +761,22 @@ static void si_pc_query_destroy(struct
    si_screen *sscreen,
 >               struct si_query_group *group = query->groups;
 >               query->groups = group->next;
 >               FREE(group);
 >       }
 >
 >       FREE(query->counters);
 >
 >       si_query_hw_destroy(sscreen, rquery);
 >   }
 >
 > -static bool si_pc_query_prepare_buffer(struct si_screen *screen,
 > -                                    struct si_query_hw *hwquery,
 > -                                    struct r600_resource *buffer)
 > +static bool si_pc_query_prepare_buffer(struct si_context *ctx,
 > +                                    struct si_query_buffer *qbuf)
 >   {
 >       /* no-op */
 >       return true;
 >   }
 >
 >   static void si_pc_query_emit_start(struct si_context *sctx,
 >                                  struct si_query_hw *hwquery,
 >                                  struct r600_resource *buffer,
    uint64_t va)
 >   {
 >       struct si_query_pc *query = (struct si_query_pc *)hwquery;
 > @@ -1055,23 +1054,20 @@ struct pipe_query
    *si_create_batch_query(struct pipe_context *ctx,
 >               counter->base = group->result_base + j;
 >               counter->stride = group->num_counters;
 >
 >               counter->qwords = 1;
 >               if ((block->b->b->flags & SI_PC_BLOCK_SE) &&
    group->se < 0)
 >                       counter->qwords = screen->info.max_se;
 >               if (group->instance < 0)
 >                       counter->qwords *= block->num_instances;
 >       }
 >
 > -     if (!si_query_hw_init(screen, >b))
 > -             goto error;
 > -
 >       return (struct pipe_query *)query;
 >
 >   error:
 >       si_pc_query_destroy(screen, >b.b);
 >       return NULL;
 >   }
 >
 >   static bool si_init_block_names(struct si_screen *screen,
 >                               struct si_pc_block *block)
 >   {
 > diff --git a/src/gallium/drivers/radeonsi/si_query.c
    b/src/gallium/drivers/radeonsi/si_query.c
 > index 479a1bbf2c4..5b0fba0ed92 100644
 > --- a/src/gallium/drivers/radeonsi/si_query.c
 > +++ b/src/gallium/drivers/radeonsi/si_query.c
 > @@ -514,86 +514,129 @@ static struct pipe_query
    *si_query_sw_create(unsigned query_type)
 >       query = CALLOC_STRUCT(si_query_sw);
 >       if (!query)
 >               return NULL;
 >
 >       query->b.type = query_type;
 >       query->b.ops = _query_ops;
 >
 >       return (struct pipe_query *)query;
 >   }
 >
 > -void si_query_hw_destroy(struct si_screen *sscreen,
 > -                      struct si_query *rquery)
 > +void si_query_buffer_destroy(struct si_screen *sscreen, struct
    si_query_buffer *buffer)
 >   {
 > -     str

Re: [Mesa-dev] [PATCH 23/25] radeonsi: factor si_query_buffer logic out of si_query_hw

2019-02-04 Thread Nicolai Hähnle

Patch looks good to me, for what it's worth.

si_query_buffer_alloc could be restructured to be slightly cleaner by 
unifying the two calls to prepare_buffer, but it's not a huge deal.


Cheers,
Nicolai

On 26.01.19 01:56, Marek Olšák wrote:

Timothy, can you please test the attached fix?

Thanks,
Marek

On Wed, Jan 2, 2019 at 10:58 PM Timothy Arceri <mailto:tarc...@itsqueeze.com>> wrote:


This commit seems to cause bad stuttering in the Batman Arkham City
benchmark.

On 7/12/18 1:00 am, Nicolai Hähnle wrote:
 > From: Nicolai Hähnle mailto:nicolai.haeh...@amd.com>>
 >
 > This is a move towards using composition instead of inheritance for
 > different query types.
 >
 > This change weakens out-of-memory error reporting somewhat,
though this
 > should be acceptable since we didn't consistently report such
errors in
 > the first place.
 > ---
 >   src/gallium/drivers/radeonsi/si_perfcounter.c |   8 +-
 >   src/gallium/drivers/radeonsi/si_query.c       | 177
+-
 >   src/gallium/drivers/radeonsi/si_query.h       |  17 +-
 >   src/gallium/drivers/radeonsi/si_texture.c     |   7 +-
 >   4 files changed, 99 insertions(+), 110 deletions(-)
 >
 > diff --git a/src/gallium/drivers/radeonsi/si_perfcounter.c
b/src/gallium/drivers/radeonsi/si_perfcounter.c
 > index 0b3d8f89273..f0d10c054c4 100644
 > --- a/src/gallium/drivers/radeonsi/si_perfcounter.c
 > +++ b/src/gallium/drivers/radeonsi/si_perfcounter.c
 > @@ -761,23 +761,22 @@ static void si_pc_query_destroy(struct
si_screen *sscreen,
 >               struct si_query_group *group = query->groups;
 >               query->groups = group->next;
 >               FREE(group);
 >       }
 >
 >       FREE(query->counters);
 >
 >       si_query_hw_destroy(sscreen, rquery);
 >   }
 >
 > -static bool si_pc_query_prepare_buffer(struct si_screen *screen,
 > -                                    struct si_query_hw *hwquery,
 > -                                    struct r600_resource *buffer)
 > +static bool si_pc_query_prepare_buffer(struct si_context *ctx,
 > +                                    struct si_query_buffer *qbuf)
 >   {
 >       /* no-op */
 >       return true;
 >   }
 >
 >   static void si_pc_query_emit_start(struct si_context *sctx,
 >                                  struct si_query_hw *hwquery,
 >                                  struct r600_resource *buffer,
uint64_t va)
 >   {
 >       struct si_query_pc *query = (struct si_query_pc *)hwquery;
 > @@ -1055,23 +1054,20 @@ struct pipe_query
*si_create_batch_query(struct pipe_context *ctx,
 >               counter->base = group->result_base + j;
 >               counter->stride = group->num_counters;
 >
 >               counter->qwords = 1;
 >               if ((block->b->b->flags & SI_PC_BLOCK_SE) &&
group->se < 0)
 >                       counter->qwords = screen->info.max_se;
 >               if (group->instance < 0)
 >                       counter->qwords *= block->num_instances;
 >       }
 >
 > -     if (!si_query_hw_init(screen, >b))
 > -             goto error;
 > -
 >       return (struct pipe_query *)query;
 >
 >   error:
 >       si_pc_query_destroy(screen, >b.b);
 >       return NULL;
 >   }
 >
 >   static bool si_init_block_names(struct si_screen *screen,
 >                               struct si_pc_block *block)
 >   {
 > diff --git a/src/gallium/drivers/radeonsi/si_query.c
b/src/gallium/drivers/radeonsi/si_query.c
 > index 479a1bbf2c4..5b0fba0ed92 100644
 > --- a/src/gallium/drivers/radeonsi/si_query.c
 > +++ b/src/gallium/drivers/radeonsi/si_query.c
 > @@ -514,86 +514,129 @@ static struct pipe_query
*si_query_sw_create(unsigned query_type)
 >       query = CALLOC_STRUCT(si_query_sw);
 >       if (!query)
 >               return NULL;
 >
 >       query->b.type = query_type;
 >       query->b.ops = _query_ops;
 >
 >       return (struct pipe_query *)query;
 >   }
 >
 > -void si_query_hw_destroy(struct si_screen *sscreen,
 > -                      struct si_query *rquery)
 > +void si_query_buffer_destroy(struct si_screen *sscreen, struct
si_query_buffer *buffer)
 >   {
 > -     struct si_query_hw *query = (struct si_query_hw *)rquery;
 > -     struct si_query_buf

[Mesa-dev] [PATCH 2/2] amd/common/vi+: enable SMEM loads with GLC=1

2019-01-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Only on LLVM 8.0+, which supports the new intrinsic.
---
 src/amd/common/ac_llvm_build.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 4d7f15901e3..6aa96ee86d4 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -1223,36 +1223,40 @@ ac_build_buffer_load(struct ac_llvm_context *ctx,
 unsigned slc,
 bool can_speculate,
 bool allow_smem)
 {
LLVMValueRef offset = LLVMConstInt(ctx->i32, inst_offset, 0);
if (voffset)
offset = LLVMBuildAdd(ctx->builder, offset, voffset, "");
if (soffset)
offset = LLVMBuildAdd(ctx->builder, offset, soffset, "");
 
-   /* TODO: VI and later generations can use SMEM with GLC=1.*/
-   if (allow_smem && !glc && !slc) {
+   if (allow_smem && !slc &&
+   (!glc || (HAVE_LLVM >= 0x0800 && ctx->chip_class >= VI))) {
assert(vindex == NULL);
 
LLVMValueRef result[8];
 
for (int i = 0; i < num_channels; i++) {
if (i) {
offset = LLVMBuildAdd(ctx->builder, offset,
  LLVMConstInt(ctx->i32, 4, 
0), "");
}
const char *intrname =
HAVE_LLVM >= 0x0800 ? 
"llvm.amdgcn.s.buffer.load.f32"
: "llvm.SI.load.const";
unsigned num_args = HAVE_LLVM >= 0x0800 ? 3 : 2;
-   LLVMValueRef args[3] = {rsrc, offset, ctx->i32_0};
+   LLVMValueRef args[3] = {
+   rsrc,
+   offset,
+   glc ? ctx->i32_1 : ctx->i32_0,
+   };
result[i] = ac_build_intrinsic(ctx, intrname,
   ctx->f32, args, num_args,
   AC_FUNC_ATTR_READNONE |
   (HAVE_LLVM < 0x0800 ? 
AC_FUNC_ATTR_LEGACY : 0));
}
if (num_channels == 1)
return result[0];
 
if (num_channels == 3)
result[num_channels++] = LLVMGetUndef(ctx->f32);
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] amd/common: use llvm.amdgcn.s.buffer.load for LLVM 8.0

2019-01-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

llvm.SI.load.const is deprecated.
---
 src/amd/common/ac_llvm_build.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 76047148a6a..4d7f15901e3 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -1234,25 +1234,29 @@ ac_build_buffer_load(struct ac_llvm_context *ctx,
if (allow_smem && !glc && !slc) {
assert(vindex == NULL);
 
LLVMValueRef result[8];
 
for (int i = 0; i < num_channels; i++) {
if (i) {
offset = LLVMBuildAdd(ctx->builder, offset,
  LLVMConstInt(ctx->i32, 4, 
0), "");
}
-   LLVMValueRef args[2] = {rsrc, offset};
-   result[i] = ac_build_intrinsic(ctx, 
"llvm.SI.load.const.v4i32",
-  ctx->f32, args, 2,
+   const char *intrname =
+   HAVE_LLVM >= 0x0800 ? 
"llvm.amdgcn.s.buffer.load.f32"
+   : "llvm.SI.load.const";
+   unsigned num_args = HAVE_LLVM >= 0x0800 ? 3 : 2;
+   LLVMValueRef args[3] = {rsrc, offset, ctx->i32_0};
+   result[i] = ac_build_intrinsic(ctx, intrname,
+  ctx->f32, args, num_args,
   AC_FUNC_ATTR_READNONE |
-  AC_FUNC_ATTR_LEGACY);
+  (HAVE_LLVM < 0x0800 ? 
AC_FUNC_ATTR_LEGACY : 0));
}
if (num_channels == 1)
return result[0];
 
if (num_channels == 3)
result[num_channels++] = LLVMGetUndef(ctx->f32);
return ac_build_gather_values(ctx, result, num_channels);
}
 
return ac_build_buffer_load_common(ctx, rsrc, vindex, offset,
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] winsys/amdgpu: Pull in LLVM CFLAGS

2018-12-19 Thread Nicolai Hähnle

On 19.12.18 16:05, Michel Dänzer wrote:

From: Michel Dänzer 

Fixes build failure if the LLVM headers aren't in a standard include
directory.


Huh, interesting that I didn't run into this. Anyway:

Reviewed-by: Nicolai Hähnle 





Fixes: ec22dd34c88f "radeonsi: move SI_FORCE_FAMILY functionality to
  winsys"
Signed-off-by: Michel Dänzer 
---
  src/gallium/winsys/amdgpu/drm/Makefile.am | 1 +
  src/gallium/winsys/amdgpu/drm/meson.build | 2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/winsys/amdgpu/drm/Makefile.am 
b/src/gallium/winsys/amdgpu/drm/Makefile.am
index e35fa2cd0a2..1c2ec010fc6 100644
--- a/src/gallium/winsys/amdgpu/drm/Makefile.am
+++ b/src/gallium/winsys/amdgpu/drm/Makefile.am
@@ -4,6 +4,7 @@ include $(top_srcdir)/src/gallium/Automake.inc
  AM_CFLAGS = \
$(GALLIUM_WINSYS_CFLAGS) \
$(AMDGPU_CFLAGS) \
+   $(LLVM_CFLAGS) \
-I$(top_srcdir)/src/amd/
  
  AM_CXXFLAGS = $(AM_CFLAGS)

diff --git a/src/gallium/winsys/amdgpu/drm/meson.build 
b/src/gallium/winsys/amdgpu/drm/meson.build
index 8b6f69b2bdd..d3282ef412d 100644
--- a/src/gallium/winsys/amdgpu/drm/meson.build
+++ b/src/gallium/winsys/amdgpu/drm/meson.build
@@ -31,5 +31,5 @@ libamdgpuwinsys = static_library(
c_args : [c_vis_args],
cpp_args : [cpp_vis_args],
link_with : libamdgpu_addrlib,
-  dependencies : dep_libdrm_amdgpu,
+  dependencies : [dep_llvm, dep_libdrm_amdgpu],
  )



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] last call for autotools

2018-12-18 Thread Nicolai Hähnle

On 17.12.18 23:46, Dylan Baker wrote:

Quoting Marek Olšák (2018-12-17 12:25:29)

On Mon, Dec 17, 2018 at 1:18 PM Eric Anholt  wrote:

 Eero Tamminen  writes:

 > Hi,
 >
 > On 17.12.2018 8.08, Marek Olšák wrote:
 > [...]
 >> I think one of the serious usability issues is that environment
 >> variables such as CFLAGS, CXXFLAGS, LDFLAGS, and PKG_CONFIG_PATH are not
 >> saved by meson for future reconfigures.
 >
 > I don't know what Meson is supposed to do, but to me that would be
 > a bug in a build tool.
 >
 > Re-configure is supposed to adapt SW to the changes in the build
 > environment, and environment variables are part of that (along with
 > command line options and SW installed to to the system).  Build
 > configure tool deciding to "remember" some of those things instead
 > of checking the new situation, seems like a great opportunity for
 > confusion.

 A user-triggered reconfigure, sure.  Recapture env vars then.  But "git
 pull; ninja -C build" losing track of the configuration state is broken.
 We don't have to specify all of your meson -Doption=state configuration
 on every build, why should you need to specify your PKG_CONFIG_PATH
 configure options on every build?


Thanks, Eric.

Yes, meson behaves such that users have to set all environment variables for
every "ninja" command that might reconfigure.

I see 2 solutions:
1) meson needs to remember the relevant env vars
2) meson should FAIL to configure if any of the env vars are set (if it wants
to ignore them)

Marek


Meson does remember the *_FLAGS variables. Those are translated on configure
into meson's internal ${lang}_args and ${lang}_link args. It does look like
those aren't remembered when --wipe is called though, I filed a bug for that:
https://github.com/mesonbuild/meson/issues/4650


I ran into this same problem and noticed that Meson is already able to 
*warn* about such changes.


It should either ignore the changes, or better yet, fail.

(Or even better: ignore environment variables entirely; IMO sourcing the 
environment implicitly in a build system with an explicit configure is 
just a broken design that was unfortunately inherited from plain make 
without really considering the UI implications.)


Cheers,
Nicolai




Dylan


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] amd/surface: fix setting of ADDR2_SURFACE_FLAGS::color

2018-12-18 Thread Nicolai Hähnle
From: Nicolai Hähnle 

In the gfx9 addrlib, this bit has been clarified as meaning that
the surface can be used as a color buffer (render target).

Setting this for compressed surfaces triggers a workaround that
is only required for surfaces that can be render targets, and ends
up breaking the 16-byte-per-block case.

Fixes 
dEQP-VK.pipeline.image.suballocation.sampling_type.combined.view_type.3d.format.etc2_r8g8b8a8_srgb_block.count_1.size.11x11x11
 and others

Note that there are other related bits which we don't set as intended
by the interface, notably the 'unordered' bit, which is meant to
indicate use as a shader image. It may be worth cleaning that up at some
point after proper testing.

Reported-by: Samuel Pitoiset 
Fixes: 776b9113656 ("amd/addrlib: update Mesa's copy of addrlib")
---
 src/amd/common/ac_surface.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c
index d8d927ee1c5..d647bd523f9 100644
--- a/src/amd/common/ac_surface.c
+++ b/src/amd/common/ac_surface.c
@@ -1405,25 +1405,24 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
case 16:
assert(!(surf->flags & RADEON_SURF_Z_OR_SBUFFER));
AddrSurfInfoIn.format = ADDR_FMT_32_32_32_32;
break;
default:
assert(0);
}
AddrSurfInfoIn.bpp = surf->bpe * 8;
}
 
-   AddrSurfInfoIn.flags.color = !(surf->flags & RADEON_SURF_Z_OR_SBUFFER);
+   AddrSurfInfoIn.flags.color = !compressed && !(surf->flags & 
RADEON_SURF_Z_OR_SBUFFER);
AddrSurfInfoIn.flags.depth = (surf->flags & RADEON_SURF_ZBUFFER) != 0;
AddrSurfInfoIn.flags.display = get_display_flag(config, surf);
-   /* flags.texture currently refers to TC-compatible HTILE */
-   AddrSurfInfoIn.flags.texture = AddrSurfInfoIn.flags.color ||
+   AddrSurfInfoIn.flags.texture = AddrSurfInfoIn.flags.color || compressed 
||
   surf->flags & 
RADEON_SURF_TC_COMPATIBLE_HTILE;
AddrSurfInfoIn.flags.opt4space = 1;
 
AddrSurfInfoIn.numMipLevels = config->info.levels;
AddrSurfInfoIn.numSamples = MAX2(1, config->info.samples);
AddrSurfInfoIn.numFrags = AddrSurfInfoIn.numSamples;
 
if (!(surf->flags & RADEON_SURF_Z_OR_SBUFFER))
AddrSurfInfoIn.numFrags = MAX2(1, config->info.storage_samples);
 
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] docs: Document GitLab merge request process (email alternative)

2018-12-06 Thread Nicolai Hähnle

On 06.12.18 00:32, Jordan Justen wrote:

This documents a process for using GitLab Merge Requests as an second
way to submit code changes for Mesa. Only one of the two methods is
allowed for each patch series.

We will *not* require all patches to be emailed. Some code changes may
be reviewed and merged without any discussion on the mesa-dev email
list.

v2:
  * No longer require email. Allow submitter to choose email or a
GitLab merge request.
  * Various feedback from Brian, Daniel, Dylan, Eric, Erik, Jason,
Matt, Michel and Rob.

Signed-off-by: Jordan Justen 
---
  docs/submittingpatches.html | 76 ++---
  1 file changed, 71 insertions(+), 5 deletions(-)

diff --git a/docs/submittingpatches.html b/docs/submittingpatches.html
index 92d954a2d09..21175988d0b 100644
--- a/docs/submittingpatches.html
+++ b/docs/submittingpatches.html
@@ -21,7 +21,7 @@
  Basic guidelines
  Patch formatting
  Testing Patches
-Mailing Patches
+Submitting Patches
  Reviewing Patches
  Nominating a commit for a stable branch
  Criteria for accepting patches to the stable 
branch
@@ -42,8 +42,10 @@ components.
  git bisect.)
  Patches should be properly formatted.
  Patches should be sufficiently tested before 
submitting.
-Patches should be submitted to mesa-dev
-for review using git send-email.
+Patches should be submitted
+to mesa-dev or with
+a merge request
+for review.
  
  
  
@@ -180,10 +182,19 @@ run.

  
  
  
-Mailing Patches

+Submitting Patches
  
  

-Patches should be sent to the mesa-dev mailing list for review:
+Patches may be submitted to the Mesa project by
+email or with a
+GitLab merge request. To prevent
+duplicate code review, only use one method to submit your changes.
+
+
+Mailing Patches
+
+
+Patches may be sent to the mesa-dev mailing list for review:
  https://lists.freedesktop.org/mailman/listinfo/mesa-dev;>
  mesa-dev@lists.freedesktop.org.
  When submitting a patch make sure to use
@@ -217,8 +228,63 @@ disabled before sending your patches. (Note that you may 
need to contact
  your email administrator for this.)
  
  
+GitLab Merge Requests

+
+
+  https://gitlab.freedesktop.org/mesa/mesa;>GitLab Merge
+  Requests (MR) can also be used to submit patches for Mesa.
+
+
+
+  If the MR may have interest for most of the Mesa community, you can
+  send an email to the mesa-dev email list including a link to the MR.
+  Don't send the patch to mesa-dev, just the MR link.
+
+
+  Add labels to your MR to help reviewers find it. For example:
+  
+Mesa changes affecting all drivers: mesa
+Hardware vendor specific code: amd, intel, nvidia, ...
+Driver specific code: anvil, freedreno, i965, iris, radeonsi,
+  radv, vc4, ...
+Other tag examples: gallium, util
+  
+
+
+  If you revise your patches based on code review and push an update
+  to your branch, you should maintain a clean history
+  in your patches. There should not be "fixup" patches in the history.
+  The series should be buildable and functional after every commit
+  whenever you push the branch.
+
+
+  It is your responsibility to keep the MR alive and making progress,
+  as there are no guarantees that a Mesa dev will independently take
+  interest in it.
+
+
+  Some other notes:
+  
+Make changes and update your branch based on feedback
+Old, stale MR may be closed, but you can reopen it if you
+  still want to pursue the changes
+You should periodically check to see if your MR needs to be
+  rebased
+Make sure your MR is closed if your patches get pushed outside
+  of GitLab
+  
+
+
  Reviewing Patches
  
+

+  To participate in code review, you should monitor the
+  https://lists.freedesktop.org/mailman/listinfo/mesa-dev;>
+  mesa-dev email list and the GitLab
+  Mesa https://gitlab.freedesktop.org/mesa/mesa/merge_requests;>Merge
+  Requests page.


This link is broken.

What's the best way to get a feel for how the review process would work 
in practice?


Cheers,
Nicolai




+
+
  
  When you've reviewed a patch on the mailing list, please be unambiguous
  about your review.  That is, state either



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 22/25] radeonsi: move query suspend logic into the top-level si_query struct

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeonsi/si_perfcounter.c | 13 ++--
 src/gallium/drivers/radeonsi/si_query.c   | 75 ++-
 src/gallium/drivers/radeonsi/si_query.h   | 18 +++--
 3 files changed, 62 insertions(+), 44 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_perfcounter.c 
b/src/gallium/drivers/radeonsi/si_perfcounter.c
index 69e149c76b6..0b3d8f89273 100644
--- a/src/gallium/drivers/radeonsi/si_perfcounter.c
+++ b/src/gallium/drivers/radeonsi/si_perfcounter.c
@@ -861,21 +861,24 @@ static void si_pc_query_add_result(struct si_screen 
*screen,
uint32_t value = results[counter->base + j * 
counter->stride];
result->batch[i].u64 += value;
}
}
 }
 
 static struct si_query_ops batch_query_ops = {
.destroy = si_pc_query_destroy,
.begin = si_query_hw_begin,
.end = si_query_hw_end,
-   .get_result = si_query_hw_get_result
+   .get_result = si_query_hw_get_result,
+
+   .suspend = si_query_hw_suspend,
+   .resume = si_query_hw_resume,
 };
 
 static struct si_query_hw_ops batch_query_hw_ops = {
.prepare_buffer = si_pc_query_prepare_buffer,
.emit_start = si_pc_query_emit_start,
.emit_stop = si_pc_query_emit_stop,
.clear_result = si_pc_query_clear_result,
.add_result = si_pc_query_add_result,
 };
 
@@ -994,41 +997,41 @@ struct pipe_query *si_create_batch_query(struct 
pipe_context *ctx,
fprintf(stderr,
"perfcounter group %s: too many selected\n",
block->b->b->name);
goto error;
}
group->selectors[group->num_counters] = sub_index;
++group->num_counters;
}
 
/* Compute result bases and CS size per group */
-   query->b.num_cs_dw_end = pc->num_stop_cs_dwords;
-   query->b.num_cs_dw_end += pc->num_instance_cs_dwords;
+   query->b.b.num_cs_dw_suspend = pc->num_stop_cs_dwords;
+   query->b.b.num_cs_dw_suspend += pc->num_instance_cs_dwords;
 
i = 0;
for (group = query->groups; group; group = group->next) {
struct si_pc_block *block = group->block;
unsigned read_dw;
unsigned instances = 1;
 
if ((block->b->b->flags & SI_PC_BLOCK_SE) && group->se < 0)
instances = screen->info.max_se;
if (group->instance < 0)
instances *= block->num_instances;
 
group->result_base = i;
query->b.result_size += sizeof(uint64_t) * instances * 
group->num_counters;
i += instances * group->num_counters;
 
read_dw = 6 * group->num_counters;
-   query->b.num_cs_dw_end += instances * read_dw;
-   query->b.num_cs_dw_end += instances * 
pc->num_instance_cs_dwords;
+   query->b.b.num_cs_dw_suspend += instances * read_dw;
+   query->b.b.num_cs_dw_suspend += instances * 
pc->num_instance_cs_dwords;
}
 
if (query->shaders) {
if (query->shaders == SI_PC_SHADERS_WINDOWING)
query->shaders = 0x;
}
 
/* Map user-supplied query array to result indices */
query->counters = CALLOC(num_queries, sizeof(*query->counters));
for (i = 0; i < num_queries; ++i) {
diff --git a/src/gallium/drivers/radeonsi/si_query.c 
b/src/gallium/drivers/radeonsi/si_query.c
index aed3e1e80c1..479a1bbf2c4 100644
--- a/src/gallium/drivers/radeonsi/si_query.c
+++ b/src/gallium/drivers/radeonsi/si_query.c
@@ -27,20 +27,22 @@
 #include "si_pipe.h"
 #include "si_query.h"
 #include "util/u_memory.h"
 #include "util/u_upload_mgr.h"
 #include "util/os_time.h"
 #include "util/u_suballoc.h"
 #include "amd/common/sid.h"
 
 #define SI_MAX_STREAMS 4
 
+static struct si_query_ops query_hw_ops;
+
 struct si_hw_query_params {
unsigned start_offset;
unsigned end_offset;
unsigned fence_offset;
unsigned pair_stride;
unsigned pair_count;
 };
 
 /* Queries without buffer handling or suspend/resume. */
 struct si_query_sw {
@@ -600,28 +602,20 @@ static bool si_query_hw_prepare_buffer(struct si_screen 
*sscreen,
 }
 
 static void si_query_hw_get_result_resource(struct si_context *sctx,
struct si_query *rquery,
bool wait,
enum pipe_query_value_type 
result_type,
int index,
struct pipe_resou

[Mesa-dev] [PATCH 24/25] radeonsi: split perfcounter queries from si_query_hw

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Remove a level of indirection to make the code more explicit -- should
make it easier to follow what's going on.
---
 src/gallium/drivers/radeonsi/si_perfcounter.c | 143 --
 1 file changed, 93 insertions(+), 50 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_perfcounter.c 
b/src/gallium/drivers/radeonsi/si_perfcounter.c
index f0d10c054c4..65197c0daa4 100644
--- a/src/gallium/drivers/radeonsi/si_perfcounter.c
+++ b/src/gallium/drivers/radeonsi/si_perfcounter.c
@@ -139,21 +139,25 @@ struct si_query_group {
unsigned selectors[SI_QUERY_MAX_COUNTERS];
 };
 
 struct si_query_counter {
unsigned base;
unsigned qwords;
unsigned stride; /* in uint64s */
 };
 
 struct si_query_pc {
-   struct si_query_hw b;
+   struct si_query b;
+   struct si_query_buffer buffer;
+
+   /* Size of the results in memory, in bytes. */
+   unsigned result_size;
 
unsigned shaders;
unsigned num_counters;
struct si_query_counter *counters;
struct si_query_group *groups;
 };
 
 
 static struct si_pc_block_base cik_CB = {
.name = "CB",
@@ -758,70 +762,72 @@ static void si_pc_query_destroy(struct si_screen *sscreen,
struct si_query_pc *query = (struct si_query_pc *)rquery;
 
while (query->groups) {
struct si_query_group *group = query->groups;
query->groups = group->next;
FREE(group);
}
 
FREE(query->counters);
 
-   si_query_hw_destroy(sscreen, rquery);
-}
-
-static bool si_pc_query_prepare_buffer(struct si_context *ctx,
-  struct si_query_buffer *qbuf)
-{
-   /* no-op */
-   return true;
+   si_query_buffer_destroy(sscreen, >buffer);
+   FREE(query);
 }
 
-static void si_pc_query_emit_start(struct si_context *sctx,
+static void si_pc_query_resume(struct si_context *sctx, struct si_query 
*rquery)
+/*
   struct si_query_hw *hwquery,
-  struct r600_resource *buffer, uint64_t va)
+  struct r600_resource *buffer, uint64_t va)*/
 {
-   struct si_query_pc *query = (struct si_query_pc *)hwquery;
-   struct si_query_group *group;
+   struct si_query_pc *query = (struct si_query_pc *)rquery;
int current_se = -1;
int current_instance = -1;
 
+   if (!si_query_buffer_alloc(sctx, >buffer, NULL, 
query->result_size))
+   return;
+   si_need_gfx_cs_space(sctx);
+
if (query->shaders)
si_pc_emit_shaders(sctx, query->shaders);
 
-   for (group = query->groups; group; group = group->next) {
+   for (struct si_query_group *group = query->groups; group; group = 
group->next) {
struct si_pc_block *block = group->block;
 
if (group->se != current_se || group->instance != 
current_instance) {
current_se = group->se;
current_instance = group->instance;
si_pc_emit_instance(sctx, group->se, group->instance);
}
 
si_pc_emit_select(sctx, block, group->num_counters, 
group->selectors);
}
 
if (current_se != -1 || current_instance != -1)
si_pc_emit_instance(sctx, -1, -1);
 
-   si_pc_emit_start(sctx, buffer, va);
+   uint64_t va = query->buffer.buf->gpu_address + 
query->buffer.results_end;
+   si_pc_emit_start(sctx, query->buffer.buf, va);
 }
 
-static void si_pc_query_emit_stop(struct si_context *sctx,
- struct si_query_hw *hwquery,
- struct r600_resource *buffer, uint64_t va)
+static void si_pc_query_suspend(struct si_context *sctx, struct si_query 
*rquery)
 {
-   struct si_query_pc *query = (struct si_query_pc *)hwquery;
-   struct si_query_group *group;
+   struct si_query_pc *query = (struct si_query_pc *)rquery;
 
-   si_pc_emit_stop(sctx, buffer, va);
+   if (!query->buffer.buf)
+   return;
 
-   for (group = query->groups; group; group = group->next) {
+   uint64_t va = query->buffer.buf->gpu_address + 
query->buffer.results_end;
+   query->buffer.results_end += query->result_size;
+
+   si_pc_emit_stop(sctx, query->buffer.buf, va);
+
+   for (struct si_query_group *group = query->groups; group; group = 
group->next) {
struct si_pc_block *block = group->block;
unsigned se = group->se >= 0 ? group->se : 0;
unsigned se_end = se + 1;
 
if ((block->b->b->flags & SI_PC_BLOCK_SE) && (group->se < 0))
se_end = sctx->screen->info.max_se;
 
do {
   

[Mesa-dev] [PATCH 20/25] radeonsi: track constant buffer bind history in si_pipe_set_constant_buffer

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Other callers of si_set_constant_buffer don't need it.
---
 src/gallium/drivers/radeonsi/si_descriptors.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 69a8c3d..81f21f2cfc1 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -1230,22 +1230,20 @@ static void si_set_constant_buffer(struct si_context 
*sctx,
   input->buffer_size, 
_offset);
if (!buffer) {
/* Just unbind on failure. */
si_set_constant_buffer(sctx, buffers, 
descriptors_idx, slot, NULL);
return;
}
va = r600_resource(buffer)->gpu_address + buffer_offset;
} else {
pipe_resource_reference(, input->buffer);
va = r600_resource(buffer)->gpu_address + 
input->buffer_offset;
-   /* Only track usage for non-user buffers. */
-   r600_resource(buffer)->bind_history |= 
PIPE_BIND_CONSTANT_BUFFER;
}
 
/* Set the descriptor. */
uint32_t *desc = descs->list + slot*4;
desc[0] = va;
desc[1] = S_008F04_BASE_ADDRESS_HI(va >> 32) |
  S_008F04_STRIDE(0);
desc[2] = input->buffer_size;
desc[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
  S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
@@ -1277,20 +1275,23 @@ static void si_pipe_set_constant_buffer(struct 
pipe_context *ctx,
 
if (shader >= SI_NUM_SHADERS)
return;
 
if (slot == 0 && input && input->buffer &&
!(r600_resource(input->buffer)->flags & RADEON_FLAG_32BIT)) {
assert(!"constant buffer 0 must have a 32-bit VM address, use 
const_uploader");
return;
}
 
+   if (input && input->buffer)
+   r600_resource(input->buffer)->bind_history |= 
PIPE_BIND_CONSTANT_BUFFER;
+
slot = si_get_constbuf_slot(slot);
si_set_constant_buffer(sctx, >const_and_shader_buffers[shader],
   
si_const_and_shader_buffer_descriptors_idx(shader),
   slot, input);
 }
 
 void si_get_pipe_constant_buffer(struct si_context *sctx, uint shader,
 uint slot, struct pipe_constant_buffer *cbuf)
 {
cbuf->user_buffer = NULL;
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 21/25] radeonsi: move remaining perfcounter code into si_perfcounter.c

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeon/r600_perfcounter.c | 639 
 src/gallium/drivers/radeonsi/Makefile.sources |   1 -
 src/gallium/drivers/radeonsi/meson.build  |   1 -
 src/gallium/drivers/radeonsi/si_perfcounter.c | 688 --
 src/gallium/drivers/radeonsi/si_pipe.c|   2 +-
 src/gallium/drivers/radeonsi/si_pipe.h|   4 +-
 src/gallium/drivers/radeonsi/si_query.h   |  74 +-
 7 files changed, 643 insertions(+), 766 deletions(-)
 delete mode 100644 src/gallium/drivers/radeon/r600_perfcounter.c

diff --git a/src/gallium/drivers/radeon/r600_perfcounter.c 
b/src/gallium/drivers/radeon/r600_perfcounter.c
deleted file mode 100644
index 57c3246898a..000
--- a/src/gallium/drivers/radeon/r600_perfcounter.c
+++ /dev/null
@@ -1,639 +0,0 @@
-/*
- * Copyright 2015 Advanced Micro Devices, Inc.
- * All Rights Reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
THE
- * SOFTWARE.
- */
-
-#include "util/u_memory.h"
-#include "radeonsi/si_query.h"
-#include "radeonsi/si_pipe.h"
-#include "amd/common/sid.h"
-
-/* Max counters per HW block */
-#define SI_QUERY_MAX_COUNTERS 16
-
-static struct si_perfcounter_block *
-lookup_counter(struct si_perfcounters *pc, unsigned index,
-  unsigned *base_gid, unsigned *sub_index)
-{
-   struct si_perfcounter_block *block = pc->blocks;
-   unsigned bid;
-
-   *base_gid = 0;
-   for (bid = 0; bid < pc->num_blocks; ++bid, ++block) {
-   unsigned total = block->num_groups * block->num_selectors;
-
-   if (index < total) {
-   *sub_index = index;
-   return block;
-   }
-
-   index -= total;
-   *base_gid += block->num_groups;
-   }
-
-   return NULL;
-}
-
-static struct si_perfcounter_block *
-lookup_group(struct si_perfcounters *pc, unsigned *index)
-{
-   unsigned bid;
-   struct si_perfcounter_block *block = pc->blocks;
-
-   for (bid = 0; bid < pc->num_blocks; ++bid, ++block) {
-   if (*index < block->num_groups)
-   return block;
-   *index -= block->num_groups;
-   }
-
-   return NULL;
-}
-
-struct si_pc_group {
-   struct si_pc_group *next;
-   struct si_perfcounter_block *block;
-   unsigned sub_gid; /* only used during init */
-   unsigned result_base; /* only used during init */
-   int se;
-   int instance;
-   unsigned num_counters;
-   unsigned selectors[SI_QUERY_MAX_COUNTERS];
-};
-
-struct si_pc_counter {
-   unsigned base;
-   unsigned qwords;
-   unsigned stride; /* in uint64s */
-};
-
-#define SI_PC_SHADERS_WINDOWING (1 << 31)
-
-struct si_query_pc {
-   struct si_query_hw b;
-
-   unsigned shaders;
-   unsigned num_counters;
-   struct si_pc_counter *counters;
-   struct si_pc_group *groups;
-};
-
-static void si_pc_query_destroy(struct si_screen *sscreen,
-   struct si_query *rquery)
-{
-   struct si_query_pc *query = (struct si_query_pc *)rquery;
-
-   while (query->groups) {
-   struct si_pc_group *group = query->groups;
-   query->groups = group->next;
-   FREE(group);
-   }
-
-   FREE(query->counters);
-
-   si_query_hw_destroy(sscreen, rquery);
-}
-
-static bool si_pc_query_prepare_buffer(struct si_screen *screen,
-  struct si_query_hw *hwquery,
-  struct r600_resource *buffer)
-{
-   /* no-op */
-   return true;
-}
-
-static void si_pc_query_emit_start(struct si_context *sctx,
-  struct si_query_hw *hwquery,
-  struct r600_resource *bu

[Mesa-dev] [PATCH 17/25] radeonsi: avoid using hard-coded SI_NUM_RW_BUFFERS

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeonsi/si_debug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index 22019741d80..fe2970a0ea3 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -1064,21 +1064,22 @@ void si_log_draw_state(struct si_context *sctx, struct 
u_log_context *log)
si_dump_framebuffer(sctx, log);
 
si_dump_gfx_shader(sctx, >vs_shader, log);
si_dump_gfx_shader(sctx, tcs_shader, log);
si_dump_gfx_shader(sctx, >tes_shader, log);
si_dump_gfx_shader(sctx, >gs_shader, log);
si_dump_gfx_shader(sctx, >ps_shader, log);
 
si_dump_descriptor_list(sctx->screen,
>descriptors[SI_DESCS_RW_BUFFERS],
-   "", "RW buffers", 4, SI_NUM_RW_BUFFERS,
+   "", "RW buffers", 4,
+   
sctx->descriptors[SI_DESCS_RW_BUFFERS].num_active_slots,
si_identity, log);
si_dump_gfx_descriptors(sctx, >vs_shader, log);
si_dump_gfx_descriptors(sctx, tcs_shader, log);
si_dump_gfx_descriptors(sctx, >tes_shader, log);
si_dump_gfx_descriptors(sctx, >gs_shader, log);
si_dump_gfx_descriptors(sctx, >ps_shader, log);
 }
 
 void si_log_compute_state(struct si_context *sctx, struct u_log_context *log)
 {
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 23/25] radeonsi: factor si_query_buffer logic out of si_query_hw

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This is a move towards using composition instead of inheritance for
different query types.

This change weakens out-of-memory error reporting somewhat, though this
should be acceptable since we didn't consistently report such errors in
the first place.
---
 src/gallium/drivers/radeonsi/si_perfcounter.c |   8 +-
 src/gallium/drivers/radeonsi/si_query.c   | 177 +-
 src/gallium/drivers/radeonsi/si_query.h   |  17 +-
 src/gallium/drivers/radeonsi/si_texture.c |   7 +-
 4 files changed, 99 insertions(+), 110 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_perfcounter.c 
b/src/gallium/drivers/radeonsi/si_perfcounter.c
index 0b3d8f89273..f0d10c054c4 100644
--- a/src/gallium/drivers/radeonsi/si_perfcounter.c
+++ b/src/gallium/drivers/radeonsi/si_perfcounter.c
@@ -761,23 +761,22 @@ static void si_pc_query_destroy(struct si_screen *sscreen,
struct si_query_group *group = query->groups;
query->groups = group->next;
FREE(group);
}
 
FREE(query->counters);
 
si_query_hw_destroy(sscreen, rquery);
 }
 
-static bool si_pc_query_prepare_buffer(struct si_screen *screen,
-  struct si_query_hw *hwquery,
-  struct r600_resource *buffer)
+static bool si_pc_query_prepare_buffer(struct si_context *ctx,
+  struct si_query_buffer *qbuf)
 {
/* no-op */
return true;
 }
 
 static void si_pc_query_emit_start(struct si_context *sctx,
   struct si_query_hw *hwquery,
   struct r600_resource *buffer, uint64_t va)
 {
struct si_query_pc *query = (struct si_query_pc *)hwquery;
@@ -1055,23 +1054,20 @@ struct pipe_query *si_create_batch_query(struct 
pipe_context *ctx,
counter->base = group->result_base + j;
counter->stride = group->num_counters;
 
counter->qwords = 1;
if ((block->b->b->flags & SI_PC_BLOCK_SE) && group->se < 0)
counter->qwords = screen->info.max_se;
if (group->instance < 0)
counter->qwords *= block->num_instances;
}
 
-   if (!si_query_hw_init(screen, >b))
-   goto error;
-
return (struct pipe_query *)query;
 
 error:
si_pc_query_destroy(screen, >b.b);
return NULL;
 }
 
 static bool si_init_block_names(struct si_screen *screen,
struct si_pc_block *block)
 {
diff --git a/src/gallium/drivers/radeonsi/si_query.c 
b/src/gallium/drivers/radeonsi/si_query.c
index 479a1bbf2c4..5b0fba0ed92 100644
--- a/src/gallium/drivers/radeonsi/si_query.c
+++ b/src/gallium/drivers/radeonsi/si_query.c
@@ -514,86 +514,129 @@ static struct pipe_query *si_query_sw_create(unsigned 
query_type)
query = CALLOC_STRUCT(si_query_sw);
if (!query)
return NULL;
 
query->b.type = query_type;
query->b.ops = _query_ops;
 
return (struct pipe_query *)query;
 }
 
-void si_query_hw_destroy(struct si_screen *sscreen,
-struct si_query *rquery)
+void si_query_buffer_destroy(struct si_screen *sscreen, struct si_query_buffer 
*buffer)
 {
-   struct si_query_hw *query = (struct si_query_hw *)rquery;
-   struct si_query_buffer *prev = query->buffer.previous;
+   struct si_query_buffer *prev = buffer->previous;
 
/* Release all query buffers. */
while (prev) {
struct si_query_buffer *qbuf = prev;
prev = prev->previous;
r600_resource_reference(>buf, NULL);
FREE(qbuf);
}
 
-   r600_resource_reference(>buffer.buf, NULL);
-   r600_resource_reference(>workaround_buf, NULL);
-   FREE(rquery);
+   r600_resource_reference(>buf, NULL);
+}
+
+void si_query_buffer_reset(struct si_context *sctx, struct si_query_buffer 
*buffer)
+{
+   /* Discard all query buffers except for the oldest. */
+   while (buffer->previous) {
+   struct si_query_buffer *qbuf = buffer->previous;
+   buffer->previous = qbuf->previous;
+
+   r600_resource_reference(>buf, NULL);
+   buffer->buf = qbuf->buf; /* move ownership */
+   FREE(qbuf);
+   }
+   buffer->results_end = 0;
+
+   /* Discard even the oldest buffer if it can't be mapped without a 
stall. */
+   if (buffer->buf &&
+   (si_rings_is_buffer_referenced(sctx, buffer->buf->buf, 
RADEON_USAGE_READWRITE) ||
+!sctx->ws->buffer_wait(buffer->buf->buf, 0, 
RADEON_USAGE_READWRITE))) {
+   r600_resource_reference(>buf, NULL);
+   }
 }
 
-static struct r600_resour

[Mesa-dev] [PATCH 25/25] radeonsi: const-ify the si_query_ops

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeonsi/si_perfcounter.c | 2 +-
 src/gallium/drivers/radeonsi/si_query.c   | 6 +++---
 src/gallium/drivers/radeonsi/si_query.h   | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_perfcounter.c 
b/src/gallium/drivers/radeonsi/si_perfcounter.c
index 65197c0daa4..fc2c58854bc 100644
--- a/src/gallium/drivers/radeonsi/si_perfcounter.c
+++ b/src/gallium/drivers/radeonsi/si_perfcounter.c
@@ -908,21 +908,21 @@ static bool si_pc_query_get_result(struct si_context 
*sctx, struct si_query *rqu
 
while (results_base != qbuf->results_end) {
si_pc_query_add_result(query, map + results_base, 
result);
results_base += query->result_size;
}
}
 
return true;
 }
 
-static struct si_query_ops batch_query_ops = {
+static const struct si_query_ops batch_query_ops = {
.destroy = si_pc_query_destroy,
.begin = si_pc_query_begin,
.end = si_pc_query_end,
.get_result = si_pc_query_get_result,
 
.suspend = si_pc_query_suspend,
.resume = si_pc_query_resume,
 };
 
 static struct si_query_group *get_group_state(struct si_screen *screen,
diff --git a/src/gallium/drivers/radeonsi/si_query.c 
b/src/gallium/drivers/radeonsi/si_query.c
index 5b0fba0ed92..093643bf684 100644
--- a/src/gallium/drivers/radeonsi/si_query.c
+++ b/src/gallium/drivers/radeonsi/si_query.c
@@ -27,21 +27,21 @@
 #include "si_pipe.h"
 #include "si_query.h"
 #include "util/u_memory.h"
 #include "util/u_upload_mgr.h"
 #include "util/os_time.h"
 #include "util/u_suballoc.h"
 #include "amd/common/sid.h"
 
 #define SI_MAX_STREAMS 4
 
-static struct si_query_ops query_hw_ops;
+static const struct si_query_ops query_hw_ops;
 
 struct si_hw_query_params {
unsigned start_offset;
unsigned end_offset;
unsigned fence_offset;
unsigned pair_stride;
unsigned pair_count;
 };
 
 /* Queries without buffer handling or suspend/resume. */
@@ -492,21 +492,21 @@ static bool si_query_sw_get_result(struct si_context 
*sctx,
case SI_QUERY_CURRENT_GPU_SCLK:
case SI_QUERY_CURRENT_GPU_MCLK:
result->u64 *= 100;
break;
}
 
return true;
 }
 
 
-static struct si_query_ops sw_query_ops = {
+static const struct si_query_ops sw_query_ops = {
.destroy = si_query_sw_destroy,
.begin = si_query_sw_begin,
.end = si_query_sw_end,
.get_result = si_query_sw_get_result,
.get_result_resource = NULL
 };
 
 static struct pipe_query *si_query_sw_create(unsigned query_type)
 {
struct si_query_sw *query;
@@ -1336,21 +1336,21 @@ static void si_query_hw_add_result(struct si_screen 
*sscreen,
 void si_query_hw_suspend(struct si_context *sctx, struct si_query *query)
 {
si_query_hw_emit_stop(sctx, (struct si_query_hw *)query);
 }
 
 void si_query_hw_resume(struct si_context *sctx, struct si_query *query)
 {
si_query_hw_emit_start(sctx, (struct si_query_hw *)query);
 }
 
-static struct si_query_ops query_hw_ops = {
+static const struct si_query_ops query_hw_ops = {
.destroy = si_query_hw_destroy,
.begin = si_query_hw_begin,
.end = si_query_hw_end,
.get_result = si_query_hw_get_result,
.get_result_resource = si_query_hw_get_result_resource,
 
.suspend = si_query_hw_suspend,
.resume = si_query_hw_resume,
 };
 
diff --git a/src/gallium/drivers/radeonsi/si_query.h 
b/src/gallium/drivers/radeonsi/si_query.h
index 63af760a271..0bc1d56f78a 100644
--- a/src/gallium/drivers/radeonsi/si_query.h
+++ b/src/gallium/drivers/radeonsi/si_query.h
@@ -134,21 +134,21 @@ struct si_query_ops {
int index,
struct pipe_resource *resource,
unsigned offset);
 
void (*suspend)(struct si_context *, struct si_query *);
void (*resume)(struct si_context *, struct si_query *);
 };
 
 struct si_query {
struct threaded_query b;
-   struct si_query_ops *ops;
+   const struct si_query_ops *ops;
 
/* The PIPE_QUERY_xxx type of query */
unsigned type;
 
/* The number of dwords for suspend. */
unsigned num_cs_dw_suspend;
 
/* Linked list of queries that must be suspended at end of CS. */
struct list_head active_list;
 };
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 19/25] radeonsi: use si_set_rw_shader_buffer for setting streamout buffers

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Reduce the number of places that encode buffer descriptors.
---
 .../drivers/radeonsi/si_state_streamout.c | 61 ---
 1 file changed, 11 insertions(+), 50 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_streamout.c 
b/src/gallium/drivers/radeonsi/si_state_streamout.c
index fd7e843bc48..83ca23a8bf2 100644
--- a/src/gallium/drivers/radeonsi/si_state_streamout.c
+++ b/src/gallium/drivers/radeonsi/si_state_streamout.c
@@ -86,24 +86,22 @@ void si_streamout_buffers_dirty(struct si_context *sctx)
si_mark_atom_dirty(sctx, >atoms.s.streamout_begin);
si_set_streamout_enable(sctx, true);
 }
 
 static void si_set_streamout_targets(struct pipe_context *ctx,
 unsigned num_targets,
 struct pipe_stream_output_target **targets,
 const unsigned *offsets)
 {
struct si_context *sctx = (struct si_context *)ctx;
-   struct si_buffer_resources *buffers = >rw_buffers;
-   struct si_descriptors *descs = >descriptors[SI_DESCS_RW_BUFFERS];
unsigned old_num_targets = sctx->streamout.num_targets;
-   unsigned i, bufidx;
+   unsigned i;
 
/* We are going to unbind the buffers. Mark which caches need to be 
flushed. */
if (sctx->streamout.num_targets && sctx->streamout.begin_emitted) {
/* Since streamout uses vector writes which go through TC L2
 * and most other clients can use TC L2 as well, we don't need
 * to flush it.
 *
 * The only cases which requires flushing it is VGT DMA index
 * fetching (on <= CIK) and indirect draw data, which are rare
 * cases. Thus, flag the TC L2 dirtiness in the resource and
@@ -168,71 +166,34 @@ static void si_set_streamout_targets(struct pipe_context 
*ctx,
/* Update dirty state bits. */
if (num_targets) {
si_streamout_buffers_dirty(sctx);
} else {
si_set_atom_dirty(sctx, >atoms.s.streamout_begin, false);
si_set_streamout_enable(sctx, false);
}
 
/* Set the shader resources.*/
for (i = 0; i < num_targets; i++) {
-   bufidx = SI_VS_STREAMOUT_BUF0 + i;
-
if (targets[i]) {
-   struct pipe_resource *buffer = targets[i]->buffer;
-   uint64_t va = r600_resource(buffer)->gpu_address;
-
-   /* Set the descriptor.
-*
-* On VI, the format must be non-INVALID, otherwise
-* the buffer will be considered not bound and store
-* instructions will be no-ops.
-*/
-   uint32_t *desc = descs->list + bufidx*4;
-   desc[0] = va;
-   desc[1] = S_008F04_BASE_ADDRESS_HI(va >> 32);
-   desc[2] = 0x;
-   desc[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
- S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
- S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
- S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) |
- 
S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32);
-
-   /* Set the resource. */
-   pipe_resource_reference(>buffers[bufidx],
-   buffer);
-   radeon_add_to_gfx_buffer_list_check_mem(sctx,
-   
r600_resource(buffer),
-   
buffers->shader_usage,
-   
RADEON_PRIO_SHADER_RW_BUFFER,
-   true);
-   r600_resource(buffer)->bind_history |= 
PIPE_BIND_STREAM_OUTPUT;
-
-   buffers->enabled_mask |= 1u << bufidx;
+   struct pipe_shader_buffer sbuf;
+   sbuf.buffer = targets[i]->buffer;
+   sbuf.buffer_offset = 0;
+   sbuf.buffer_size = targets[i]->buffer_offset +
+  targets[i]->buffer_size;
+   si_set_rw_shader_buffer(sctx, SI_VS_STREAMOUT_BUF0 + i, 
);
+   r600_resource(targets[i]->buffer)->bind_history |= 
PIPE_BIND_STREAM_OUTPUT;
} else {
-   /* Clear the descriptor and unset the resource. */
-   memset(descs->list + bufidx*4, 0,
-  sizeof(uint32_t) * 4);
-   pipe_res

[Mesa-dev] [PATCH 18/25] radeonsi: add an si_set_rw_shader_buffer convenience function

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeonsi/si_descriptors.c | 107 ++
 src/gallium/drivers/radeonsi/si_state.h   |   2 +
 2 files changed, 64 insertions(+), 45 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 06e95e863eb..69a8c3d 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -1262,27 +1262,20 @@ static void si_set_constant_buffer(struct si_context 
*sctx,
buffers->enabled_mask |= 1u << slot;
} else {
/* Clear the descriptor. */
memset(descs->list + slot*4, 0, sizeof(uint32_t) * 4);
buffers->enabled_mask &= ~(1u << slot);
}
 
sctx->descriptors_dirty |= 1u << descriptors_idx;
 }
 
-void si_set_rw_buffer(struct si_context *sctx,
- uint slot, const struct pipe_constant_buffer *input)
-{
-   si_set_constant_buffer(sctx, >rw_buffers,
-   SI_DESCS_RW_BUFFERS, slot, 
input);
-}
-
 static void si_pipe_set_constant_buffer(struct pipe_context *ctx,
enum pipe_shader_type shader, uint slot,
const struct pipe_constant_buffer 
*input)
 {
struct si_context *sctx = (struct si_context *)ctx;
 
if (shader >= SI_NUM_SHADERS)
return;
 
if (slot == 0 && input && input->buffer &&
@@ -1303,74 +1296,84 @@ void si_get_pipe_constant_buffer(struct si_context 
*sctx, uint shader,
cbuf->user_buffer = NULL;
si_get_buffer_from_descriptors(
>const_and_shader_buffers[shader],
si_const_and_shader_buffer_descriptors(sctx, shader),
si_get_constbuf_slot(slot),
>buffer, >buffer_offset, >buffer_size);
 }
 
 /* SHADER BUFFERS */
 
+static void si_set_shader_buffer(struct si_context *sctx,
+struct si_buffer_resources *buffers,
+unsigned descriptors_idx,
+uint slot, const struct pipe_shader_buffer 
*sbuffer,
+enum radeon_bo_priority priority)
+{
+   struct si_descriptors *descs = >descriptors[descriptors_idx];
+   uint32_t *desc = descs->list + slot * 4;
+
+   if (!sbuffer || !sbuffer->buffer) {
+   pipe_resource_reference(>buffers[slot], NULL);
+   memset(desc, 0, sizeof(uint32_t) * 4);
+   buffers->enabled_mask &= ~(1u << slot);
+   sctx->descriptors_dirty |= 1u << descriptors_idx;
+   return;
+   }
+
+   struct r600_resource *buf = r600_resource(sbuffer->buffer);
+   uint64_t va = buf->gpu_address + sbuffer->buffer_offset;
+
+   desc[0] = va;
+   desc[1] = S_008F04_BASE_ADDRESS_HI(va >> 32) |
+ S_008F04_STRIDE(0);
+   desc[2] = sbuffer->buffer_size;
+   desc[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
+ S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
+ S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
+ S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) |
+ S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) |
+ S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32);
+
+   pipe_resource_reference(>buffers[slot], >b.b);
+   radeon_add_to_gfx_buffer_list_check_mem(sctx, buf,
+   buffers->shader_usage,
+   priority, true);
+
+   buffers->enabled_mask |= 1u << slot;
+   sctx->descriptors_dirty |= 1u << descriptors_idx;
+
+   util_range_add(>valid_buffer_range, sbuffer->buffer_offset,
+  sbuffer->buffer_offset + sbuffer->buffer_size);
+}
+
 static void si_set_shader_buffers(struct pipe_context *ctx,
  enum pipe_shader_type shader,
  unsigned start_slot, unsigned count,
  const struct pipe_shader_buffer *sbuffers)
 {
struct si_context *sctx = (struct si_context *)ctx;
struct si_buffer_resources *buffers = 
>const_and_shader_buffers[shader];
-   struct si_descriptors *descs = 
si_const_and_shader_buffer_descriptors(sctx, shader);
+   unsigned descriptors_idx = 
si_const_and_shader_buffer_descriptors_idx(shader);
unsigned i;
 
assert(start_slot + count <= SI_NUM_SHADER_BUFFERS);
 
for (i = 0; i < count; ++i) {
const struct pipe_shader_buffer *sbuffer = sbuffers ? 
[i] : NULL;
-   struct r600_resource *buf;
unsigned 

[Mesa-dev] [PATCH 16/25] radeonsi: show the fixed function TCS in debug dumps

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This is rather important for merged VS/TCS as LSHS shaders...
---
 src/gallium/drivers/radeonsi/si_debug.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index ec4bd03c9a5..22019741d80 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -1045,37 +1045,43 @@ static void si_dump_debug_state(struct pipe_context 
*ctx, FILE *f,
si_dump_debug_registers(sctx, f);
 
si_dump_annotated_shaders(sctx, f);
si_dump_command("Active waves (raw data)", "umr -O halt_waves 
-wa | column -t", f);
si_dump_command("Wave information", "umr -O halt_waves,bits 
-wa", f);
}
 }
 
 void si_log_draw_state(struct si_context *sctx, struct u_log_context *log)
 {
+   struct si_shader_ctx_state *tcs_shader;
+
if (!log)
return;
 
+   tcs_shader = >tcs_shader;
+   if (sctx->tes_shader.cso && !sctx->tcs_shader.cso)
+   tcs_shader = >fixed_func_tcs_shader;
+
si_dump_framebuffer(sctx, log);
 
si_dump_gfx_shader(sctx, >vs_shader, log);
-   si_dump_gfx_shader(sctx, >tcs_shader, log);
+   si_dump_gfx_shader(sctx, tcs_shader, log);
si_dump_gfx_shader(sctx, >tes_shader, log);
si_dump_gfx_shader(sctx, >gs_shader, log);
si_dump_gfx_shader(sctx, >ps_shader, log);
 
si_dump_descriptor_list(sctx->screen,
>descriptors[SI_DESCS_RW_BUFFERS],
"", "RW buffers", 4, SI_NUM_RW_BUFFERS,
si_identity, log);
si_dump_gfx_descriptors(sctx, >vs_shader, log);
-   si_dump_gfx_descriptors(sctx, >tcs_shader, log);
+   si_dump_gfx_descriptors(sctx, tcs_shader, log);
si_dump_gfx_descriptors(sctx, >tes_shader, log);
si_dump_gfx_descriptors(sctx, >gs_shader, log);
si_dump_gfx_descriptors(sctx, >ps_shader, log);
 }
 
 void si_log_compute_state(struct si_context *sctx, struct u_log_context *log)
 {
if (!log)
return;
 
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/25] radeonsi: move SI_FORCE_FAMILY functionality to winsys

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This helps some debugging cases by initializing addrlib with
slightly more appropriate settings.
---
 src/gallium/drivers/radeonsi/si_pipe.c| 34 --
 src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c | 36 +++
 2 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 503d8331906..7943af4d86e 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -718,53 +718,20 @@ static void si_destroy_screen(struct pipe_screen* pscreen)
sscreen->ws->destroy(sscreen->ws);
FREE(sscreen);
 }
 
 static void si_init_gs_info(struct si_screen *sscreen)
 {
sscreen->gs_table_depth = 
ac_get_gs_table_depth(sscreen->info.chip_class,
sscreen->info.family);
 }
 
-static void si_handle_env_var_force_family(struct si_screen *sscreen)
-{
-   const char *family = debug_get_option("SI_FORCE_FAMILY", NULL);
-   unsigned i;
-
-   if (!family)
-   return;
-
-   for (i = CHIP_TAHITI; i < CHIP_LAST; i++) {
-   if (!strcmp(family, ac_get_llvm_processor_name(i))) {
-   /* Override family and chip_class. */
-   sscreen->info.family = i;
-   sscreen->info.name = "GCN-NOOP";
-
-   if (i >= CHIP_VEGA10)
-   sscreen->info.chip_class = GFX9;
-   else if (i >= CHIP_TONGA)
-   sscreen->info.chip_class = VI;
-   else if (i >= CHIP_BONAIRE)
-   sscreen->info.chip_class = CIK;
-   else
-   sscreen->info.chip_class = SI;
-
-   /* Don't submit any IBs. */
-   setenv("RADEON_NOOP", "1", 1);
-   return;
-   }
-   }
-
-   fprintf(stderr, "radeonsi: Unknown family: %s\n", family);
-   exit(1);
-}
-
 static void si_test_vmfault(struct si_screen *sscreen)
 {
struct pipe_context *ctx = sscreen->aux_context;
struct si_context *sctx = (struct si_context *)ctx;
struct pipe_resource *buf =
pipe_buffer_create_const0(>b, 0, PIPE_USAGE_DEFAULT, 
64);
 
if (!buf) {
puts("Buffer allocation failed.");
exit(1);
@@ -871,21 +838,20 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws,
 {
struct si_screen *sscreen = CALLOC_STRUCT(si_screen);
unsigned hw_threads, num_comp_hi_threads, num_comp_lo_threads, i;
 
if (!sscreen) {
return NULL;
}
 
sscreen->ws = ws;
ws->query_info(ws, >info);
-   si_handle_env_var_force_family(sscreen);
 
if (sscreen->info.chip_class >= GFX9) {
sscreen->se_tile_repeat = 32 * sscreen->info.max_se;
} else {
ac_get_raster_config(>info,
 >pa_sc_raster_config,
 >pa_sc_raster_config_1,
 >se_tile_repeat);
}
 
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
index 6b7f484f239..79d2c1345ef 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c
@@ -31,40 +31,76 @@
 #include "amdgpu_public.h"
 
 #include "util/u_cpu_detect.h"
 #include "util/u_hash_table.h"
 #include "util/hash_table.h"
 #include "util/xmlconfig.h"
 #include 
 #include 
 #include 
 #include 
+#include "amd/common/ac_llvm_util.h"
 #include "amd/common/sid.h"
 #include "amd/common/gfx9d.h"
 
 #ifndef AMDGPU_INFO_NUM_VRAM_CPU_PAGE_FAULTS
 #define AMDGPU_INFO_NUM_VRAM_CPU_PAGE_FAULTS   0x1E
 #endif
 
 static struct util_hash_table *dev_tab = NULL;
 static simple_mtx_t dev_tab_mutex = _SIMPLE_MTX_INITIALIZER_NP;
 
 DEBUG_GET_ONCE_BOOL_OPTION(all_bos, "RADEON_ALL_BOS", false)
 
+static void handle_env_var_force_family(struct amdgpu_winsys *ws)
+{
+  const char *family = debug_get_option("SI_FORCE_FAMILY", NULL);
+  unsigned i;
+
+  if (!family)
+   return;
+
+  for (i = CHIP_TAHITI; i < CHIP_LAST; i++) {
+ if (!strcmp(family, ac_get_llvm_processor_name(i))) {
+/* Override family and chip_class. */
+ws->info.family = i;
+ws->info.name = "GCN-NOOP";
+
+if (i >= CHIP_VEGA10)
+   ws->info.chip_class = GFX9;
+else if (i >= CHIP_TONGA)
+   w

[Mesa-dev] [PATCH 08/25] ac/surface: 3D and cube surfaces are never displayable

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/amd/common/ac_surface.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c
index d8d927ee1c5..aeba5e161c9 100644
--- a/src/amd/common/ac_surface.c
+++ b/src/amd/common/ac_surface.c
@@ -1509,24 +1509,26 @@ static int gfx9_compute_surface(ADDR_HANDLE addrlib,
r = gfx9_compute_miptree(addrlib, config, surf, compressed,
 );
if (r)
return r;
}
 
surf->is_linear = surf->u.gfx9.surf.swizzle_mode == ADDR_SW_LINEAR;
 
/* Query whether the surface is displayable. */
bool displayable = false;
-   r = Addr2IsValidDisplaySwizzleMode(addrlib, 
surf->u.gfx9.surf.swizzle_mode,
+   if (!config->is_3d && !config->is_cube) {
+   r = Addr2IsValidDisplaySwizzleMode(addrlib, 
surf->u.gfx9.surf.swizzle_mode,
   surf->bpe * 8, );
-   if (r)
-   return r;
+   if (r)
+   return r;
+   }
surf->is_displayable = displayable;
 
switch (surf->u.gfx9.surf.swizzle_mode) {
/* S = standard. */
case ADDR_SW_256B_S:
case ADDR_SW_4KB_S:
case ADDR_SW_64KB_S:
case ADDR_SW_VAR_S:
case ADDR_SW_64KB_S_T:
case ADDR_SW_4KB_S_X:
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/25] radeonsi: extract declare_vs_blit_inputs

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Prepare for some later refactoring.
---
 src/gallium/drivers/radeonsi/si_shader.c | 43 ++--
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index d455fb5db6a..1bc32f31020 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4577,20 +4577,44 @@ static void declare_vs_input_vgprs(struct 
si_shader_context *ctx,
 
if (!shader->is_gs_copy_shader) {
/* Vertex load indices. */
ctx->param_vertex_index0 = fninfo->num_params;
for (unsigned i = 0; i < shader->selector->info.num_inputs; i++)
add_arg(fninfo, ARG_VGPR, ctx->i32);
*num_prolog_vgprs += shader->selector->info.num_inputs;
}
 }
 
+static void declare_vs_blit_inputs(struct si_shader_context *ctx,
+  struct si_function_info *fninfo,
+  unsigned vs_blit_property)
+{
+   ctx->param_vs_blit_inputs = fninfo->num_params;
+   add_arg(fninfo, ARG_SGPR, ctx->i32); /* i16 x1, y1 */
+   add_arg(fninfo, ARG_SGPR, ctx->i32); /* i16 x2, y2 */
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* depth */
+
+   if (vs_blit_property == SI_VS_BLIT_SGPRS_POS_COLOR) {
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* color0 */
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* color1 */
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* color2 */
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* color3 */
+   } else if (vs_blit_property == SI_VS_BLIT_SGPRS_POS_TEXCOORD) {
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* texcoord.x1 */
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* texcoord.y1 */
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* texcoord.x2 */
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* texcoord.y2 */
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* texcoord.z */
+   add_arg(fninfo, ARG_SGPR, ctx->f32); /* texcoord.w */
+   }
+}
+
 static void declare_tes_input_vgprs(struct si_shader_context *ctx,
struct si_function_info *fninfo)
 {
ctx->param_tes_u = add_arg(fninfo, ARG_VGPR, ctx->f32);
ctx->param_tes_v = add_arg(fninfo, ARG_VGPR, ctx->f32);
ctx->param_tes_rel_patch_id = add_arg(fninfo, ARG_VGPR, ctx->i32);
add_arg_assign(fninfo, ARG_VGPR, ctx->i32, >abi.tes_patch_id);
 }
 
 enum {
@@ -4621,38 +4645,21 @@ static void create_function(struct si_shader_context 
*ctx)
type = SI_SHADER_MERGED_VERTEX_OR_TESSEVAL_GEOMETRY;
}
 
LLVMTypeRef v3i32 = LLVMVectorType(ctx->i32, 3);
 
switch (type) {
case PIPE_SHADER_VERTEX:
declare_global_desc_pointers(ctx, );
 
if (vs_blit_property) {
-   ctx->param_vs_blit_inputs = fninfo.num_params;
-   add_arg(, ARG_SGPR, ctx->i32); /* i16 x1, y1 */
-   add_arg(, ARG_SGPR, ctx->i32); /* i16 x2, y2 */
-   add_arg(, ARG_SGPR, ctx->f32); /* depth */
-
-   if (vs_blit_property == SI_VS_BLIT_SGPRS_POS_COLOR) {
-   add_arg(, ARG_SGPR, ctx->f32); /* color0 
*/
-   add_arg(, ARG_SGPR, ctx->f32); /* color1 
*/
-   add_arg(, ARG_SGPR, ctx->f32); /* color2 
*/
-   add_arg(, ARG_SGPR, ctx->f32); /* color3 
*/
-   } else if (vs_blit_property == 
SI_VS_BLIT_SGPRS_POS_TEXCOORD) {
-   add_arg(, ARG_SGPR, ctx->f32); /* 
texcoord.x1 */
-   add_arg(, ARG_SGPR, ctx->f32); /* 
texcoord.y1 */
-   add_arg(, ARG_SGPR, ctx->f32); /* 
texcoord.x2 */
-   add_arg(, ARG_SGPR, ctx->f32); /* 
texcoord.y2 */
-   add_arg(, ARG_SGPR, ctx->f32); /* 
texcoord.z */
-   add_arg(, ARG_SGPR, ctx->f32); /* 
texcoord.w */
-   }
+   declare_vs_blit_inputs(ctx, , vs_blit_property);
 
/* VGPRs */
declare_vs_input_vgprs(ctx, , _prolog_vgprs);
break;
}
 
declare_per_stage_desc_pointers(ctx, , true);
declare_vs_specific_input_sgprs(ctx, );
ctx->param_vertex_buffers = add_arg(, ARG_SGPR,
ac_array_in_const32_addr_space(ctx->v4i32));
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/25] amd/common: scan/reduce across waves of a workgroup

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.
---
 src/amd/common/ac_llvm_build.c | 195 -
 src/amd/common/ac_llvm_build.h |  36 ++
 2 files changed, 227 insertions(+), 4 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 68c8bad9e83..932f4bbdeef 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -3345,68 +3345,88 @@ ac_build_alu_op(struct ac_llvm_context *ctx, 
LLVMValueRef lhs, LLVMValueRef rhs,
_64bit ? ctx->f64 : ctx->f32,
(LLVMValueRef[]){lhs, rhs}, 2, 
AC_FUNC_ATTR_READNONE);
case nir_op_iand: return LLVMBuildAnd(ctx->builder, lhs, rhs, "");
case nir_op_ior: return LLVMBuildOr(ctx->builder, lhs, rhs, "");
case nir_op_ixor: return LLVMBuildXor(ctx->builder, lhs, rhs, "");
default:
unreachable("bad reduction intrinsic");
}
 }
 
-/* TODO: add inclusive and excluse scan functions for SI chip class.  */
+/**
+ * \param maxprefix specifies that the result only needs to be correct for a
+ * prefix of this many threads
+ *
+ * TODO: add inclusive and excluse scan functions for SI chip class.
+ */
 static LLVMValueRef
-ac_build_scan(struct ac_llvm_context *ctx, nir_op op, LLVMValueRef src, 
LLVMValueRef identity)
+ac_build_scan(struct ac_llvm_context *ctx, nir_op op, LLVMValueRef src, 
LLVMValueRef identity,
+ unsigned maxprefix)
 {
LLVMValueRef result, tmp;
result = src;
+   if (maxprefix <= 1)
+   return result;
tmp = ac_build_dpp(ctx, identity, src, dpp_row_sr(1), 0xf, 0xf, false);
result = ac_build_alu_op(ctx, result, tmp, op);
+   if (maxprefix <= 2)
+   return result;
tmp = ac_build_dpp(ctx, identity, src, dpp_row_sr(2), 0xf, 0xf, false);
result = ac_build_alu_op(ctx, result, tmp, op);
+   if (maxprefix <= 3)
+   return result;
tmp = ac_build_dpp(ctx, identity, src, dpp_row_sr(3), 0xf, 0xf, false);
result = ac_build_alu_op(ctx, result, tmp, op);
+   if (maxprefix <= 4)
+   return result;
tmp = ac_build_dpp(ctx, identity, result, dpp_row_sr(4), 0xf, 0xe, 
false);
result = ac_build_alu_op(ctx, result, tmp, op);
+   if (maxprefix <= 8)
+   return result;
tmp = ac_build_dpp(ctx, identity, result, dpp_row_sr(8), 0xf, 0xc, 
false);
result = ac_build_alu_op(ctx, result, tmp, op);
+   if (maxprefix <= 16)
+   return result;
tmp = ac_build_dpp(ctx, identity, result, dpp_row_bcast15, 0xa, 0xf, 
false);
result = ac_build_alu_op(ctx, result, tmp, op);
+   if (maxprefix <= 32)
+   return result;
tmp = ac_build_dpp(ctx, identity, result, dpp_row_bcast31, 0xc, 0xf, 
false);
result = ac_build_alu_op(ctx, result, tmp, op);
return result;
 }
 
 LLVMValueRef
 ac_build_inclusive_scan(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op 
op)
 {
ac_build_optimization_barrier(ctx, );
LLVMValueRef result;
LLVMValueRef identity =
get_reduction_identity(ctx, op, 
ac_get_type_size(LLVMTypeOf(src)));
result = LLVMBuildBitCast(ctx->builder, ac_build_set_inactive(ctx, src, 
identity),
  LLVMTypeOf(identity), "");
-   result = ac_build_scan(ctx, op, result, identity);
+   result = ac_build_scan(ctx, op, result, identity, 64);
 
return ac_build_wwm(ctx, result);
 }
 
 LLVMValueRef
 ac_build_exclusive_scan(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op 
op)
 {
ac_build_optimization_barrier(ctx, );
LLVMValueRef result;
LLVMValueRef identity =
get_reduction_identity(ctx, op, 
ac_get_type_size(LLVMTypeOf(src)));
result = LLVMBuildBitCast(ctx->builder, ac_build_set_inactive(ctx, src, 
identity),
  LLVMTypeOf(identity), "");
result = ac_build_dpp(ctx, identity, result, dpp_wf_sr1, 0xf, 0xf, 
false);
-   result = ac_build_scan(ctx, op, result, identity);
+   result = ac_build_scan(ctx, op, result, identity, 64);
 
return ac_build_wwm(ctx, result);
 }
 
 LLVMValueRef
 ac_build_reduce(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op op, 
unsigned cluster_size)
 {
if (cluster_size == 1) return src;
ac_build_optimization_barrier(ctx, );
LLVMValueRef result, swap;
@@ -3450,20 +3470,187 @@ ac_build_reduce(struct ac_llvm_context *ctx, 
LLVMValueRef src, nir_op op, unsign
result = ac_build_readlane(ctx, result, LLVMConstInt(ctx->i32, 
63, 0));
return ac_build_wwm(ctx, result);
 

[Mesa-dev] [PATCH 13/25] radeonsi: don't set RAW_WAIT for CP DMA clears

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

There is never a read-after-write hazard because the command doesn't read.
---
 src/gallium/drivers/radeonsi/si_cp_dma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c 
b/src/gallium/drivers/radeonsi/si_cp_dma.c
index 33220d9f0fa..80673f3f5f2 100644
--- a/src/gallium/drivers/radeonsi/si_cp_dma.c
+++ b/src/gallium/drivers/radeonsi/si_cp_dma.c
@@ -182,21 +182,22 @@ static void si_cp_dma_prepare(struct si_context *sctx, 
struct pipe_resource *dst
  r600_resource(src),
  RADEON_USAGE_READ, 
RADEON_PRIO_CP_DMA);
}
 
/* Flush the caches for the first copy only.
 * Also wait for the previous CP DMA operations.
 */
if (!(user_flags & SI_CPDMA_SKIP_GFX_SYNC) && sctx->flags)
si_emit_cache_flush(sctx);
 
-   if (!(user_flags & SI_CPDMA_SKIP_SYNC_BEFORE) && *is_first)
+   if (!(user_flags & SI_CPDMA_SKIP_SYNC_BEFORE) && *is_first &&
+   !(*packet_flags & CP_DMA_CLEAR))
*packet_flags |= CP_DMA_RAW_WAIT;
 
*is_first = false;
 
/* Do the synchronization after the last dma, so that all data
 * is written to memory.
 */
if (!(user_flags & SI_CPDMA_SKIP_SYNC_AFTER) &&
byte_count == remaining_size) {
*packet_flags |= CP_DMA_SYNC;
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/25] radeonsi: const-ify si_set_tesseval_regs

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index ad7d21e7816..0d4e1956037 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -330,24 +330,24 @@ void si_destroy_shader_cache(struct si_screen *sscreen)
 {
if (sscreen->shader_cache)
_mesa_hash_table_destroy(sscreen->shader_cache,
 si_destroy_shader_cache_entry);
mtx_destroy(>shader_cache_mutex);
 }
 
 /* SHADER STATES */
 
 static void si_set_tesseval_regs(struct si_screen *sscreen,
-struct si_shader_selector *tes,
+const struct si_shader_selector *tes,
 struct si_pm4_state *pm4)
 {
-   struct tgsi_shader_info *info = >info;
+   const struct tgsi_shader_info *info = >info;
unsigned tes_prim_mode = info->properties[TGSI_PROPERTY_TES_PRIM_MODE];
unsigned tes_spacing = info->properties[TGSI_PROPERTY_TES_SPACING];
bool tes_vertex_order_cw = 
info->properties[TGSI_PROPERTY_TES_VERTEX_ORDER_CW];
bool tes_point_mode = info->properties[TGSI_PROPERTY_TES_POINT_MODE];
unsigned type, partitioning, topology, distribution_mode;
 
switch (tes_prim_mode) {
case PIPE_PRIM_LINES:
type = V_028B6C_TESS_ISOLINE;
break;
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/25] radeonsi: rename SI_RESOURCE_FLAG_FORCE_TILING to clarify its purpose

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeonsi/si_blit.c| 2 +-
 src/gallium/drivers/radeonsi/si_pipe.h| 2 +-
 src/gallium/drivers/radeonsi/si_texture.c | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
b/src/gallium/drivers/radeonsi/si_blit.c
index 8f7aa0815b9..69b1af02db0 100644
--- a/src/gallium/drivers/radeonsi/si_blit.c
+++ b/src/gallium/drivers/radeonsi/si_blit.c
@@ -1186,21 +1186,21 @@ resolve_to_temp:
 * a temporary texture and blit.
 */
memset(, 0, sizeof(templ));
templ.target = PIPE_TEXTURE_2D;
templ.format = info->src.resource->format;
templ.width0 = info->src.resource->width0;
templ.height0 = info->src.resource->height0;
templ.depth0 = 1;
templ.array_size = 1;
templ.usage = PIPE_USAGE_DEFAULT;
-   templ.flags = SI_RESOURCE_FLAG_FORCE_TILING |
+   templ.flags = SI_RESOURCE_FLAG_FORCE_MSAA_TILING |
  SI_RESOURCE_FLAG_DISABLE_DCC;
 
/* The src and dst microtile modes must be the same. */
if (src->surface.micro_tile_mode == RADEON_MICRO_MODE_DISPLAY)
templ.bind = PIPE_BIND_SCANOUT;
else
templ.bind = 0;
 
tmp = ctx->screen->resource_create(ctx->screen, );
if (!tmp)
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 1d677d29e88..179671e8871 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -96,21 +96,21 @@
 #define SI_PREFETCH_PS (1 << 6)
 
 #define SI_MAX_BORDER_COLORS   4096
 #define SI_MAX_VIEWPORTS   16
 #define SIX_BITS   0x3F
 #define SI_MAP_BUFFER_ALIGNMENT64
 #define SI_MAX_VARIABLE_THREADS_PER_BLOCK 1024
 
 #define SI_RESOURCE_FLAG_TRANSFER  (PIPE_RESOURCE_FLAG_DRV_PRIV << 0)
 #define SI_RESOURCE_FLAG_FLUSHED_DEPTH (PIPE_RESOURCE_FLAG_DRV_PRIV << 1)
-#define SI_RESOURCE_FLAG_FORCE_TILING  (PIPE_RESOURCE_FLAG_DRV_PRIV << 2)
+#define SI_RESOURCE_FLAG_FORCE_MSAA_TILING (PIPE_RESOURCE_FLAG_DRV_PRIV << 2)
 #define SI_RESOURCE_FLAG_DISABLE_DCC   (PIPE_RESOURCE_FLAG_DRV_PRIV << 3)
 #define SI_RESOURCE_FLAG_UNMAPPABLE(PIPE_RESOURCE_FLAG_DRV_PRIV << 4)
 #define SI_RESOURCE_FLAG_READ_ONLY (PIPE_RESOURCE_FLAG_DRV_PRIV << 5)
 #define SI_RESOURCE_FLAG_32BIT (PIPE_RESOURCE_FLAG_DRV_PRIV << 6)
 #define SI_RESOURCE_FLAG_SO_FILLED_SIZE(PIPE_RESOURCE_FLAG_DRV_PRIV << 
7)
 
 /* Debug flags. */
 enum {
/* Shader logging options: */
DBG_VS = PIPE_SHADER_VERTEX,
diff --git a/src/gallium/drivers/radeonsi/si_texture.c 
b/src/gallium/drivers/radeonsi/si_texture.c
index 95f1e8c9693..ac1a0aa6097 100644
--- a/src/gallium/drivers/radeonsi/si_texture.c
+++ b/src/gallium/drivers/radeonsi/si_texture.c
@@ -296,21 +296,21 @@ static int si_init_surface(struct si_screen *sscreen,
   ptex->last_level == 0 &&
   !(flags & RADEON_SURF_Z_OR_SBUFFER));
 
flags |= RADEON_SURF_SCANOUT;
}
 
if (ptex->bind & PIPE_BIND_SHARED)
flags |= RADEON_SURF_SHAREABLE;
if (is_imported)
flags |= RADEON_SURF_IMPORTED | RADEON_SURF_SHAREABLE;
-   if (!(ptex->flags & SI_RESOURCE_FLAG_FORCE_TILING))
+   if (!(ptex->flags & SI_RESOURCE_FLAG_FORCE_MSAA_TILING))
flags |= RADEON_SURF_OPTIMIZE_FOR_SPACE;
 
r = sscreen->ws->surface_init(sscreen->ws, ptex, flags, bpe,
  array_mode, surface);
if (r) {
return r;
}
 
unsigned pitch = pitch_in_bytes_override / bpe;
 
@@ -1286,21 +1286,21 @@ si_texture_create_object(struct pipe_screen *screen,
}
 
return tex;
 }
 
 static enum radeon_surf_mode
 si_choose_tiling(struct si_screen *sscreen,
 const struct pipe_resource *templ, bool tc_compatible_htile)
 {
const struct util_format_description *desc = 
util_format_description(templ->format);
-   bool force_tiling = templ->flags & SI_RESOURCE_FLAG_FORCE_TILING;
+   bool force_tiling = templ->flags & SI_RESOURCE_FLAG_FORCE_MSAA_TILING;
bool is_depth_stencil = util_format_is_depth_or_stencil(templ->format) 
&&
!(templ->flags & 
SI_RESOURCE_FLAG_FLUSHED_DEPTH);
 
/* MSAA resources must be 2D tiled. */
if (templ->nr_samples > 1)
return RADEON_SURF_MODE_2D;
 
/* Transfer resources should be linear. */
if (templ->flags & SI_RESOURCE_FLAG_TRANSFER)
return RADEON_SURF_MODE_LINEAR_ALIGNED;
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/25] amd/sid_tables: add additional python3 compatibility imports

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This happened to bite me while doing some experiments.
---
 src/amd/common/sid_tables.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/common/sid_tables.py b/src/amd/common/sid_tables.py
index 7b5e626e3e1..f12bed4b209 100644
--- a/src/amd/common/sid_tables.py
+++ b/src/amd/common/sid_tables.py
@@ -1,11 +1,11 @@
-from __future__ import print_function
+from __future__ import print_function, division, unicode_literals
 
 CopyRight = '''
 /*
  * Copyright 2015 Advanced Micro Devices, Inc.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a
  * copy of this software and associated documentation files (the "Software"),
  * to deal in the Software without restriction, including without limitation
  * on the rights to use, copy, modify, merge, publish, distribute, sub
  * license, and/or sell copies of the Software, and to permit persons to whom
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/25] amd/common: add i1 special case to ac_build_{inclusive, exclusive}_scan

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Allow for a unified but efficient treatment of adding a bitmask over a
wave or an entire threadgroup.
---
 src/amd/common/ac_llvm_build.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 932f4bbdeef..eb840369d07 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -3391,36 +3391,57 @@ ac_build_scan(struct ac_llvm_context *ctx, nir_op op, 
LLVMValueRef src, LLVMValu
if (maxprefix <= 32)
return result;
tmp = ac_build_dpp(ctx, identity, result, dpp_row_bcast31, 0xc, 0xf, 
false);
result = ac_build_alu_op(ctx, result, tmp, op);
return result;
 }
 
 LLVMValueRef
 ac_build_inclusive_scan(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op 
op)
 {
-   ac_build_optimization_barrier(ctx, );
LLVMValueRef result;
+
+   if (LLVMTypeOf(src) == ctx->i1 && op == nir_op_iadd) {
+   LLVMBuilderRef builder = ctx->builder;
+   src = LLVMBuildZExt(builder, src, ctx->i32, "");
+   result = ac_build_ballot(ctx, src);
+   result = ac_build_mbcnt(ctx, result);
+   result = LLVMBuildAdd(builder, result, src, "");
+   return result;
+   }
+
+   ac_build_optimization_barrier(ctx, );
+
LLVMValueRef identity =
get_reduction_identity(ctx, op, 
ac_get_type_size(LLVMTypeOf(src)));
result = LLVMBuildBitCast(ctx->builder, ac_build_set_inactive(ctx, src, 
identity),
  LLVMTypeOf(identity), "");
result = ac_build_scan(ctx, op, result, identity, 64);
 
return ac_build_wwm(ctx, result);
 }
 
 LLVMValueRef
 ac_build_exclusive_scan(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op 
op)
 {
-   ac_build_optimization_barrier(ctx, );
LLVMValueRef result;
+
+   if (LLVMTypeOf(src) == ctx->i1 && op == nir_op_iadd) {
+   LLVMBuilderRef builder = ctx->builder;
+   src = LLVMBuildZExt(builder, src, ctx->i32, "");
+   result = ac_build_ballot(ctx, src);
+   result = ac_build_mbcnt(ctx, result);
+   return result;
+   }
+
+   ac_build_optimization_barrier(ctx, );
+
LLVMValueRef identity =
get_reduction_identity(ctx, op, 
ac_get_type_size(LLVMTypeOf(src)));
result = LLVMBuildBitCast(ctx->builder, ac_build_set_inactive(ctx, src, 
identity),
  LLVMTypeOf(identity), "");
result = ac_build_dpp(ctx, identity, result, dpp_wf_sr1, 0xf, 0xf, 
false);
result = ac_build_scan(ctx, op, result, identity, 64);
 
return ac_build_wwm(ctx, result);
 }
 
@@ -3585,20 +3606,22 @@ ac_build_wg_wavescan(struct ac_llvm_context *ctx, 
struct ac_wg_scan *ws)
  * "Top half" of a scan that reduces per-thread values across an entire
  * workgroup.
  *
  * All lanes must be active when this code runs.
  */
 void
 ac_build_wg_scan_top(struct ac_llvm_context *ctx, struct ac_wg_scan *ws)
 {
if (ws->enable_exclusive) {
ws->extra = ac_build_exclusive_scan(ctx, ws->src, ws->op);
+   if (LLVMTypeOf(ws->src) == ctx->i1 && ws->op == nir_op_iadd)
+   ws->src = LLVMBuildZExt(ctx->builder, ws->src, 
ctx->i32, "");
ws->src = ac_build_alu_op(ctx, ws->extra, ws->src, ws->op);
} else {
ws->src = ac_build_inclusive_scan(ctx, ws->src, ws->op);
}
 
bool enable_inclusive = ws->enable_inclusive;
bool enable_exclusive = ws->enable_exclusive;
ws->enable_inclusive = false;
ws->enable_exclusive = ws->enable_exclusive || enable_inclusive;
ac_build_wg_wavescan_top(ctx, ws);
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/25] amd/common, radeonsi: misc cleanups, refactorings, etc.

2018-12-06 Thread Nicolai Hähnle
this is a grab bag of random patches that I've been accumulating, without
any real unifying theme. The main highlights are:

- finally move the perfcounter code into the radeonsi directory
- unify some RW buffer handling
- new helpers for cross-wave scans and reductions

Please review!
Thanks,
Nicolai
--
 src/amd/common/ac_debug.c|   2 +
 src/amd/common/ac_llvm_build.c   | 247 +-
 src/amd/common/ac_llvm_build.h   |  37 +
 src/amd/common/ac_nir_to_llvm.c  |   2 +-
 src/amd/common/ac_surface.c  |   8 +-
 src/amd/common/gfx9d.h   |  12 +-
 src/amd/common/sid.h |  13 +-
 src/amd/common/sid_tables.py |   2 +-
 src/amd/vulkan/radv_image.c  |   8 +-
 src/gallium/drivers/r600/sb/sb_ir.h  |   2 +-
 .../drivers/radeon/r600_perfcounter.c| 639 ---
 .../drivers/radeonsi/Makefile.sources|   1 -
 src/gallium/drivers/radeonsi/meson.build |   1 -
 src/gallium/drivers/radeonsi/si_blit.c   |   2 +-
 src/gallium/drivers/radeonsi/si_build_pm4.h  |   8 +-
 src/gallium/drivers/radeonsi/si_cp_dma.c |   3 +-
 src/gallium/drivers/radeonsi/si_debug.c  |  13 +-
 .../drivers/radeonsi/si_descriptors.c| 112 +--
 .../drivers/radeonsi/si_perfcounter.c| 730 +++--
 src/gallium/drivers/radeonsi/si_pipe.c   |  40 +-
 src/gallium/drivers/radeonsi/si_pipe.h   |   6 +-
 src/gallium/drivers/radeonsi/si_query.c  | 254 +++---
 src/gallium/drivers/radeonsi/si_query.h  | 111 +--
 src/gallium/drivers/radeonsi/si_shader.c |  43 +-
 .../drivers/radeonsi/si_shader_tgsi_mem.c|   6 +-
 src/gallium/drivers/radeonsi/si_state.c  |  12 +-
 src/gallium/drivers/radeonsi/si_state.h  |  12 +-
 src/gallium/drivers/radeonsi/si_state_draw.c |  40 +-
 .../drivers/radeonsi/si_state_shaders.c  |   4 +-
 .../drivers/radeonsi/si_state_streamout.c|  61 +-
 src/gallium/drivers/radeonsi/si_texture.c|  11 +-
 .../winsys/amdgpu/drm/amdgpu_winsys.c|  36 +
 32 files changed, 1331 insertions(+), 1147 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/25] amd/common: cleanup DATA_FORMAT/NUM_FORMAT field names

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The definition wasn't actually changed in gfx9, so having the suffix
makes no sense.
---
 src/amd/common/ac_nir_to_llvm.c   |  2 +-
 src/amd/common/gfx9d.h| 12 ++--
 src/amd/common/sid.h  | 12 ++--
 src/amd/vulkan/radv_image.c   |  8 
 src/gallium/drivers/radeonsi/si_shader_tgsi_mem.c |  6 +++---
 src/gallium/drivers/radeonsi/si_state.c   | 10 +-
 6 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index fe65dfff8f3..cbb5be4b1a2 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -1238,21 +1238,21 @@ static LLVMValueRef lower_gather4_integer(struct 
ac_llvm_context *ctx,
if (stype == GLSL_TYPE_UINT)
/* Create a NUM FORMAT - 0x2 or 0x4 - USCALED or UINT */
tmp = LLVMBuildSelect(ctx->builder, compare_cube_wa, 
LLVMConstInt(ctx->i32, 0x800, false),
  LLVMConstInt(ctx->i32, 
0x1000, false), "");
else
/* Create a NUM FORMAT - 0x3 or 0x5 - SSCALED or SINT */
tmp = LLVMBuildSelect(ctx->builder, compare_cube_wa, 
LLVMConstInt(ctx->i32, 0xc00, false),
  LLVMConstInt(ctx->i32, 
0x1400, false), "");
 
/* replace the NUM FORMAT in the descriptor */
-   tmp2 = LLVMBuildAnd(ctx->builder, tmp2, LLVMConstInt(ctx->i32, 
C_008F14_NUM_FORMAT_GFX6, false), "");
+   tmp2 = LLVMBuildAnd(ctx->builder, tmp2, LLVMConstInt(ctx->i32, 
C_008F14_NUM_FORMAT, false), "");
tmp2 = LLVMBuildOr(ctx->builder, tmp2, tmp, "");
 
args->resource = LLVMBuildInsertElement(ctx->builder, 
args->resource, tmp2, ctx->i32_1, "");
 
/* don't modify the coordinates for this case */
for (unsigned c = 0; c < 2; ++c)
args->coords[c] = LLVMBuildSelect(
ctx->builder, compare_cube_wa,
orig_coords[c], args->coords[c], "");
}
diff --git a/src/amd/common/gfx9d.h b/src/amd/common/gfx9d.h
index 2e790c54699..5d3de5842a1 100644
--- a/src/amd/common/gfx9d.h
+++ b/src/amd/common/gfx9d.h
@@ -1262,23 +1262,23 @@
 #define   S_030F14_COUNT_HI(x)
(((unsigned)(x) & 0x7FFF) << 0)
 #define   G_030F14_COUNT_HI(x)(((x) >> 
0) & 0x7FFF)
 #define   C_030F14_COUNT_HI   
0x8000
 #define R_008F14_SQ_IMG_RSRC_WORD1  
0x008F14
 #define   S_008F14_BASE_ADDRESS_HI(x) 
(((unsigned)(x) & 0xFF) << 0)
 #define   G_008F14_BASE_ADDRESS_HI(x) (((x) >> 
0) & 0xFF)
 #define   C_008F14_BASE_ADDRESS_HI
0xFF00
 #define   S_008F14_MIN_LOD(x) 
(((unsigned)(x) & 0xFFF) << 8)
 #define   G_008F14_MIN_LOD(x) (((x) >> 
8) & 0xFFF)
 #define   C_008F14_MIN_LOD
0xFFF000FF
-#define   S_008F14_DATA_FORMAT_GFX9(x)
(((unsigned)(x) & 0x3F) << 20)
-#define   G_008F14_DATA_FORMAT_GFX9(x)(((x) >> 
20) & 0x3F)
-#define   C_008F14_DATA_FORMAT_GFX9   
0xFC0F
+#define   S_008F14_DATA_FORMAT(x) 
(((unsigned)(x) & 0x3F) << 20)
+#define   G_008F14_DATA_FORMAT(x) (((x) >> 
20) & 0x3F)
+#define   C_008F14_DATA_FORMAT
0xFC0F
 #define V_008F14_IMG_DATA_FORMAT_INVALID0x00
 #define V_008F14_IMG_DATA_FORMAT_8  0x01
 #define V_008F14_IMG_DATA_FORMAT_16 0x02
 #define V_008F14_IMG_DATA_FORMAT_8_80x03
 #define V_008F14_IMG_DATA_FORMAT_32 0x04
 #define V_008F14_IMG_DATA_FORMAT_16_16  0x05
 #define V_008F14_IMG_DATA_FORMAT_10_11_11   0x06
 #define V_008F14_IMG_DATA_FORMAT_11_11_10   0x07
 #define V_008F14_IMG_DATA_FORMAT_10_10_10_2 0x08
 #define V_008F14_IMG_DATA_FORMAT_2_10_10_10 0x09
@@ -1329,23 +1329,23 @@
 #define V_008F14_IMG_DATA_FORMA

[Mesa-dev] [PATCH 12/25] radeonsi/gfx9: use SET_UCONFIG_REG_INDEX packets when available

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/amd/common/ac_debug.c|  2 ++
 src/amd/common/sid.h |  1 +
 src/gallium/drivers/radeonsi/si_build_pm4.h  |  8 +++-
 src/gallium/drivers/radeonsi/si_state_draw.c | 12 
 4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/amd/common/ac_debug.c b/src/amd/common/ac_debug.c
index 3b15398a2a2..e5463b66616 100644
--- a/src/amd/common/ac_debug.c
+++ b/src/amd/common/ac_debug.c
@@ -226,39 +226,41 @@ static void ac_parse_packet3(FILE *f, uint32_t header, 
struct ac_ib_parser *ib,
for (i = 0; i < ARRAY_SIZE(packet3_table); i++)
if (packet3_table[i].op == op)
break;
 
if (i < ARRAY_SIZE(packet3_table)) {
const char *name = sid_strings + packet3_table[i].name_offset;
 
if (op == PKT3_SET_CONTEXT_REG ||
op == PKT3_SET_CONFIG_REG ||
op == PKT3_SET_UCONFIG_REG ||
+   op == PKT3_SET_UCONFIG_REG_INDEX ||
op == PKT3_SET_SH_REG)
fprintf(f, COLOR_CYAN "%s%s" COLOR_CYAN ":\n",
name, predicate);
else
fprintf(f, COLOR_GREEN "%s%s" COLOR_RESET ":\n",
name, predicate);
} else
fprintf(f, COLOR_RED "PKT3_UNKNOWN 0x%x%s" COLOR_RESET ":\n",
op, predicate);
 
/* Print the contents. */
switch (op) {
case PKT3_SET_CONTEXT_REG:
ac_parse_set_reg_packet(f, count, SI_CONTEXT_REG_OFFSET, ib);
break;
case PKT3_SET_CONFIG_REG:
ac_parse_set_reg_packet(f, count, SI_CONFIG_REG_OFFSET, ib);
break;
case PKT3_SET_UCONFIG_REG:
+   case PKT3_SET_UCONFIG_REG_INDEX:
ac_parse_set_reg_packet(f, count, CIK_UCONFIG_REG_OFFSET, ib);
break;
case PKT3_SET_SH_REG:
ac_parse_set_reg_packet(f, count, SI_SH_REG_OFFSET, ib);
break;
case PKT3_ACQUIRE_MEM:
ac_dump_reg(f, ib->chip_class, R_0301F0_CP_COHER_CNTL, 
ac_ib_get(ib), ~0);
ac_dump_reg(f, ib->chip_class, R_0301F4_CP_COHER_SIZE, 
ac_ib_get(ib), ~0);
ac_dump_reg(f, ib->chip_class, R_030230_CP_COHER_SIZE_HI, 
ac_ib_get(ib), ~0);
ac_dump_reg(f, ib->chip_class, R_0301F8_CP_COHER_BASE, 
ac_ib_get(ib), ~0);
diff --git a/src/amd/common/sid.h b/src/amd/common/sid.h
index a6d0bc2fe42..94709b486d0 100644
--- a/src/amd/common/sid.h
+++ b/src/amd/common/sid.h
@@ -204,20 +204,21 @@
 /* fix CP DMA before uncommenting: */
 /*#define PKT3_EVENT_WRITE_EOS   0x48*/ /* not on GFX9 */
 #define PKT3_RELEASE_MEM   0x49 /* GFX9+ [any ring] or 
GFX8 [compute ring only] */
 #define PKT3_ONE_REG_WRITE 0x57 /* not on CIK */
 #define PKT3_ACQUIRE_MEM   0x58 /* new for CIK */
 #define PKT3_SET_CONFIG_REG0x68
 #define PKT3_SET_CONTEXT_REG   0x69
 #define PKT3_SET_SH_REG0x76
 #define PKT3_SET_SH_REG_OFFSET 0x77
 #define PKT3_SET_UCONFIG_REG   0x79 /* new for CIK */
+#define PKT3_SET_UCONFIG_REG_INDEX 0x7A /* new for GFX9, CP ucode 
version >= 26 */
 #define PKT3_LOAD_CONST_RAM0x80
 #define PKT3_WRITE_CONST_RAM   0x81
 #define PKT3_DUMP_CONST_RAM0x83
 #define PKT3_INCREMENT_CE_COUNTER  0x84
 #define PKT3_INCREMENT_DE_COUNTER  0x85
 #define PKT3_WAIT_ON_CE_COUNTER0x86
 #define PKT3_LOAD_CONTEXT_REG  0x9F /* new for VI */
 
 #define PKT_TYPE_S(x)   (((unsigned)(x) & 0x3) << 30)
 #define PKT_TYPE_G(x)   (((x) >> 30) & 0x3)
diff --git a/src/gallium/drivers/radeonsi/si_build_pm4.h 
b/src/gallium/drivers/radeonsi/si_build_pm4.h
index 796adda0963..4e8890a5f97 100644
--- a/src/gallium/drivers/radeonsi/si_build_pm4.h
+++ b/src/gallium/drivers/radeonsi/si_build_pm4.h
@@ -93,26 +93,32 @@ static inline void radeon_set_uconfig_reg_seq(struct 
radeon_cmdbuf *cs, unsigned
radeon_emit(cs, (reg - CIK_UCONFIG_REG_OFFSET) >> 2);
 }
 
 static inline void radeon_set_uconfig_reg(struct radeon_cmdbuf *cs, unsigned 
reg, unsigned value)
 {
radeon_set_uconfig_reg_seq(cs, reg, 1);
radeon_emit(cs, value);
 }
 
 static inline void radeon_set_uconfig_reg_idx(struct radeon_cmdbuf *cs,
+ struct si_screen *screen,
  unsigned reg, unsigned idx,
  unsigned value)
 {
assert(reg >= CIK_UCONFIG_REG_OFFSET &a

[Mesa-dev] [PATCH 04/25] amd/common: whitespace fixes

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/amd/common/ac_llvm_build.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index abc18da13db..fba90205a2e 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -3374,40 +3374,38 @@ ac_build_scan(struct ac_llvm_context *ctx, nir_op op, 
LLVMValueRef src, LLVMValu
tmp = ac_build_dpp(ctx, identity, result, dpp_row_bcast31, 0xc, 0xf, 
false);
result = ac_build_alu_op(ctx, result, tmp, op);
return result;
 }
 
 LLVMValueRef
 ac_build_inclusive_scan(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op 
op)
 {
ac_build_optimization_barrier(ctx, );
LLVMValueRef result;
-   LLVMValueRef identity = get_reduction_identity(ctx, op,
-   
ac_get_type_size(LLVMTypeOf(src)));
-   result = LLVMBuildBitCast(ctx->builder,
-   
ac_build_set_inactive(ctx, src, identity),
-   
LLVMTypeOf(identity), "");
+   LLVMValueRef identity =
+   get_reduction_identity(ctx, op, 
ac_get_type_size(LLVMTypeOf(src)));
+   result = LLVMBuildBitCast(ctx->builder, ac_build_set_inactive(ctx, src, 
identity),
+ LLVMTypeOf(identity), "");
result = ac_build_scan(ctx, op, result, identity);
 
return ac_build_wwm(ctx, result);
 }
 
 LLVMValueRef
 ac_build_exclusive_scan(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op 
op)
 {
ac_build_optimization_barrier(ctx, );
LLVMValueRef result;
-   LLVMValueRef identity = get_reduction_identity(ctx, op,
-   
ac_get_type_size(LLVMTypeOf(src)));
-   result = LLVMBuildBitCast(ctx->builder,
-   
ac_build_set_inactive(ctx, src, identity),
-   
LLVMTypeOf(identity), "");
+   LLVMValueRef identity =
+   get_reduction_identity(ctx, op, 
ac_get_type_size(LLVMTypeOf(src)));
+   result = LLVMBuildBitCast(ctx->builder, ac_build_set_inactive(ctx, src, 
identity),
+ LLVMTypeOf(identity), "");
result = ac_build_dpp(ctx, identity, result, dpp_wf_sr1, 0xf, 0xf, 
false);
result = ac_build_scan(ctx, op, result, identity);
 
return ac_build_wwm(ctx, result);
 }
 
 LLVMValueRef
 ac_build_reduce(struct ac_llvm_context *ctx, LLVMValueRef src, nir_op op, 
unsigned cluster_size)
 {
if (cluster_size == 1) return src;
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/25] radeonsi: add si_init_draw_functions and make some functions static

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/radeonsi/si_pipe.c   |  4 +--
 src/gallium/drivers/radeonsi/si_state.c  |  2 --
 src/gallium/drivers/radeonsi/si_state.h  | 10 +--
 src/gallium/drivers/radeonsi/si_state_draw.c | 28 +---
 4 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 7943af4d86e..fd8ff5fa202 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -494,44 +494,44 @@ static struct pipe_context *si_create_context(struct 
pipe_screen *screen,
ws->buffer_map(sctx->border_color_buffer->buf,
   NULL, PIPE_TRANSFER_WRITE);
if (!sctx->border_color_map)
goto fail;
 
si_init_all_descriptors(sctx);
si_init_fence_functions(sctx);
si_init_state_functions(sctx);
si_init_shader_functions(sctx);
si_init_viewport_functions(sctx);
-   si_init_ia_multi_vgt_param_table(sctx);
 
if (sctx->chip_class >= CIK)
cik_init_sdma_functions(sctx);
else
si_init_dma_functions(sctx);
 
if (sscreen->debug_flags & DBG(FORCE_DMA))
sctx->b.resource_copy_region = sctx->dma_copy;
 
bool dst_stream_policy = SI_COMPUTE_DST_CACHE_POLICY != L2_LRU;
sctx->cs_clear_buffer = si_create_dma_compute_shader(>b,
 SI_COMPUTE_CLEAR_DW_PER_THREAD,
 dst_stream_policy, false);
sctx->cs_copy_buffer = si_create_dma_compute_shader(>b,
 SI_COMPUTE_COPY_DW_PER_THREAD,
 dst_stream_policy, true);
 
sctx->blitter = util_blitter_create(>b);
if (sctx->blitter == NULL)
goto fail;
-   sctx->blitter->draw_rectangle = si_draw_rectangle;
sctx->blitter->skip_viewport_restore = true;
 
+   si_init_draw_functions(sctx);
+
sctx->sample_mask = 0x;
 
if (sctx->chip_class >= GFX9) {
sctx->wait_mem_scratch = r600_resource(
pipe_buffer_create(screen, 0, PIPE_USAGE_DEFAULT, 4));
if (!sctx->wait_mem_scratch)
goto fail;
 
/* Initialize the memory. */
struct radeon_cmdbuf *cs = sctx->gfx_cs;
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 0960f379c4f..86d7b3a16f9 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -4818,22 +4818,20 @@ void si_init_state_functions(struct si_context *sctx)
sctx->b.delete_vertex_elements_state = si_delete_vertex_element;
sctx->b.set_vertex_buffers = si_set_vertex_buffers;
 
sctx->b.texture_barrier = si_texture_barrier;
sctx->b.memory_barrier = si_memory_barrier;
sctx->b.set_min_samples = si_set_min_samples;
sctx->b.set_tess_state = si_set_tess_state;
 
sctx->b.set_active_query_state = si_set_active_query_state;
 
-   sctx->b.draw_vbo = si_draw_vbo;
-
si_init_config(sctx);
 }
 
 void si_init_screen_state_functions(struct si_screen *sscreen)
 {
sscreen->b.is_format_supported = si_is_format_supported;
 }
 
 static void si_set_grbm_gfx_index(struct si_context *sctx,
  struct si_pm4_state *pm4,  unsigned value)
diff --git a/src/gallium/drivers/radeonsi/si_state.h 
b/src/gallium/drivers/radeonsi/si_state.h
index 83589e6918c..bb186f530f0 100644
--- a/src/gallium/drivers/radeonsi/si_state.h
+++ b/src/gallium/drivers/radeonsi/si_state.h
@@ -534,31 +534,23 @@ bool si_init_shader_cache(struct si_screen *sscreen);
 void si_destroy_shader_cache(struct si_screen *sscreen);
 void si_schedule_initial_compile(struct si_context *sctx, unsigned processor,
 struct util_queue_fence *ready_fence,
 struct si_compiler_ctx_state 
*compiler_ctx_state,
 void *job, util_queue_execute_func execute);
 void si_get_active_slot_masks(const struct tgsi_shader_info *info,
  uint32_t *const_and_shader_buffers,
  uint64_t *samplers_and_images);
 
 /* si_state_draw.c */
-void si_init_ia_multi_vgt_param_table(struct si_context *sctx);
 void si_emit_cache_flush(struct si_context *sctx);
-void si_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *dinfo);
-void si_draw_rectangle(struct blitter_context *blitter,
-  void *vertex_elements_cso,
-  blitter_get_vs_func get_vs,
-  int x1, int y1, int x2, int y2,
-  

[Mesa-dev] [PATCH 05/25] amd/common: add ac_build_ifcc

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/amd/common/ac_llvm_build.c | 7 +++
 src/amd/common/ac_llvm_build.h | 1 +
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index fba90205a2e..68c8bad9e83 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -2861,48 +2861,47 @@ void ac_build_endloop(struct ac_llvm_context *ctx, int 
label_id)
 
assert(current_loop->loop_entry_block);
 
emit_default_branch(ctx->builder, current_loop->loop_entry_block);
 
LLVMPositionBuilderAtEnd(ctx->builder, current_loop->next_block);
set_basicblock_name(current_loop->next_block, "endloop", label_id);
ctx->flow_depth--;
 }
 
-static void if_cond_emit(struct ac_llvm_context *ctx, LLVMValueRef cond,
-int label_id)
+void ac_build_ifcc(struct ac_llvm_context *ctx, LLVMValueRef cond, int 
label_id)
 {
struct ac_llvm_flow *flow = push_flow(ctx);
LLVMBasicBlockRef if_block;
 
if_block = append_basic_block(ctx, "IF");
flow->next_block = append_basic_block(ctx, "ELSE");
set_basicblock_name(if_block, "if", label_id);
LLVMBuildCondBr(ctx->builder, cond, if_block, flow->next_block);
LLVMPositionBuilderAtEnd(ctx->builder, if_block);
 }
 
 void ac_build_if(struct ac_llvm_context *ctx, LLVMValueRef value,
 int label_id)
 {
LLVMValueRef cond = LLVMBuildFCmp(ctx->builder, LLVMRealUNE,
  value, ctx->f32_0, "");
-   if_cond_emit(ctx, cond, label_id);
+   ac_build_ifcc(ctx, cond, label_id);
 }
 
 void ac_build_uif(struct ac_llvm_context *ctx, LLVMValueRef value,
  int label_id)
 {
LLVMValueRef cond = LLVMBuildICmp(ctx->builder, LLVMIntNE,
  ac_to_integer(ctx, value),
  ctx->i32_0, "");
-   if_cond_emit(ctx, cond, label_id);
+   ac_build_ifcc(ctx, cond, label_id);
 }
 
 LLVMValueRef ac_build_alloca_undef(struct ac_llvm_context *ac, LLVMTypeRef 
type,
 const char *name)
 {
LLVMBuilderRef builder = ac->builder;
LLVMBasicBlockRef current_block = LLVMGetInsertBlock(builder);
LLVMValueRef function = LLVMGetBasicBlockParent(current_block);
LLVMBasicBlockRef first_block = LLVMGetEntryBasicBlock(function);
LLVMValueRef first_instr = LLVMGetFirstInstruction(first_block);
diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h
index e90c8c21ad4..cf3e3cedf65 100644
--- a/src/amd/common/ac_llvm_build.h
+++ b/src/amd/common/ac_llvm_build.h
@@ -475,20 +475,21 @@ LLVMValueRef ac_find_lsb(struct ac_llvm_context *ctx,
 
 LLVMTypeRef ac_array_in_const_addr_space(LLVMTypeRef elem_type);
 LLVMTypeRef ac_array_in_const32_addr_space(LLVMTypeRef elem_type);
 
 void ac_build_bgnloop(struct ac_llvm_context *ctx, int lable_id);
 void ac_build_break(struct ac_llvm_context *ctx);
 void ac_build_continue(struct ac_llvm_context *ctx);
 void ac_build_else(struct ac_llvm_context *ctx, int lable_id);
 void ac_build_endif(struct ac_llvm_context *ctx, int lable_id);
 void ac_build_endloop(struct ac_llvm_context *ctx, int lable_id);
+void ac_build_ifcc(struct ac_llvm_context *ctx, LLVMValueRef cond, int 
label_id);
 void ac_build_if(struct ac_llvm_context *ctx, LLVMValueRef value,
 int lable_id);
 void ac_build_uif(struct ac_llvm_context *ctx, LLVMValueRef value,
  int lable_id);
 
 LLVMValueRef ac_build_alloca(struct ac_llvm_context *ac, LLVMTypeRef type,
 const char *name);
 LLVMValueRef ac_build_alloca_undef(struct ac_llvm_context *ac, LLVMTypeRef 
type,
   const char *name);
 
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/25] r600: remove redundant semicolon

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/drivers/r600/sb/sb_ir.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_ir.h 
b/src/gallium/drivers/r600/sb/sb_ir.h
index c7a94fcb930..ef0fbd4e68f 100644
--- a/src/gallium/drivers/r600/sb/sb_ir.h
+++ b/src/gallium/drivers/r600/sb/sb_ir.h
@@ -1005,21 +1005,21 @@ public:
virtual bool fold_dispatch(expr_handler *ex);
 
void jump(cf_node *c) { jump_target = c; jump_after_target = false; }
void jump_after(cf_node *c) { jump_target = c; jump_after_target = 
true; }
 
friend class shader;
 };
 
 class alu_node : public node {
 protected:
-   alu_node() : node(NT_OP, NST_ALU_INST) { memset(, 0, 
sizeof(bc_alu)); };
+   alu_node() : node(NT_OP, NST_ALU_INST) { memset(, 0, 
sizeof(bc_alu)); }
 public:
bc_alu bc;
 
virtual bool is_valid() { return subtype == NST_ALU_INST; }
virtual bool accept(vpass , bool enter);
virtual bool fold_dispatch(expr_handler *ex);
 
unsigned forced_bank_swizzle() {
return ((bc.op_ptr->flags & AF_INTERP) && (bc.slot_flags == 
AF_4V)) ?
VEC_210 : 0;
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] meson: link LLVM 'native' component when LLVM is available

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Linking against LLVM built with BUILD_SHARED_LIBS fails otherwise,
as the component is required for the draw module.
---
 meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 1aeef95f722..0177716c476 100644
--- a/meson.build
+++ b/meson.build
@@ -1155,21 +1155,21 @@ dep_libdrm = dependency(
 if dep_libdrm.found()
   pre_args += '-DHAVE_LIBDRM'
   if with_dri_platform == 'drm' and with_dri
 with_gallium_drisw_kms = true
   endif
 endif
 
 llvm_modules = ['bitwriter', 'engine', 'mcdisassembler', 'mcjit']
 llvm_optional_modules = []
 if with_amd_vk or with_gallium_radeonsi or with_gallium_r600
-  llvm_modules += ['amdgpu', 'bitreader', 'ipo']
+  llvm_modules += ['amdgpu', 'native', 'bitreader', 'ipo']
   if with_gallium_r600
 llvm_modules += 'asmparser'
   endif
 endif
 if with_gallium_opencl
   llvm_modules += [
 'all-targets', 'linker', 'coverage', 'instrumentation', 'ipo', 'irreader',
 'lto', 'option', 'objcarcopts', 'profiledata',
   ]
   llvm_optional_modules += ['coroutines']
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] ddebug: always flush when requested, even when hang detection is disabled

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

---
 src/gallium/auxiliary/driver_ddebug/dd_draw.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/auxiliary/driver_ddebug/dd_draw.c 
b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
index a930299ebb7..f5b94356119 100644
--- a/src/gallium/auxiliary/driver_ddebug/dd_draw.c
+++ b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
@@ -1104,20 +1104,22 @@ dd_before_draw(struct dd_context *dctx, struct 
dd_draw_record *record)
if (dscreen->timeout_ms > 0) {
   if (dscreen->flush_always && dctx->num_draw_calls >= 
dscreen->skip_count) {
  pipe->flush(pipe, >prev_bottom_of_pipe, 0);
  screen->fence_reference(screen, >top_of_pipe, 
record->prev_bottom_of_pipe);
   } else {
  pipe->flush(pipe, >prev_bottom_of_pipe,
  PIPE_FLUSH_DEFERRED | PIPE_FLUSH_BOTTOM_OF_PIPE);
  pipe->flush(pipe, >top_of_pipe,
  PIPE_FLUSH_DEFERRED | PIPE_FLUSH_TOP_OF_PIPE);
   }
+   } else if (dscreen->flush_always && dctx->num_draw_calls >= 
dscreen->skip_count) {
+  pipe->flush(pipe, NULL, 0);
}
 
mtx_lock(>mutex);
if (unlikely(dctx->num_records > 1)) {
   dctx->api_stalled = true;
   /* Since this is only a heuristic to prevent the API thread from getting
* too far ahead, we don't need a loop here. */
   cnd_wait(>cond, >mutex);
   dctx->api_stalled = false;
}
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] ddebug: simplify watchdog loop and fix crash in the no-timeout case

2018-12-06 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The following race condition could occur in the no-timeout case:

  API thread   Gallium threadWatchdog
  --   --
  dd_before_draw
  u_threaded_context draw
  dd_after_draw
add to dctx->records
signal watchdog
 dump & destroy record
   execute draw
   dd_after_draw_async
 use-after-free!

Alternatively, the same scenario would assert in a debug build when
destroying the record because record->driver_finished has not signaled.

Fix this and simplify the logic at the same time by
- handing the record pointers off to the watchdog thread *before* each
  draw call and
- waiting on the driver_finished fence in the watchdog thread
---
 .../auxiliary/driver_ddebug/dd_context.c  |   1 -
 src/gallium/auxiliary/driver_ddebug/dd_draw.c | 103 --
 src/gallium/auxiliary/driver_ddebug/dd_pipe.h |  21 ++--
 3 files changed, 52 insertions(+), 73 deletions(-)

diff --git a/src/gallium/auxiliary/driver_ddebug/dd_context.c 
b/src/gallium/auxiliary/driver_ddebug/dd_context.c
index a9ac6ef14ab..15efeccf879 100644
--- a/src/gallium/auxiliary/driver_ddebug/dd_context.c
+++ b/src/gallium/auxiliary/driver_ddebug/dd_context.c
@@ -589,21 +589,20 @@ static void
 dd_context_destroy(struct pipe_context *_pipe)
 {
struct dd_context *dctx = dd_context(_pipe);
struct pipe_context *pipe = dctx->pipe;
 
dd_thread_join(dctx);
mtx_destroy(>mutex);
cnd_destroy(>cond);
 
assert(list_empty(>records));
-   assert(!dctx->record_pending);
 
if (pipe->set_log_context) {
   pipe->set_log_context(pipe, NULL);
 
   if (dd_screen(dctx->base.screen)->dump_mode == DD_DUMP_ALL_CALLS) {
  FILE *f = dd_get_file_stream(dd_screen(dctx->base.screen), 0);
  if (f) {
 fprintf(f, "Remainder of driver log:\n\n");
  }
 
diff --git a/src/gallium/auxiliary/driver_ddebug/dd_draw.c 
b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
index cb5db8ab83b..a930299ebb7 100644
--- a/src/gallium/auxiliary/driver_ddebug/dd_draw.c
+++ b/src/gallium/auxiliary/driver_ddebug/dd_draw.c
@@ -981,80 +981,75 @@ dd_report_hang(struct dd_context *dctx)
  }
 
  fclose(f);
   }
 
   if (top_not_reached)
  stop_output = true;
   encountered_hang = true;
}
 
-   if (num_later || dctx->record_pending) {
-  fprintf(stderr, "... and %u%s additional draws.\n", num_later,
-  dctx->record_pending ? "+1 (pending)" : "");
-   }
+   if (num_later)
+  fprintf(stderr, "... and %u additional draws.\n", num_later);
 
fprintf(stderr, "\nDone.\n");
dd_kill_process();
 }
 
 int
 dd_thread_main(void *input)
 {
struct dd_context *dctx = (struct dd_context *)input;
struct dd_screen *dscreen = dd_screen(dctx->base.screen);
struct pipe_screen *screen = dscreen->screen;
 
mtx_lock(>mutex);
 
for (;;) {
   struct list_head records;
-  struct pipe_fence_handle *fence;
-  struct pipe_fence_handle *fence2 = NULL;
-
   list_replace(>records, );
   list_inithead(>records);
   dctx->num_records = 0;
 
   if (dctx->api_stalled)
  cnd_signal(>cond);
 
-  if (!list_empty()) {
- /* Wait for the youngest draw. This means hangs can take a bit longer
-  * to detect, but it's more efficient this way. */
- struct dd_draw_record *youngest =
-LIST_ENTRY(struct dd_draw_record, records.prev, list);
- fence = youngest->bottom_of_pipe;
-  } else if (dctx->record_pending) {
- /* Wait for pending fences, in case the driver ends up hanging 
internally. */
- fence = dctx->record_pending->prev_bottom_of_pipe;
- fence2 = dctx->record_pending->top_of_pipe;
-  } else if (dctx->kill_thread) {
- break;
-  } else {
+  if (list_empty()) {
+ if (dctx->kill_thread)
+break;
+
  cnd_wait(>cond, >mutex);
  continue;
   }
+
   mtx_unlock(>mutex);
 
-  /* Fences can be NULL legitimately when timeout detection is disabled. */
-  if ((fence &&
-   !screen->fence_finish(screen, NULL, fence,
- (uint64_t)dscreen->timeout_ms * 1000*1000)) ||
-  (fence2 &&
-   !screen->fence_finish(screen, NULL, fence2,
- (uint64_t)dscreen->timeout_ms * 1000*1000))) {
- mtx_lock(>mutex);
- list_splice(, >records);
- dd_report_hang(dctx);
- /* we won't actually get here */
- mtx_unlock(>mutex);
+  /* Wait for the youngest draw. This means hangs can take a bit l

Re: [Mesa-dev] Make Jordan an Owner of the mesa project?

2018-12-04 Thread Nicolai Hähnle

+1

On 04.12.18 08:26, Marek Olšák wrote:

Ack.

On Mon, Dec 3, 2018, 7:49 PM Jason Ekstrand  wrote:


Jordan has requested to be made an Owner of the mesa project.  As
much as I may be the guy who pushed to get everything set up, I
don't want to do this sort of thing on my own.  As such, I'm asking
for some ACKs.  If I can get 5 ACKs (at least 2 non-intel) from
other Owners and no NAKs, I'll click the button.

Personally, I think the answer here is absurdly obvious.  Jordan is
one of the most involved people in the community. :-D

As a side-note, does this seem like a reasonable process for adding
people as Owners?

--Jason




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] winsys/amdgpu: explicitly declare whether buffer_map is permanent or not

2018-11-22 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Introduce a new driver-private transfer flag RADEON_TRANSFER_TEMPORARY
that specifies whether the caller will use buffer_unmap or not. The
default behavior is set to permanent maps, because that's what drivers
do for Gallium buffer maps.

This should eliminate the need for hacks in libdrm. Assertions are added
to catch when the buffer_unmap calls don't match the (temporary)
buffer_map calls.

I did my best to update r600 for consistency (r300 needs no changes
because it never calls buffer_unmap), even though the radeon winsys
ignores the new flag.

As an added bonus, this should actually improve the performance of
the normal fast path, because we no longer call into libdrm at all
after the first map, and there's one less atomic in the winsys itself
(there are now no atomics left in the UNSYNCHRONIZED fast path).

Cc: Leo Liu 
v2:
- remove comment about visible VRAM (Marek)
- don't rely on amdgpu_bo_cpu_map doing an atomic write
---
 src/gallium/drivers/r600/evergreen_compute.c |  4 +-
 src/gallium/drivers/r600/r600_asm.c  |  4 +-
 src/gallium/drivers/r600/r600_shader.c   |  4 +-
 src/gallium/drivers/r600/radeon_uvd.c|  8 +-
 src/gallium/drivers/r600/radeon_vce.c|  4 +-
 src/gallium/drivers/r600/radeon_video.c  |  6 +-
 src/gallium/drivers/radeon/radeon_uvd.c  | 10 +-
 src/gallium/drivers/radeon/radeon_uvd_enc.c  |  6 +-
 src/gallium/drivers/radeon/radeon_vce.c  |  4 +-
 src/gallium/drivers/radeon/radeon_vcn_dec.c  | 18 ++--
 src/gallium/drivers/radeon/radeon_vcn_enc.c  |  4 +-
 src/gallium/drivers/radeon/radeon_video.c|  6 +-
 src/gallium/drivers/radeon/radeon_winsys.h   | 14 ++-
 src/gallium/drivers/radeonsi/si_shader.c |  3 +-
 src/gallium/include/pipe/p_defines.h |  8 +-
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c| 96 +---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.h|  3 +-
 17 files changed, 140 insertions(+), 62 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index a77f58242e3..9085be4e2f3 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -431,21 +431,23 @@ static void *evergreen_create_compute_state(struct 
pipe_context *ctx,
COMPUTE_DBG(rctx->screen, "*** evergreen_create_compute_state\n");
header = cso->prog;
code = cso->prog + sizeof(struct pipe_llvm_program_header);
radeon_shader_binary_init(>binary);
r600_elf_read(code, header->num_bytes, >binary);
r600_create_shader(>bc, >binary, _kill);
 
/* Upload code + ROdata */
shader->code_bo = r600_compute_buffer_alloc_vram(rctx->screen,
shader->bc.ndw * 4);
-   p = r600_buffer_map_sync_with_rings(>b, shader->code_bo, 
PIPE_TRANSFER_WRITE);
+   p = r600_buffer_map_sync_with_rings(
+   >b, shader->code_bo,
+   PIPE_TRANSFER_WRITE | RADEON_TRANSFER_TEMPORARY);
//TODO: use util_memcpy_cpu_to_le32 ?
memcpy(p, shader->bc.bytecode, shader->bc.ndw * 4);
rctx->b.ws->buffer_unmap(shader->code_bo->buf);
 #endif
 
return shader;
 }
 
 static void evergreen_delete_compute_state(struct pipe_context *ctx, void 
*state)
 {
diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 7029be24f4b..4ba77c535f9 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -2765,21 +2765,23 @@ void *r600_create_vertex_fetch_shader(struct 
pipe_context *ctx,
 
u_suballocator_alloc(rctx->allocator_fetch_shader, fs_size, 256,
 >offset,
 (struct pipe_resource**)>buffer);
if (!shader->buffer) {
r600_bytecode_clear();
FREE(shader);
return NULL;
}
 
-   bytecode = r600_buffer_map_sync_with_rings(>b, shader->buffer, 
PIPE_TRANSFER_WRITE | PIPE_TRANSFER_UNSYNCHRONIZED);
+   bytecode = r600_buffer_map_sync_with_rings
+   (>b, shader->buffer,
+   PIPE_TRANSFER_WRITE | PIPE_TRANSFER_UNSYNCHRONIZED | 
RADEON_TRANSFER_TEMPORARY);
bytecode += shader->offset / 4;
 
if (R600_BIG_ENDIAN) {
for (i = 0; i < fs_size / 4; ++i) {
bytecode[i] = util_cpu_to_le32(bc.bytecode[i]);
}
} else {
memcpy(bytecode, bc.bytecode, fs_size);
}
rctx->b.ws->buffer_unmap(shader->buffer->buf);
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 408939d1105..fc826470d69 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -134,21 +134,23 @@ static int store_shader(st

Re: [Mesa-dev] [PATCH 2/2] winsys/amdgpu: explicitly declare whether buffer_map is permanent or not

2018-11-22 Thread Nicolai Hähnle

On 21.11.18 21:27, Marek Olšák wrote:
On Wed, Nov 21, 2018 at 12:57 PM Nicolai Hähnle <mailto:nhaeh...@gmail.com>> wrote:


    From: Nicolai Hähnle mailto:nicolai.haeh...@amd.com>>

Introduce a new driver-private transfer flag RADEON_TRANSFER_TEMPORARY
that specifies whether the caller will use buffer_unmap or not. The
default behavior is set to permanent maps, because that's what drivers
do for Gallium buffer maps.

This should eliminate the need for hacks in libdrm. Assertions are added
to catch when the buffer_unmap calls don't match the (temporary)
buffer_map calls.

I did my best to update r600 and r300 as well for completeness (yes,
it's a no-op for r300 because it never calls buffer_unmap), even though
the radeon winsys ignores the new flag.


You didn't make any changes to r300.


Yeah, that's what I wrote :)



You can also drop all r600 changes, because the radeon winsys doesn't care.


I don't think it's a good idea, though. The interface of the two winsys 
is different, yes, but it's largely the same and it makes sense to keep 
it that way conceptually. Not that it matters much for the code itself.



[snip]

+enum radeon_transfer_flags {
+   /* Indicates that the caller will unmap the buffer.
+    *
+    * Not unmapping buffers is an important performance
optimization for
+    * OpenGL (avoids kernel overhead for frequently mapped
buffers). However,
+    * if you only map a buffer once and then use it indefinitely
from the GPU,
+    * it is much better to unmap it so that the kernel is free to
move it to
+    * non-visible VRAM.


The second half of the comment is misleading. The kernel will move 
buffers to invisible VRAM regardless of whether they're mapped, so CPU 
mappings have no effect on the placement. Buffers are only moved back to 
CPU-accessible memory on a CPU page fault. If a buffer is mapped and 
there no CPU access, it will stay in invisible VRAM forever. The general 
recommendation is to keep those buffers mapped for CPU access just like 
GTT buffers.


Yeah, I'll change that.



+    */
+   RADEON_TRANSFER_TEMPORARY = (PIPE_TRANSFER_DRV_PRV << 0),
+};
+
  #define RADEON_SPARSE_PAGE_SIZE (64 * 1024)

  enum ring_type {
      RING_GFX = 0,
      RING_COMPUTE,
      RING_DMA,
      RING_UVD,
      RING_VCE,
      RING_UVD_ENC,
      RING_VCN_DEC,
@@ -287,23 +299,26 @@ struct radeon_winsys {
      struct pb_buffer *(*buffer_create)(struct radeon_winsys *ws,
                                         uint64_t size,
                                         unsigned alignment,
                                         enum radeon_bo_domain domain,
                                         enum radeon_bo_flag flags);

      /**
       * Map the entire data store of a buffer object into the
client's address
       * space.
       *
+     * Callers are expected to unmap buffers again if and only if the
+     * RADEON_TRANSFER_TEMPORARY flag is set in \p usage.
+     *
       * \param buf       A winsys buffer object to map.
       * \param cs        A command stream to flush if the buffer is
referenced by it.
-     * \param usage     A bitmask of the PIPE_TRANSFER_* flags.
+     * \param usage     A bitmask of the PIPE_TRANSFER_* and
RADEON_TRANSFER_* flags.
       * \return          The pointer at the beginning of the buffer.
       */
      void *(*buffer_map)(struct pb_buffer *buf,
                          struct radeon_cmdbuf *cs,
                          enum pipe_transfer_usage usage);

      /**
       * Unmap a buffer object from the client's address space.
       *
       * \param buf       A winsys buffer object to unmap.
diff --git a/src/gallium/drivers/radeonsi/si_shader.c
b/src/gallium/drivers/radeonsi/si_shader.c
index 19522cc97b1..d455fb5db6a 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -5286,21 +5286,22 @@ int si_shader_binary_upload(struct si_screen
*sscreen, struct si_shader *shader)
                                                 0 :
SI_RESOURCE_FLAG_READ_ONLY,
                                                PIPE_USAGE_IMMUTABLE,
                                                align(bo_size,
SI_CPDMA_ALIGNMENT),
                                                256);
         if (!shader->bo)
                 return -ENOMEM;

         /* Upload. */
         ptr = sscreen->ws->buffer_map(shader->bo->buf, NULL,
                                         PIPE_TRANSFER_READ_WRITE |
-                                       PIPE_TRANSFER_UNSYNCHRONIZED);
+                               

[Mesa-dev] [PATCH 1/2] winsys/amdgpu: add amdgpu_winsys_bo::lock

2018-11-21 Thread Nicolai Hähnle
From: Nicolai Hähnle 

We'll use it in the upcoming mapping change. Sparse buffers have always
had one.
---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 19 +--
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.h |  4 ++--
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 10 +-
 3 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
index f49fb47b80e..9f0d4c12482 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
@@ -196,20 +196,21 @@ void amdgpu_bo_destroy(struct pb_buffer *_buf)
   ws->allocated_gtt -= align64(bo->base.size, ws->info.gart_page_size);
 
if (bo->u.real.map_count >= 1) {
   if (bo->initial_domain & RADEON_DOMAIN_VRAM)
  ws->mapped_vram -= bo->base.size;
   else if (bo->initial_domain & RADEON_DOMAIN_GTT)
  ws->mapped_gtt -= bo->base.size;
   ws->num_mapped_buffers--;
}
 
+   simple_mtx_destroy(>lock);
FREE(bo);
 }
 
 static void amdgpu_bo_destroy_or_cache(struct pb_buffer *_buf)
 {
struct amdgpu_winsys_bo *bo = amdgpu_winsys_bo(_buf);
 
assert(bo->bo); /* slab buffers have a separate vtbl */
 
if (bo->u.real.use_reusable_pool)
@@ -461,20 +462,21 @@ static struct amdgpu_winsys_bo *amdgpu_create_bo(struct 
amdgpu_winsys *ws,
AMDGPU_VM_PAGE_EXECUTABLE;
 
if (!(flags & RADEON_FLAG_READ_ONLY))
vm_flags |= AMDGPU_VM_PAGE_WRITEABLE;
 
r = amdgpu_bo_va_op_raw(ws->dev, buf_handle, 0, size, va, vm_flags,
   AMDGPU_VA_OP_MAP);
if (r)
   goto error_va_map;
 
+   simple_mtx_init(>lock, mtx_plain);
pipe_reference_init(>base.reference, 1);
bo->base.alignment = alignment;
bo->base.usage = 0;
bo->base.size = size;
bo->base.vtbl = _winsys_bo_vtbl;
bo->ws = ws;
bo->bo = buf_handle;
bo->va = va;
bo->u.real.va_handle = va_handle;
bo->initial_domain = initial_domain;
@@ -564,20 +566,21 @@ struct pb_slab *amdgpu_bo_slab_alloc(void *priv, unsigned 
heap,
if (!slab->entries)
   goto fail_buffer;
 
LIST_INITHEAD(>base.free);
 
base_id = __sync_fetch_and_add(>next_bo_unique_id, 
slab->base.num_entries);
 
for (unsigned i = 0; i < slab->base.num_entries; ++i) {
   struct amdgpu_winsys_bo *bo = >entries[i];
 
+  simple_mtx_init(>lock, mtx_plain);
   bo->base.alignment = entry_size;
   bo->base.usage = slab->buffer->base.usage;
   bo->base.size = entry_size;
   bo->base.vtbl = _winsys_bo_slab_vtbl;
   bo->ws = ws;
   bo->va = slab->buffer->va + i * entry_size;
   bo->initial_domain = domains;
   bo->unique_id = base_id + i;
   bo->u.slab.entry.slab = >base;
   bo->u.slab.entry.group_index = group_index;
@@ -592,22 +595,24 @@ fail_buffer:
amdgpu_winsys_bo_reference(>buffer, NULL);
 fail:
FREE(slab);
return NULL;
 }
 
 void amdgpu_bo_slab_free(void *priv, struct pb_slab *pslab)
 {
struct amdgpu_slab *slab = amdgpu_slab(pslab);
 
-   for (unsigned i = 0; i < slab->base.num_entries; ++i)
+   for (unsigned i = 0; i < slab->base.num_entries; ++i) {
   amdgpu_bo_remove_fences(>entries[i]);
+  simple_mtx_destroy(>entries[i].lock);
+   }
 
FREE(slab->entries);
amdgpu_winsys_bo_reference(>buffer, NULL);
FREE(slab);
 }
 
 #if DEBUG_SPARSE_COMMITS
 static void
 sparse_dump(struct amdgpu_winsys_bo *bo, const char *func)
 {
@@ -851,22 +856,22 @@ static void amdgpu_bo_sparse_destroy(struct pb_buffer 
*_buf)
}
 
while (!list_empty(>u.sparse.backing)) {
   struct amdgpu_sparse_backing *dummy = NULL;
   sparse_free_backing_buffer(bo,
  container_of(bo->u.sparse.backing.next,
   dummy, list));
}
 
amdgpu_va_range_free(bo->u.sparse.va_handle);
-   simple_mtx_destroy(>u.sparse.commit_lock);
FREE(bo->u.sparse.commitments);
+   simple_mtx_destroy(>lock);
FREE(bo);
 }
 
 static const struct pb_vtbl amdgpu_winsys_bo_sparse_vtbl = {
amdgpu_bo_sparse_destroy
/* other functions are never called */
 };
 
 static struct pb_buffer *
 amdgpu_bo_sparse_create(struct amdgpu_winsys *ws, uint64_t size,
@@ -882,37 +887,37 @@ amdgpu_bo_sparse_create(struct amdgpu_winsys *ws, 
uint64_t size,
 * that exceed this limit. This is not really a restriction: we don't have
 * that much virtual address space anyway.
 */
if (size > (uint64_t)INT32_MAX * RADEON_SPARSE_PAGE_SIZE)
   return NULL;
 
bo = CALLOC_STRUCT(amdgpu_winsys_bo);
if (!bo)
   return NULL;
 
+   simple_mtx_init(>lock, mtx_plain);
pipe_reference_init(>base.reference, 1);
bo->base.alignment = RADEON_SPARSE

[Mesa-dev] [PATCH 2/2] winsys/amdgpu: explicitly declare whether buffer_map is permanent or not

2018-11-21 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Introduce a new driver-private transfer flag RADEON_TRANSFER_TEMPORARY
that specifies whether the caller will use buffer_unmap or not. The
default behavior is set to permanent maps, because that's what drivers
do for Gallium buffer maps.

This should eliminate the need for hacks in libdrm. Assertions are added
to catch when the buffer_unmap calls don't match the (temporary)
buffer_map calls.

I did my best to update r600 and r300 as well for completeness (yes,
it's a no-op for r300 because it never calls buffer_unmap), even though
the radeon winsys ignores the new flag.

As an added bonus, this should actually improve the performance of
the normal fast path, because we no longer call into libdrm at all
after the first map, and there's one less atomic in the winsys itself
(there are now no atomics left in the UNSYNCHRONIZED fast path).

Cc: Leo Liu 
--
Leo, it'd be nice if you could confirm that all video buffer mappings
are temporary in this sense.
---
 src/gallium/drivers/r600/evergreen_compute.c |  4 +-
 src/gallium/drivers/r600/r600_asm.c  |  4 +-
 src/gallium/drivers/r600/r600_shader.c   |  4 +-
 src/gallium/drivers/r600/radeon_uvd.c|  8 +-
 src/gallium/drivers/r600/radeon_vce.c|  4 +-
 src/gallium/drivers/r600/radeon_video.c  |  6 +-
 src/gallium/drivers/radeon/radeon_uvd.c  | 10 ++-
 src/gallium/drivers/radeon/radeon_uvd_enc.c  |  6 +-
 src/gallium/drivers/radeon/radeon_vce.c  |  4 +-
 src/gallium/drivers/radeon/radeon_vcn_dec.c  | 18 ++--
 src/gallium/drivers/radeon/radeon_vcn_enc.c  |  4 +-
 src/gallium/drivers/radeon/radeon_video.c|  6 +-
 src/gallium/drivers/radeon/radeon_winsys.h   | 17 +++-
 src/gallium/drivers/radeonsi/si_shader.c |  3 +-
 src/gallium/include/pipe/p_defines.h |  8 +-
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c| 95 +---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.h|  3 +-
 17 files changed, 142 insertions(+), 62 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c 
b/src/gallium/drivers/r600/evergreen_compute.c
index a77f58242e3..9085be4e2f3 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -431,21 +431,23 @@ static void *evergreen_create_compute_state(struct 
pipe_context *ctx,
COMPUTE_DBG(rctx->screen, "*** evergreen_create_compute_state\n");
header = cso->prog;
code = cso->prog + sizeof(struct pipe_llvm_program_header);
radeon_shader_binary_init(>binary);
r600_elf_read(code, header->num_bytes, >binary);
r600_create_shader(>bc, >binary, _kill);
 
/* Upload code + ROdata */
shader->code_bo = r600_compute_buffer_alloc_vram(rctx->screen,
shader->bc.ndw * 4);
-   p = r600_buffer_map_sync_with_rings(>b, shader->code_bo, 
PIPE_TRANSFER_WRITE);
+   p = r600_buffer_map_sync_with_rings(
+   >b, shader->code_bo,
+   PIPE_TRANSFER_WRITE | RADEON_TRANSFER_TEMPORARY);
//TODO: use util_memcpy_cpu_to_le32 ?
memcpy(p, shader->bc.bytecode, shader->bc.ndw * 4);
rctx->b.ws->buffer_unmap(shader->code_bo->buf);
 #endif
 
return shader;
 }
 
 static void evergreen_delete_compute_state(struct pipe_context *ctx, void 
*state)
 {
diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 7029be24f4b..4ba77c535f9 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -2765,21 +2765,23 @@ void *r600_create_vertex_fetch_shader(struct 
pipe_context *ctx,
 
u_suballocator_alloc(rctx->allocator_fetch_shader, fs_size, 256,
 >offset,
 (struct pipe_resource**)>buffer);
if (!shader->buffer) {
r600_bytecode_clear();
FREE(shader);
return NULL;
}
 
-   bytecode = r600_buffer_map_sync_with_rings(>b, shader->buffer, 
PIPE_TRANSFER_WRITE | PIPE_TRANSFER_UNSYNCHRONIZED);
+   bytecode = r600_buffer_map_sync_with_rings
+   (>b, shader->buffer,
+   PIPE_TRANSFER_WRITE | PIPE_TRANSFER_UNSYNCHRONIZED | 
RADEON_TRANSFER_TEMPORARY);
bytecode += shader->offset / 4;
 
if (R600_BIG_ENDIAN) {
for (i = 0; i < fs_size / 4; ++i) {
bytecode[i] = util_cpu_to_le32(bc.bytecode[i]);
}
} else {
memcpy(bytecode, bc.bytecode, fs_size);
}
rctx->b.ws->buffer_unmap(shader->buffer->buf);
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 408939d1105..fc826470d69 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -134,21 +134,23 @@ 

[Mesa-dev] [PATCH 1.5/2] ac/surface/gfx9: let addrlib choose the preferred swizzle kind

2018-11-21 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Our choices here are simply redundant as long as sin.flags is set
correctly.
--
This is the change I was talking about.
---
 src/amd/common/ac_surface.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/src/amd/common/ac_surface.c b/src/amd/common/ac_surface.c
index edd710a968c..ad2cb585c9d 100644
--- a/src/amd/common/ac_surface.c
+++ b/src/amd/common/ac_surface.c
@@ -1057,30 +1057,20 @@ gfx9_get_preferred_swizzle_mode(ADDR_HANDLE addrlib,
sin.forbiddenBlock.var = 1; /* don't allow the variable-sized swizzle 
modes */
sin.forbiddenBlock.linear = 1; /* don't allow linear swizzle modes */
sin.bpp = in->bpp;
sin.width = in->width;
sin.height = in->height;
sin.numSlices = in->numSlices;
sin.numMipLevels = in->numMipLevels;
sin.numSamples = in->numSamples;
sin.numFrags = in->numFrags;
 
-   if (flags & RADEON_SURF_SCANOUT) {
-   sin.preferredSwSet.sw_D = 1;
-   /* Raven only allows S for displayable surfaces with < 64 bpp, 
so
-* allow it as fallback */
-   sin.preferredSwSet.sw_S = 1;
-   } else if (in->flags.depth || in->flags.stencil || is_fmask)
-   sin.preferredSwSet.sw_Z = 1;
-   else
-   sin.preferredSwSet.sw_S = 1;
-
if (is_fmask) {
sin.flags.display = 0;
sin.flags.color = 0;
sin.flags.fmask = 1;
}
 
ret = Addr2GetPreferredSurfaceSetting(addrlib, , );
if (ret != ADDR_OK)
return ret;
 
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   5   6   7   8   9   10   >