Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-10 Thread Jason Ekstrand
On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro  wrote:
> vec4 port of fs_cmod_propagation.
>
> Shader-db results:
> total instructions in shared programs: 6241226 -> 6224469 (-0.27%)
> instructions in affected programs: 498213 -> 481456 (-3.36%)
> helped:3082
> HURT:  0

Would you mind cherry-picking this back onto
4e0a8e0a50c9ac91cb7a70b92b8d9c6fcc02b7aa (the commit right before we
made NIR non-optional) and get some GLSL IR vs. NIR vec4-only numbers
with this patch?  I'd like to know what it does to that delta as well.

Thanks!
--Jason

> ---
>
> The final outcome is really similar to fs_brw_cmod_propagation. In
> fact the only difference is that on fs we have this:
>  if (scan_inst->overwrites_reg(inst->src[0])) {
> if (scan_inst->is_partial_write() ||
> scan_inst->dst.reg_offset != inst->src[0].reg_offset)
>break;
>
> And on vec4 (this commit) we have this:
>  if (inst->src[0].in_range(scan_inst->dst,
>scan_inst->regs_written)) {
>
> if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) 
> ||
> scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
> (scan_inst->dst.writemask != WRITEMASK_X && 
> scan_inst->dst.writemask != WRITEMASK_XYZW))
>break;
>
> if (scan_inst->dst.writemask == WRITEMASK_XYZW &&
> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) {
>break;
> }
>
> So at some point I thought about refactoring it and having one common,
> like with opt_predicated_break, but that one was possible with just
> backend_instructions, while here we would need to deal with
> vec4_instructions and fs_inst, that could be somewhat messy, so
> I'm leaving this as it is.
>
>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>  src/mesa/drivers/dri/i965/brw_vec4.cpp |   1 +
>  src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
>  .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 
> +
>  4 files changed, 166 insertions(+)
>  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
> b/src/mesa/drivers/dri/i965/Makefile.sources
> index 81ef628..c1836d6 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -56,6 +56,7 @@ i965_compiler_FILES = \
> brw_util.c \
> brw_util.h \
> brw_vec4_builder.h \
> +   brw_vec4_cmod_propagation.cpp \
> brw_vec4_copy_propagation.cpp \
> brw_vec4.cpp \
> brw_vec4_cse.cpp \
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index e966b96..55e381b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1867,6 +1867,7 @@ vec4_visitor::run()
>OPT(dead_code_eliminate);
>OPT(dead_control_flow_eliminate, this);
>OPT(opt_copy_propagation);
> +  OPT(opt_cmod_propagation);
>OPT(opt_cse);
>OPT(opt_algebraic);
>OPT(opt_register_coalesce);
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
> b/src/mesa/drivers/dri/i965/brw_vec4.h
> index 5e3500c..3c1711d 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> @@ -149,6 +149,7 @@ public:
> int var_range_start(unsigned v, unsigned n) const;
> int var_range_end(unsigned v, unsigned n) const;
> bool virtual_grf_interferes(int a, int b);
> +   bool opt_cmod_propagation();
> bool opt_copy_propagation(bool do_constant_prop = true);
> bool opt_cse_local(bblock_t *block);
> bool opt_cse();
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
> new file mode 100644
> index 000..7e39d2b
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
> @@ -0,0 +1,163 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE 

[Mesa-dev] [PATCH v2 18/17 (was 10/17)] i965/vs: Move use_legacy_snorm_formula into the shader key

2015-10-10 Thread Jason Ekstrand
This is really an input into the shader compiler so it kind of makes sense
in the key.  Also, given where it's placed into the key, it doesn't
actually make it any bigger.

v2 (Jason Ekstrand):
   - Rebase on top of the compiler clean-ups so the affects of this patch
 can better be studied without being in the middle of a series.
---
 src/mesa/drivers/dri/i965/brw_compiler.h  | 3 ++-
 src/mesa/drivers/dri/i965/brw_vec4.cpp| 4 +---
 src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 9 -
 src/mesa/drivers/dri/i965/brw_vs.c| 3 ++-
 src/mesa/drivers/dri/i965/brw_vs.h| 5 +
 5 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 4bc1caa..153e381 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -161,6 +161,8 @@ struct brw_vs_prog_key {
 
bool clamp_vertex_color:1;
 
+   bool use_legacy_snorm_formula:1;
+
/**
 * How many user clipping planes are being uploaded to the vertex shader as
 * push constants.
@@ -585,7 +587,6 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
*log_data,
struct brw_vs_prog_data *prog_data,
const struct nir_shader *shader,
gl_clip_plane *clip_planes,
-   bool use_legacy_snorm_formula,
int shader_time_index,
unsigned *final_assembly_size,
char **error_str);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 8636323..5336590 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1943,7 +1943,6 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
*log_data,
struct brw_vs_prog_data *prog_data,
const nir_shader *shader,
gl_clip_plane *clip_planes,
-   bool use_legacy_snorm_formula,
int shader_time_index,
unsigned *final_assembly_size,
char **error_str)
@@ -1982,8 +1981,7 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
*log_data,
   prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
 
   vec4_vs_visitor v(compiler, log_data, key, prog_data,
-shader, clip_planes, mem_ctx,
-shader_time_index, use_legacy_snorm_formula);
+shader, clip_planes, mem_ctx, shader_time_index);
   if (!v.run()) {
  if (error_str)
 *error_str = ralloc_strdup(mem_ctx, v.fail_msg);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
index 485a80e..9cf04cd 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
@@ -77,7 +77,8 @@ vec4_vs_visitor::emit_prolog()
 /* ES 3.0 has different rules for converting signed normalized
  * fixed-point numbers than desktop GL.
  */
-if ((wa_flags & BRW_ATTRIB_WA_SIGN) && !use_legacy_snorm_formula) {
+if ((wa_flags & BRW_ATTRIB_WA_SIGN) &&
+!key->use_legacy_snorm_formula) {
/* According to equation 2.2 of the ES 3.0 specification,
 * signed normalization conversion is done by:
 *
@@ -304,14 +305,12 @@ vec4_vs_visitor::vec4_vs_visitor(const struct 
brw_compiler *compiler,
  const nir_shader *shader,
  gl_clip_plane *clip_planes,
  void *mem_ctx,
- int shader_time_index,
- bool use_legacy_snorm_formula)
+ int shader_time_index)
: vec4_visitor(compiler, log_data, >tex, _prog_data->base, shader,
   mem_ctx, false /* no_spills */, shader_time_index),
  key(key),
  vs_prog_data(vs_prog_data),
- clip_planes(clip_planes),
- use_legacy_snorm_formula(use_legacy_snorm_formula)
+ clip_planes(clip_planes)
 {
 }
 
diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index 9c9b83b..3b3eb8b 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -184,7 +184,6 @@ brw_codegen_vs_prog(struct brw_context *brw,
program = brw_compile_vs(brw->intelScreen->compiler, brw, mem_ctx, key,
 _data, vp->program.Base.nir,
 brw_select_clip_planes(>ctx),
-!_mesa_is_gles3(>ctx),
 st_index, _size, _str);
if (program == NULL) {
   if (prog) {
@@ -341,6 +340,8 @@ brw_vs_populate_key(struct brw_context *brw,
   key->clamp_vertex_color = ctx->Light._ClampVertexColor;
}
 
+   

Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.

2015-10-10 Thread Axel Davy

On 10/10/2015 17:49, Marek Olšák wrote:

On Sat, Oct 10, 2015 at 4:15 PM, Bas Nieuwenhuizen
 wrote:

Hi Marek,

The revised series is mostly done. I wanted to do more testing and to
try to make sure that the added cache flushes I am doing now (a
CACHE_FLUSH event before a fast clear and on switching framebuffers)
are the minimal needed.


Also, it looks like we don't need DCC decompression at all, right? It
might be better to get rid of it and only use the 3D engine to access
DCC-encoded surfaces.

I still use it for flush_resource. I could make this redundant by
sharing the DCC buffer by appending the DCC buffer to the texture
resource similarly to how the CMASK is appended to the resource of a
MSAA buffer. This has the secondary benefit of not needing to
reference as many resources for command submission.

IIRC, flush_resource is only used for shared (scanout) surfaces where
DCC is always disabled.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

I think it's not a very good idea to rely on that.

It may be true for now, but may change in the future:
For example, perhaps some day wayland will tell egl
the app is not fullscreen and that a non-scanoutable buffer
can be used.

Axel Davy
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.

2015-10-10 Thread Marek Olšák
On Sat, Oct 10, 2015 at 6:12 PM, Axel Davy  wrote:
> On 10/10/2015 17:49, Marek Olšák wrote:
>>
>> On Sat, Oct 10, 2015 at 4:15 PM, Bas Nieuwenhuizen
>>  wrote:
>>>
>>> Hi Marek,
>>>
>>> The revised series is mostly done. I wanted to do more testing and to
>>> try to make sure that the added cache flushes I am doing now (a
>>> CACHE_FLUSH event before a fast clear and on switching framebuffers)
>>> are the minimal needed.
>>>
 Also, it looks like we don't need DCC decompression at all, right? It
 might be better to get rid of it and only use the 3D engine to access
 DCC-encoded surfaces.
>>>
>>> I still use it for flush_resource. I could make this redundant by
>>> sharing the DCC buffer by appending the DCC buffer to the texture
>>> resource similarly to how the CMASK is appended to the resource of a
>>> MSAA buffer. This has the secondary benefit of not needing to
>>> reference as many resources for command submission.
>>
>> IIRC, flush_resource is only used for shared (scanout) surfaces where
>> DCC is always disabled.
>>
>> Marek
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> I think it's not a very good idea to rely on that.
>
> It may be true for now, but may change in the future:
> For example, perhaps some day wayland will tell egl
> the app is not fullscreen and that a non-scanoutable buffer
> can be used.

If that happens, we'll implement DCC sharing between processes.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] glsl: move half<->float convertion to util

2015-10-10 Thread Rob Clark
On Sat, Oct 10, 2015 at 3:09 PM, Matt Turner  wrote:
> On Sat, Oct 10, 2015 at 11:47 AM, Rob Clark  wrote:
>> From: Rob Clark 
>>
>> Needed in NIR too, so move out of mesa/main/imports.c
>>
>> Signed-off-by: Rob Clark 
>> ---
>>  src/glsl/Makefile.am  |   1 +
>>  src/mesa/main/imports.c   | 148 --
>>  src/mesa/main/imports.h   |  38 --
>>  src/util/Makefile.sources |   2 +
>>  src/util/convert.c| 179 
>> ++
>>  src/util/convert.h|  43 +++
>>  6 files changed, 259 insertions(+), 152 deletions(-)
>>  create mode 100644 src/util/convert.c
>>  create mode 100644 src/util/convert.h
>>
>> diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am
>> index 3265391..347919b 100644
>> --- a/src/glsl/Makefile.am
>> +++ b/src/glsl/Makefile.am
>> @@ -160,6 +160,7 @@ glsl_compiler_SOURCES = \
>>  glsl_compiler_LDADD =  \
>> libglsl.la  \
>> $(top_builddir)/src/libglsl_util.la \
>> +   $(top_builddir)/src/util/libmesautil.la \
>> $(PTHREAD_LIBS)
>>
>>  glsl_test_SOURCES = \
>> diff --git a/src/mesa/main/imports.c b/src/mesa/main/imports.c
>> index 350e675..230ebbc 100644
>> --- a/src/mesa/main/imports.c
>> +++ b/src/mesa/main/imports.c
>> @@ -307,154 +307,6 @@ _mesa_bitcount_64(uint64_t n)
>>  }
>>  #endif
>>
>> -
>> -/**
>> - * Convert a 4-byte float to a 2-byte half float.
>> - *
>> - * Not all float32 values can be represented exactly as a float16 value. We
>> - * round such intermediate float32 values to the nearest float16. When the
>> - * float32 lies exactly between to float16 values, we round to the one with
>> - * an even mantissa.
>> - *
>> - * This rounding behavior has several benefits:
>> - *   - It has no sign bias.
>> - *
>> - *   - It reproduces the behavior of real hardware: opcode F32TO16 in 
>> Intel's
>> - * GPU ISA.
>> - *
>> - *   - By reproducing the behavior of the GPU (at least on Intel hardware),
>> - * compile-time evaluation of constant packHalf2x16 GLSL expressions 
>> will
>> - * result in the same value as if the expression were executed on the 
>> GPU.
>> - */
>> -GLhalfARB
>> -_mesa_float_to_half(float val)
>> -{
>> -   const fi_type fi = {val};
>> -   const int flt_m = fi.i & 0x7f;
>> -   const int flt_e = (fi.i >> 23) & 0xff;
>> -   const int flt_s = (fi.i >> 31) & 0x1;
>> -   int s, e, m = 0;
>> -   GLhalfARB result;
>> -
>> -   /* sign bit */
>> -   s = flt_s;
>> -
>> -   /* handle special cases */
>> -   if ((flt_e == 0) && (flt_m == 0)) {
>> -  /* zero */
>> -  /* m = 0; - already set */
>> -  e = 0;
>> -   }
>> -   else if ((flt_e == 0) && (flt_m != 0)) {
>> -  /* denorm -- denorm float maps to 0 half */
>> -  /* m = 0; - already set */
>> -  e = 0;
>> -   }
>> -   else if ((flt_e == 0xff) && (flt_m == 0)) {
>> -  /* infinity */
>> -  /* m = 0; - already set */
>> -  e = 31;
>> -   }
>> -   else if ((flt_e == 0xff) && (flt_m != 0)) {
>> -  /* NaN */
>> -  m = 1;
>> -  e = 31;
>> -   }
>> -   else {
>> -  /* regular number */
>> -  const int new_exp = flt_e - 127;
>> -  if (new_exp < -14) {
>> - /* The float32 lies in the range (0.0, min_normal16) and is rounded
>> -  * to a nearby float16 value. The result will be either zero, 
>> subnormal,
>> -  * or normal.
>> -  */
>> - e = 0;
>> - m = _mesa_lroundevenf((1 << 24) * fabsf(fi.f));
>> -  }
>> -  else if (new_exp > 15) {
>> - /* map this value to infinity */
>> - /* m = 0; - already set */
>> - e = 31;
>> -  }
>> -  else {
>> - /* The float32 lies in the range
>> -  *   [min_normal16, max_normal16 + max_step16)
>> -  * and is rounded to a nearby float16 value. The result will be
>> -  * either normal or infinite.
>> -  */
>> - e = new_exp + 15;
>> - m = _mesa_lroundevenf(flt_m / (float) (1 << 13));
>> -  }
>> -   }
>> -
>> -   assert(0 <= m && m <= 1024);
>> -   if (m == 1024) {
>> -  /* The float32 was rounded upwards into the range of the next 
>> exponent,
>> -   * so bump the exponent. This correctly handles the case where f32
>> -   * should be rounded up to float16 infinity.
>> -   */
>> -  ++e;
>> -  m = 0;
>> -   }
>> -
>> -   result = (s << 15) | (e << 10) | m;
>> -   return result;
>> -}
>> -
>> -
>> -/**
>> - * Convert a 2-byte half float to a 4-byte float.
>> - * Based on code from:
>> - * http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008786.html
>> - */
>> -float
>> -_mesa_half_to_float(GLhalfARB val)
>> -{
>> -   /* XXX could also use a 64K-entry lookup table */
>> -   const int m = val & 0x3ff;
>> -   const int e = (val >> 10) & 0x1f;
>> -   

Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Ilia Mirkin
On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset
 wrote:
> This patch looks fine except that it should be a bit more normalized. I
> mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for
> PUSH_SPACE calls, sometimes you add it sometimes not.

Meh. We need to get our error checking situation straight, but this
isn't the patch to do it in.

>
> Did you run a full piglit test this time ? :)

Nope, but I ran a full piglit before this patch. Almost took down my
box. Probably won't be running it again for this patch.

>
> See my comment below.
>
>
> On 10/10/2015 11:09 AM, Ilia Mirkin wrote:
>>
>> We still have to push everything out, might as well kick earlier and
>> flip pushbufs when we know we'll need it. This resolves some issues with
>> the new policy of making sure that we always leave a bit of room at the
>> end for fences.
>>
>> Signed-off-by: Ilia Mirkin 
>> Cc: mesa-sta...@lists.freedesktop.org
>> ---
>>   src/gallium/drivers/nouveau/nv50/nv50_shader_state.c |  9 ++---
>>   src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16
>> +++-
>>   src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20
>> +---
>>   3 files changed, 10 insertions(+), 35 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
>> b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
>> index fdde11f..941555f 100644
>> --- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
>> +++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
>> @@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50)
>>  PUSH_DATA (push, (b << 12) | (i << 8) | p | 1);
>>   }
>>   while (words) {
>> -   unsigned nr;
>> -
>> -   if (!PUSH_SPACE(push, 16))
>> -  break;
>> -   nr = PUSH_AVAIL(push);
>> -   assert(nr >= 16);
>> -   nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN);
>> +   unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
>>   +   PUSH_SPACE(push, nr + 3);
>
>
> This PUSH_SPACE call doesn't seem to be needed for me because
> NV50_PUSH_EXPLICIT_SPACE_CHECKING is not set and the following BEGIN_XXX
> calls will allocate space.

I want to ensure that both of the below commands are in the same
batch. Not sure if it's necessary, but... don't want to find out. They
were in the same batch before. And this batch stuff is what was
causing the M2MF errors I was seeing earlier.

>
>
>>  BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
>>  PUSH_DATA (push, (start << 8) | b);
>>  BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr);
>> diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
>> b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
>> index be51407..9a3fd1e 100644
>> --- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
>> +++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
>> @@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv,
>>  PUSH_DATA (push, 0);
>>while (count) {
>> -  unsigned nr;
>> -
>> -  if (!PUSH_SPACE(push, 16))
>> - break;
>> -  nr = PUSH_AVAIL(push);
>> -  assert(nr >= 16);
>> -  nr = MIN2(count, nr - 1);
>> -  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
>> +  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
>>   BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr);
>> PUSH_DATAp(push, src, nr);
>> @@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv,
>>  nouveau_pushbuf_validate(push);
>>while (words) {
>> -  unsigned nr;
>> -
>> -  nr = PUSH_AVAIL(push);
>> -  nr = MIN2(nr - 7, words);
>> -  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
>> +  unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
>>   +  PUSH_SPACE(push, nr + 7);
>> BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3);
>> PUSH_DATAh(push, bo->offset + base);
>> PUSH_DATA (push, bo->offset + base);
>> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
>> b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
>> index aaec60a..d459dd6 100644
>> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
>> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
>> @@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv,
>>  nouveau_pushbuf_validate(push);
>>while (count) {
>> -  unsigned nr;
>> +  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
>>   -  if (!PUSH_SPACE(push, 16))
>> +  if (!PUSH_SPACE(push, nr + 9))
>>break;
>> -  nr = PUSH_AVAIL(push);
>> -  assert(nr >= 16);
>> -  nr = MIN2(count, nr - 9);
>> -  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
>>   BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2);
>> PUSH_DATAh(push, dst->offset + offset);
>> @@ -234,14 +230,10 @@ 

Re: [Mesa-dev] [PATCH 1/5] i965/vec4: nir_emit_if doesn't need to predicate based on all the channels

2015-10-10 Thread Jason Ekstrand
Looking at the docs a bit, it looks like we should never have been
using predicate_normal for if's in the first place

Reviewed-by: Jason Ekstrand 

On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro  wrote:
> ---
>
> I already talked about this with Jason Ekstrand and Matt Turner
> privately, but just in case somebody else jump to the review:
>
> When using BRW_PREDICATE_NORMAL, the if will use all the channels of
> the register flag.  But nir_if only reads from one channel, so that
> is not needed. Another hint showing that this is safe: the MOV that
> put the condition on f0 is calling get_nir_src with just one
> component. That will return always a source with swizzle
> BRW_SWIZZLE_, so that component is the only to be used.
>
> This commit is not needed/solving anything per-se, but it is needed in
> order to be able to implement vec4_cmod_propagation with a good
> overall outcome.
>
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index 41bd80d..e05745f 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -193,7 +193,9 @@ vec4_visitor::nir_emit_if(nir_if *if_stmt)
> vec4_instruction *inst = emit(MOV(dst_null_d(), condition));
> inst->conditional_mod = BRW_CONDITIONAL_NZ;
>
> -   emit(IF(BRW_PREDICATE_NORMAL));
> +   /* We can just predicate based on the X channel, as the condition only
> +* reads from one channel */
> +   emit(IF(BRW_PREDICATE_ALIGN16_REPLICATE_X));
>
> nir_emit_cf_list(_stmt->then_list);
>
> --
> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.

2015-10-10 Thread Marek Olšák
On Sat, Oct 10, 2015 at 5:49 PM, Marek Olšák  wrote:
> On Sat, Oct 10, 2015 at 4:15 PM, Bas Nieuwenhuizen
>  wrote:
>> Hi Marek,
>>
>> The revised series is mostly done. I wanted to do more testing and to
>> try to make sure that the added cache flushes I am doing now (a
>> CACHE_FLUSH event before a fast clear and on switching framebuffers)
>> are the minimal needed.
>>
>>> Also, it looks like we don't need DCC decompression at all, right? It
>>> might be better to get rid of it and only use the 3D engine to access
>>> DCC-encoded surfaces.
>>
>> I still use it for flush_resource. I could make this redundant by
>> sharing the DCC buffer by appending the DCC buffer to the texture
>> resource similarly to how the CMASK is appended to the resource of a
>> MSAA buffer. This has the secondary benefit of not needing to
>> reference as many resources for command submission.
>
> IIRC, flush_resource is only used for shared (scanout) surfaces where
> DCC is always disabled.

That said, we might need to keep the DCC decompression for image store
instructions, which don't support compression.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Mesa 11.0.3

2015-10-10 Thread Emil Velikov
Mesa 11.0.3 is now available.

In the current release we have a bunch of EGL patches, mangledGL build fixes
and a healthy amount of driver bugfixes - radeonsi, nouveau, i915 and i965.
Last but not least, the KDE/Weston regression introduced with 11.0.2 has also 
been resolved.


Brian Paul (1):
  st/mesa: try PIPE_BIND_RENDER_TARGET when choosing float texture formats

Daniel Scharrer (1):
  mesa: Add abs input modifier to base for POW in ffvertex_prog

Emil Velikov (4):
  docs: add sha256 checksums for 11.0.2
  Revert "nouveau: make sure there's always room to emit a fence"
  Update version to 11.0.3
  docs: add release notes for 11.0.3

Francisco Jerez (1):
  i965/fs: Fix hang on IVB and VLV with image format mismatch.

Ian Romanick (1):
  meta: Handle array textures in scaled MSAA blits

Ilia Mirkin (6):
  nouveau: be more careful about freeing temporary transfer buffers
  nouveau: delay deleting buffer with unflushed fence
  nouveau: wait to unref the transfer's bo until it's no longer used
  nv30: pretend to have packed texture/surface formats
  nv30: always go through translate module on big-endian
  nouveau: make sure there's always room to emit a fence

Jason Ekstrand (1):
  mesa: Correctly handle GL_BGRA_EXT in ES3 format_and_type checks

Kyle Brenneman (3):
  glx: Fix build errors with --enable-mangling (v2)
  mapi: Make _glapi_get_stub work with "gl" or "mgl" prefix.
  glx: Don't hard-code the name "libGL.so.1" in driOpenDriver (v3)

Leo Liu (1):
  radeon/vce: fix vui time_scale zero error

Marek Olšák (21):
  st/mesa: fix front buffer regression after dropping st_validate_state in 
Blit
  radeonsi: handle index buffer alloc failures
  radeonsi: handle constant buffer alloc failures
  gallium/radeon: handle buffer_map staging buffer failures better
  gallium/radeon: handle buffer alloc failures in r600_draw_rectangle
  gallium/radeon: add a fail path for depth MSAA texture readback
  radeonsi: report alloc failure from si_shader_binary_read
  radeonsi: add malloc fail paths to si_create_shader_state
  radeonsi: skip drawing if the tess factor ring allocation fails
  radeonsi: skip drawing if GS ring allocations fail
  radeonsi: handle shader precompile failures
  radeonsi: handle fixed-func TCS shader create failure
  radeonsi: skip drawing if VS, TCS, TES, GS fail to compile or upload
  radeonsi: skip drawing if PS fails to compile or upload
  radeonsi: skip drawing if updating the scratch buffer fails
  radeonsi: don't forget to update scratch relocations for LS, HS, ES 
shaders
  radeonsi: handle dummy constant buffer allocation failure
  gallium/u_blitter: handle allocation failures
  radeonsi: add scratch buffer to the buffer list when it's re-allocated
  st/dri: don't use _ctx in client_wait_sync
  egl/dri2: don't require a context for ClientWaitSync (v2)

Matthew Waters (1):
  egl: rework handling EGL_CONTEXT_FLAGS

Michel Dänzer (1):
  st/dri: Use packed RGB formats

Roland Scheidegger (1):
  mesa: fix mipmap generation for immutable, compressed textures

Tom Stellard (3):
  gallium/radeon: Use call_once() when initailizing LLVM targets
  gallivm: Allow drivers and state trackers to initialize gallivm LLVM 
targets v2
  radeon/llvm: Initialize gallivm targets when initializing the AMDGPU 
target v2

Varad Gautam (1):
  egl: restore surface type before linking config to its display

Ville Syrjälä (3):
  i830: Fix collision between I830_UPLOAD_RASTER_RULES and 
I830_UPLOAD_TEX(0)
  i915: Fix texcoord vs. varying collision in fragment programs
  i915: Remember to call intel_prepare_render() before blitting


git tag: mesa-11.0.3

ftp://ftp.freedesktop.org/pub/mesa/11.0.3/mesa-11.0.3.tar.gz
MD5: 67be040a22025034351ca26c204db81c  mesa-11.0.3.tar.gz
SHA1: 85f5386a9914cfbf53dae58b39e26b2e41f66178  mesa-11.0.3.tar.gz
SHA256: c2210e3daecc10ed9fdcea500327652ed6effc2f47c4b9cee63fb08f560d7117  
mesa-11.0.3.tar.gz
PGP: ftp://ftp.freedesktop.org/pub/mesa/11.0.3/mesa-11.0.3.tar.gz.sig

ftp://ftp.freedesktop.org/pub/mesa/11.0.3/mesa-11.0.3.tar.xz
MD5: bf9118bf0fbf360715cfe60baf7a1db5  mesa-11.0.3.tar.xz
SHA1: e66dbd0f372947eaaee12a50df41befb20164b05  mesa-11.0.3.tar.xz
SHA256: ab2992eece21adc23c398720ef8c6933cb69ea42e1b2611dc09d031e17e033d6  
mesa-11.0.3.tar.xz
PGP: ftp://ftp.freedesktop.org/pub/mesa/11.0.3/mesa-11.0.3.tar.xz.sig

--
-Emil



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92361] [BSW SKL] Regression: glx@glx-copy-sub-buffer failed

2015-10-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92361

cprigent  changed:

   What|Removed |Added

Summary|[BSW] Regression:   |[BSW SKL] Regression:
   |glx@glx-copy-sub-buffer |glx@glx-copy-sub-buffer
   |failed  |failed

--- Comment #1 from cprigent  ---
Reproduced on SKL.
Following tests were Pass with Mesa 10.6.7:
glx@glx-copy-sub-buffer
glx@glx-copy-sub-buffer samples=2
glx@glx-copy-sub-buffer samples=4
glx@glx-copy-sub-buffer samples=6
glx@glx-copy-sub-buffer samples=8

Hardware:
Platform: SKY LAKE Y A0 
CPU : Intel(R) Core(TM) m5-6Y57 CPU @ 1.10GHz (family: 6, model: 78  stepping:
3)
MCP : SKL-Y  D1 2+2 (ou ULX-D1)
QDF : QJK9 
CPU : SKL D0
Chipset PCH: Sunrise Point LP C1   
CRB : SKY LAKE Y LPDDR3 RVP3 CRB FAB2
Reworks : All Mandatories + FBS02,FBS03, F23, O-02 & O-06
Software
Linux : Ubuntu 14.04 LTS 64 bits
BIOS : SKLSE2R1.R00.X097.B02.1509020030
ME FW : 11.0.0.1173
Ksc (EC FW): 1.19

kernel 4.3.0-rc3-drm-intel-nightly+ (eb69e51) from
git://anongit.freedesktop.org/drm-intel
Mesa - 11.0.2 from http://cgit.freedesktop.org/mesa/mesa/
xf86-video-intel - 2.99.917 from
http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/
Libdrm - 2.4.65 from http://cgit.freedesktop.org/mesa/drm/
Libva - 1.6.1 from http://cgit.freedesktop.org/libva/
vaapi intel-driver - 1.6.1 from http://cgit.freedesktop.org/vaapi/intel-driver
Cairo - 1.14.2 from http://cgit.freedesktop.org/cairo
Xorg Xserver - 1.17.2 from http://cgit.freedesktop.org/xorg/xserver

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92368] [BSW] Regression: glx@glx_arb_sync_control@timing -fullscreen test Fail

2015-10-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92368

--- Comment #1 from cprigent  ---
Reproduced on SKL-Y.

Following tests were Pass with Mesa 10.6.4:
glx@glx_arb_sync_control@timing -divisor 1
glx@glx_arb_sync_control@timing -fullscreen -msc-delta 1
glx@glx_arb_sync_control@timing -msc-delta 1
glx@glx_arb_sync_control@timing -msc-delta 2
glx@glx_arb_sync_control@timing -waitformsc -divisor 1
glx@glx_arb_sync_control@timing -waitformsc -msc-delta 2

Hardware:
Platform: SKY LAKE Y A0 
CPU : Intel(R) Core(TM) m5-6Y57 CPU @ 1.10GHz (family: 6, model: 78  stepping:
3)
MCP : SKL-Y  D1 2+2 (ou ULX-D1)
QDF : QJK9 
CPU : SKL D0
Chipset PCH: Sunrise Point LP C1   
CRB : SKY LAKE Y LPDDR3 RVP3 CRB FAB2
Reworks : All Mandatories + FBS02,FBS03, F23, O-02 & O-06
Software
Linux : Ubuntu 14.04 LTS 64 bits
BIOS : SKLSE2R1.R00.X097.B02.1509020030
ME FW : 11.0.0.1173
Ksc (EC FW): 1.19

kernel 4.3.0-rc3-drm-intel-nightly+ (eb69e51) from
git://anongit.freedesktop.org/drm-intel
Mesa - 11.0.2 from http://cgit.freedesktop.org/mesa/mesa/
xf86-video-intel - 2.99.917 from
http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/
Libdrm - 2.4.65 from http://cgit.freedesktop.org/mesa/drm/
Libva - 1.6.1 from http://cgit.freedesktop.org/libva/
vaapi intel-driver - 1.6.1 from http://cgit.freedesktop.org/vaapi/intel-driver
Cairo - 1.14.2 from http://cgit.freedesktop.org/cairo
Xorg Xserver - 1.17.2 from http://cgit.freedesktop.org/xorg/xserver

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/vec4: Implement b2f and b2i using negation.

2015-10-10 Thread Matt Turner
Curro added this in commit 3ee2daf23d (before the vec4/NIR backend was
added) but it was missed in the new NIR backend. Add it there as well.

instructions in affected programs: 1857 -> 1810 (-2.53%)
helped:15
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 41bd80d..fdf767d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1237,14 +1237,8 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   break;
 
case nir_op_b2i:
-  emit(AND(dst, op[0], src_reg(1)));
-  break;
-
case nir_op_b2f:
-  op[0].type = BRW_REGISTER_TYPE_D;
-  dst.type = BRW_REGISTER_TYPE_D;
-  emit(AND(dst, op[0], src_reg(0x3f80u)));
-  dst.type = BRW_REGISTER_TYPE_F;
+  emit(MOV(dst, negate(op[0])));
   break;
 
case nir_op_f2b:
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Ilia Mirkin
On Sat, Oct 10, 2015 at 3:55 PM, Samuel Pitoiset
 wrote:
>
>
> On 10/10/2015 09:42 PM, Ilia Mirkin wrote:
>>
>> On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset
>>  wrote:
>>>
>>> This patch looks fine except that it should be a bit more normalized. I
>>> mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for
>>> PUSH_SPACE calls, sometimes you add it sometimes not.
>>
>> Meh. We need to get our error checking situation straight, but this
>> isn't the patch to do it in.
>
>
> Yeah, but this needs to be clarified.

What does?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.

2015-10-10 Thread Bas Nieuwenhuizen
Hi Marek,

The revised series is mostly done. I wanted to do more testing and to
try to make sure that the added cache flushes I am doing now (a
CACHE_FLUSH event before a fast clear and on switching framebuffers)
are the minimal needed.

> Also, it looks like we don't need DCC decompression at all, right? It
> might be better to get rid of it and only use the 3D engine to access
> DCC-encoded surfaces.

I still use it for flush_resource. I could make this redundant by
sharing the DCC buffer by appending the DCC buffer to the texture
resource similarly to how the CMASK is appended to the resource of a
MSAA buffer. This has the secondary benefit of not needing to
reference as many resources for command submission.

Yours sincerely,
Bas Nieuwenhuizen
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir/glsl: Use shader_prog->Name for naming the NIR shader

2015-10-10 Thread Jason Ekstrand
This has the better name to use. Aparently, sh->Name is usually 0.
---
 src/glsl/nir/glsl_to_nir.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
index 6e1dd84..3284bdc 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -150,7 +150,7 @@ glsl_to_nir(const struct gl_shader_program *shader_prog,
   if (sh->Program->SamplersUsed & (1 << i))
  num_textures = i;
 
-   shader->info.name = ralloc_asprintf(shader, "GLSL%d", sh->Name);
+   shader->info.name = ralloc_asprintf(shader, "GLSL%d", shader_prog->Name);
if (shader_prog->Label)
   shader->info.label = ralloc_strdup(shader, shader_prog->Label);
shader->info.num_textures = num_textures;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-10 Thread Jason Ekstrand
I reworked this patch to patch use_legacy_snorm_formula through as a
function argument rather than trying to go through the key.  This
should make landing this series independent of finding strange
meta-related gpu hangs on HSW.  I reworked the patch to move
use_legacy_snorm_formula into the key so that it applies on top of the
series.

On Sat, Oct 10, 2015 at 8:09 AM, Jason Ekstrand  wrote:
> This commit removes all dependence on GL state by getting rid of the
> brw_context parameter and the GL data structures.
>
> v2 (Jason Ekstrand):
>- Patch use_legacy_snorm_formula through as a function argument rather
>  than trying to go through the shader key.
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 
> +-
>  src/mesa/drivers/dri/i965/brw_vs.c | 16 +++-
>  src/mesa/drivers/dri/i965/brw_vs.h | 12 --
>  3 files changed, 49 insertions(+), 49 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 4b8390f..8e38729 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -1937,51 +1937,42 @@ extern "C" {
>   * Returns the final assembly and the program's size.
>   */
>  const unsigned *
> -brw_vs_emit(struct brw_context *brw,
> +brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
>  void *mem_ctx,
>  const struct brw_vs_prog_key *key,
>  struct brw_vs_prog_data *prog_data,
> -struct gl_vertex_program *vp,
> -struct gl_shader_program *prog,
> +const nir_shader *shader,
> +gl_clip_plane *clip_planes,
> +bool use_legacy_snorm_formula,
>  int shader_time_index,
> -unsigned *final_assembly_size)
> +unsigned *final_assembly_size,
> +char **error_str)
>  {
> const unsigned *assembly = NULL;
>
> -   if (brw->intelScreen->compiler->scalar_vs) {
> +   if (compiler->scalar_vs) {
>prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
>
> -  fs_visitor v(brw->intelScreen->compiler, brw,
> -   mem_ctx, key, _data->base.base,
> +  fs_visitor v(compiler, log_data, mem_ctx, key, _data->base.base,
> NULL, /* prog; Only used for TEXTURE_RECTANGLE on gen < 8 
> */
> -   vp->Base.nir, 8, shader_time_index);
> -  if (!v.run_vs(brw_select_clip_planes(>ctx))) {
> - if (prog) {
> -prog->LinkStatus = false;
> -ralloc_strcat(>InfoLog, v.fail_msg);
> - }
> -
> - _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
> -   v.fail_msg);
> +   shader, 8, shader_time_index);
> +  if (!v.run_vs(clip_planes)) {
> + if (error_str)
> +*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
>
>   return NULL;
>}
>
> -  fs_generator g(brw->intelScreen->compiler, brw,
> - mem_ctx, (void *) key, _data->base.base,
> - v.promoted_constants,
> +  fs_generator g(compiler, log_data, mem_ctx, (void *) key,
> + _data->base.base, v.promoted_constants,
>   v.runtime_check_aads_emit, "VS");
>if (INTEL_DEBUG & DEBUG_VS) {
> - char *name;
> - if (prog) {
> -name = ralloc_asprintf(mem_ctx, "%s vertex shader %d",
> -   prog->Label ? prog->Label : "unnamed",
> -   prog->Name);
> - } else {
> -name = ralloc_asprintf(mem_ctx, "vertex program %d",
> -   vp->Base.Id);
> - }
> - g.enable_debug(name);
> + const char *debug_name =
> +ralloc_asprintf(mem_ctx, "%s vertex shader %s",
> +shader->info.label ? shader->info.label : 
> "unnamed",
> +shader->info.name);
> +
> + g.enable_debug(debug_name);
>}
>g.generate_code(v.cfg, 8);
>assembly = g.get_assembly(final_assembly_size);
> @@ -1990,26 +1981,19 @@ brw_vs_emit(struct brw_context *brw,
> if (!assembly) {
>prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
>
> -  vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data,
> -vp->Base.nir, brw_select_clip_planes(>ctx),
> -mem_ctx, shader_time_index,
> -!_mesa_is_gles3(>ctx));
> +  vec4_vs_visitor v(compiler, log_data, key, prog_data,
> +shader, clip_planes, mem_ctx,
> +shader_time_index, use_legacy_snorm_formula);
>if (!v.run()) {
> - if (prog) {
> -prog->LinkStatus = false;
> -ralloc_strcat(>InfoLog, v.fail_msg);
> - }
> -
> - _mesa_problem(NULL, "Failed to 

Re: [Mesa-dev] [PATCH] nouveau: avoid emitting new fences unnecessarily

2015-10-10 Thread Samuel Pitoiset

Does this fix those texelFetch piglit tests ? Or is it the second patch ?

Anyway, this patch is :

Reviewed-by: Samuel Pitoiset 

On 10/10/2015 08:12 AM, Ilia Mirkin wrote:

Right now we emit on every kick, but this is only necessary if something
will ever be able to observe that the fence completed. If there are no
refs, leave the fence alone and emit it another day.

This also happens to work around an issue for the kick handler -- a kick
can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be
due to lack of space in the pushbuf. We want the emit to happen in the
current batch, so we want there to always be enough space. However an
explicit kick could take the reserved space for the implicitly-triggered
kick's fence emission if it happened right after. With the new mechanism,
hopefully there's no way to cause two fences to be emitted into the same
reserved space.

Signed-off-by: Ilia Mirkin 
Cc: mesa-sta...@lists.freedesktop.org
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
---
  src/gallium/drivers/nouveau/nouveau_fence.c | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_fence.c 
b/src/gallium/drivers/nouveau/nouveau_fence.c
index ee4e08d..18b1592 100644
--- a/src/gallium/drivers/nouveau/nouveau_fence.c
+++ b/src/gallium/drivers/nouveau/nouveau_fence.c
@@ -190,8 +190,10 @@ nouveau_fence_wait(struct nouveau_fence *fence)
 /* wtf, someone is waiting on a fence in flush_notify handler? */
 assert(fence->state != NOUVEAU_FENCE_STATE_EMITTING);
  
-   if (fence->state < NOUVEAU_FENCE_STATE_EMITTED)

+   if (fence->state < NOUVEAU_FENCE_STATE_EMITTED) {
+  PUSH_SPACE(screen->pushbuf, 8);
nouveau_fence_emit(fence);
+   }
  
 if (fence->state < NOUVEAU_FENCE_STATE_FLUSHED)

if (nouveau_pushbuf_kick(screen->pushbuf, screen->pushbuf->channel))
@@ -224,8 +226,12 @@ nouveau_fence_wait(struct nouveau_fence *fence)
  void
  nouveau_fence_next(struct nouveau_screen *screen)
  {
-   if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING)
-  nouveau_fence_emit(screen->fence.current);
+   if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING) {
+  if (screen->fence.current->ref > 1)
+ nouveau_fence_emit(screen->fence.current);
+  else
+ return;
+   }
  
 nouveau_fence_ref(NULL, >fence.current);
  


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Samuel Pitoiset
This patch looks fine except that it should be a bit more normalized. I 
mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for 
PUSH_SPACE calls, sometimes you add it sometimes not.


Did you run a full piglit test this time ? :)

See my comment below.

On 10/10/2015 11:09 AM, Ilia Mirkin wrote:

We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.

Signed-off-by: Ilia Mirkin 
Cc: mesa-sta...@lists.freedesktop.org
---
  src/gallium/drivers/nouveau/nv50/nv50_shader_state.c |  9 ++---
  src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16 +++-
  src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20 +---
  3 files changed, 10 insertions(+), 35 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c 
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index fdde11f..941555f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50)
 PUSH_DATA (push, (b << 12) | (i << 8) | p | 1);
  }
  while (words) {
-   unsigned nr;
-
-   if (!PUSH_SPACE(push, 16))
-  break;
-   nr = PUSH_AVAIL(push);
-   assert(nr >= 16);
-   nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN);
+   unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
  
+   PUSH_SPACE(push, nr + 3);


This PUSH_SPACE call doesn't seem to be needed for me because 
NV50_PUSH_EXPLICIT_SPACE_CHECKING is not set and the following BEGIN_XXX 
calls will allocate space.



 BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
 PUSH_DATA (push, (start << 8) | b);
 BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c 
b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
index be51407..9a3fd1e 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
@@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv,
 PUSH_DATA (push, 0);
  
 while (count) {

-  unsigned nr;
-
-  if (!PUSH_SPACE(push, 16))
- break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 1);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
  
BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr);

PUSH_DATAp(push, src, nr);
@@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv,
 nouveau_pushbuf_validate(push);
  
 while (words) {

-  unsigned nr;
-
-  nr = PUSH_AVAIL(push);
-  nr = MIN2(nr - 7, words);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
+  unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
  
+  PUSH_SPACE(push, nr + 7);

BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3);
PUSH_DATAh(push, bo->offset + base);
PUSH_DATA (push, bo->offset + base);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
index aaec60a..d459dd6 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
@@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv,
 nouveau_pushbuf_validate(push);
  
 while (count) {

-  unsigned nr;
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
  
-  if (!PUSH_SPACE(push, 16))

+  if (!PUSH_SPACE(push, nr + 9))
   break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 9);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
  
BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2);

PUSH_DATAh(push, dst->offset + offset);
@@ -234,14 +230,10 @@ nve4_p2mf_push_linear(struct nouveau_context *nv,
 nouveau_pushbuf_validate(push);
  
 while (count) {

-  unsigned nr;
+  unsigned nr = MIN2(count, (NV04_PFIFO_MAX_PACKET_LEN - 1));
  
-  if (!PUSH_SPACE(push, 16))

+  if (!PUSH_SPACE(push, nr + 10))
   break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 8);
-  nr = MIN2(nr, (NV04_PFIFO_MAX_PACKET_LEN - 1));
  
BEGIN_NVC0(push, NVE4_P2MF(UPLOAD_DST_ADDRESS_HIGH), 2);

PUSH_DATAh(push, dst->offset + offset);
@@ -571,9 +563,7 @@ nvc0_cb_bo_push(struct nouveau_context *nv,
 PUSH_DATA (push, bo->offset + base);
  
 while (words) {

-  unsigned nr = PUSH_AVAIL(push);
-  nr = MIN2(nr, words);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
+  unsigned nr = MIN2(words, 

Re: [Mesa-dev] [PATCH 11/17] i965/fs: Rework wm_fs_emit to take a nir_shader and a brw_compiler

2015-10-10 Thread Jason Ekstrand
Ignore this.  It's just an accidental re-send.

On Sat, Oct 10, 2015 at 8:04 AM, Jason Ekstrand  wrote:
> This commit removes all dependence on GL state by getting rid of the
> brw_context parameter and the GL data structures.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 59 
> 
>  src/mesa/drivers/dri/i965/brw_wm.c   | 14 +++--
>  src/mesa/drivers/dri/i965/brw_wm.h   | 13 +---
>  3 files changed, 47 insertions(+), 39 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 3c83f2a..8bdc676 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -5115,40 +5115,39 @@ fs_visitor::run_cs()
>  }
>
>  const unsigned *
> -brw_wm_fs_emit(struct brw_context *brw,
> +brw_wm_fs_emit(const struct brw_compiler *compiler, void *log_data,
> void *mem_ctx,
> const struct brw_wm_prog_key *key,
> struct brw_wm_prog_data *prog_data,
> -   struct gl_fragment_program *fp,
> -   struct gl_shader_program *prog,
> +   const nir_shader *shader,
> +   struct gl_program *prog,
> int shader_time_index8, int shader_time_index16,
> -   unsigned *final_assembly_size)
> +   bool use_rep_send,
> +   unsigned *final_assembly_size,
> +   char **error_str)
>  {
> -   /* Now the main event: Visit the shader IR and generate our FS IR for it.
> -*/
> -   fs_visitor v(brw->intelScreen->compiler, brw, mem_ctx, key,
> -_data->base, >Base, fp->Base.nir, 8, 
> shader_time_index8);
> +   fs_visitor v(compiler, log_data, mem_ctx, key,
> +_data->base, prog, shader, 8,
> +shader_time_index8);
> if (!v.run_fs(false /* do_rep_send */)) {
> -  if (prog) {
> - prog->LinkStatus = false;
> - ralloc_strcat(>InfoLog, v.fail_msg);
> -  }
> -
> -  _mesa_problem(NULL, "Failed to compile fragment shader: %s\n",
> -v.fail_msg);
> +  if (error_str)
> + *error_str = ralloc_strdup(mem_ctx, v.fail_msg);
>
>return NULL;
> }
>
> cfg_t *simd16_cfg = NULL;
> -   fs_visitor v2(brw->intelScreen->compiler, brw, mem_ctx, key,
> - _data->base, >Base, fp->Base.nir, 16, 
> shader_time_index16);
> -   if (likely(!(INTEL_DEBUG & DEBUG_NO16) || brw->use_rep_send)) {
> +   fs_visitor v2(compiler, log_data, mem_ctx, key,
> + _data->base, prog, shader, 16,
> + shader_time_index16);
> +   if (likely(!(INTEL_DEBUG & DEBUG_NO16) || use_rep_send)) {
>if (!v.simd16_unsupported) {
>   /* Try a SIMD16 compile */
>   v2.import_uniforms();
> - if (!v2.run_fs(brw->use_rep_send)) {
> -perf_debug("SIMD16 shader failed to compile: %s", v2.fail_msg);
> + if (!v2.run_fs(use_rep_send)) {
> +compiler->shader_perf_log(log_data,
> +  "SIMD16 shader failed to compile: %s",
> +  v2.fail_msg);
>   } else {
>  simd16_cfg = v2.cfg;
>   }
> @@ -5156,8 +5155,8 @@ brw_wm_fs_emit(struct brw_context *brw,
> }
>
> cfg_t *simd8_cfg;
> -   int no_simd8 = (INTEL_DEBUG & DEBUG_NO8) || brw->no_simd8;
> -   if ((no_simd8 || brw->gen < 5) && simd16_cfg) {
> +   int no_simd8 = (INTEL_DEBUG & DEBUG_NO8) || use_rep_send;
> +   if ((no_simd8 || compiler->devinfo->gen < 5) && simd16_cfg) {
>simd8_cfg = NULL;
>prog_data->no_8 = true;
> } else {
> @@ -5165,20 +5164,14 @@ brw_wm_fs_emit(struct brw_context *brw,
>prog_data->no_8 = false;
> }
>
> -   fs_generator g(brw->intelScreen->compiler, brw,
> -  mem_ctx, (void *) key, _data->base,
> +   fs_generator g(compiler, log_data, mem_ctx, (void *) key, 
> _data->base,
>v.promoted_constants, v.runtime_check_aads_emit, "FS");
>
> if (unlikely(INTEL_DEBUG & DEBUG_WM)) {
> -  char *name;
> -  if (prog)
> - name = ralloc_asprintf(mem_ctx, "%s fragment shader %d",
> -prog->Label ? prog->Label : "unnamed",
> -prog->Name);
> -  else
> - name = ralloc_asprintf(mem_ctx, "fragment program %d", fp->Base.Id);
> -
> -  g.enable_debug(name);
> +  g.enable_debug(ralloc_asprintf(mem_ctx, "%s fragment shader %s",
> + shader->info.label ? shader->info.label 
> :
> +  "unnamed",
> + shader->info.name));
> }
>
> if (simd8_cfg)
> diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
> b/src/mesa/drivers/dri/i965/brw_wm.c
> index 4d5e7f6..ab9461a 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm.c
> +++ 

[Mesa-dev] [PATCH v2 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-10 Thread Jason Ekstrand
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.

v2 (Jason Ekstrand):
   - Patch use_legacy_snorm_formula through as a function argument rather
 than trying to go through the shader key.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 70 +-
 src/mesa/drivers/dri/i965/brw_vs.c | 16 +++-
 src/mesa/drivers/dri/i965/brw_vs.h | 12 --
 3 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 4b8390f..8e38729 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1937,51 +1937,42 @@ extern "C" {
  * Returns the final assembly and the program's size.
  */
 const unsigned *
-brw_vs_emit(struct brw_context *brw,
+brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
 void *mem_ctx,
 const struct brw_vs_prog_key *key,
 struct brw_vs_prog_data *prog_data,
-struct gl_vertex_program *vp,
-struct gl_shader_program *prog,
+const nir_shader *shader,
+gl_clip_plane *clip_planes,
+bool use_legacy_snorm_formula,
 int shader_time_index,
-unsigned *final_assembly_size)
+unsigned *final_assembly_size,
+char **error_str)
 {
const unsigned *assembly = NULL;
 
-   if (brw->intelScreen->compiler->scalar_vs) {
+   if (compiler->scalar_vs) {
   prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
 
-  fs_visitor v(brw->intelScreen->compiler, brw,
-   mem_ctx, key, _data->base.base,
+  fs_visitor v(compiler, log_data, mem_ctx, key, _data->base.base,
NULL, /* prog; Only used for TEXTURE_RECTANGLE on gen < 8 */
-   vp->Base.nir, 8, shader_time_index);
-  if (!v.run_vs(brw_select_clip_planes(>ctx))) {
- if (prog) {
-prog->LinkStatus = false;
-ralloc_strcat(>InfoLog, v.fail_msg);
- }
-
- _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
-   v.fail_msg);
+   shader, 8, shader_time_index);
+  if (!v.run_vs(clip_planes)) {
+ if (error_str)
+*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
 
  return NULL;
   }
 
-  fs_generator g(brw->intelScreen->compiler, brw,
- mem_ctx, (void *) key, _data->base.base,
- v.promoted_constants,
+  fs_generator g(compiler, log_data, mem_ctx, (void *) key,
+ _data->base.base, v.promoted_constants,
  v.runtime_check_aads_emit, "VS");
   if (INTEL_DEBUG & DEBUG_VS) {
- char *name;
- if (prog) {
-name = ralloc_asprintf(mem_ctx, "%s vertex shader %d",
-   prog->Label ? prog->Label : "unnamed",
-   prog->Name);
- } else {
-name = ralloc_asprintf(mem_ctx, "vertex program %d",
-   vp->Base.Id);
- }
- g.enable_debug(name);
+ const char *debug_name =
+ralloc_asprintf(mem_ctx, "%s vertex shader %s",
+shader->info.label ? shader->info.label : 
"unnamed",
+shader->info.name);
+
+ g.enable_debug(debug_name);
   }
   g.generate_code(v.cfg, 8);
   assembly = g.get_assembly(final_assembly_size);
@@ -1990,26 +1981,19 @@ brw_vs_emit(struct brw_context *brw,
if (!assembly) {
   prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
 
-  vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data,
-vp->Base.nir, brw_select_clip_planes(>ctx),
-mem_ctx, shader_time_index,
-!_mesa_is_gles3(>ctx));
+  vec4_vs_visitor v(compiler, log_data, key, prog_data,
+shader, clip_planes, mem_ctx,
+shader_time_index, use_legacy_snorm_formula);
   if (!v.run()) {
- if (prog) {
-prog->LinkStatus = false;
-ralloc_strcat(>InfoLog, v.fail_msg);
- }
-
- _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
-   v.fail_msg);
+ if (error_str)
+*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
 
  return NULL;
   }
 
-  vec4_generator g(brw->intelScreen->compiler, brw,
-   _data->base,
+  vec4_generator g(compiler, log_data, _data->base,
mem_ctx, INTEL_DEBUG & DEBUG_VS, "vertex", "VS");
-  assembly = g.generate_assembly(v.cfg, final_assembly_size, vp->Base.nir);
+  assembly = g.generate_assembly(v.cfg, final_assembly_size, shader);
}
 
return assembly;
diff --git 

Re: [Mesa-dev] [PATCH 3/8] radeonsi: Enable DCC.

2015-10-10 Thread Marek Olšák
On Sat, Oct 10, 2015 at 4:15 PM, Bas Nieuwenhuizen
 wrote:
> Hi Marek,
>
> The revised series is mostly done. I wanted to do more testing and to
> try to make sure that the added cache flushes I am doing now (a
> CACHE_FLUSH event before a fast clear and on switching framebuffers)
> are the minimal needed.
>
>> Also, it looks like we don't need DCC decompression at all, right? It
>> might be better to get rid of it and only use the 3D engine to access
>> DCC-encoded surfaces.
>
> I still use it for flush_resource. I could make this redundant by
> sharing the DCC buffer by appending the DCC buffer to the texture
> resource similarly to how the CMASK is appended to the resource of a
> MSAA buffer. This has the secondary benefit of not needing to
> reference as many resources for command submission.

IIRC, flush_resource is only used for shared (scanout) surfaces where
DCC is always disabled.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier

2015-10-10 Thread Marek Olšák
Hi Timothy,

One of these 3 commits breaks compilation for Talos shaders with
gallium. My piglit patch "glsl-1.30/sampler-bug: ..." contains a
minimal test case. I can't say which commit, because Mesa fails to
build between them. It has something to do with uniforms, structures,
and samplers.

commit dcd9cd03837545055ce2a315e7e8840cc3254d1a
Author: Timothy Arceri 
Date:   Sun Aug 30 12:50:34 2015 +1000

glsl: store uniform slot id in var location field

...
commit 9788700caf61ff8beee5fd836f5efd98a931a976
Author: Timothy Arceri 
Date:   Wed Sep 2 11:29:11 2015 +1000

glsl: assign hidden uniforms their slot id earlier

...
commit 874a0217fd8bba83b0bc2448f5156fdb82f77d7c
Author: Timothy Arceri 
Date:   Sun Aug 30 12:49:46 2015 +1000

glsl: order indices for samplers inside a struct array

...

Any idea?

Thanks,
Marek

On Tue, Sep 15, 2015 at 9:51 AM, Timothy Arceri  wrote:
> This is required so that the next patch can safely assign the slot id
> to the var.
>
> The ids are now assigned in the order we want before allocating storage
> so there is no need to sort the storage array and move things around.
>
> V2: rename variable to make code easier to follow as suggested by Jason
>
> Reviewed-by: Jason Ekstrand 
> ---
>  src/glsl/link_uniforms.cpp | 90 
> +-
>  1 file changed, 41 insertions(+), 49 deletions(-)
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965/vec4: nir_emit_if doesn't need to predicate based on all the channels

2015-10-10 Thread Matt Turner
On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro  wrote:
> ---
>
> I already talked about this with Jason Ekstrand and Matt Turner
> privately, but just in case somebody else jump to the review:
>
> When using BRW_PREDICATE_NORMAL, the if will use all the channels of
> the register flag.  But nir_if only reads from one channel, so that
> is not needed. Another hint showing that this is safe: the MOV that
> put the condition on f0 is calling get_nir_src with just one
> component. That will return always a source with swizzle
> BRW_SWIZZLE_, so that component is the only to be used.
>
> This commit is not needed/solving anything per-se, but it is needed in
> order to be able to implement vec4_cmod_propagation with a good
> overall outcome.
>
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index 41bd80d..e05745f 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -193,7 +193,9 @@ vec4_visitor::nir_emit_if(nir_if *if_stmt)
> vec4_instruction *inst = emit(MOV(dst_null_d(), condition));
> inst->conditional_mod = BRW_CONDITIONAL_NZ;
>
> -   emit(IF(BRW_PREDICATE_NORMAL));
> +   /* We can just predicate based on the X channel, as the condition only
> +* reads from one channel */

*/ goes on its own line.

> +   emit(IF(BRW_PREDICATE_ALIGN16_REPLICATE_X));

I agree with what Jason says -- seems like we should have been doing
this all along.

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] glsl: move half<->float convertion to util

2015-10-10 Thread Matt Turner
On Sat, Oct 10, 2015 at 11:47 AM, Rob Clark  wrote:
> From: Rob Clark 
>
> Needed in NIR too, so move out of mesa/main/imports.c
>
> Signed-off-by: Rob Clark 
> ---
>  src/glsl/Makefile.am  |   1 +
>  src/mesa/main/imports.c   | 148 --
>  src/mesa/main/imports.h   |  38 --
>  src/util/Makefile.sources |   2 +
>  src/util/convert.c| 179 
> ++
>  src/util/convert.h|  43 +++
>  6 files changed, 259 insertions(+), 152 deletions(-)
>  create mode 100644 src/util/convert.c
>  create mode 100644 src/util/convert.h
>
> diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am
> index 3265391..347919b 100644
> --- a/src/glsl/Makefile.am
> +++ b/src/glsl/Makefile.am
> @@ -160,6 +160,7 @@ glsl_compiler_SOURCES = \
>  glsl_compiler_LDADD =  \
> libglsl.la  \
> $(top_builddir)/src/libglsl_util.la \
> +   $(top_builddir)/src/util/libmesautil.la \
> $(PTHREAD_LIBS)
>
>  glsl_test_SOURCES = \
> diff --git a/src/mesa/main/imports.c b/src/mesa/main/imports.c
> index 350e675..230ebbc 100644
> --- a/src/mesa/main/imports.c
> +++ b/src/mesa/main/imports.c
> @@ -307,154 +307,6 @@ _mesa_bitcount_64(uint64_t n)
>  }
>  #endif
>
> -
> -/**
> - * Convert a 4-byte float to a 2-byte half float.
> - *
> - * Not all float32 values can be represented exactly as a float16 value. We
> - * round such intermediate float32 values to the nearest float16. When the
> - * float32 lies exactly between to float16 values, we round to the one with
> - * an even mantissa.
> - *
> - * This rounding behavior has several benefits:
> - *   - It has no sign bias.
> - *
> - *   - It reproduces the behavior of real hardware: opcode F32TO16 in Intel's
> - * GPU ISA.
> - *
> - *   - By reproducing the behavior of the GPU (at least on Intel hardware),
> - * compile-time evaluation of constant packHalf2x16 GLSL expressions will
> - * result in the same value as if the expression were executed on the 
> GPU.
> - */
> -GLhalfARB
> -_mesa_float_to_half(float val)
> -{
> -   const fi_type fi = {val};
> -   const int flt_m = fi.i & 0x7f;
> -   const int flt_e = (fi.i >> 23) & 0xff;
> -   const int flt_s = (fi.i >> 31) & 0x1;
> -   int s, e, m = 0;
> -   GLhalfARB result;
> -
> -   /* sign bit */
> -   s = flt_s;
> -
> -   /* handle special cases */
> -   if ((flt_e == 0) && (flt_m == 0)) {
> -  /* zero */
> -  /* m = 0; - already set */
> -  e = 0;
> -   }
> -   else if ((flt_e == 0) && (flt_m != 0)) {
> -  /* denorm -- denorm float maps to 0 half */
> -  /* m = 0; - already set */
> -  e = 0;
> -   }
> -   else if ((flt_e == 0xff) && (flt_m == 0)) {
> -  /* infinity */
> -  /* m = 0; - already set */
> -  e = 31;
> -   }
> -   else if ((flt_e == 0xff) && (flt_m != 0)) {
> -  /* NaN */
> -  m = 1;
> -  e = 31;
> -   }
> -   else {
> -  /* regular number */
> -  const int new_exp = flt_e - 127;
> -  if (new_exp < -14) {
> - /* The float32 lies in the range (0.0, min_normal16) and is rounded
> -  * to a nearby float16 value. The result will be either zero, 
> subnormal,
> -  * or normal.
> -  */
> - e = 0;
> - m = _mesa_lroundevenf((1 << 24) * fabsf(fi.f));
> -  }
> -  else if (new_exp > 15) {
> - /* map this value to infinity */
> - /* m = 0; - already set */
> - e = 31;
> -  }
> -  else {
> - /* The float32 lies in the range
> -  *   [min_normal16, max_normal16 + max_step16)
> -  * and is rounded to a nearby float16 value. The result will be
> -  * either normal or infinite.
> -  */
> - e = new_exp + 15;
> - m = _mesa_lroundevenf(flt_m / (float) (1 << 13));
> -  }
> -   }
> -
> -   assert(0 <= m && m <= 1024);
> -   if (m == 1024) {
> -  /* The float32 was rounded upwards into the range of the next exponent,
> -   * so bump the exponent. This correctly handles the case where f32
> -   * should be rounded up to float16 infinity.
> -   */
> -  ++e;
> -  m = 0;
> -   }
> -
> -   result = (s << 15) | (e << 10) | m;
> -   return result;
> -}
> -
> -
> -/**
> - * Convert a 2-byte half float to a 4-byte float.
> - * Based on code from:
> - * http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008786.html
> - */
> -float
> -_mesa_half_to_float(GLhalfARB val)
> -{
> -   /* XXX could also use a 64K-entry lookup table */
> -   const int m = val & 0x3ff;
> -   const int e = (val >> 10) & 0x1f;
> -   const int s = (val >> 15) & 0x1;
> -   int flt_m, flt_e, flt_s;
> -   fi_type fi;
> -   float result;
> -
> -   /* sign bit */
> -   flt_s = s;
> -
> -   /* handle special cases */
> -   if ((e == 0) && (m == 0)) {
> -  /* zero */
> 

Re: [Mesa-dev] [PATCH] i965/vec4: Implement b2f and b2i using negation.

2015-10-10 Thread Francisco Jerez
Matt Turner  writes:

> Curro added this in commit 3ee2daf23d (before the vec4/NIR backend was
> added) but it was missed in the new NIR backend. Add it there as well.
>
> instructions in affected programs: 1857 -> 1810 (-2.53%)
> helped:15
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8 +---
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index 41bd80d..fdf767d 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -1237,14 +1237,8 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>break;
>  
> case nir_op_b2i:
> -  emit(AND(dst, op[0], src_reg(1)));
> -  break;
> -
> case nir_op_b2f:
> -  op[0].type = BRW_REGISTER_TYPE_D;
> -  dst.type = BRW_REGISTER_TYPE_D;
> -  emit(AND(dst, op[0], src_reg(0x3f80u)));
> -  dst.type = BRW_REGISTER_TYPE_F;
> +  emit(MOV(dst, negate(op[0])));
>break;

Looks good to me,

Reviewed-by: Francisco Jerez 

>  
> case nir_op_f2b:
> -- 
> 2.4.9


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Samuel Pitoiset



On 10/10/2015 09:42 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset
 wrote:

This patch looks fine except that it should be a bit more normalized. I
mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for
PUSH_SPACE calls, sometimes you add it sometimes not.

Meh. We need to get our error checking situation straight, but this
isn't the patch to do it in.


Yeah, but this needs to be clarified.




Did you run a full piglit test this time ? :)

Nope, but I ran a full piglit before this patch. Almost took down my
box. Probably won't be running it again for this patch.


Ok, I'll run a full piglit this night then.




See my comment below.


On 10/10/2015 11:09 AM, Ilia Mirkin wrote:

We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.

Signed-off-by: Ilia Mirkin 
Cc: mesa-sta...@lists.freedesktop.org
---
   src/gallium/drivers/nouveau/nv50/nv50_shader_state.c |  9 ++---
   src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16
+++-
   src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20
+---
   3 files changed, 10 insertions(+), 35 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index fdde11f..941555f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50)
  PUSH_DATA (push, (b << 12) | (i << 8) | p | 1);
   }
   while (words) {
-   unsigned nr;
-
-   if (!PUSH_SPACE(push, 16))
-  break;
-   nr = PUSH_AVAIL(push);
-   assert(nr >= 16);
-   nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN);
+   unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
   +   PUSH_SPACE(push, nr + 3);


This PUSH_SPACE call doesn't seem to be needed for me because
NV50_PUSH_EXPLICIT_SPACE_CHECKING is not set and the following BEGIN_XXX
calls will allocate space.

I want to ensure that both of the below commands are in the same
batch. Not sure if it's necessary, but... don't want to find out. They
were in the same batch before. And this batch stuff is what was
causing the M2MF errors I was seeing earlier.




  BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
  PUSH_DATA (push, (start << 8) | b);
  BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
index be51407..9a3fd1e 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
@@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv,
  PUSH_DATA (push, 0);
while (count) {
-  unsigned nr;
-
-  if (!PUSH_SPACE(push, 16))
- break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 1);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
   BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr);
 PUSH_DATAp(push, src, nr);
@@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv,
  nouveau_pushbuf_validate(push);
while (words) {
-  unsigned nr;
-
-  nr = PUSH_AVAIL(push);
-  nr = MIN2(nr - 7, words);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
+  unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
   +  PUSH_SPACE(push, nr + 7);
 BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3);
 PUSH_DATAh(push, bo->offset + base);
 PUSH_DATA (push, bo->offset + base);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
index aaec60a..d459dd6 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
@@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv,
  nouveau_pushbuf_validate(push);
while (count) {
-  unsigned nr;
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
   -  if (!PUSH_SPACE(push, 16))
+  if (!PUSH_SPACE(push, nr + 9))
break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 9);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
   BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2);
 PUSH_DATAh(push, dst->offset + offset);
@@ -234,14 +230,10 @@ nve4_p2mf_push_linear(struct nouveau_context *nv,
  nouveau_pushbuf_validate(push);
while (count) {
-  unsigned nr;
+  unsigned 

[Mesa-dev] [PATCH 11/17] i965/fs: Rework wm_fs_emit to take a nir_shader and a brw_compiler

2015-10-10 Thread Jason Ekstrand
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 59 
 src/mesa/drivers/dri/i965/brw_wm.c   | 14 +++--
 src/mesa/drivers/dri/i965/brw_wm.h   | 13 +---
 3 files changed, 47 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 3c83f2a..8bdc676 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -5115,40 +5115,39 @@ fs_visitor::run_cs()
 }
 
 const unsigned *
-brw_wm_fs_emit(struct brw_context *brw,
+brw_wm_fs_emit(const struct brw_compiler *compiler, void *log_data,
void *mem_ctx,
const struct brw_wm_prog_key *key,
struct brw_wm_prog_data *prog_data,
-   struct gl_fragment_program *fp,
-   struct gl_shader_program *prog,
+   const nir_shader *shader,
+   struct gl_program *prog,
int shader_time_index8, int shader_time_index16,
-   unsigned *final_assembly_size)
+   bool use_rep_send,
+   unsigned *final_assembly_size,
+   char **error_str)
 {
-   /* Now the main event: Visit the shader IR and generate our FS IR for it.
-*/
-   fs_visitor v(brw->intelScreen->compiler, brw, mem_ctx, key,
-_data->base, >Base, fp->Base.nir, 8, 
shader_time_index8);
+   fs_visitor v(compiler, log_data, mem_ctx, key,
+_data->base, prog, shader, 8,
+shader_time_index8);
if (!v.run_fs(false /* do_rep_send */)) {
-  if (prog) {
- prog->LinkStatus = false;
- ralloc_strcat(>InfoLog, v.fail_msg);
-  }
-
-  _mesa_problem(NULL, "Failed to compile fragment shader: %s\n",
-v.fail_msg);
+  if (error_str)
+ *error_str = ralloc_strdup(mem_ctx, v.fail_msg);
 
   return NULL;
}
 
cfg_t *simd16_cfg = NULL;
-   fs_visitor v2(brw->intelScreen->compiler, brw, mem_ctx, key,
- _data->base, >Base, fp->Base.nir, 16, 
shader_time_index16);
-   if (likely(!(INTEL_DEBUG & DEBUG_NO16) || brw->use_rep_send)) {
+   fs_visitor v2(compiler, log_data, mem_ctx, key,
+ _data->base, prog, shader, 16,
+ shader_time_index16);
+   if (likely(!(INTEL_DEBUG & DEBUG_NO16) || use_rep_send)) {
   if (!v.simd16_unsupported) {
  /* Try a SIMD16 compile */
  v2.import_uniforms();
- if (!v2.run_fs(brw->use_rep_send)) {
-perf_debug("SIMD16 shader failed to compile: %s", v2.fail_msg);
+ if (!v2.run_fs(use_rep_send)) {
+compiler->shader_perf_log(log_data,
+  "SIMD16 shader failed to compile: %s",
+  v2.fail_msg);
  } else {
 simd16_cfg = v2.cfg;
  }
@@ -5156,8 +5155,8 @@ brw_wm_fs_emit(struct brw_context *brw,
}
 
cfg_t *simd8_cfg;
-   int no_simd8 = (INTEL_DEBUG & DEBUG_NO8) || brw->no_simd8;
-   if ((no_simd8 || brw->gen < 5) && simd16_cfg) {
+   int no_simd8 = (INTEL_DEBUG & DEBUG_NO8) || use_rep_send;
+   if ((no_simd8 || compiler->devinfo->gen < 5) && simd16_cfg) {
   simd8_cfg = NULL;
   prog_data->no_8 = true;
} else {
@@ -5165,20 +5164,14 @@ brw_wm_fs_emit(struct brw_context *brw,
   prog_data->no_8 = false;
}
 
-   fs_generator g(brw->intelScreen->compiler, brw,
-  mem_ctx, (void *) key, _data->base,
+   fs_generator g(compiler, log_data, mem_ctx, (void *) key, _data->base,
   v.promoted_constants, v.runtime_check_aads_emit, "FS");
 
if (unlikely(INTEL_DEBUG & DEBUG_WM)) {
-  char *name;
-  if (prog)
- name = ralloc_asprintf(mem_ctx, "%s fragment shader %d",
-prog->Label ? prog->Label : "unnamed",
-prog->Name);
-  else
- name = ralloc_asprintf(mem_ctx, "fragment program %d", fp->Base.Id);
-
-  g.enable_debug(name);
+  g.enable_debug(ralloc_asprintf(mem_ctx, "%s fragment shader %s",
+ shader->info.label ? shader->info.label :
+  "unnamed",
+ shader->info.name));
}
 
if (simd8_cfg)
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
b/src/mesa/drivers/dri/i965/brw_wm.c
index 4d5e7f6..ab9461a 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -230,9 +230,19 @@ brw_codegen_wm_prog(struct brw_context *brw,
   st_index16 = brw_get_shader_time_index(brw, prog, >program.Base, 
ST_FS16);
}
 
-   program = brw_wm_fs_emit(brw, mem_ctx, key, _data,
->program, prog, st_index8, st_index16, 
_size);
+   char *error_str = NULL;
+   program = 

Re: [Mesa-dev] [PATCH 0/6] Remove NIR dependency on GLSL

2015-10-10 Thread Rob Clark
On Sat, Oct 10, 2015 at 2:47 PM, Rob Clark  wrote:
> From: Rob Clark 
>
> This patchset removes the NIR dependency on GLSL (and includes resend
> of shader_enums cleanups w/ addition of STATIC_ASSERT()'s)
>
> Split up glsl_types so the builtin-types go w/ glsl_types but the parts
> that add them to glsl_symbol_table stay with glsl.  This way we can move
> glsl_types into NIR without dragging along glsl_symbol_table and all of
> it's dependencies.
>
> Also move the half/float conversion into util so it can be used from NIR
> without bringing an external dependency.
>
> With this we can move glsl_types into NIR and drop the dependency on
> GLSL, and mostly remove the libglsl_util hack.  (The standalone glsl-
> compiler util still needs libglsl_util, so we can't remove it completely
> yet, but we can remove the dependency on libglsl_util from non-mesa
> state trackers.  And a hypothetical vulkan implementation using NIR
> should also not need to suck in libglsl_util.)
>
> Probably there is some room to rename things to complete the cleanup,
> but I figured it was good to split things up into moving things first,
> and do flag-day renames second (if desired).
>
> Rob Clark (6):
>   glsl: couple shader_enums cleanups
>   glsl: move builtin types to glsl_types.cpp
>   glsl: move half<->float convertion to util
>   nir: use util/convert.h
>   nir: remove dependency on glsl

btw, this one seems to have bounced due to size (since moving files),
but you can find it here:

https://github.com/freedreno/mesa/commits/wip-nir-no-glsl

BR,
-R

>   glsl: (mostly) remove libglsl_util
>
>  src/gallium/drivers/freedreno/Makefile.am  |3 +-
>  src/gallium/targets/d3dadapter9/Makefile.am|1 -
>  src/gallium/targets/pipe-loader/Makefile.am|1 -
>  src/gallium/targets/xa/Makefile.am |1 -
>  src/glsl/Makefile.am   |   10 +-
>  src/glsl/Makefile.sources  |4 +-
>  src/glsl/builtin_type_macros.h |  172 --
>  src/glsl/builtin_types.cpp |4 +-
>  src/glsl/glsl_types.cpp| 1715 ---
>  src/glsl/glsl_types.h  |  867 --
>  src/glsl/nir/builtin_type_macros.h |  172 ++
>  src/glsl/nir/glsl_types.cpp| 1729 
> 
>  src/glsl/nir/glsl_types.h  |  867 ++
>  src/glsl/nir/nir_constant_expressions.py   |5 +-
>  src/glsl/nir/nir_types.h   |2 +-
>  src/glsl/nir/shader_enums.c|8 +
>  src/glsl/nir/shader_enums.h|7 +
>  .../drivers/dri/i965/brw_cubemap_normalize.cpp |2 +-
>  src/mesa/drivers/dri/i965/brw_fs.cpp   |2 +-
>  src/mesa/drivers/dri/i965/brw_fs.h |2 +-
>  .../dri/i965/brw_fs_channel_expressions.cpp|2 +-
>  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |2 +-
>  .../drivers/dri/i965/brw_fs_vector_splitting.cpp   |2 +-
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |2 +-
>  .../dri/i965/brw_lower_unnormalized_offset.cpp |2 +-
>  .../drivers/dri/i965/brw_schedule_instructions.cpp |2 +-
>  src/mesa/main/ff_fragment_shader.cpp   |2 +-
>  src/mesa/main/imports.c|  148 --
>  src/mesa/main/imports.h|   38 +-
>  src/mesa/main/mtypes.h |5 -
>  src/mesa/main/uniforms.h   |2 +-
>  src/mesa/program/ir_to_mesa.cpp|2 +-
>  src/mesa/program/sampler.cpp   |2 +-
>  src/util/Makefile.sources  |2 +
>  src/util/convert.c |  179 ++
>  src/util/convert.h |   43 +
>  36 files changed, 3063 insertions(+), 2946 deletions(-)
>  delete mode 100644 src/glsl/builtin_type_macros.h
>  delete mode 100644 src/glsl/glsl_types.cpp
>  delete mode 100644 src/glsl/glsl_types.h
>  create mode 100644 src/glsl/nir/builtin_type_macros.h
>  create mode 100644 src/glsl/nir/glsl_types.cpp
>  create mode 100644 src/glsl/nir/glsl_types.h
>  create mode 100644 src/util/convert.c
>  create mode 100644 src/util/convert.h
>
> --
> 2.4.3
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nouveau: avoid emitting new fences unnecessarily

2015-10-10 Thread Ilia Mirkin
On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset
 wrote:
> Does this fix those texelFetch piglit tests ? Or is it the second patch ?

This patch "fixes" the initial texelFetch piglit failures. However it
creates some fresh texelFetch piglit failures -- that test is
interesting because it does a lot of draws with minimal state changes
between them. Those ones are fixed by the second patch. But really
these are all different problems, which interact with each other in
frustrating ways.

>
> Anyway, this patch is :
>
> Reviewed-by: Samuel Pitoiset 
>
>
> On 10/10/2015 08:12 AM, Ilia Mirkin wrote:
>>
>> Right now we emit on every kick, but this is only necessary if something
>> will ever be able to observe that the fence completed. If there are no
>> refs, leave the fence alone and emit it another day.
>>
>> This also happens to work around an issue for the kick handler -- a kick
>> can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be
>> due to lack of space in the pushbuf. We want the emit to happen in the
>> current batch, so we want there to always be enough space. However an
>> explicit kick could take the reserved space for the implicitly-triggered
>> kick's fence emission if it happened right after. With the new mechanism,
>> hopefully there's no way to cause two fences to be emitted into the same
>> reserved space.
>>
>> Signed-off-by: Ilia Mirkin 
>> Cc: mesa-sta...@lists.freedesktop.org
>> Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
>> ---
>>   src/gallium/drivers/nouveau/nouveau_fence.c | 12 +---
>>   1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/nouveau_fence.c
>> b/src/gallium/drivers/nouveau/nouveau_fence.c
>> index ee4e08d..18b1592 100644
>> --- a/src/gallium/drivers/nouveau/nouveau_fence.c
>> +++ b/src/gallium/drivers/nouveau/nouveau_fence.c
>> @@ -190,8 +190,10 @@ nouveau_fence_wait(struct nouveau_fence *fence)
>>  /* wtf, someone is waiting on a fence in flush_notify handler? */
>>  assert(fence->state != NOUVEAU_FENCE_STATE_EMITTING);
>>   -   if (fence->state < NOUVEAU_FENCE_STATE_EMITTED)
>> +   if (fence->state < NOUVEAU_FENCE_STATE_EMITTED) {
>> +  PUSH_SPACE(screen->pushbuf, 8);
>> nouveau_fence_emit(fence);
>> +   }
>>if (fence->state < NOUVEAU_FENCE_STATE_FLUSHED)
>> if (nouveau_pushbuf_kick(screen->pushbuf,
>> screen->pushbuf->channel))
>> @@ -224,8 +226,12 @@ nouveau_fence_wait(struct nouveau_fence *fence)
>>   void
>>   nouveau_fence_next(struct nouveau_screen *screen)
>>   {
>> -   if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING)
>> -  nouveau_fence_emit(screen->fence.current);
>> +   if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING) {
>> +  if (screen->fence.current->ref > 1)
>> + nouveau_fence_emit(screen->fence.current);
>> +  else
>> + return;
>> +   }
>>nouveau_fence_ref(NULL, >fence.current);
>>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/17] i965/vs: Rework vs_emit to take a nir_shader and a brw_compiler

2015-10-10 Thread Jason Ekstrand
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 67 +-
 src/mesa/drivers/dri/i965/brw_vs.c | 14 ++-
 src/mesa/drivers/dri/i965/brw_vs.h | 11 --
 3 files changed, 44 insertions(+), 48 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 4b03967..d6549de 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1937,51 +1937,41 @@ extern "C" {
  * Returns the final assembly and the program's size.
  */
 const unsigned *
-brw_vs_emit(struct brw_context *brw,
+brw_vs_emit(const struct brw_compiler *compiler, void *log_data,
 void *mem_ctx,
 const struct brw_vs_prog_key *key,
 struct brw_vs_prog_data *prog_data,
-struct gl_vertex_program *vp,
-struct gl_shader_program *prog,
+const nir_shader *shader,
+gl_clip_plane *clip_planes,
 int shader_time_index,
-unsigned *final_assembly_size)
+unsigned *final_assembly_size,
+char **error_str)
 {
const unsigned *assembly = NULL;
 
-   if (brw->intelScreen->compiler->scalar_vs) {
+   if (compiler->scalar_vs) {
   prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
 
-  fs_visitor v(brw->intelScreen->compiler, brw,
-   mem_ctx, key, _data->base.base,
+  fs_visitor v(compiler, log_data, mem_ctx, key, _data->base.base,
NULL, /* prog; Only used for TEXTURE_RECTANGLE on gen < 8 */
-   vp->Base.nir, 8, shader_time_index);
-  if (!v.run_vs(brw_select_clip_planes(>ctx))) {
- if (prog) {
-prog->LinkStatus = false;
-ralloc_strcat(>InfoLog, v.fail_msg);
- }
-
- _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
-   v.fail_msg);
+   shader, 8, shader_time_index);
+  if (!v.run_vs(clip_planes)) {
+ if (error_str)
+*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
 
  return NULL;
   }
 
-  fs_generator g(brw->intelScreen->compiler, brw,
- mem_ctx, (void *) key, _data->base.base,
- v.promoted_constants,
+  fs_generator g(compiler, log_data, mem_ctx, (void *) key,
+ _data->base.base, v.promoted_constants,
  v.runtime_check_aads_emit, "VS");
   if (INTEL_DEBUG & DEBUG_VS) {
- char *name;
- if (prog) {
-name = ralloc_asprintf(mem_ctx, "%s vertex shader %d",
-   prog->Label ? prog->Label : "unnamed",
-   prog->Name);
- } else {
-name = ralloc_asprintf(mem_ctx, "vertex program %d",
-   vp->Base.Id);
- }
- g.enable_debug(name);
+ const char *debug_name =
+ralloc_asprintf(mem_ctx, "%s vertex shader %s",
+shader->info.label ? shader->info.label : 
"unnamed",
+shader->info.name);
+
+ g.enable_debug(debug_name);
   }
   g.generate_code(v.cfg, 8);
   assembly = g.get_assembly(final_assembly_size);
@@ -1990,25 +1980,18 @@ brw_vs_emit(struct brw_context *brw,
if (!assembly) {
   prog_data->base.dispatch_mode = DISPATCH_MODE_4X2_DUAL_OBJECT;
 
-  vec4_vs_visitor v(brw->intelScreen->compiler, brw, key, prog_data,
-vp->Base.nir, brw_select_clip_planes(>ctx),
-mem_ctx, shader_time_index);
+  vec4_vs_visitor v(compiler, log_data, key, prog_data,
+shader, clip_planes, mem_ctx, shader_time_index);
   if (!v.run()) {
- if (prog) {
-prog->LinkStatus = false;
-ralloc_strcat(>InfoLog, v.fail_msg);
- }
-
- _mesa_problem(NULL, "Failed to compile vertex shader: %s\n",
-   v.fail_msg);
+ if (error_str)
+*error_str = ralloc_strdup(mem_ctx, v.fail_msg);
 
  return NULL;
   }
 
-  vec4_generator g(brw->intelScreen->compiler, brw,
-   _data->base,
+  vec4_generator g(compiler, log_data, _data->base,
mem_ctx, INTEL_DEBUG & DEBUG_VS, "vertex", "VS");
-  assembly = g.generate_assembly(v.cfg, final_assembly_size, vp->Base.nir);
+  assembly = g.generate_assembly(v.cfg, final_assembly_size, shader);
}
 
return assembly;
diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index ecaeefa..f54c9a3 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -180,9 +180,19 @@ brw_codegen_vs_prog(struct brw_context *brw,
 
/* Emit GEN4 code.
 */
-   program = 

[Mesa-dev] [PATCH 1/6] glsl: couple shader_enums cleanups

2015-10-10 Thread Rob Clark
From: Rob Clark 

Add missing enum to gl_system_value_name() and move VARYING_SLOT_MAX /
FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil.

v2: add STATIC_ASSERT()'s

Reported-by: Emil Velikov 
Signed-off-by: Rob Clark 
---
 src/glsl/nir/shader_enums.c | 8 
 src/glsl/nir/shader_enums.h | 7 +++
 src/mesa/main/mtypes.h  | 5 -
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/glsl/nir/shader_enums.c b/src/glsl/nir/shader_enums.c
index 3722475..66a25e7 100644
--- a/src/glsl/nir/shader_enums.c
+++ b/src/glsl/nir/shader_enums.c
@@ -28,6 +28,7 @@
 
 #include "shader_enums.h"
 #include "util/macros.h"
+#include "mesa/main/config.h"
 
 #define ENUM(x) [x] = #x
 #define NAME(val) val) < ARRAY_SIZE(names)) && names[(val)]) ? 
names[(val)] : "UNKNOWN")
@@ -42,6 +43,7 @@ const char * gl_shader_stage_name(gl_shader_stage stage)
   ENUM(MESA_SHADER_FRAGMENT),
   ENUM(MESA_SHADER_COMPUTE),
};
+   STATIC_ASSERT(ARRAY_SIZE(names) == MESA_SHADER_STAGES);
return NAME(stage);
 }
 
@@ -82,6 +84,7 @@ const char * gl_vert_attrib_name(gl_vert_attrib attrib)
   ENUM(VERT_ATTRIB_GENERIC14),
   ENUM(VERT_ATTRIB_GENERIC15),
};
+   STATIC_ASSERT(ARRAY_SIZE(names) == VERT_ATTRIB_MAX);
return NAME(attrib);
 }
 
@@ -147,6 +150,7 @@ const char * gl_varying_slot_name(gl_varying_slot slot)
   ENUM(VARYING_SLOT_VAR30),
   ENUM(VARYING_SLOT_VAR31),
};
+   STATIC_ASSERT(ARRAY_SIZE(names) == VARYING_SLOT_MAX);
return NAME(slot);
 }
 
@@ -169,8 +173,10 @@ const char * gl_system_value_name(gl_system_value sysval)
  ENUM(SYSTEM_VALUE_TESS_LEVEL_INNER),
  ENUM(SYSTEM_VALUE_LOCAL_INVOCATION_ID),
  ENUM(SYSTEM_VALUE_WORK_GROUP_ID),
+ ENUM(SYSTEM_VALUE_NUM_WORK_GROUPS),
  ENUM(SYSTEM_VALUE_VERTEX_CNT),
};
+   STATIC_ASSERT(ARRAY_SIZE(names) == SYSTEM_VALUE_MAX);
return NAME(sysval);
 }
 
@@ -182,6 +188,7 @@ const char * glsl_interp_qualifier_name(enum 
glsl_interp_qualifier qual)
   ENUM(INTERP_QUALIFIER_FLAT),
   ENUM(INTERP_QUALIFIER_NOPERSPECTIVE),
};
+   STATIC_ASSERT(ARRAY_SIZE(names) == INTERP_QUALIFIER_COUNT);
return NAME(qual);
 }
 
@@ -201,5 +208,6 @@ const char * gl_frag_result_name(gl_frag_result result)
   ENUM(FRAG_RESULT_DATA6),
   ENUM(FRAG_RESULT_DATA7),
};
+   STATIC_ASSERT(ARRAY_SIZE(names) == FRAG_RESULT_MAX);
return NAME(result);
 }
diff --git a/src/glsl/nir/shader_enums.h b/src/glsl/nir/shader_enums.h
index 2a5d2c5..77638ba 100644
--- a/src/glsl/nir/shader_enums.h
+++ b/src/glsl/nir/shader_enums.h
@@ -233,6 +233,11 @@ typedef enum
VARYING_SLOT_VAR31,
 } gl_varying_slot;
 
+
+#define VARYING_SLOT_MAX   (VARYING_SLOT_VAR0 + MAX_VARYING)
+#define VARYING_SLOT_PATCH0(VARYING_SLOT_MAX)
+#define VARYING_SLOT_TESS_MAX  (VARYING_SLOT_PATCH0 + MAX_VARYING)
+
 const char * gl_varying_slot_name(gl_varying_slot slot);
 
 /**
@@ -473,4 +478,6 @@ typedef enum
 
 const char * gl_frag_result_name(gl_frag_result result);
 
+#define FRAG_RESULT_MAX(FRAG_RESULT_DATA0 + MAX_DRAW_BUFFERS)
+
 #endif /* SHADER_ENUMS_H */
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 0a54b20..ba94402 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -94,11 +94,6 @@ struct vbo_context;
 #define PRIM_OUTSIDE_BEGIN_END   (PRIM_MAX + 1)
 #define PRIM_UNKNOWN (PRIM_MAX + 2)
 
-#define VARYING_SLOT_MAX   (VARYING_SLOT_VAR0 + MAX_VARYING)
-#define VARYING_SLOT_PATCH0(VARYING_SLOT_MAX)
-#define VARYING_SLOT_TESS_MAX  (VARYING_SLOT_PATCH0 + MAX_VARYING)
-#define FRAG_RESULT_MAX(FRAG_RESULT_DATA0 + MAX_DRAW_BUFFERS)
-
 /**
  * Determine if the given gl_varying_slot appears in the fragment shader.
  */
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] glsl: move half<->float convertion to util

2015-10-10 Thread Rob Clark
From: Rob Clark 

Needed in NIR too, so move out of mesa/main/imports.c

Signed-off-by: Rob Clark 
---
 src/glsl/Makefile.am  |   1 +
 src/mesa/main/imports.c   | 148 --
 src/mesa/main/imports.h   |  38 --
 src/util/Makefile.sources |   2 +
 src/util/convert.c| 179 ++
 src/util/convert.h|  43 +++
 6 files changed, 259 insertions(+), 152 deletions(-)
 create mode 100644 src/util/convert.c
 create mode 100644 src/util/convert.h

diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am
index 3265391..347919b 100644
--- a/src/glsl/Makefile.am
+++ b/src/glsl/Makefile.am
@@ -160,6 +160,7 @@ glsl_compiler_SOURCES = \
 glsl_compiler_LDADD =  \
libglsl.la  \
$(top_builddir)/src/libglsl_util.la \
+   $(top_builddir)/src/util/libmesautil.la \
$(PTHREAD_LIBS)
 
 glsl_test_SOURCES = \
diff --git a/src/mesa/main/imports.c b/src/mesa/main/imports.c
index 350e675..230ebbc 100644
--- a/src/mesa/main/imports.c
+++ b/src/mesa/main/imports.c
@@ -307,154 +307,6 @@ _mesa_bitcount_64(uint64_t n)
 }
 #endif
 
-
-/**
- * Convert a 4-byte float to a 2-byte half float.
- *
- * Not all float32 values can be represented exactly as a float16 value. We
- * round such intermediate float32 values to the nearest float16. When the
- * float32 lies exactly between to float16 values, we round to the one with
- * an even mantissa.
- *
- * This rounding behavior has several benefits:
- *   - It has no sign bias.
- *
- *   - It reproduces the behavior of real hardware: opcode F32TO16 in Intel's
- * GPU ISA.
- *
- *   - By reproducing the behavior of the GPU (at least on Intel hardware),
- * compile-time evaluation of constant packHalf2x16 GLSL expressions will
- * result in the same value as if the expression were executed on the GPU.
- */
-GLhalfARB
-_mesa_float_to_half(float val)
-{
-   const fi_type fi = {val};
-   const int flt_m = fi.i & 0x7f;
-   const int flt_e = (fi.i >> 23) & 0xff;
-   const int flt_s = (fi.i >> 31) & 0x1;
-   int s, e, m = 0;
-   GLhalfARB result;
-   
-   /* sign bit */
-   s = flt_s;
-
-   /* handle special cases */
-   if ((flt_e == 0) && (flt_m == 0)) {
-  /* zero */
-  /* m = 0; - already set */
-  e = 0;
-   }
-   else if ((flt_e == 0) && (flt_m != 0)) {
-  /* denorm -- denorm float maps to 0 half */
-  /* m = 0; - already set */
-  e = 0;
-   }
-   else if ((flt_e == 0xff) && (flt_m == 0)) {
-  /* infinity */
-  /* m = 0; - already set */
-  e = 31;
-   }
-   else if ((flt_e == 0xff) && (flt_m != 0)) {
-  /* NaN */
-  m = 1;
-  e = 31;
-   }
-   else {
-  /* regular number */
-  const int new_exp = flt_e - 127;
-  if (new_exp < -14) {
- /* The float32 lies in the range (0.0, min_normal16) and is rounded
-  * to a nearby float16 value. The result will be either zero, 
subnormal,
-  * or normal.
-  */
- e = 0;
- m = _mesa_lroundevenf((1 << 24) * fabsf(fi.f));
-  }
-  else if (new_exp > 15) {
- /* map this value to infinity */
- /* m = 0; - already set */
- e = 31;
-  }
-  else {
- /* The float32 lies in the range
-  *   [min_normal16, max_normal16 + max_step16)
-  * and is rounded to a nearby float16 value. The result will be
-  * either normal or infinite.
-  */
- e = new_exp + 15;
- m = _mesa_lroundevenf(flt_m / (float) (1 << 13));
-  }
-   }
-
-   assert(0 <= m && m <= 1024);
-   if (m == 1024) {
-  /* The float32 was rounded upwards into the range of the next exponent,
-   * so bump the exponent. This correctly handles the case where f32
-   * should be rounded up to float16 infinity.
-   */
-  ++e;
-  m = 0;
-   }
-
-   result = (s << 15) | (e << 10) | m;
-   return result;
-}
-
-
-/**
- * Convert a 2-byte half float to a 4-byte float.
- * Based on code from:
- * http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/008786.html
- */
-float
-_mesa_half_to_float(GLhalfARB val)
-{
-   /* XXX could also use a 64K-entry lookup table */
-   const int m = val & 0x3ff;
-   const int e = (val >> 10) & 0x1f;
-   const int s = (val >> 15) & 0x1;
-   int flt_m, flt_e, flt_s;
-   fi_type fi;
-   float result;
-
-   /* sign bit */
-   flt_s = s;
-
-   /* handle special cases */
-   if ((e == 0) && (m == 0)) {
-  /* zero */
-  flt_m = 0;
-  flt_e = 0;
-   }
-   else if ((e == 0) && (m != 0)) {
-  /* denorm -- denorm half will fit in non-denorm single */
-  const float half_denorm = 1.0f / 16384.0f; /* 2^-14 */
-  float mantissa = ((float) (m)) / 1024.0f;
-  float sign = s ? -1.0f : 1.0f;
-  return sign * mantissa * half_denorm;
-   }
-   else if ((e == 31) && (m == 

[Mesa-dev] [PATCH 2/6] glsl: move builtin types to glsl_types.cpp

2015-10-10 Thread Rob Clark
From: Rob Clark 

First step at untangling NIR's dependency on glsl_types without bringing
in the dependency on glsl_symbol_table.  The builtin types are now in
glsl_types (which will end up in NIR), but adding them to the symbol-
table stays in builtin_types.cpp (which will not be part of NIR).

Signed-off-by: Rob Clark 
---
 src/glsl/builtin_types.cpp |  4 +---
 src/glsl/glsl_types.cpp| 14 ++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/glsl/builtin_types.cpp b/src/glsl/builtin_types.cpp
index 0aedbb3..bbdcd19 100644
--- a/src/glsl/builtin_types.cpp
+++ b/src/glsl/builtin_types.cpp
@@ -43,9 +43,7 @@
  * convenience pointers (glsl_type::foo_type).
  * @{
  */
-#define DECL_TYPE(NAME, ...)\
-   const glsl_type glsl_type::_##NAME##_type = glsl_type(__VA_ARGS__, #NAME); \
-   const glsl_type *const glsl_type::NAME##_type = _type::_##NAME##_type;
+#define DECL_TYPE(NAME, ...)
 
 #define STRUCT_TYPE(NAME)   \
const glsl_type glsl_type::_struct_##NAME##_type =   \
diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
index b9cb97c..b0bb2ff 100644
--- a/src/glsl/glsl_types.cpp
+++ b/src/glsl/glsl_types.cpp
@@ -1713,3 +1713,17 @@ glsl_type::coordinate_components() const
 
return size;
 }
+
+/**
+ * Declarations of type flyweights (glsl_type::_foo_type) and
+ * convenience pointers (glsl_type::foo_type).
+ * @{
+ */
+#define DECL_TYPE(NAME, ...)\
+   const glsl_type glsl_type::_##NAME##_type = glsl_type(__VA_ARGS__, #NAME); \
+   const glsl_type *const glsl_type::NAME##_type = _type::_##NAME##_type;
+
+#define STRUCT_TYPE(NAME)
+
+#include "builtin_type_macros.h"
+/** @} */
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/6] Remove NIR dependency on GLSL

2015-10-10 Thread Rob Clark
From: Rob Clark 

This patchset removes the NIR dependency on GLSL (and includes resend
of shader_enums cleanups w/ addition of STATIC_ASSERT()'s)

Split up glsl_types so the builtin-types go w/ glsl_types but the parts
that add them to glsl_symbol_table stay with glsl.  This way we can move
glsl_types into NIR without dragging along glsl_symbol_table and all of
it's dependencies.

Also move the half/float conversion into util so it can be used from NIR
without bringing an external dependency.

With this we can move glsl_types into NIR and drop the dependency on
GLSL, and mostly remove the libglsl_util hack.  (The standalone glsl-
compiler util still needs libglsl_util, so we can't remove it completely
yet, but we can remove the dependency on libglsl_util from non-mesa
state trackers.  And a hypothetical vulkan implementation using NIR
should also not need to suck in libglsl_util.)

Probably there is some room to rename things to complete the cleanup,
but I figured it was good to split things up into moving things first,
and do flag-day renames second (if desired).

Rob Clark (6):
  glsl: couple shader_enums cleanups
  glsl: move builtin types to glsl_types.cpp
  glsl: move half<->float convertion to util
  nir: use util/convert.h
  nir: remove dependency on glsl
  glsl: (mostly) remove libglsl_util

 src/gallium/drivers/freedreno/Makefile.am  |3 +-
 src/gallium/targets/d3dadapter9/Makefile.am|1 -
 src/gallium/targets/pipe-loader/Makefile.am|1 -
 src/gallium/targets/xa/Makefile.am |1 -
 src/glsl/Makefile.am   |   10 +-
 src/glsl/Makefile.sources  |4 +-
 src/glsl/builtin_type_macros.h |  172 --
 src/glsl/builtin_types.cpp |4 +-
 src/glsl/glsl_types.cpp| 1715 ---
 src/glsl/glsl_types.h  |  867 --
 src/glsl/nir/builtin_type_macros.h |  172 ++
 src/glsl/nir/glsl_types.cpp| 1729 
 src/glsl/nir/glsl_types.h  |  867 ++
 src/glsl/nir/nir_constant_expressions.py   |5 +-
 src/glsl/nir/nir_types.h   |2 +-
 src/glsl/nir/shader_enums.c|8 +
 src/glsl/nir/shader_enums.h|7 +
 .../drivers/dri/i965/brw_cubemap_normalize.cpp |2 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp   |2 +-
 src/mesa/drivers/dri/i965/brw_fs.h |2 +-
 .../dri/i965/brw_fs_channel_expressions.cpp|2 +-
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |2 +-
 .../drivers/dri/i965/brw_fs_vector_splitting.cpp   |2 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |2 +-
 .../dri/i965/brw_lower_unnormalized_offset.cpp |2 +-
 .../drivers/dri/i965/brw_schedule_instructions.cpp |2 +-
 src/mesa/main/ff_fragment_shader.cpp   |2 +-
 src/mesa/main/imports.c|  148 --
 src/mesa/main/imports.h|   38 +-
 src/mesa/main/mtypes.h |5 -
 src/mesa/main/uniforms.h   |2 +-
 src/mesa/program/ir_to_mesa.cpp|2 +-
 src/mesa/program/sampler.cpp   |2 +-
 src/util/Makefile.sources  |2 +
 src/util/convert.c |  179 ++
 src/util/convert.h |   43 +
 36 files changed, 3063 insertions(+), 2946 deletions(-)
 delete mode 100644 src/glsl/builtin_type_macros.h
 delete mode 100644 src/glsl/glsl_types.cpp
 delete mode 100644 src/glsl/glsl_types.h
 create mode 100644 src/glsl/nir/builtin_type_macros.h
 create mode 100644 src/glsl/nir/glsl_types.cpp
 create mode 100644 src/glsl/nir/glsl_types.h
 create mode 100644 src/util/convert.c
 create mode 100644 src/util/convert.h

-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] glsl: (mostly) remove libglsl_util

2015-10-10 Thread Rob Clark
From: Rob Clark 

Now that NIR does not depend on glsl, we can (mostly[*]) get rid of the
libglsl_util hack.

[*] glsl_compiler is the one remaining user of libglsl_util

Signed-off-by: Rob Clark 
---
 src/gallium/drivers/freedreno/Makefile.am   | 3 +--
 src/gallium/targets/d3dadapter9/Makefile.am | 1 -
 src/gallium/targets/pipe-loader/Makefile.am | 1 -
 src/gallium/targets/xa/Makefile.am  | 1 -
 src/glsl/Makefile.am| 6 --
 5 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/src/gallium/drivers/freedreno/Makefile.am 
b/src/gallium/drivers/freedreno/Makefile.am
index dff95ba..3de8e0f 100644
--- a/src/gallium/drivers/freedreno/Makefile.am
+++ b/src/gallium/drivers/freedreno/Makefile.am
@@ -19,7 +19,7 @@ libfreedreno_la_SOURCES = \
 
 noinst_PROGRAMS = ir3_compiler
 
-# XXX: Required due to the C++ sources in libnir/libglsl_util
+# XXX: Required due to the C++ sources in libnir
 nodist_EXTRA_ir3_compiler_SOURCES = dummy.cpp
 ir3_compiler_SOURCES = \
ir3/ir3_cmdline.c
@@ -28,7 +28,6 @@ ir3_compiler_LDADD = \
libfreedreno.la \
$(top_builddir)/src/gallium/auxiliary/libgallium.la \
$(top_builddir)/src/glsl/libnir.la \
-   $(top_builddir)/src/libglsl_util.la \
$(top_builddir)/src/util/libmesautil.la \
$(GALLIUM_COMMON_LIB_DEPS) \
$(FREEDRENO_LIBS)
diff --git a/src/gallium/targets/d3dadapter9/Makefile.am 
b/src/gallium/targets/d3dadapter9/Makefile.am
index e26ca33..b522147 100644
--- a/src/gallium/targets/d3dadapter9/Makefile.am
+++ b/src/gallium/targets/d3dadapter9/Makefile.am
@@ -76,7 +76,6 @@ d3dadapter9_la_LIBADD = \
$(top_builddir)/src/gallium/auxiliary/libgalliumvl_stub.la \
$(top_builddir)/src/gallium/auxiliary/libgallium.la \
$(top_builddir)/src/glsl/libnir.la \
-   $(top_builddir)/src/libglsl_util.la \
$(top_builddir)/src/gallium/state_trackers/nine/libninetracker.la \
$(top_builddir)/src/util/libmesautil.la \
$(top_builddir)/src/gallium/winsys/sw/wrapper/libwsw.la \
diff --git a/src/gallium/targets/pipe-loader/Makefile.am 
b/src/gallium/targets/pipe-loader/Makefile.am
index 4d9f7be..4f25b4f 100644
--- a/src/gallium/targets/pipe-loader/Makefile.am
+++ b/src/gallium/targets/pipe-loader/Makefile.am
@@ -53,7 +53,6 @@ endif
 PIPE_LIBS += \
$(top_builddir)/src/gallium/auxiliary/libgallium.la \
$(top_builddir)/src/glsl/libnir.la \
-   $(top_builddir)/src/libglsl_util.la \
$(top_builddir)/src/util/libmesautil.la \
$(top_builddir)/src/gallium/drivers/rbug/librbug.la \
$(top_builddir)/src/gallium/drivers/trace/libtrace.la \
diff --git a/src/gallium/targets/xa/Makefile.am 
b/src/gallium/targets/xa/Makefile.am
index 92173de..02c42c6 100644
--- a/src/gallium/targets/xa/Makefile.am
+++ b/src/gallium/targets/xa/Makefile.am
@@ -38,7 +38,6 @@ libxatracker_la_LIBADD = \
$(top_builddir)/src/gallium/auxiliary/libgalliumvl_stub.la \
$(top_builddir)/src/gallium/auxiliary/libgallium.la \
$(top_builddir)/src/glsl/libnir.la \
-   $(top_builddir)/src/libglsl_util.la \
$(top_builddir)/src/util/libmesautil.la \
$(LIBDRM_LIBS) \
$(GALLIUM_COMMON_LIB_DEPS)
diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am
index 437c6a5..ebea816 100644
--- a/src/glsl/Makefile.am
+++ b/src/glsl/Makefile.am
@@ -96,7 +96,6 @@ tests_general_ir_test_CFLAGS =
\
 tests_general_ir_test_LDADD =  \
$(top_builddir)/src/gtest/libgtest.la   \
$(top_builddir)/src/glsl/libglsl.la \
-   $(top_builddir)/src/libglsl_util.la \
$(PTHREAD_LIBS)
 
 tests_uniform_initializer_test_SOURCES =   \
@@ -109,7 +108,6 @@ tests_uniform_initializer_test_CFLAGS = 
\
 tests_uniform_initializer_test_LDADD = \
$(top_builddir)/src/gtest/libgtest.la   \
$(top_builddir)/src/glsl/libglsl.la \
-   $(top_builddir)/src/libglsl_util.la \
$(PTHREAD_LIBS)
 
 tests_sampler_types_test_SOURCES = \
@@ -119,7 +117,6 @@ tests_sampler_types_test_CFLAGS =   \
 tests_sampler_types_test_LDADD =   \
$(top_builddir)/src/gtest/libgtest.la   \
$(top_builddir)/src/glsl/libglsl.la \
-   $(top_builddir)/src/libglsl_util.la \
$(PTHREAD_LIBS)
 
 libglcpp_la_LIBADD =   \
@@ -134,7 +131,6 @@ glcpp_glcpp_SOURCES =   
\
glcpp/glcpp.c
 glcpp_glcpp_LDADD =\
libglcpp.la \
-   $(top_builddir)/src/libglsl_util.la \
-lm
 
 libglsl_la_LIBADD = libglcpp.la
@@ -168,7 +164,6 @@ glsl_test_SOURCES = \
 
 

[Mesa-dev] [PATCH 4/6] nir: use util/convert.h

2015-10-10 Thread Rob Clark
From: Rob Clark 

Signed-off-by: Rob Clark 
---
 src/glsl/nir/nir_constant_expressions.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/glsl/nir/nir_constant_expressions.py 
b/src/glsl/nir/nir_constant_expressions.py
index 8fd9b10..ba28c0e 100644
--- a/src/glsl/nir/nir_constant_expressions.py
+++ b/src/glsl/nir/nir_constant_expressions.py
@@ -28,6 +28,7 @@ template = """\
 
 #include 
 #include "main/core.h"
+#include "util/convert.h"
 #include "util/rounding.h" /* for _mesa_roundeven */
 #include "nir_constant_expressions.h"
 
@@ -199,7 +200,7 @@ unpack_unorm_1x16(uint16_t u)
 static uint16_t
 pack_half_1x16(float x)
 {
-   return _mesa_float_to_half(x);
+   return float_to_half(x);
 }
 
 /**
@@ -208,7 +209,7 @@ pack_half_1x16(float x)
 static float
 unpack_half_1x16(uint16_t u)
 {
-   return _mesa_half_to_float(u);
+   return half_to_float(u);
 }
 
 /* Some typed vector structures to make things like src0.y work */
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/10] radeonsi: cleanup copy-pasted scratch buffer updates

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 39 +
 1 file changed, 13 insertions(+), 26 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index c1d61d5..9395c31 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1243,7 +1243,6 @@ static bool si_update_spi_tmpring_size(struct si_context 
*sctx)
int r;
 
if (scratch_needed_size > 0) {
-
if (scratch_needed_size > current_scratch_buffer_size) {
/* Create a bigger scratch buffer */
pipe_resource_reference(
@@ -1282,38 +1281,26 @@ static bool si_update_spi_tmpring_size(struct 
si_context *sctx)
si_pm4_bind_state(sctx, hs, 
sctx->tcs_shader->current->pm4);
 
/* VS can be bound as LS, ES, or VS. */
-   if (sctx->tes_shader) {
-   r = si_update_scratch_buffer(sctx, sctx->vs_shader);
-   if (r < 0)
-   return false;
-   if (r == 1)
+   r = si_update_scratch_buffer(sctx, sctx->vs_shader);
+   if (r < 0)
+   return false;
+   if (r == 1) {
+   if (sctx->tes_shader)
si_pm4_bind_state(sctx, ls, 
sctx->vs_shader->current->pm4);
-   } else if (sctx->gs_shader) {
-   r = si_update_scratch_buffer(sctx, sctx->vs_shader);
-   if (r < 0)
-   return false;
-   if (r == 1)
+   else if (sctx->gs_shader)
si_pm4_bind_state(sctx, es, 
sctx->vs_shader->current->pm4);
-   } else {
-   r = si_update_scratch_buffer(sctx, sctx->vs_shader);
-   if (r < 0)
-   return false;
-   if (r == 1)
+   else
si_pm4_bind_state(sctx, vs, 
sctx->vs_shader->current->pm4);
}
 
/* TES can be bound as ES or VS. */
-   if (sctx->gs_shader) {
-   r = si_update_scratch_buffer(sctx, sctx->tes_shader);
-   if (r < 0)
-   return false;
-   if (r == 1)
+   r = si_update_scratch_buffer(sctx, sctx->tes_shader);
+   if (r < 0)
+   return false;
+   if (r == 1) {
+   if (sctx->gs_shader)
si_pm4_bind_state(sctx, es, 
sctx->tes_shader->current->pm4);
-   } else {
-   r = si_update_scratch_buffer(sctx, sctx->tes_shader);
-   if (r < 0)
-   return false;
-   if (r == 1)
+   else
si_pm4_bind_state(sctx, vs, 
sctx->tes_shader->current->pm4);
}
}
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/10] tgsi: move pipe_shader_from_tgsi_processor function to util

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/auxiliary/tgsi/tgsi_ureg.c | 26 ++
 src/gallium/auxiliary/util/u_inlines.h | 22 ++
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c 
b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
index 3d21319..f2f5181 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
@@ -35,6 +35,7 @@
 #include "tgsi/tgsi_dump.h"
 #include "tgsi/tgsi_sanity.h"
 #include "util/u_debug.h"
+#include "util/u_inlines.h"
 #include "util/u_memory.h"
 #include "util/u_math.h"
 #include "util/u_bitmask.h"
@@ -1830,29 +1831,6 @@ void ureg_free_tokens( const struct tgsi_token *tokens )
 }
 
 
-static inline unsigned
-pipe_shader_from_tgsi_processor(unsigned processor)
-{
-   switch (processor) {
-   case TGSI_PROCESSOR_VERTEX:
-  return PIPE_SHADER_VERTEX;
-   case TGSI_PROCESSOR_TESS_CTRL:
-  return PIPE_SHADER_TESS_CTRL;
-   case TGSI_PROCESSOR_TESS_EVAL:
-  return PIPE_SHADER_TESS_EVAL;
-   case TGSI_PROCESSOR_GEOMETRY:
-  return PIPE_SHADER_GEOMETRY;
-   case TGSI_PROCESSOR_FRAGMENT:
-  return PIPE_SHADER_FRAGMENT;
-   case TGSI_PROCESSOR_COMPUTE:
-  return PIPE_SHADER_COMPUTE;
-   default:
-  assert(0);
-  return PIPE_SHADER_VERTEX;
-   }
-}
-
-
 struct ureg_program *
 ureg_create(unsigned processor)
 {
@@ -1872,7 +1850,7 @@ ureg_create_with_screen(unsigned processor, struct 
pipe_screen *screen)
ureg->supports_any_inout_decl_range =
   screen &&
   screen->get_shader_param(screen,
-   pipe_shader_from_tgsi_processor(processor),
+   util_pipe_shader_from_tgsi_processor(processor),
PIPE_SHADER_CAP_TGSI_ANY_INOUT_DECL_RANGE) != 0;
 
for (i = 0; i < Elements(ureg->properties); i++)
diff --git a/src/gallium/auxiliary/util/u_inlines.h 
b/src/gallium/auxiliary/util/u_inlines.h
index bb99a02..384e267 100644
--- a/src/gallium/auxiliary/util/u_inlines.h
+++ b/src/gallium/auxiliary/util/u_inlines.h
@@ -651,6 +651,28 @@ util_max_layer(const struct pipe_resource *r, unsigned 
level)
}
 }
 
+static inline unsigned
+util_pipe_shader_from_tgsi_processor(unsigned processor)
+{
+   switch (processor) {
+   case TGSI_PROCESSOR_VERTEX:
+  return PIPE_SHADER_VERTEX;
+   case TGSI_PROCESSOR_TESS_CTRL:
+  return PIPE_SHADER_TESS_CTRL;
+   case TGSI_PROCESSOR_TESS_EVAL:
+  return PIPE_SHADER_TESS_EVAL;
+   case TGSI_PROCESSOR_GEOMETRY:
+  return PIPE_SHADER_GEOMETRY;
+   case TGSI_PROCESSOR_FRAGMENT:
+  return PIPE_SHADER_FRAGMENT;
+   case TGSI_PROCESSOR_COMPUTE:
+  return PIPE_SHADER_COMPUTE;
+   default:
+  assert(0);
+  return PIPE_SHADER_VERTEX;
+   }
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/10] radeonsi: remove an unused ctx parameter in si_shader_destroy

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_compute.c   | 4 ++--
 src/gallium/drivers/radeonsi/si_shader.c| 4 ++--
 src/gallium/drivers/radeonsi/si_shader.h| 2 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index c660534..697e60a 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -469,7 +469,7 @@ static void si_delete_compute_state(struct pipe_context 
*ctx, void* state){
if (program->kernels) {
for (int i = 0; i < program->num_kernels; i++){
if (program->kernels[i].bo){
-   si_shader_destroy(ctx, >kernels[i]);
+   si_shader_destroy(>kernels[i]);
}
}
FREE(program->kernels);
@@ -482,7 +482,7 @@ static void si_delete_compute_state(struct pipe_context 
*ctx, void* state){
FREE(program->shader.binary.config);
FREE(program->shader.binary.rodata);
FREE(program->shader.binary.global_symbol_offsets);
-   si_shader_destroy(ctx, >shader);
+   si_shader_destroy(>shader);
 #endif
 
pipe_resource_reference(
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 789b1b7..0e98915 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4190,10 +4190,10 @@ out:
return r;
 }
 
-void si_shader_destroy(struct pipe_context *ctx, struct si_shader *shader)
+void si_shader_destroy(struct si_shader *shader)
 {
if (shader->gs_copy_shader)
-   si_shader_destroy(ctx, shader->gs_copy_shader);
+   si_shader_destroy(shader->gs_copy_shader);
 
if (shader->scratch_bo)
r600_resource_reference(>scratch_bo, NULL);
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index b92fa02..460 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -324,7 +324,7 @@ int si_shader_create(struct si_screen *sscreen, 
LLVMTargetMachineRef tm,
 void si_dump_shader_key(unsigned shader, union si_shader_key *key, FILE *f);
 int si_compile_llvm(struct si_screen *sscreen, struct si_shader *shader,
LLVMTargetMachineRef tm, LLVMModuleRef mod);
-void si_shader_destroy(struct pipe_context *ctx, struct si_shader *shader);
+void si_shader_destroy(struct si_shader *shader);
 unsigned si_shader_io_get_unique_index(unsigned semantic_name, unsigned index);
 int si_shader_binary_upload(struct si_screen *sscreen, struct si_shader 
*shader);
 int si_shader_binary_read(struct si_screen *sscreen, struct si_shader *shader);
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 2489101..9d05cb5 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -942,7 +942,7 @@ static void si_delete_shader_selector(struct pipe_context 
*ctx,
break;
}
 
-   si_shader_destroy(ctx, p);
+   si_shader_destroy(p);
free(p);
p = c;
}
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/10] radeonsi: print export_prim_id from the shader key

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_shader.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 109a805..789b1b7 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -3974,6 +3974,7 @@ void si_dump_shader_key(unsigned shader, union 
si_shader_key *key, FILE *f)
key->vs.es_enabled_outputs);
fprintf(f, "  as_es = %u\n", key->vs.as_es);
fprintf(f, "  as_ls = %u\n", key->vs.as_ls);
+   fprintf(f, "  export_prim_id = %u\n", key->vs.export_prim_id);
break;
 
case PIPE_SHADER_TESS_CTRL:
@@ -3985,6 +3986,7 @@ void si_dump_shader_key(unsigned shader, union 
si_shader_key *key, FILE *f)
fprintf(f, "  es_enabled_outputs = 0x%"PRIx64"\n",
key->tes.es_enabled_outputs);
fprintf(f, "  as_es = %u\n", key->tes.as_es);
+   fprintf(f, "  export_prim_id = %u\n", key->tes.export_prim_id);
break;
 
case PIPE_SHADER_GEOMETRY:
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/10] radeonsi: unify shader delete functions

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 84 +
 1 file changed, 17 insertions(+), 67 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 9d05cb5..cc053bb 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -907,11 +907,21 @@ static void si_bind_ps_shader(struct pipe_context *ctx, 
void *state)
si_mark_atom_dirty(sctx, >cb_target_mask);
 }
 
-static void si_delete_shader_selector(struct pipe_context *ctx,
- struct si_shader_selector *sel)
+static void si_delete_shader_selector(struct pipe_context *ctx, void *state)
 {
struct si_context *sctx = (struct si_context *)ctx;
+   struct si_shader_selector *sel = (struct si_shader_selector *)state;
struct si_shader *p = sel->current, *c;
+   struct si_shader_selector **current_shader[SI_NUM_SHADERS] = {
+   [PIPE_SHADER_VERTEX] = >vs_shader,
+   [PIPE_SHADER_TESS_CTRL] = >tcs_shader,
+   [PIPE_SHADER_TESS_EVAL] = >tes_shader,
+   [PIPE_SHADER_GEOMETRY] = >gs_shader,
+   [PIPE_SHADER_FRAGMENT] = >ps_shader,
+   };
+
+   if (*current_shader[sel->type] == sel)
+   *current_shader[sel->type] = NULL;
 
while (p) {
c = p->next_variant;
@@ -951,66 +961,6 @@ static void si_delete_shader_selector(struct pipe_context 
*ctx,
free(sel);
 }
 
-static void si_delete_vs_shader(struct pipe_context *ctx, void *state)
-{
-   struct si_context *sctx = (struct si_context *)ctx;
-   struct si_shader_selector *sel = (struct si_shader_selector *)state;
-
-   if (sctx->vs_shader == sel) {
-   sctx->vs_shader = NULL;
-   }
-
-   si_delete_shader_selector(ctx, sel);
-}
-
-static void si_delete_gs_shader(struct pipe_context *ctx, void *state)
-{
-   struct si_context *sctx = (struct si_context *)ctx;
-   struct si_shader_selector *sel = (struct si_shader_selector *)state;
-
-   if (sctx->gs_shader == sel) {
-   sctx->gs_shader = NULL;
-   }
-
-   si_delete_shader_selector(ctx, sel);
-}
-
-static void si_delete_ps_shader(struct pipe_context *ctx, void *state)
-{
-   struct si_context *sctx = (struct si_context *)ctx;
-   struct si_shader_selector *sel = (struct si_shader_selector *)state;
-
-   if (sctx->ps_shader == sel) {
-   sctx->ps_shader = NULL;
-   }
-
-   si_delete_shader_selector(ctx, sel);
-}
-
-static void si_delete_tcs_shader(struct pipe_context *ctx, void *state)
-{
-   struct si_context *sctx = (struct si_context *)ctx;
-   struct si_shader_selector *sel = (struct si_shader_selector *)state;
-
-   if (sctx->tcs_shader == sel) {
-   sctx->tcs_shader = NULL;
-   }
-
-   si_delete_shader_selector(ctx, sel);
-}
-
-static void si_delete_tes_shader(struct pipe_context *ctx, void *state)
-{
-   struct si_context *sctx = (struct si_context *)ctx;
-   struct si_shader_selector *sel = (struct si_shader_selector *)state;
-
-   if (sctx->tes_shader == sel) {
-   sctx->tes_shader = NULL;
-   }
-
-   si_delete_shader_selector(ctx, sel);
-}
-
 static void si_emit_spi_map(struct si_context *sctx, struct r600_atom *atom)
 {
struct radeon_winsys_cs *cs = sctx->b.rings.gfx.cs;
@@ -1675,9 +1625,9 @@ void si_init_shader_functions(struct si_context *sctx)
sctx->b.b.bind_gs_state = si_bind_gs_shader;
sctx->b.b.bind_fs_state = si_bind_ps_shader;
 
-   sctx->b.b.delete_vs_state = si_delete_vs_shader;
-   sctx->b.b.delete_tcs_state = si_delete_tcs_shader;
-   sctx->b.b.delete_tes_state = si_delete_tes_shader;
-   sctx->b.b.delete_gs_state = si_delete_gs_shader;
-   sctx->b.b.delete_fs_state = si_delete_ps_shader;
+   sctx->b.b.delete_vs_state = si_delete_shader_selector;
+   sctx->b.b.delete_tcs_state = si_delete_shader_selector;
+   sctx->b.b.delete_tes_state = si_delete_shader_selector;
+   sctx->b.b.delete_gs_state = si_delete_shader_selector;
+   sctx->b.b.delete_fs_state = si_delete_shader_selector;
 }
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/10] radeonsi: cleanup si_llvm_init_export_args

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_shader.c | 76 ++--
 1 file changed, 34 insertions(+), 42 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 32a702f..109a805 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1306,6 +1306,22 @@ static void si_llvm_init_export_args(struct 
lp_build_tgsi_context *bld_base,
unsigned compressed = 0;
unsigned chan;
 
+   /* XXX: This controls which components of the output
+* registers actually get exported. (e.g bit 0 means export
+* X component, bit 1 means export Y component, etc.)  I'm
+* hard coding this to 0xf for now.  In the future, we might
+* want to do something else. */
+   args[0] = lp_build_const_int32(base->gallivm, 0xf);
+
+   /* Specify whether the EXEC mask represents the valid mask */
+   args[1] = uint->zero;
+
+   /* Specify whether this is the last export */
+   args[2] = uint->zero;
+
+   /* Specify the target we are exporting */
+   args[3] = lp_build_const_int32(base->gallivm, target);
+
if (si_shader_ctx->type == TGSI_PROCESSOR_FRAGMENT) {
int cbuf = target - V_008DFC_SQ_EXP_MRT;
 
@@ -1323,55 +1339,31 @@ static void si_llvm_init_export_args(struct 
lp_build_tgsi_context *bld_base,
}
}
 
+   /* Set COMPR flag */
+   args[4] = compressed ? uint->one : uint->zero;
+
if (compressed) {
/* Pixel shader needs to pack output values before export */
-   for (chan = 0; chan < 2; chan++ ) {
-   args[0] = values[2 * chan];
-   args[1] = values[2 * chan + 1];
-   args[chan + 5] =
-   lp_build_intrinsic(base->gallivm->builder,
-   "llvm.SI.packf16",
-   
LLVMInt32TypeInContext(base->gallivm->context),
-   args, 2,
-   LLVMReadNoneAttribute | 
LLVMNoUnwindAttribute);
+   for (chan = 0; chan < 2; chan++) {
+   LLVMValueRef pack_args[2] = {
+   values[2 * chan],
+   values[2 * chan + 1]
+   };
+   LLVMValueRef packed;
+
+   packed = lp_build_intrinsic(base->gallivm->builder,
+   "llvm.SI.packf16",
+   
LLVMInt32TypeInContext(base->gallivm->context),
+   pack_args, 2,
+   LLVMReadNoneAttribute | 
LLVMNoUnwindAttribute);
args[chan + 7] = args[chan + 5] =
LLVMBuildBitCast(base->gallivm->builder,
-args[chan + 5],
+packed,
 
LLVMFloatTypeInContext(base->gallivm->context),
 "");
}
-
-   /* Set COMPR flag */
-   args[4] = uint->one;
-   } else {
-   for (chan = 0; chan < 4; chan++ )
-   /* +5 because the first output value will be
-* the 6th argument to the intrinsic. */
-   args[chan + 5] = values[chan];
-
-   /* Clear COMPR flag */
-   args[4] = uint->zero;
-   }
-
-   /* XXX: This controls which components of the output
-* registers actually get exported. (e.g bit 0 means export
-* X component, bit 1 means export Y component, etc.)  I'm
-* hard coding this to 0xf for now.  In the future, we might
-* want to do something else. */
-   args[0] = lp_build_const_int32(base->gallivm, 0xf);
-
-   /* Specify whether the EXEC mask represents the valid mask */
-   args[1] = uint->zero;
-
-   /* Specify whether this is the last export */
-   args[2] = uint->zero;
-
-   /* Specify the target we are exporting */
-   args[3] = lp_build_const_int32(base->gallivm, target);
-
-   /* XXX: We probably need to keep track of the output
-* values, so we know what we are passing to the next
-* stage. */
+   } else
+   memcpy([5], values, sizeof(values[0]) * 4);
 }
 
 /* Load from output pointers and initialize arguments for the shader export 
intrinsic */
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/10] RadeonSI cleanups

2015-10-10 Thread Marek Olšák
Nothing special here other than cleanups. One patch disables NaNs for LS and 
HS, and there's also one GS shader leak fix.

Please review.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/10] radeonsi: disable NaNs for LS and HS

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

They're disabled for all other shaders except compute, but I forgot
to do this for tess stages.
---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index f673388..2489101 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -122,7 +122,8 @@ static void si_shader_ls(struct si_shader *shader)
 
shader->ls_rsrc1 = S_00B528_VGPRS((shader->num_vgprs - 1) / 4) |
   S_00B528_SGPRS((num_sgprs - 1) / 8) |
-  S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt);
+  S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
+  S_00B528_DX10_CLAMP(shader->dx10_clamp_mode);
shader->ls_rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
   S_00B52C_SCRATCH_EN(shader->scratch_bytes_per_wave > 
0);
 }
@@ -154,7 +155,8 @@ static void si_shader_hs(struct si_shader *shader)
si_pm4_set_reg(pm4, R_00B424_SPI_SHADER_PGM_HI_HS, va >> 40);
si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS,
   S_00B428_VGPRS((shader->num_vgprs - 1) / 4) |
-  S_00B428_SGPRS((num_sgprs - 1) / 8));
+  S_00B428_SGPRS((num_sgprs - 1) / 8) |
+  S_00B428_DX10_CLAMP(shader->dx10_clamp_mode));
si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS,
   S_00B42C_USER_SGPR(num_user_sgprs) |
   S_00B42C_SCRATCH_EN(shader->scratch_bytes_per_wave > 0));
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/10] radeonsi: fix a GS copy shader leak

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

Cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/radeonsi/si_shader.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 0e98915..012d708 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4192,8 +4192,10 @@ out:
 
 void si_shader_destroy(struct si_shader *shader)
 {
-   if (shader->gs_copy_shader)
+   if (shader->gs_copy_shader) {
si_shader_destroy(shader->gs_copy_shader);
+   FREE(shader->gs_copy_shader);
+   }
 
if (shader->scratch_bo)
r600_resource_reference(>scratch_bo, NULL);
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/10] radeonsi: cleanup other scratch buffer functions

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 9395c31..71349a5 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1205,30 +1205,23 @@ static int si_update_scratch_buffer(struct si_context 
*sctx,
 
 static unsigned si_get_current_scratch_buffer_size(struct si_context *sctx)
 {
-   if (!sctx->scratch_buffer)
-   return 0;
-
-   return sctx->scratch_buffer->b.b.width0;
+   return sctx->scratch_buffer ? sctx->scratch_buffer->b.b.width0 : 0;
 }
 
-static unsigned si_get_scratch_buffer_bytes_per_wave(struct si_context *sctx,
-   struct si_shader_selector *sel)
+static unsigned si_get_scratch_buffer_bytes_per_wave(struct si_shader_selector 
*sel)
 {
-   if (!sel)
-   return 0;
-
-   return sel->current->scratch_bytes_per_wave;
+   return sel ? sel->current->scratch_bytes_per_wave : 0;
 }
 
 static unsigned si_get_max_scratch_bytes_per_wave(struct si_context *sctx)
 {
unsigned bytes = 0;
 
-   bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx, 
sctx->ps_shader));
-   bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx, 
sctx->gs_shader));
-   bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx, 
sctx->vs_shader));
-   bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx, 
sctx->tcs_shader));
-   bytes = MAX2(bytes, si_get_scratch_buffer_bytes_per_wave(sctx, 
sctx->tes_shader));
+   bytes = MAX2(bytes, 
si_get_scratch_buffer_bytes_per_wave(sctx->ps_shader));
+   bytes = MAX2(bytes, 
si_get_scratch_buffer_bytes_per_wave(sctx->gs_shader));
+   bytes = MAX2(bytes, 
si_get_scratch_buffer_bytes_per_wave(sctx->vs_shader));
+   bytes = MAX2(bytes, 
si_get_scratch_buffer_bytes_per_wave(sctx->tcs_shader));
+   bytes = MAX2(bytes, 
si_get_scratch_buffer_bytes_per_wave(sctx->tes_shader));
return bytes;
 }
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/10] radeonsi: unify shader create functions

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

The shader specifies the processor type, so use that instead.
---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 49 +
 1 file changed, 9 insertions(+), 40 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index cc053bb..c1d61d5 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -647,9 +647,8 @@ static int si_shader_select(struct pipe_context *ctx,
return 0;
 }
 
-static void *si_create_shader_state(struct pipe_context *ctx,
-   const struct pipe_shader_state *state,
-   unsigned pipe_shader_type)
+static void *si_create_shader_selector(struct pipe_context *ctx,
+  const struct pipe_shader_state *state)
 {
struct si_screen *sscreen = (struct si_screen *)ctx->screen;
struct si_shader_selector *sel = CALLOC_STRUCT(si_shader_selector);
@@ -658,7 +657,6 @@ static void *si_create_shader_state(struct pipe_context 
*ctx,
if (!sel)
return NULL;
 
-   sel->type = pipe_shader_type;
sel->tokens = tgsi_dup_tokens(state->tokens);
if (!sel->tokens) {
FREE(sel);
@@ -667,6 +665,7 @@ static void *si_create_shader_state(struct pipe_context 
*ctx,
 
sel->so = state->stream_output;
tgsi_scan_shader(state->tokens, >info);
+   sel->type = util_pipe_shader_from_tgsi_processor(sel->info.processor);
p_atomic_inc(>b.num_shaders_created);
 
/* First set which opcode uses which (i,j) pair. */
@@ -697,7 +696,7 @@ static void *si_create_shader_state(struct pipe_context 
*ctx,
sel->info.uses_linear_centroid +
sel->info.uses_linear_sample >= 2;
 
-   switch (pipe_shader_type) {
+   switch (sel->type) {
case PIPE_SHADER_GEOMETRY:
sel->gs_output_prim =
sel->info.properties[TGSI_PROPERTY_GS_OUTPUT_PRIM];
@@ -763,36 +762,6 @@ static void *si_create_shader_state(struct pipe_context 
*ctx,
return sel;
 }
 
-static void *si_create_fs_state(struct pipe_context *ctx,
-   const struct pipe_shader_state *state)
-{
-   return si_create_shader_state(ctx, state, PIPE_SHADER_FRAGMENT);
-}
-
-static void *si_create_gs_state(struct pipe_context *ctx,
-   const struct pipe_shader_state *state)
-{
-   return si_create_shader_state(ctx, state, PIPE_SHADER_GEOMETRY);
-}
-
-static void *si_create_vs_state(struct pipe_context *ctx,
-   const struct pipe_shader_state *state)
-{
-   return si_create_shader_state(ctx, state, PIPE_SHADER_VERTEX);
-}
-
-static void *si_create_tcs_state(struct pipe_context *ctx,
-const struct pipe_shader_state *state)
-{
-   return si_create_shader_state(ctx, state, PIPE_SHADER_TESS_CTRL);
-}
-
-static void *si_create_tes_state(struct pipe_context *ctx,
-const struct pipe_shader_state *state)
-{
-   return si_create_shader_state(ctx, state, PIPE_SHADER_TESS_EVAL);
-}
-
 /**
  * Normally, we only emit 1 viewport and 1 scissor if no shader is using
  * the VIEWPORT_INDEX output, and emitting the other viewports and scissors
@@ -1613,11 +1582,11 @@ void si_init_shader_functions(struct si_context *sctx)
si_init_atom(sctx, >spi_map, >atoms.s.spi_map, 
si_emit_spi_map);
si_init_atom(sctx, >spi_ps_input, >atoms.s.spi_ps_input, 
si_emit_spi_ps_input);
 
-   sctx->b.b.create_vs_state = si_create_vs_state;
-   sctx->b.b.create_tcs_state = si_create_tcs_state;
-   sctx->b.b.create_tes_state = si_create_tes_state;
-   sctx->b.b.create_gs_state = si_create_gs_state;
-   sctx->b.b.create_fs_state = si_create_fs_state;
+   sctx->b.b.create_vs_state = si_create_shader_selector;
+   sctx->b.b.create_tcs_state = si_create_shader_selector;
+   sctx->b.b.create_tes_state = si_create_shader_selector;
+   sctx->b.b.create_gs_state = si_create_shader_selector;
+   sctx->b.b.create_fs_state = si_create_shader_selector;
 
sctx->b.b.bind_vs_state = si_bind_vs_shader;
sctx->b.b.bind_tcs_state = si_bind_tcs_shader;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/10] radeonsi: don't use the AMDGPU intrinsic for CMP

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

The increase in VGPRs in unfortunate, but the decrease in the scratch size
is always welcome.

Totals:
SGPRS: 344552 -> 344368 (-0.05 %)
VGPRS: 197132 -> 197552 (0.21 %)
Code Size: 7375376 -> 7366304 (-0.12 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1679360 -> 1615872 (-3.78 %) bytes per wave

Totals from affected shaders:
SGPRS: 47736 -> 47552 (-0.39 %)
VGPRS: 27952 -> 28372 (1.50 %)
Code Size: 1392724 -> 1383652 (-0.65 %) bytes
LDS: 39 -> 39 (0.00 %) blocks
Scratch: 513024 -> 449536 (-12.38 %) bytes per wave
---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 31 +++---
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index c22ea7c..ac99e73 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -919,7 +919,21 @@ static void emit_ucmp(
LLVMBuildSelect(builder, v, emit_data->args[1], 
emit_data->args[2], "");
 }
 
-static void emit_cmp(
+static void emit_cmp(const struct lp_build_tgsi_action *action,
+struct lp_build_tgsi_context *bld_base,
+struct lp_build_emit_data *emit_data)
+{
+   LLVMBuilderRef builder = bld_base->base.gallivm->builder;
+   LLVMValueRef cond, *args = emit_data->args;
+
+   cond = LLVMBuildFCmp(builder, LLVMRealOLT, args[0],
+bld_base->base.zero, "");
+
+   emit_data->output[emit_data->chan] =
+   LLVMBuildSelect(builder, cond, args[1], args[2], "");
+}
+
+static void emit_set_cond(
const struct lp_build_tgsi_action *action,
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
@@ -1503,8 +1517,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_CEIL].intr_name = "llvm.ceil.f32";
bld_base->op_actions[TGSI_OPCODE_CLAMP].emit = 
build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_CLAMP].intr_name = "llvm.AMDIL.clamp.";
-   bld_base->op_actions[TGSI_OPCODE_CMP].emit = build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_CMP].intr_name = "llvm.AMDGPU.cndlt";
+   bld_base->op_actions[TGSI_OPCODE_CMP].emit = emit_cmp;
bld_base->op_actions[TGSI_OPCODE_CONT].emit = cont_emit;
bld_base->op_actions[TGSI_OPCODE_COS].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_COS].intr_name = "llvm.cos.f32";
@@ -1573,13 +1586,13 @@ void radeon_llvm_context_init(struct 
radeon_llvm_context * ctx)
bld_base->op_actions[TGSI_OPCODE_ROUND].intr_name = "llvm.rint.f32";
bld_base->op_actions[TGSI_OPCODE_RSQ].intr_name = 
"llvm.AMDGPU.rsq.clamped.f32";
bld_base->op_actions[TGSI_OPCODE_RSQ].emit = build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_SGE].emit = emit_cmp;
-   bld_base->op_actions[TGSI_OPCODE_SEQ].emit = emit_cmp;
+   bld_base->op_actions[TGSI_OPCODE_SGE].emit = emit_set_cond;
+   bld_base->op_actions[TGSI_OPCODE_SEQ].emit = emit_set_cond;
bld_base->op_actions[TGSI_OPCODE_SHL].emit = emit_shl;
-   bld_base->op_actions[TGSI_OPCODE_SLE].emit = emit_cmp;
-   bld_base->op_actions[TGSI_OPCODE_SLT].emit = emit_cmp;
-   bld_base->op_actions[TGSI_OPCODE_SNE].emit = emit_cmp;
-   bld_base->op_actions[TGSI_OPCODE_SGT].emit = emit_cmp;
+   bld_base->op_actions[TGSI_OPCODE_SLE].emit = emit_set_cond;
+   bld_base->op_actions[TGSI_OPCODE_SLT].emit = emit_set_cond;
+   bld_base->op_actions[TGSI_OPCODE_SNE].emit = emit_set_cond;
+   bld_base->op_actions[TGSI_OPCODE_SGT].emit = emit_set_cond;
bld_base->op_actions[TGSI_OPCODE_SIN].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_SIN].intr_name = "llvm.sin.f32";
bld_base->op_actions[TGSI_OPCODE_SQRT].emit = 
build_tgsi_intrinsic_nomem;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/10] radeonsi: really enable the no-nans-fp-math option

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

Include compute shaders too, which includes OpenGL, but not OpenCL.

LLVM doesn't use this much according to shader-db:

Totals:
SGPRS: 344944 -> 344944 (0.00 %)
VGPRS: 197024 -> 197024 (0.00 %)
Code Size: 7325688 -> 7325624 (-0.00 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1510400 -> 1510400 (0.00 %) bytes per wave

Totals from affected shaders:
SGPRS: 664 -> 664 (0.00 %)
VGPRS: 480 -> 480 (0.00 %)
Code Size: 25356 -> 25292 (-0.25 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
---
 src/gallium/drivers/radeonsi/si_shader.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 8da2f77..aa4cfa0 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -3587,7 +3587,7 @@ static void create_function(struct si_shader_context 
*si_shader_ctx)
 
if (shader->dx10_clamp_mode)

LLVMAddTargetDependentFunctionAttr(si_shader_ctx->radeon_bld.main_fn,
-  "enable-no-nans-fp-math", 
"true");
+  "no-nans-fp-math", "true");
 
for (i = 0; i <= last_sgpr; ++i) {
LLVMValueRef P = 
LLVMGetParam(si_shader_ctx->radeon_bld.main_fn, i);
@@ -4095,8 +4095,7 @@ int si_shader_create(struct si_screen *sscreen, 
LLVMTargetMachineRef tm,
radeon_llvm_context_init(_shader_ctx.radeon_bld);
bld_base = _shader_ctx.radeon_bld.soa.bld_base;
 
-   if (sel->type != PIPE_SHADER_COMPUTE)
-   shader->dx10_clamp_mode = true;
+   shader->dx10_clamp_mode = true;
 
if (sel->info.uses_kill)
shader->db_shader_control |= S_02880C_KILL_ENABLE(1);
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/10] RadeonSI: Better LLVM IR generation

2015-10-10 Thread Marek Olšák
Hi,

This patch series improves IR generation for radeonsi. Most of it removes uses 
of AMDGPU intrinsics.

There is one piglit regression caused by aggressive handling of "undef" in 
LLVM, breaking piglit/glsl-routing. I have a lit test which I'll send later.

Complete stats from shader-db are below. Decreasing scratch usage is certainly 
nice, but there is not much else.

7063 shaders
Totals:
SGPRS: 345216 -> 344944 (-0.08 %)
VGPRS: 197684 -> 197024 (-0.33 %)
Code Size: 7390408 -> 7325624 (-0.88 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1842176 -> 1510400 (-18.01 %) bytes per wave

Totals from affected shaders:
SGPRS: 229736 -> 229464 (-0.12 %)
VGPRS: 130668 -> 130008 (-0.51 %)
Code Size: 5554088 -> 5489304 (-1.17 %) bytes
LDS: 56 -> 56 (0.00 %) blocks
Scratch: 1764352 -> 1432576 (-18.80 %) bytes per wave

Increases:
SGPRS: 485 (0.07 %)
VGPRS: 508 (0.07 %)
Code Size: 1355 (0.19 %)
LDS: 0 (0.00 %)
Scratch: 65 (0.01 %)

Decreases:
SGPRS: 462 (0.07 %)
VGPRS: 631 (0.09 %)
Code Size: 2226 (0.32 %)
LDS: 0 (0.00 %)
Scratch: 137 (0.02 %)

*** BY PERCENTAGE ***

Max Increase:

SGPRS: 40 -> 104 (160.00 %)  (lines 10083 -> 10083)
VGPRS: 4 -> 8 (100.00 %)  (lines 19667 -> 19667)
Code Size: 388 -> 444 (14.43 %)  (lines 32696 -> 32696) bytes
LDS: 0 -> 0 (0.00 %)  (lines -1 -> -1) blocks
Scratch: 1024 -> 12288 (1100.00 %)  (lines 29684 -> 29684) bytes per wave

Max Decrease:

SGPRS: 80 -> 32 (-60.00 %)  (lines 3125 -> 3125)
VGPRS: 36 -> 16 (-55.56 %)  (lines 18113 -> 18113)
Code Size: 2548 -> 1160 (-54.47 %)  (lines 18617 -> 18617) bytes
LDS: 0 -> 0 (0.00 %)  (lines -1 -> -1) blocks
Scratch: 3072 -> 0 (-100.00 %)  (lines 1522 -> 1522) bytes per wave

*** BY UNIT ***

Max Increase:

SGPRS: 40 -> 104 (160.00 %)  (lines 10083 -> 10083)
VGPRS: 76 -> 92 (21.05 %)  (lines 528 -> 528)
Code Size: 3064 -> 3336 (8.88 %)  (lines 29684 -> 29684) bytes
LDS: 0 -> 0 (0.00 %)  (lines -1 -> -1) blocks
Scratch: 1024 -> 12288 (1100.00 %)  (lines 29684 -> 29684) bytes per wave

Max Decrease:

SGPRS: 80 -> 32 (-60.00 %)  (lines 3125 -> 3125)
VGPRS: 156 -> 124 (-20.51 %)  (lines 29866 -> 29866)
Code Size: 2940 -> 1408 (-52.11 %)  (lines 17413 -> 17413) bytes
LDS: 0 -> 0 (0.00 %)  (lines -1 -> -1) blocks
Scratch: 14336 -> 0 (-100.00 %)  (lines 30496 -> 30496) bytes per wave

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/10] radeonsi: re-enable unsafe-fp-math for LLVM 3.8

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

Required for 1/sqrt ==> rsq.

We should finally fix the hang instead of running away from the issue. This
assumes the bug is in LLVM and we have time to fix it before the release.
Include compute shaders as well, which only affects TGSI and thus OpenGL.

Totals:
SGPRS: 344368 -> 345104 (0.21 %)
VGPRS: 197552 -> 197420 (-0.07 %)
Code Size: 7366304 -> 7324692 (-0.56 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1615872 -> 1524736 (-5.64 %) bytes per wave

Totals from affected shaders:
SGPRS: 146696 -> 147432 (0.50 %)
VGPRS: 87212 -> 87080 (-0.15 %)
Code Size: 3852664 -> 3811052 (-1.08 %) bytes
LDS: 48 -> 48 (0.00 %) blocks
Scratch: 1179648 -> 1088512 (-7.73 %) bytes per wave
---
 src/gallium/drivers/radeon/radeon_llvm_emit.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
b/src/gallium/drivers/radeon/radeon_llvm_emit.c
index 6b2ebde..4bda4a4 100644
--- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
+++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
@@ -84,6 +84,13 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned type)
sprintf(Str, "%1d", llvm_type);
 
LLVMAddTargetDependentFunctionAttr(F, "ShaderType", Str);
+
+#if HAVE_LLVM >= 0x0308
+   /* This only affects TGSI (OpenGL), so it's okay to set it for
+* compute shaders too.
+*/
+   LLVMAddTargetDependentFunctionAttr(F, "unsafe-fp-math", "true");
+#endif
 }
 
 static void init_r600_target()
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/10] radeonsi: don't emit AMDGPU intrinsics for RSQ opcodes

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

Intel and Nouveau use IEEE opcodes, so we should too.
If there is a bug caused by not using the clamped RSQ variant, there must
be another way to fix it. I don't think the RSQ behavior matters much now
that NaNs are disabled.

Nine and Wine should implement necessary workarounds for DX9 games.
(they probably already do)

Not many shaders are affected.

Totals:
SGPRS: 345104 -> 344944 (-0.05 %)
VGPRS: 197420 -> 197024 (-0.20 %)
Code Size: 7324692 -> 7325688 (0.01 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1524736 -> 1510400 (-0.94 %) bytes per wave

Totals from affected shaders:
SGPRS: 25160 -> 25000 (-0.64 %)
VGPRS: 17336 -> 16940 (-2.28 %)
Code Size: 843412 -> 844408 (0.12 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Scratch: 139264 -> 124928 (-10.29 %) bytes per wave
---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 28 ++
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index ac99e73..1172244 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1452,6 +1452,28 @@ static void emit_minmax_int(const struct 
lp_build_tgsi_action *action,
emit_data->args[1], "");
 }
 
+/* This requires "unsafe-fp-math" for LLVM to convert it to RSQ. */
+static void emit_rsq(const struct lp_build_tgsi_action *action,
+struct lp_build_tgsi_context *bld_base,
+struct lp_build_emit_data *emit_data)
+{
+   LLVMBuilderRef builder = bld_base->base.gallivm->builder;
+   LLVMValueRef src = emit_data->args[0];
+   bool is_f64 = LLVMGetTypeKind(LLVMTypeOf(src)) == LLVMDoubleTypeKind;
+
+   LLVMValueRef sqrt =
+   lp_build_emit_llvm_unary(bld_base,
+is_f64 ? TGSI_OPCODE_DSQRT
+   : TGSI_OPCODE_SQRT,
+src);
+
+   emit_data->output[emit_data->chan] =
+   LLVMBuildFDiv(builder,
+ is_f64 ? bld_base->dbl_bld.one
+: bld_base->base.one,
+ sqrt, "");
+}
+
 void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
 {
struct lp_type type;
@@ -1531,8 +1553,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_DSGE].emit = emit_dcmp;
bld_base->op_actions[TGSI_OPCODE_DSLT].emit = emit_dcmp;
bld_base->op_actions[TGSI_OPCODE_DSNE].emit = emit_dcmp;
-   bld_base->op_actions[TGSI_OPCODE_DRSQ].emit = 
build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_DRSQ].intr_name = 
"llvm.AMDGPU.rsq.f64";
+   bld_base->op_actions[TGSI_OPCODE_DRSQ].emit = emit_rsq;
bld_base->op_actions[TGSI_OPCODE_DSQRT].emit = 
build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_DSQRT].intr_name = "llvm.sqrt.f64";
bld_base->op_actions[TGSI_OPCODE_ELSE].emit = else_emit;
@@ -1584,8 +1605,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_POW].intr_name = "llvm.pow.f32";
bld_base->op_actions[TGSI_OPCODE_ROUND].emit = 
build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_ROUND].intr_name = "llvm.rint.f32";
-   bld_base->op_actions[TGSI_OPCODE_RSQ].intr_name = 
"llvm.AMDGPU.rsq.clamped.f32";
-   bld_base->op_actions[TGSI_OPCODE_RSQ].emit = build_tgsi_intrinsic_nomem;
+   bld_base->op_actions[TGSI_OPCODE_RSQ].emit = emit_rsq;
bld_base->op_actions[TGSI_OPCODE_SGE].emit = emit_set_cond;
bld_base->op_actions[TGSI_OPCODE_SEQ].emit = emit_set_cond;
bld_base->op_actions[TGSI_OPCODE_SHL].emit = emit_shl;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/10] radeonsi: disable NaNs for LS and HS

2015-10-10 Thread Roland Scheidegger
FWIW I'm still baffled by this shader bit.
NaNs are absolutely required to be generated and handled as NaNs in
shaders (albeit conversion to ints will make them 0) by DX10 (there's
plenty of tests which actually check for this). And generally, you
really want to generate NaNs for newer glsl versions too I think, albeit
this may not be strictly required (of course, currently you can't
distinguish this in tgsi, but particularly gs/ls/hs will always be newer
glsl versions).
So I'm REALLY wondering why there's a shader bit named that way...

Roland

Am 11.10.2015 um 03:11 schrieb Marek Olšák:
> From: Marek Olšák 
> 
> They're disabled for all other shaders except compute, but I forgot
> to do this for tess stages.
> ---
>  src/gallium/drivers/radeonsi/si_state_shaders.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
> b/src/gallium/drivers/radeonsi/si_state_shaders.c
> index f673388..2489101 100644
> --- a/src/gallium/drivers/radeonsi/si_state_shaders.c
> +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
> @@ -122,7 +122,8 @@ static void si_shader_ls(struct si_shader *shader)
>  
>   shader->ls_rsrc1 = S_00B528_VGPRS((shader->num_vgprs - 1) / 4) |
>  S_00B528_SGPRS((num_sgprs - 1) / 8) |
> -S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt);
> +S_00B528_VGPR_COMP_CNT(vgpr_comp_cnt) |
> +S_00B528_DX10_CLAMP(shader->dx10_clamp_mode);
>   shader->ls_rsrc2 = S_00B52C_USER_SGPR(num_user_sgprs) |
>  S_00B52C_SCRATCH_EN(shader->scratch_bytes_per_wave > 
> 0);
>  }
> @@ -154,7 +155,8 @@ static void si_shader_hs(struct si_shader *shader)
>   si_pm4_set_reg(pm4, R_00B424_SPI_SHADER_PGM_HI_HS, va >> 40);
>   si_pm4_set_reg(pm4, R_00B428_SPI_SHADER_PGM_RSRC1_HS,
>  S_00B428_VGPRS((shader->num_vgprs - 1) / 4) |
> -S_00B428_SGPRS((num_sgprs - 1) / 8));
> +S_00B428_SGPRS((num_sgprs - 1) / 8) |
> +S_00B428_DX10_CLAMP(shader->dx10_clamp_mode));
>   si_pm4_set_reg(pm4, R_00B42C_SPI_SHADER_PGM_RSRC2_HS,
>  S_00B42C_USER_SGPR(num_user_sgprs) |
>  S_00B42C_SCRATCH_EN(shader->scratch_bytes_per_wave > 0));
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Ilia Mirkin
On Sat, Oct 10, 2015 at 4:21 PM, Samuel Pitoiset
 wrote:
>
>
> On 10/10/2015 09:58 PM, Ilia Mirkin wrote:
>>
>> On Sat, Oct 10, 2015 at 3:55 PM, Samuel Pitoiset
>>  wrote:
>>>
>>>
>>> On 10/10/2015 09:42 PM, Ilia Mirkin wrote:

 On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset
  wrote:
>
> This patch looks fine except that it should be a bit more normalized. I
> mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same
> for
> PUSH_SPACE calls, sometimes you add it sometimes not.

 Meh. We need to get our error checking situation straight, but this
 isn't the patch to do it in.
>>>
>>>
>>> Yeah, but this needs to be clarified.
>>
>> What does?
>
>
> I mean, we should either use PUSH_SPACE everywhere or not at all, and always
> breaks (or not) when PUSH_SPACE fails.
> That's really a minor issue.

It's actually a major issue. Error-handling is practically
non-existent. There are a couple of spots here and there, but it
doesn't really scale up. I guess I (semi-)accidentally removed a
couple of spots that error checked, but, again, meh. Doing this for
real will require some careful thought.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/10] gallivm: implement the correct version of LRP

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

The previous version has precision issues. This can be a problem
with tessellation. Sadly, I can't find the article where I read it
anymore. I'm not sure if the unsafe-fp-math flag would be enough to revert
this.
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
index 0ad78b0..512558b 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
@@ -538,12 +538,13 @@ lrp_emit(
struct lp_build_tgsi_context * bld_base,
struct lp_build_emit_data * emit_data)
 {
-   LLVMValueRef tmp;
-   tmp = lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_SUB,
-   emit_data->args[1],
-   emit_data->args[2]);
-   emit_data->output[emit_data->chan] = lp_build_emit_llvm_ternary(bld_base,
-TGSI_OPCODE_MAD, emit_data->args[0], tmp, 
emit_data->args[2]);
+   struct lp_build_context *bld = _base->base;
+   LLVMValueRef inv, a, b;
+
+   inv = lp_build_sub(bld, bld_base->base.one, emit_data->args[0]);
+   a = lp_build_mul(bld, emit_data->args[1], emit_data->args[0]);
+   b = lp_build_mul(bld, emit_data->args[2], inv);
+   emit_data->output[emit_data->chan] = lp_build_add(bld, a, b);
 }
 
 /* TGSI_OPCODE_MAD */
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/10] radeonsi: initialize output, temp, and address registers to "undef"

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

This removes "v_mov v0, 0" which typically occurs before exports.

Totals:
SGPRS: 345216 -> 344552 (-0.19 %)
VGPRS: 197684 -> 197132 (-0.28 %)
Code Size: 7390408 -> 7375376 (-0.20 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1842176 -> 1679360 (-8.84 %) bytes per wave

Totals from affected shaders:
SGPRS: 101336 -> 100672 (-0.66 %)
VGPRS: 53920 -> 53368 (-1.02 %)
Code Size: 2170176 -> 2155144 (-0.69 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 1015808 -> 852992 (-16.03 %) bytes per wave
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 2e9a013..f548d1a 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -272,6 +272,15 @@ static LLVMValueRef fetch_system_value(
return bitcast(bld_base, type, cval);
 }
 
+static LLVMValueRef si_build_alloca_undef(struct gallivm_state *gallivm,
+ LLVMTypeRef type,
+ const char *name)
+{
+   LLVMValueRef ptr = lp_build_alloca(gallivm, type, name);
+   LLVMBuildStore(gallivm->builder, LLVMGetUndef(type), ptr);
+   return ptr;
+}
+
 static void emit_declaration(
struct lp_build_tgsi_context * bld_base,
const struct tgsi_full_declaration *decl)
@@ -285,7 +294,7 @@ static void emit_declaration(
for (idx = decl->Range.First; idx <= decl->Range.Last; idx++) {
unsigned chan;
for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) {
-ctx->soa.addr[idx][chan] = lp_build_alloca(
+ctx->soa.addr[idx][chan] = 
si_build_alloca_undef(
>gallivm,
ctx->soa.bld_base.uint_bld.elem_type, 
"");
}
@@ -315,8 +324,9 @@ static void emit_declaration(
for (idx = first; idx <= last; idx++) {
for (i = 0; i < TGSI_NUM_CHANNELS; i++) {
ctx->temps[idx * TGSI_NUM_CHANNELS + i] =
-   lp_build_alloca(bld_base->base.gallivm, 
bld_base->base.vec_type,
-   "temp");
+   
si_build_alloca_undef(bld_base->base.gallivm,
+ 
bld_base->base.vec_type,
+ "temp");
}
}
break;
@@ -347,7 +357,8 @@ static void emit_declaration(
unsigned chan;
assert(idx < RADEON_LLVM_MAX_OUTPUTS);
for (chan = 0; chan < TGSI_NUM_CHANNELS; chan++) {
-   ctx->soa.outputs[idx][chan] = 
lp_build_alloca(>gallivm,
+   ctx->soa.outputs[idx][chan] = 
si_build_alloca_undef(
+   >gallivm,
ctx->soa.bld_base.base.elem_type, "");
}
}
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/10] gallivm: supply correct opcode info to emit functions

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

This is useful only when emit functions use it.
The new radeonsi min/max opcode implementation requires this.
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
index c4ae304..c50d83e 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
@@ -114,12 +114,17 @@ lp_build_emit_llvm(
struct lp_build_emit_data * emit_data)
 {
struct lp_build_tgsi_action * action = _base->op_actions[tgsi_opcode];
+   const struct tgsi_opcode_info *old_info = emit_data->info;
/* XXX: Assert that this is a componentwise or replicate instruction */
 
lp_build_action_set_dst_type(emit_data, bld_base, tgsi_opcode);
emit_data->chan = 0;
+
+   /* Set and restore the opcode info. */
+   emit_data->info = tgsi_get_opcode_info(tgsi_opcode);
assert(action->emit);
action->emit(action, bld_base, emit_data);
+   emit_data->info = old_info;
return emit_data->output[0];
 }
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/10] radeonsi: use LRP from gallivm

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 23ea23a..c22ea7c 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1561,8 +1561,6 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_LSB].emit = emit_lsb;
bld_base->op_actions[TGSI_OPCODE_LG2].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_LG2].intr_name = "llvm.log2.f32";
-   bld_base->op_actions[TGSI_OPCODE_LRP].emit = build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_LRP].intr_name = "llvm.AMDGPU.lrp";
bld_base->op_actions[TGSI_OPCODE_MOD].emit = emit_mod;
bld_base->op_actions[TGSI_OPCODE_UMSB].emit = emit_umsb;
bld_base->op_actions[TGSI_OPCODE_NOT].emit = emit_not;
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/10] radeonsi: don't emit AMDGPU intrinsics for integer abs, min, max

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

No difference according to shader-db. (with the new S_ABS_I32 pattern)
---
 .../drivers/radeon/radeon_setup_tgsi_llvm.c| 60 ++
 1 file changed, 50 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index 91cf658..23ea23a 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1393,6 +1393,51 @@ static void emit_imsb(const struct lp_build_tgsi_action 
* action,
LLVMBuildSelect(builder, cond, all_ones, msb, "");
 }
 
+static void emit_iabs(const struct lp_build_tgsi_action *action,
+ struct lp_build_tgsi_context *bld_base,
+ struct lp_build_emit_data *emit_data)
+{
+   LLVMBuilderRef builder = bld_base->base.gallivm->builder;
+
+   emit_data->output[emit_data->chan] =
+   lp_build_emit_llvm_binary(bld_base, TGSI_OPCODE_IMAX,
+ emit_data->args[0],
+ LLVMBuildNeg(builder,
+  emit_data->args[0], ""));
+}
+
+static void emit_minmax_int(const struct lp_build_tgsi_action *action,
+   struct lp_build_tgsi_context *bld_base,
+   struct lp_build_emit_data *emit_data)
+{
+   LLVMBuilderRef builder = bld_base->base.gallivm->builder;
+   LLVMIntPredicate op;
+
+   switch (emit_data->info->opcode) {
+   default:
+   assert(0);
+   case TGSI_OPCODE_IMAX:
+   op = LLVMIntSGT;
+   break;
+   case TGSI_OPCODE_IMIN:
+   op = LLVMIntSLT;
+   break;
+   case TGSI_OPCODE_UMAX:
+   op = LLVMIntUGT;
+   break;
+   case TGSI_OPCODE_UMIN:
+   op = LLVMIntULT;
+   break;
+   }
+
+   emit_data->output[emit_data->chan] =
+   LLVMBuildSelect(builder,
+   LLVMBuildICmp(builder, op, emit_data->args[0],
+ emit_data->args[1], ""),
+   emit_data->args[0],
+   emit_data->args[1], "");
+}
+
 void radeon_llvm_context_init(struct radeon_llvm_context * ctx)
 {
struct lp_type type;
@@ -1493,17 +1538,14 @@ void radeon_llvm_context_init(struct 
radeon_llvm_context * ctx)
bld_base->op_actions[TGSI_OPCODE_FSGE].emit = emit_fcmp;
bld_base->op_actions[TGSI_OPCODE_FSLT].emit = emit_fcmp;
bld_base->op_actions[TGSI_OPCODE_FSNE].emit = emit_fcmp;
-   bld_base->op_actions[TGSI_OPCODE_IABS].emit = 
build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_IABS].intr_name = "llvm.AMDIL.abs.";
+   bld_base->op_actions[TGSI_OPCODE_IABS].emit = emit_iabs;
bld_base->op_actions[TGSI_OPCODE_IBFE].emit = 
build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_IBFE].intr_name = 
"llvm.AMDGPU.bfe.i32";
bld_base->op_actions[TGSI_OPCODE_IDIV].emit = emit_idiv;
bld_base->op_actions[TGSI_OPCODE_IF].emit = if_emit;
bld_base->op_actions[TGSI_OPCODE_UIF].emit = uif_emit;
-   bld_base->op_actions[TGSI_OPCODE_IMAX].emit = 
build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_IMAX].intr_name = "llvm.AMDGPU.imax";
-   bld_base->op_actions[TGSI_OPCODE_IMIN].emit = 
build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_IMIN].intr_name = "llvm.AMDGPU.imin";
+   bld_base->op_actions[TGSI_OPCODE_IMAX].emit = emit_minmax_int;
+   bld_base->op_actions[TGSI_OPCODE_IMIN].emit = emit_minmax_int;
bld_base->op_actions[TGSI_OPCODE_IMSB].emit = emit_imsb;
bld_base->op_actions[TGSI_OPCODE_INEG].emit = emit_ineg;
bld_base->op_actions[TGSI_OPCODE_ISHR].emit = emit_ishr;
@@ -1551,10 +1593,8 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_UBFE].emit = 
build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_UBFE].intr_name = 
"llvm.AMDGPU.bfe.u32";
bld_base->op_actions[TGSI_OPCODE_UDIV].emit = emit_udiv;
-   bld_base->op_actions[TGSI_OPCODE_UMAX].emit = 
build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_UMAX].intr_name = "llvm.AMDGPU.umax";
-   bld_base->op_actions[TGSI_OPCODE_UMIN].emit = 
build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_UMIN].intr_name = "llvm.AMDGPU.umin";
+   bld_base->op_actions[TGSI_OPCODE_UMAX].emit = emit_minmax_int;
+   bld_base->op_actions[TGSI_OPCODE_UMIN].emit = emit_minmax_int;
bld_base->op_actions[TGSI_OPCODE_UMOD].emit = emit_umod;
bld_base->op_actions[TGSI_OPCODE_USEQ].emit = emit_icmp;
bld_base->op_actions[TGSI_OPCODE_USGE].emit = emit_icmp;
-- 
2.1.4


[Mesa-dev] [PATCH 04/10] radeonsi: don't emit AMDGPU intrinsics for EX2, ROUND, TRUNC

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

No difference according to shader-db.
---
 src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index f548d1a..91cf658 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1481,7 +1481,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_ENDIF].emit = endif_emit;
bld_base->op_actions[TGSI_OPCODE_ENDLOOP].emit = endloop_emit;
bld_base->op_actions[TGSI_OPCODE_EX2].emit = build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.AMDIL.exp.";
+   bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.exp2.f32";
bld_base->op_actions[TGSI_OPCODE_FLR].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_FLR].intr_name = "llvm.floor.f32";
bld_base->op_actions[TGSI_OPCODE_FMA].emit = build_tgsi_intrinsic_nomem;
@@ -1530,7 +1530,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_POW].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_POW].intr_name = "llvm.pow.f32";
bld_base->op_actions[TGSI_OPCODE_ROUND].emit = 
build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_ROUND].intr_name = 
"llvm.AMDIL.round.nearest.";
+   bld_base->op_actions[TGSI_OPCODE_ROUND].intr_name = "llvm.rint.f32";
bld_base->op_actions[TGSI_OPCODE_RSQ].intr_name = 
"llvm.AMDGPU.rsq.clamped.f32";
bld_base->op_actions[TGSI_OPCODE_RSQ].emit = build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_SGE].emit = emit_cmp;
@@ -1546,7 +1546,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_SQRT].intr_name = "llvm.sqrt.f32";
bld_base->op_actions[TGSI_OPCODE_SSG].emit = emit_ssg;
bld_base->op_actions[TGSI_OPCODE_TRUNC].emit = 
build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_TRUNC].intr_name = "llvm.AMDGPU.trunc";
+   bld_base->op_actions[TGSI_OPCODE_TRUNC].intr_name = "llvm.trunc.f32";
bld_base->op_actions[TGSI_OPCODE_UADD].emit = emit_uadd;
bld_base->op_actions[TGSI_OPCODE_UBFE].emit = 
build_tgsi_intrinsic_nomem;
bld_base->op_actions[TGSI_OPCODE_UBFE].intr_name = 
"llvm.AMDGPU.bfe.u32";
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] radeonsi: re-enable unsafe-fp-math for LLVM 3.8

2015-10-10 Thread Connor Abbott
FWIW, this isn't quite correct with ARB_shader_precision or GL4.1 --
it specifies that infinities should be correctly generated through
division by 0, which unsafe-fp-math doesn't guarantee. At least,
that's assuming this is similar to the "fast" per-instruction flag
(http://llvm.org/docs/LangRef.html#fast-math-flags) which says "This
flag implies all the others."

On Sat, Oct 10, 2015 at 9:29 PM, Marek Olšák  wrote:
> From: Marek Olšák 
>
> Required for 1/sqrt ==> rsq.
>
> We should finally fix the hang instead of running away from the issue. This
> assumes the bug is in LLVM and we have time to fix it before the release.
> Include compute shaders as well, which only affects TGSI and thus OpenGL.
>
> Totals:
> SGPRS: 344368 -> 345104 (0.21 %)
> VGPRS: 197552 -> 197420 (-0.07 %)
> Code Size: 7366304 -> 7324692 (-0.56 %) bytes
> LDS: 91 -> 91 (0.00 %) blocks
> Scratch: 1615872 -> 1524736 (-5.64 %) bytes per wave
>
> Totals from affected shaders:
> SGPRS: 146696 -> 147432 (0.50 %)
> VGPRS: 87212 -> 87080 (-0.15 %)
> Code Size: 3852664 -> 3811052 (-1.08 %) bytes
> LDS: 48 -> 48 (0.00 %) blocks
> Scratch: 1179648 -> 1088512 (-7.73 %) bytes per wave
> ---
>  src/gallium/drivers/radeon/radeon_llvm_emit.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c 
> b/src/gallium/drivers/radeon/radeon_llvm_emit.c
> index 6b2ebde..4bda4a4 100644
> --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
> +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
> @@ -84,6 +84,13 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned type)
> sprintf(Str, "%1d", llvm_type);
>
> LLVMAddTargetDependentFunctionAttr(F, "ShaderType", Str);
> +
> +#if HAVE_LLVM >= 0x0308
> +   /* This only affects TGSI (OpenGL), so it's okay to set it for
> +* compute shaders too.
> +*/
> +   LLVMAddTargetDependentFunctionAttr(F, "unsafe-fp-math", "true");
> +#endif
>  }
>
>  static void init_r600_target()
> --
> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/10] gallivm: supply correct opcode info to emit functions

2015-10-10 Thread Roland Scheidegger
Am 11.10.2015 um 03:29 schrieb Marek Olšák:
> From: Marek Olšák 
> 
> This is useful only when emit functions use it.
> The new radeonsi min/max opcode implementation requires this.
> ---
>  src/gallium/auxiliary/gallivm/lp_bld_tgsi.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c 
> b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
> index c4ae304..c50d83e 100644
> --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
> +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
> @@ -114,12 +114,17 @@ lp_build_emit_llvm(
> struct lp_build_emit_data * emit_data)
>  {
> struct lp_build_tgsi_action * action = _base->op_actions[tgsi_opcode];
> +   const struct tgsi_opcode_info *old_info = emit_data->info;
> /* XXX: Assert that this is a componentwise or replicate instruction */
>  
> lp_build_action_set_dst_type(emit_data, bld_base, tgsi_opcode);
> emit_data->chan = 0;
> +
> +   /* Set and restore the opcode info. */
> +   emit_data->info = tgsi_get_opcode_info(tgsi_opcode);
> assert(action->emit);
> action->emit(action, bld_base, emit_data);
> +   emit_data->info = old_info;
> return emit_data->output[0];
>  }
>  
> 

Could you elaborate why this is necessary? Looks like a hack and I can't
see why opcode info would be wrong in the first place. Or if that's
never set correctly and you just need it to be able to distinguish
min/max later, I'd suggest you shouldn't do that and just use different
functions.

Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radeonsi: support thread-safe shaders shared by multiple contexts

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

The "current" shader pointer is moved from the CSO to the context, so that
the CSO is mostly immutable.

The only drawback is that the "current" pointer isn't saved when unbinding
a shader and it must be looked up when the shader is bound again.

This is also a prerequisite for multithreaded shader compilation.
---
 src/gallium/drivers/radeonsi/si_blit.c  |  10 +-
 src/gallium/drivers/radeonsi/si_debug.c |  18 +-
 src/gallium/drivers/radeonsi/si_descriptors.c   |  12 +-
 src/gallium/drivers/radeonsi/si_pipe.c  |   6 +-
 src/gallium/drivers/radeonsi/si_pipe.h  |  21 +-
 src/gallium/drivers/radeonsi/si_shader.h|  31 +--
 src/gallium/drivers/radeonsi/si_state.c |   2 +-
 src/gallium/drivers/radeonsi/si_state_draw.c|  44 ++--
 src/gallium/drivers/radeonsi/si_state_shaders.c | 279 +---
 9 files changed, 224 insertions(+), 199 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
b/src/gallium/drivers/radeonsi/si_blit.c
index d5c5db3..082ea85 100644
--- a/src/gallium/drivers/radeonsi/si_blit.c
+++ b/src/gallium/drivers/radeonsi/si_blit.c
@@ -55,11 +55,11 @@ static void si_blitter_begin(struct pipe_context *ctx, enum 
si_blitter_op op)
util_blitter_save_depth_stencil_alpha(sctx->blitter, 
sctx->queued.named.dsa);
util_blitter_save_stencil_ref(sctx->blitter, >stencil_ref.state);
util_blitter_save_rasterizer(sctx->blitter, 
sctx->queued.named.rasterizer);
-   util_blitter_save_fragment_shader(sctx->blitter, sctx->ps_shader);
-   util_blitter_save_geometry_shader(sctx->blitter, sctx->gs_shader);
-   util_blitter_save_tessctrl_shader(sctx->blitter, sctx->tcs_shader);
-   util_blitter_save_tesseval_shader(sctx->blitter, sctx->tes_shader);
-   util_blitter_save_vertex_shader(sctx->blitter, sctx->vs_shader);
+   util_blitter_save_fragment_shader(sctx->blitter, sctx->ps_shader.cso);
+   util_blitter_save_geometry_shader(sctx->blitter, sctx->gs_shader.cso);
+   util_blitter_save_tessctrl_shader(sctx->blitter, sctx->tcs_shader.cso);
+   util_blitter_save_tesseval_shader(sctx->blitter, sctx->tes_shader.cso);
+   util_blitter_save_vertex_shader(sctx->blitter, sctx->vs_shader.cso);
util_blitter_save_vertex_elements(sctx->blitter, sctx->vertex_elements);
util_blitter_save_sample_mask(sctx->blitter, 
sctx->sample_mask.sample_mask);
util_blitter_save_viewport(sctx->blitter, >viewports.states[0]);
diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index 7d41e8d..5306218 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -31,15 +31,15 @@
 #include "ddebug/dd_util.h"
 
 
-static void si_dump_shader(struct si_shader_selector *sel, const char *name,
+static void si_dump_shader(struct si_shader_ctx_state *state, const char *name,
   FILE *f)
 {
-   if (!sel || !sel->current)
+   if (!state->cso || !state->current)
return;
 
fprintf(f, "%s shader disassembly:\n", name);
-   si_dump_shader_key(sel->type, >current->key, f);
-   fprintf(f, "%s\n\n", sel->current->binary.disasm_string);
+   si_dump_shader_key(state->cso->type, >current->key, f);
+   fprintf(f, "%s\n\n", state->current->binary.disasm_string);
 }
 
 /* Parsed IBs are difficult to read without colors. Use "less -R file" to
@@ -536,11 +536,11 @@ static void si_dump_debug_state(struct pipe_context *ctx, 
FILE *f,
if (flags & PIPE_DEBUG_DEVICE_IS_HUNG)
si_dump_debug_registers(sctx, f);
 
-   si_dump_shader(sctx->vs_shader, "Vertex", f);
-   si_dump_shader(sctx->tcs_shader, "Tessellation control", f);
-   si_dump_shader(sctx->tes_shader, "Tessellation evaluation", f);
-   si_dump_shader(sctx->gs_shader, "Geometry", f);
-   si_dump_shader(sctx->ps_shader, "Fragment", f);
+   si_dump_shader(>vs_shader, "Vertex", f);
+   si_dump_shader(>tcs_shader, "Tessellation control", f);
+   si_dump_shader(>tes_shader, "Tessellation evaluation", f);
+   si_dump_shader(>gs_shader, "Geometry", f);
+   si_dump_shader(>ps_shader, "Fragment", f);
 
si_dump_last_bo_list(sctx, f);
si_dump_last_ib(sctx, f);
diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 19dd14f..13738da 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -915,10 +915,10 @@ static void si_set_user_data_base(struct si_context *sctx,
 void si_shader_change_notify(struct si_context *sctx)
 {
/* VS can be bound as VS, ES, or LS. */
-   if (sctx->tes_shader)
+   if (sctx->tes_shader.cso)
si_set_user_data_base(sctx, PIPE_SHADER_VERTEX,
  R_00B530_SPI_SHADER_USER_DATA_LS_0);
-   else if 

[Mesa-dev] [PATCH 2/3] radeonsi: implement vertex color clamping

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

This is only supported in the compatibility profile (without GS and tess).
---
 src/gallium/drivers/radeonsi/si_pipe.c  |  2 +-
 src/gallium/drivers/radeonsi/si_shader.c| 42 +
 src/gallium/drivers/radeonsi/si_shader.h|  8 +++--
 src/gallium/drivers/radeonsi/si_state.c |  2 ++
 src/gallium/drivers/radeonsi/si_state_shaders.c |  2 +-
 5 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 894fc59..d4be6f9 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -271,6 +271,7 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_START_INSTANCE:
case PIPE_CAP_NPOT_TEXTURES:
case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
+   case PIPE_CAP_VERTEX_COLOR_CLAMPED:
case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
 case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
case PIPE_CAP_TGSI_INSTANCEID:
@@ -331,7 +332,6 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
/* Unsupported features. */
case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS:
-   case PIPE_CAP_VERTEX_COLOR_CLAMPED:
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_FAKE_SW_MSAA:
case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 1f9b2b6..8da2f77 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -2075,6 +2075,45 @@ static void si_llvm_emit_vs_epilogue(struct 
lp_build_tgsi_context * bld_base)
 
outputs = MALLOC((info->num_outputs + 1) * sizeof(outputs[0]));
 
+   /* Vertex color clamping.
+*
+* This uses a state constant loaded in a user data SGPR and
+* an IF statement is added that clamps all colors if the constant
+* is true.
+*/
+   if (si_shader_ctx->type == TGSI_PROCESSOR_VERTEX &&
+   !si_shader_ctx->shader->is_gs_copy_shader) {
+   struct lp_build_if_state if_ctx;
+   LLVMValueRef cond = NULL;
+   LLVMValueRef addr, val;
+
+   for (i = 0; i < info->num_outputs; i++) {
+   if (info->output_semantic_name[i] != 
TGSI_SEMANTIC_COLOR &&
+   info->output_semantic_name[i] != 
TGSI_SEMANTIC_BCOLOR)
+   continue;
+
+   /* We've found a color. */
+   if (!cond) {
+   /* The state is in the first bit of the user 
SGPR. */
+   cond = 
LLVMGetParam(si_shader_ctx->radeon_bld.main_fn,
+   SI_PARAM_VS_STATE_BITS);
+   cond = LLVMBuildTrunc(gallivm->builder, cond,
+ 
LLVMInt1TypeInContext(gallivm->context), "");
+   lp_build_if(_ctx, gallivm, cond);
+   }
+
+   for (j = 0; j < 4; j++) {
+   addr = 
si_shader_ctx->radeon_bld.soa.outputs[i][j];
+   val = LLVMBuildLoad(gallivm->builder, addr, "");
+   val = radeon_llvm_saturate(bld_base, val);
+   LLVMBuildStore(gallivm->builder, val, addr);
+   }
+   }
+
+   if (cond)
+   lp_build_endif(_ctx);
+   }
+
for (i = 0; i < info->num_outputs; i++) {
outputs[i].name = info->output_semantic_name[i];
outputs[i].sid = info->output_semantic_index[i];
@@ -3444,6 +3483,9 @@ static void create_function(struct si_shader_context 
*si_shader_ctx)
if (shader->is_gs_copy_shader) {
last_array_pointer = SI_PARAM_CONST;
num_params = SI_PARAM_CONST+1;
+   } else {
+   params[SI_PARAM_VS_STATE_BITS] = i32;
+   num_params = SI_PARAM_VS_STATE_BITS+1;
}
 
/* The locations of the other parameters are assigned 
dynamically. */
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index fa5930a..54dad72 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -83,6 +83,7 @@ struct radeon_shader_reloc;
 #define SI_SGPR_VERTEX_BUFFER  8  /* VS only */
 #define SI_SGPR_BASE_VERTEX10 /* VS only */
 #define SI_SGPR_START_INSTANCE 11 /* VS only */
+#define SI_SGPR_VS_STATE_BITS  12 /* VS(VS) only */
 #define 

[Mesa-dev] [PATCH 0/3] RadeonSI forcing st/mesa to create shaders at link time

2015-10-10 Thread Marek Olšák
Hi,

This patch series implements all features needed for st/mesa to send shaders to 
the driver immediately.

The good thing about thread-safe shader CSOs is that multithreaded shader 
compilation suddenly seems easy.

Please review.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] radeonsi: implement fragment color clamping

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

using the shader key for now.
---
 src/gallium/drivers/radeonsi/si_pipe.c  |  2 +-
 src/gallium/drivers/radeonsi/si_shader.c| 13 +
 src/gallium/drivers/radeonsi/si_shader.h|  1 +
 src/gallium/drivers/radeonsi/si_state.c |  2 +-
 src/gallium/drivers/radeonsi/si_state.h |  1 +
 src/gallium/drivers/radeonsi/si_state_shaders.c |  1 +
 6 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index aa5a9ea..894fc59 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -271,6 +271,7 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_START_INSTANCE:
case PIPE_CAP_NPOT_TEXTURES:
case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
+   case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
 case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
case PIPE_CAP_TGSI_INSTANCEID:
case PIPE_CAP_COMPUTE:
@@ -330,7 +331,6 @@ static int si_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
/* Unsupported features. */
case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS:
-   case PIPE_CAP_FRAGMENT_COLOR_CLAMPED:
case PIPE_CAP_VERTEX_COLOR_CLAMPED:
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_FAKE_SW_MSAA:
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 012d708..1f9b2b6 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -2109,6 +2109,7 @@ static void si_llvm_emit_fs_epilogue(struct 
lp_build_tgsi_context * bld_base)
struct lp_build_context * base = _base->base;
struct lp_build_context * uint = _base->uint_bld;
struct tgsi_shader_info *info = >selector->info;
+   LLVMBuilderRef builder = base->gallivm->builder;
LLVMValueRef args[9];
LLVMValueRef last_args[9] = { 0 };
int depth_index = -1, stencil_index = -1, samplemask_index = -1;
@@ -2135,6 +2136,16 @@ static void si_llvm_emit_fs_epilogue(struct 
lp_build_tgsi_context * bld_base)
target = V_008DFC_SQ_EXP_MRT + semantic_index;
alpha_ptr = si_shader_ctx->radeon_bld.soa.outputs[i][3];
 
+   if (si_shader_ctx->shader->key.ps.clamp_color) {
+   for (int j = 0; j < 4; j++) {
+   LLVMValueRef ptr = 
si_shader_ctx->radeon_bld.soa.outputs[i][j];
+   LLVMValueRef result = 
LLVMBuildLoad(builder, ptr, "");
+
+   result = radeon_llvm_saturate(bld_base, 
result);
+   LLVMBuildStore(builder, result, ptr);
+   }
+   }
+
if (si_shader_ctx->shader->key.ps.alpha_to_one)
LLVMBuildStore(base->gallivm->builder,
   base->one, alpha_ptr);
@@ -2145,6 +2156,7 @@ static void si_llvm_emit_fs_epilogue(struct 
lp_build_tgsi_context * bld_base)
 
if (si_shader_ctx->shader->key.ps.poly_line_smoothing)
si_scale_alpha_by_sample_mask(bld_base, 
alpha_ptr);
+
break;
default:
target = 0;
@@ -3999,6 +4011,7 @@ void si_dump_shader_key(unsigned shader, union 
si_shader_key *key, FILE *f)
fprintf(f, "  alpha_func = %u\n", key->ps.alpha_func);
fprintf(f, "  alpha_to_one = %u\n", key->ps.alpha_to_one);
fprintf(f, "  poly_stipple = %u\n", key->ps.poly_stipple);
+   fprintf(f, "  clamp_color = %u\n", key->ps.clamp_color);
break;
 
default:
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 460..fa5930a 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -227,6 +227,7 @@ union si_shader_key {
unsignedalpha_to_one:1;
unsignedpoly_stipple:1;
unsignedpoly_line_smoothing:1;
+   unsignedclamp_color:1;
} ps;
struct {
unsignedinstance_divisors[SI_NUM_VERTEX_BUFFERS];
diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index 00d4bc1..3aafe8a 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -694,7 +694,7 @@ static void *si_create_rs_state(struct pipe_context *ctx,
rs->poly_smooth = state->poly_smooth;
rs->uses_poly_offset = state->offset_point || state->offset_line 

Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-10 Thread Alejandro Piñeiro
On 10/10/15 16:54, Jason Ekstrand wrote:
> On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro  
> wrote:
>> vec4 port of fs_cmod_propagation.
>>
>> Shader-db results:
>> total instructions in shared programs: 6241226 -> 6224469 (-0.27%)
>> instructions in affected programs: 498213 -> 481456 (-3.36%)
>> helped:3082
>> HURT:  0
> Would you mind cherry-picking this back onto
> 4e0a8e0a50c9ac91cb7a70b92b8d9c6fcc02b7aa (the commit right before we
> made NIR non-optional) and get some GLSL IR vs. NIR vec4-only numbers
> with this patch?  I'd like to know what it does to that delta as well.

FWIW, the previous shader-db numbers were done without grepping for
vec4. Matt mentioned that he preferred that way. As asked, the numbers
for this email will be vec4-only numbers (so grepping for vec4).

So, the shader-db numbers of IR vs NIR at that reference commit are the
following:
  total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
  instructions in affected programs: 1660279 -> 1460468 (-12.03%)
  helped:14668
  HURT:  1369

And IR vs NIR numbers cherry-picking the optimization are the following:
  total instructions in shared programs: 1845902 -> 1631459 (-11.62%)
  instructions in affected programs: 1663398 -> 1448955 (-12.89%)
  helped:14863
  HURT:  1203

FWIW, we need to take into account that this commit is also helping IR.
The shader-db numbers of IR reference vs IR cherry picking are the
following:
  total instructions in shared programs: 1848027 -> 1845902 (-0.11%)
  instructions in affected programs: 195042 -> 192917 (-1.09%)
  helped:1033
  HURT:  0

And for that reason, probably it is worth to compare IR at the reference
versus NIR results cherry-picking, that are the following:
  total instructions in shared programs: 1848027 -> 1631459 (-11.72%)
  instructions in affected programs: 1672237 -> 1455669 (-12.95%)
  helped:14955
  HURT:  1194

>
> Thanks!
> --Jason

You are welcome. Thanks for the patch reviewing.

Best regards

>
>> ---
>>
>> The final outcome is really similar to fs_brw_cmod_propagation. In
>> fact the only difference is that on fs we have this:
>>  if (scan_inst->overwrites_reg(inst->src[0])) {
>> if (scan_inst->is_partial_write() ||
>> scan_inst->dst.reg_offset != inst->src[0].reg_offset)
>>break;
>>
>> And on vec4 (this commit) we have this:
>>  if (inst->src[0].in_range(scan_inst->dst,
>>scan_inst->regs_written)) {
>>
>> if ((scan_inst->predicate && scan_inst->opcode != 
>> BRW_OPCODE_SEL) ||
>> scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
>> (scan_inst->dst.writemask != WRITEMASK_X && 
>> scan_inst->dst.writemask != WRITEMASK_XYZW))
>>break;
>>
>> if (scan_inst->dst.writemask == WRITEMASK_XYZW &&
>> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) {
>>break;
>> }
>>
>> So at some point I thought about refactoring it and having one common,
>> like with opt_predicated_break, but that one was possible with just
>> backend_instructions, while here we would need to deal with
>> vec4_instructions and fs_inst, that could be somewhat messy, so
>> I'm leaving this as it is.
>>
>>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp |   1 +
>>  src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
>>  .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 
>> +
>>  4 files changed, 166 insertions(+)
>>  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
>>
>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
>> b/src/mesa/drivers/dri/i965/Makefile.sources
>> index 81ef628..c1836d6 100644
>> --- a/src/mesa/drivers/dri/i965/Makefile.sources
>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
>> @@ -56,6 +56,7 @@ i965_compiler_FILES = \
>> brw_util.c \
>> brw_util.h \
>> brw_vec4_builder.h \
>> +   brw_vec4_cmod_propagation.cpp \
>> brw_vec4_copy_propagation.cpp \
>> brw_vec4.cpp \
>> brw_vec4_cse.cpp \
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index e966b96..55e381b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -1867,6 +1867,7 @@ vec4_visitor::run()
>>OPT(dead_code_eliminate);
>>OPT(dead_control_flow_eliminate, this);
>>OPT(opt_copy_propagation);
>> +  OPT(opt_cmod_propagation);
>>

[Mesa-dev] [PATCH shader-db 1/3] Makefile: avoid undefined reference build errors with LIBS

2015-10-10 Thread Rhys Kidd
Signed-off-by: Rhys Kidd 
---
 .gitignore |  1 +
 Makefile   | 14 +++---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/.gitignore b/.gitignore
index f69750a..cffa19c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,3 @@
 bin
 run
+*.o
diff --git a/Makefile b/Makefile
index 1ae0776..a4bfb8f 100644
--- a/Makefile
+++ b/Makefile
@@ -21,9 +21,17 @@
 
 CFLAGS ?= -g -O2 -march=native -pipe
 CFLAGS += -std=gnu99 -fopenmp
-LDFLAGS = -lepoxy -lgbm
+LIBS = -lepoxy -lgbm
 
-run:
+OBJ = run.o
+
+all: run
+
+%.o: %.c
+   $(CC) -c -o $@ $< $(CFLAGS)
+
+run: $(OBJ)
+   $(CC) $(CFLAGS) -o $@ $^ $(LIBS)
 
 clean:
-   rm -f run
+   rm -f run $(OBJ)
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH shader-db 0/3] Makefile and documentation cleanup

2015-10-10 Thread Rhys Kidd
Patchset adds Makefile and documentation improvements.

I aimed to write these as I would have found most helpful when seeking to
understand shader-db's operation, as a new Mesa developer.

First patch resolves the build errors [0] experienced on Ubuntu 15.04 and
permit a simple 'make' to work if the dependencies are met. The following two
patches improve the documentation of those dependencies.

[0]
$ cc --version
cc (Ubuntu 4.9.2-10ubuntu13) 4.9.2
...
$ make
cc -g -O2 -march=native -pipe -std=gnu99 -fopenmp  -lepoxy -lgbm  run.c   -o run
/tmp/ccaZrtAC.o: In function `main._omp_fn.0':
/home/usera/Coding/shader-db/run.c:511: undefined reference to 
`epoxy_eglBindAPI'
/home/usera/Coding/shader-db/run.c:513: undefined reference to 
`epoxy_eglCreateContext'
/home/usera/Coding/shader-db/run.c:516: undefined reference to 
`epoxy_eglMakeCurrent'
/home/usera/Coding/shader-db/run.c:528: undefined reference to 
`epoxy_eglCreateContext'
/home/usera/Coding/shader-db/run.c:536: undefined reference to 
`epoxy_eglMakeCurrent'
/home/usera/Coding/shader-db/run.c:541: undefined reference to `epoxy_glEnable'
/home/usera/Coding/shader-db/run.c:542: undefined reference to `epoxy_glEnable'
/home/usera/Coding/shader-db/run.c:543: undefined reference to 
`epoxy_glDebugMessageControl'
/home/usera/Coding/shader-db/run.c:545: undefined reference to 
`epoxy_glDebugMessageControl'
/home/usera/Coding/shader-db/run.c:548: undefined reference to 
`epoxy_glDebugMessageCallback'
/home/usera/Coding/shader-db/run.c:642: undefined reference to 
`epoxy_eglDestroyContext'
/home/usera/Coding/shader-db/run.c:643: undefined reference to 
`epoxy_eglDestroyContext'
/home/usera/Coding/shader-db/run.c:644: undefined reference to 
`epoxy_eglReleaseThread'
/home/usera/Coding/shader-db/run.c:585: undefined reference to 
`epoxy_eglMakeCurrent'
/home/usera/Coding/shader-db/run.c:620: undefined reference to 
`epoxy_glGenProgramsARB'
/home/usera/Coding/shader-db/run.c:621: undefined reference to 
`epoxy_glBindProgramARB'
/home/usera/Coding/shader-db/run.c:622: undefined reference to 
`epoxy_glProgramStringARB'
/home/usera/Coding/shader-db/run.c:624: undefined reference to 
`epoxy_glDeleteProgramsARB'
/home/usera/Coding/shader-db/run.c:625: undefined reference to 
`epoxy_glGetError'
/home/usera/Coding/shader-db/run.c:594: undefined reference to 
`epoxy_glCreateProgram'
/home/usera/Coding/shader-db/run.c:611: undefined reference to 
`epoxy_glAttachShader'
/home/usera/Coding/shader-db/run.c:612: undefined reference to 
`epoxy_glDeleteShader'
/home/usera/Coding/shader-db/run.c:597: undefined reference to 
`epoxy_glCreateShader'
/home/usera/Coding/shader-db/run.c:598: undefined reference to 
`epoxy_glShaderSource'
/home/usera/Coding/shader-db/run.c:599: undefined reference to 
`epoxy_glCompileShader'
/home/usera/Coding/shader-db/run.c:602: undefined reference to 
`epoxy_glGetShaderiv'
/home/usera/Coding/shader-db/run.c:606: undefined reference to 
`epoxy_glGetShaderInfoLog'
/home/usera/Coding/shader-db/run.c:615: undefined reference to 
`epoxy_glLinkProgram'
/home/usera/Coding/shader-db/run.c:616: undefined reference to 
`epoxy_glDeleteProgram'
/home/usera/Coding/shader-db/run.c:517: undefined reference to `epoxy_glEnable'
/home/usera/Coding/shader-db/run.c:518: undefined reference to `epoxy_glEnable'
/home/usera/Coding/shader-db/run.c:519: undefined reference to 
`epoxy_glDebugMessageControl'
/home/usera/Coding/shader-db/run.c:521: undefined reference to 
`epoxy_glDebugMessageControl'
/home/usera/Coding/shader-db/run.c:525: undefined reference to 
`epoxy_glDebugMessageCallback'
/tmp/ccaZrtAC.o: In function `main':
/home/usera/Coding/shader-db/run.c:334: undefined reference to 
`epoxy_eglQueryString'
/home/usera/Coding/shader-db/run.c:354: undefined reference to 
`gbm_create_device'
/home/usera/Coding/shader-db/run.c:361: undefined reference to 
`epoxy_eglGetPlatformDisplayEXT'
/home/usera/Coding/shader-db/run.c:369: undefined reference to 
`epoxy_eglInitialize'
/home/usera/Coding/shader-db/run.c:379: undefined reference to 
`epoxy_eglQueryString'
/home/usera/Coding/shader-db/run.c:395: undefined reference to 
`epoxy_eglChooseConfig'
/home/usera/Coding/shader-db/run.c:659: undefined reference to 
`epoxy_eglTerminate'
/home/usera/Coding/shader-db/run.c:661: undefined reference to 
`gbm_device_destroy'
/home/usera/Coding/shader-db/run.c:401: undefined reference to 
`epoxy_eglBindAPI'
/home/usera/Coding/shader-db/run.c:412: undefined reference to 
`epoxy_eglCreateContext'
/home/usera/Coding/shader-db/run.c:415: undefined reference to 
`epoxy_eglMakeCurrent'
/home/usera/Coding/shader-db/run.c:462: undefined reference to 
`epoxy_eglCreateContext'
/home/usera/Coding/shader-db/run.c:470: undefined reference to 
`epoxy_eglMakeCurrent'
/home/usera/Coding/shader-db/run.c:475: undefined reference to 
`epoxy_glGetString'
/home/usera/Coding/shader-db/run.c:478: undefined reference to 
`epoxy_glGetString'
/home/usera/Coding/shader-db/run.c:417: undefined reference to 

[Mesa-dev] [PATCH shader-db 2/3] docs: Improve dependencies documentation

2015-10-10 Thread Rhys Kidd
Signed-off-by: Rhys Kidd 
---
 README | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/README b/README
index e301d0e..6ed3244 100644
--- a/README
+++ b/README
@@ -47,7 +47,18 @@ ST_DEBUG=precompile R600_DEBUG=ps,vs,gs,precompile ./run 
shaders -1 2> new-run
 
 === Dependencies ===
 run requires some GNU C extensions, render nodes (/dev/dri/renderD128),
-libepoxy, OpenMP, and Mesa configured with --with-egl-platforms=x11,drm
+libepoxy, libgbm, OpenMP, and Mesa configured with --with-egl-platforms=x11,drm
+
+Install necessary dependencies on Ubuntu:
+```
+# Developers will probably have a local build of Mesa
+sudo apt-get install build-essentials libepoxy-dev libgbm-dev
+```
+
+Build with:
+```
+make
+```
 
 === jemalloc ===
 Since run compiles shaders in different threads, malloc/free locking overhead
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH shader-db 3/3] docs: Add symbolic link generation step

2015-10-10 Thread Rhys Kidd
Signed-off-by: Rhys Kidd 
---
 README | 5 +
 1 file changed, 5 insertions(+)

diff --git a/README b/README
index 6ed3244..03be4e7 100644
--- a/README
+++ b/README
@@ -60,6 +60,11 @@ Build with:
 make
 ```
 
+run.py relies on a symbolic link to a built piglit bin directory, as follows:
+```
+ln -s /bin "$PWD"/bin
+```
+
 === jemalloc ===
 Since run compiles shaders in different threads, malloc/free locking overhead
 from inside Mesa can be expensive. Preloading jemalloc can cut significant
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Samuel Pitoiset



On 10/10/2015 09:58 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 3:55 PM, Samuel Pitoiset
 wrote:


On 10/10/2015 09:42 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset
 wrote:

This patch looks fine except that it should be a bit more normalized. I
mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same for
PUSH_SPACE calls, sometimes you add it sometimes not.

Meh. We need to get our error checking situation straight, but this
isn't the patch to do it in.


Yeah, but this needs to be clarified.

What does?


I mean, we should either use PUSH_SPACE everywhere or not at all, and 
always breaks (or not) when PUSH_SPACE fails.

That's really a minor issue.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] gallium: add PIPE_CAP_SHAREABLE_SHADERS

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

I'll let drivers figure out how to do it.
---
 src/gallium/docs/source/screen.rst   | 2 ++
 src/gallium/drivers/freedreno/freedreno_screen.c | 1 +
 src/gallium/drivers/i915/i915_screen.c   | 1 +
 src/gallium/drivers/ilo/ilo_screen.c | 1 +
 src/gallium/drivers/llvmpipe/lp_screen.c | 1 +
 src/gallium/drivers/nouveau/nv30/nv30_screen.c   | 1 +
 src/gallium/drivers/nouveau/nv50/nv50_screen.c   | 1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   | 1 +
 src/gallium/drivers/r300/r300_screen.c   | 1 +
 src/gallium/drivers/r600/r600_pipe.c | 1 +
 src/gallium/drivers/radeonsi/si_pipe.c   | 1 +
 src/gallium/drivers/softpipe/sp_screen.c | 1 +
 src/gallium/drivers/svga/svga_screen.c   | 1 +
 src/gallium/drivers/vc4/vc4_screen.c | 1 +
 src/gallium/include/pipe/p_defines.h | 1 +
 15 files changed, 16 insertions(+)

diff --git a/src/gallium/docs/source/screen.rst 
b/src/gallium/docs/source/screen.rst
index e08844b..72f7596 100644
--- a/src/gallium/docs/source/screen.rst
+++ b/src/gallium/docs/source/screen.rst
@@ -276,6 +276,8 @@ The integer capabilities:
   GL4 hardware will likely need to emulate it with a shader variant, or by
   selecting the interpolation weights with a conditional assignment
   in the shader.
+* ``PIPE_CAP_SHAREABLE_SHADERS``: Whether shader CSOs can be used by any
+  pipe_context.
 
 
 
diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
b/src/gallium/drivers/freedreno/freedreno_screen.c
index 0d01005..2e8bf47 100644
--- a/src/gallium/drivers/freedreno/freedreno_screen.c
+++ b/src/gallium/drivers/freedreno/freedreno_screen.c
@@ -236,6 +236,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_DEPTH_BOUNDS_TEST:
case PIPE_CAP_TGSI_TXQS:
case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
+   case PIPE_CAP_SHAREABLE_SHADERS:
return 0;
 
case PIPE_CAP_MAX_VIEWPORTS:
diff --git a/src/gallium/drivers/i915/i915_screen.c 
b/src/gallium/drivers/i915/i915_screen.c
index 9d6b3d3..c91408d 100644
--- a/src/gallium/drivers/i915/i915_screen.c
+++ b/src/gallium/drivers/i915/i915_screen.c
@@ -249,6 +249,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap 
cap)
case PIPE_CAP_DEPTH_BOUNDS_TEST:
case PIPE_CAP_TGSI_TXQS:
case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
+   case PIPE_CAP_SHAREABLE_SHADERS:
   return 0;
 
case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:
diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
b/src/gallium/drivers/ilo/ilo_screen.c
index 76812a6..acf688f 100644
--- a/src/gallium/drivers/ilo/ilo_screen.c
+++ b/src/gallium/drivers/ilo/ilo_screen.c
@@ -471,6 +471,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
case PIPE_CAP_DEPTH_BOUNDS_TEST:
case PIPE_CAP_TGSI_TXQS:
case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
+   case PIPE_CAP_SHAREABLE_SHADERS:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
b/src/gallium/drivers/llvmpipe/lp_screen.c
index 50c3781..e2ed267 100644
--- a/src/gallium/drivers/llvmpipe/lp_screen.c
+++ b/src/gallium/drivers/llvmpipe/lp_screen.c
@@ -298,6 +298,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
pipe_cap param)
case PIPE_CAP_DEPTH_BOUNDS_TEST:
case PIPE_CAP_TGSI_TXQS:
case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
+   case PIPE_CAP_SHAREABLE_SHADERS:
   return 0;
}
/* should only get here on unhandled cases */
diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
index 335c163..d4cf143 100644
--- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
@@ -171,6 +171,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
case PIPE_CAP_TGSI_TXQS:
case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
+   case PIPE_CAP_SHAREABLE_SHADERS:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 812b246..a4431f2 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -216,6 +216,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
case PIPE_CAP_FORCE_PERSAMPLE_INTERP:
+   case PIPE_CAP_SHAREABLE_SHADERS:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index afd91e6..57c9c6c 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -202,6 +202,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
 

[Mesa-dev] [PATCH 3/3] st/mesa: create shaders which have only one variant immediatelly

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

---
 src/mesa/state_tracker/st_cb_program.c |  5 +++--
 src/mesa/state_tracker/st_context.c| 14 ++
 src/mesa/state_tracker/st_context.h|  7 +++
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_program.c 
b/src/mesa/state_tracker/st_cb_program.c
index 40f2af0..611aea7 100644
--- a/src/mesa/state_tracker/st_cb_program.c
+++ b/src/mesa/state_tracker/st_cb_program.c
@@ -222,6 +222,7 @@ st_program_string_notify( struct gl_context *ctx,
struct gl_program *prog )
 {
struct st_context *st = st_context(ctx);
+   gl_shader_stage stage = _mesa_program_enum_to_shader_stage(target);
 
if (target == GL_FRAGMENT_PROGRAM_ARB) {
   struct st_fragment_program *stfp = (struct st_fragment_program *) prog;
@@ -276,10 +277,10 @@ st_program_string_notify( struct gl_context *ctx,
  st->dirty.st |= ST_NEW_TESSEVAL_PROGRAM;
}
 
-   if (ST_DEBUG & DEBUG_PRECOMPILE)
+   if (ST_DEBUG & DEBUG_PRECOMPILE ||
+   st->shader_has_one_variant[stage])
   st_precompile_shader_variant(st, prog);
 
-   /* XXX check if program is legal, within limits */
return GL_TRUE;
 }
 
diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
index 6256c0b..4f3f525 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -293,6 +293,20 @@ st_create_context_priv( struct gl_context *ctx, struct 
pipe_context *pipe,
  ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectSampler = true;
}
 
+   /* Set which shader types can be compiled at link time. */
+   st->shader_has_one_variant[MESA_SHADER_VERTEX] =
+ st->has_shareable_shaders &&
+ !st->clamp_vert_color_in_shader;
+
+   st->shader_has_one_variant[MESA_SHADER_FRAGMENT] =
+ st->has_shareable_shaders &&
+ !st->clamp_frag_color_in_shader &&
+ st->can_force_persample_interp;
+
+   st->shader_has_one_variant[MESA_SHADER_TESS_CTRL] = 
st->has_shareable_shaders;
+   st->shader_has_one_variant[MESA_SHADER_TESS_EVAL] = 
st->has_shareable_shaders;
+   st->shader_has_one_variant[MESA_SHADER_GEOMETRY] = 
st->has_shareable_shaders;
+
_mesa_compute_version(ctx);
 
if (ctx->Version == 0) {
diff --git a/src/mesa/state_tracker/st_context.h 
b/src/mesa/state_tracker/st_context.h
index 446fe5d..d0aed7e 100644
--- a/src/mesa/state_tracker/st_context.h
+++ b/src/mesa/state_tracker/st_context.h
@@ -101,6 +101,13 @@ struct st_context
boolean can_force_persample_interp;
boolean has_shareable_shaders;
 
+   /**
+* If a shader can be created when we get its source.
+* This means it has only 1 variant, not counting glBitmap and
+* glDrawPixels.
+*/
+   boolean shader_has_one_variant[MESA_SHADER_STAGES];
+
boolean needs_texcoord_semantic;
boolean apply_texture_swizzle_to_border_color;
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] st/mesa: decouple shaders from contexts if they are shareable

2015-10-10 Thread Marek Olšák
From: Marek Olšák 

---
 src/mesa/state_tracker/st_atom_shader.c   | 10 +-
 src/mesa/state_tracker/st_cb_bitmap.c |  2 +-
 src/mesa/state_tracker/st_cb_drawpixels.c |  2 +-
 src/mesa/state_tracker/st_context.c   |  3 ++-
 src/mesa/state_tracker/st_context.h   |  1 +
 src/mesa/state_tracker/st_program.c   | 16 +++-
 6 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/src/mesa/state_tracker/st_atom_shader.c 
b/src/mesa/state_tracker/st_atom_shader.c
index 1e880a1..3941454 100644
--- a/src/mesa/state_tracker/st_atom_shader.c
+++ b/src/mesa/state_tracker/st_atom_shader.c
@@ -64,7 +64,7 @@ update_fp( struct st_context *st )
assert(stfp->Base.Base.Target == GL_FRAGMENT_PROGRAM_ARB);
 
memset(, 0, sizeof(key));
-   key.st = st;
+   key.st = st->has_shareable_shaders ? NULL : st;
 
/* _NEW_FRAG_CLAMP */
key.clamp_color = st->clamp_frag_color_in_shader &&
@@ -119,7 +119,7 @@ update_vp( struct st_context *st )
assert(stvp->Base.Base.Target == GL_VERTEX_PROGRAM_ARB);
 
memset(, 0, sizeof key);
-   key.st = st;  /* variants are per-context */
+   key.st = st->has_shareable_shaders ? NULL : st;
 
/* When this is true, we will add an extra input to the vertex
 * shader translation (for edgeflags), an extra output with
@@ -174,7 +174,7 @@ update_gp( struct st_context *st )
assert(stgp->Base.Base.Target == GL_GEOMETRY_PROGRAM_NV);
 
memset(, 0, sizeof(key));
-   key.st = st;
+   key.st = st->has_shareable_shaders ? NULL : st;
 
st->gp_variant = st_get_gp_variant(st, stgp, );
 
@@ -210,7 +210,7 @@ update_tcp( struct st_context *st )
assert(sttcp->Base.Base.Target == GL_TESS_CONTROL_PROGRAM_NV);
 
memset(, 0, sizeof(key));
-   key.st = st;
+   key.st = st->has_shareable_shaders ? NULL : st;
 
st->tcp_variant = st_get_tcp_variant(st, sttcp, );
 
@@ -246,7 +246,7 @@ update_tep( struct st_context *st )
assert(sttep->Base.Base.Target == GL_TESS_EVALUATION_PROGRAM_NV);
 
memset(, 0, sizeof(key));
-   key.st = st;
+   key.st = st->has_shareable_shaders ? NULL : st;
 
st->tep_variant = st_get_tep_variant(st, sttep, );
 
diff --git a/src/mesa/state_tracker/st_cb_bitmap.c 
b/src/mesa/state_tracker/st_cb_bitmap.c
index bb6dfe8..cbc6845 100644
--- a/src/mesa/state_tracker/st_cb_bitmap.c
+++ b/src/mesa/state_tracker/st_cb_bitmap.c
@@ -269,7 +269,7 @@ draw_bitmap_quad(struct gl_context *ctx, GLint x, GLint y, 
GLfloat z,
struct pipe_resource *vbuf = NULL;
 
memset(, 0, sizeof(key));
-   key.st = st;
+   key.st = st->has_shareable_shaders ? NULL : st;
key.bitmap = GL_TRUE;
key.clamp_color = st->clamp_frag_color_in_shader &&
  st->ctx->Color._ClampFragmentColor;
diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c 
b/src/mesa/state_tracker/st_cb_drawpixels.c
index 7e8633e..20cbfde 100644
--- a/src/mesa/state_tracker/st_cb_drawpixels.c
+++ b/src/mesa/state_tracker/st_cb_drawpixels.c
@@ -914,7 +914,7 @@ get_color_fp_variant(struct st_context *st)
 
memset(, 0, sizeof(key));
 
-   key.st = st;
+   key.st = st->has_shareable_shaders ? NULL : st;
key.drawpixels = 1;
key.scaleAndBias = (ctx->Pixel.RedBias != 0.0 ||
ctx->Pixel.RedScale != 1.0 ||
diff --git a/src/mesa/state_tracker/st_context.c 
b/src/mesa/state_tracker/st_context.c
index bef7307..6256c0b 100644
--- a/src/mesa/state_tracker/st_context.c
+++ b/src/mesa/state_tracker/st_context.c
@@ -237,7 +237,8 @@ st_create_context_priv( struct gl_context *ctx, struct 
pipe_context *pipe,
   PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER);
st->can_force_persample_interp = screen->get_param(screen,
   PIPE_CAP_FORCE_PERSAMPLE_INTERP);
-
+   st->has_shareable_shaders = screen->get_param(screen,
+ PIPE_CAP_SHAREABLE_SHADERS);
st->needs_texcoord_semantic =
   screen->get_param(screen, PIPE_CAP_TGSI_TEXCOORD);
st->apply_texture_swizzle_to_border_color =
diff --git a/src/mesa/state_tracker/st_context.h 
b/src/mesa/state_tracker/st_context.h
index f187d82..446fe5d 100644
--- a/src/mesa/state_tracker/st_context.h
+++ b/src/mesa/state_tracker/st_context.h
@@ -99,6 +99,7 @@ struct st_context
boolean has_etc2;
boolean prefer_blit_based_texture_transfer;
boolean can_force_persample_interp;
+   boolean has_shareable_shaders;
 
boolean needs_texcoord_semantic;
boolean apply_texture_swizzle_to_border_color;
diff --git a/src/mesa/state_tracker/st_program.c 
b/src/mesa/state_tracker/st_program.c
index 6a69ba7..87571a8 100644
--- a/src/mesa/state_tracker/st_program.c
+++ b/src/mesa/state_tracker/st_program.c
@@ -1728,6 +1728,12 @@ destroy_program_variants_cb(GLuint key, void *data, void 
*userData)
 void
 st_destroy_program_variants(struct st_context *st)
 {
+   /* If shaders can be shared with other contexts, the last context will
+* call DeleteProgram 

[Mesa-dev] [PATCH 0/3] Creating gallium shaders at link time

2015-10-10 Thread Marek Olšák
Hi,

This is a continuation of the previous series. It allows drivers to have only 1 
shader variant for every user shader in st/mesa, not counting glDrawPixels and 
glBitmap variants.

In such case, the shader variant is created in LinkShader or 
ProgramStringNotify.

Please review.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/glsl: Use shader_prog->Name for naming the NIR shader

2015-10-10 Thread Kenneth Graunke
On Friday, October 09, 2015 07:45:20 AM Jason Ekstrand wrote:
> This has the better name to use. Aparently, sh->Name is usually 0.
> ---
>  src/glsl/nir/glsl_to_nir.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index 6e1dd84..3284bdc 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -150,7 +150,7 @@ glsl_to_nir(const struct gl_shader_program *shader_prog,
>if (sh->Program->SamplersUsed & (1 << i))
>   num_textures = i;
>  
> -   shader->info.name = ralloc_asprintf(shader, "GLSL%d", sh->Name);
> +   shader->info.name = ralloc_asprintf(shader, "GLSL%d", shader_prog->Name);
> if (shader_prog->Label)
>shader->info.label = ralloc_strdup(shader, shader_prog->Label);
> shader->info.num_textures = num_textures;
> 

Whoops.  Right, this is more useful.

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Nouveau] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Samuel Pitoiset



On 10/10/2015 10:17 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 4:21 PM, Samuel Pitoiset
 wrote:


On 10/10/2015 09:58 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 3:55 PM, Samuel Pitoiset
 wrote:


On 10/10/2015 09:42 PM, Ilia Mirkin wrote:

On Sat, Oct 10, 2015 at 3:41 PM, Samuel Pitoiset
 wrote:

This patch looks fine except that it should be a bit more normalized. I
mean, sometimes you break when PUSH_SPACE fails, sometimes not. Same
for
PUSH_SPACE calls, sometimes you add it sometimes not.

Meh. We need to get our error checking situation straight, but this
isn't the patch to do it in.


Yeah, but this needs to be clarified.

What does?


I mean, we should either use PUSH_SPACE everywhere or not at all, and always
breaks (or not) when PUSH_SPACE fails.
That's really a minor issue.

It's actually a major issue. Error-handling is practically
non-existent. There are a couple of spots here and there, but it
doesn't really scale up. I guess I (semi-)accidentally removed a
couple of spots that error checked, but, again, meh. Doing this for
real will require some careful thought.


Yeah, okay. So we really need to improve error-handling. :)


   -ilia


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/10] radeonsi: re-enable unsafe-fp-math for LLVM 3.8

2015-10-10 Thread Nick Sarnie
Hi Marek,

I don't get the hang on Dota 2 Reborn with this patch and LLVM/Mesa git.

Tested-by: Nick Sarnie 

Thanks!

On Sat, Oct 10, 2015 at 10:12 PM, Connor Abbott  wrote:

> FWIW, this isn't quite correct with ARB_shader_precision or GL4.1 --
> it specifies that infinities should be correctly generated through
> division by 0, which unsafe-fp-math doesn't guarantee. At least,
> that's assuming this is similar to the "fast" per-instruction flag
> (http://llvm.org/docs/LangRef.html#fast-math-flags) which says "This
> flag implies all the others."
>
> On Sat, Oct 10, 2015 at 9:29 PM, Marek Olšák  wrote:
> > From: Marek Olšák 
> >
> > Required for 1/sqrt ==> rsq.
> >
> > We should finally fix the hang instead of running away from the issue.
> This
> > assumes the bug is in LLVM and we have time to fix it before the release.
> > Include compute shaders as well, which only affects TGSI and thus OpenGL.
> >
> > Totals:
> > SGPRS: 344368 -> 345104 (0.21 %)
> > VGPRS: 197552 -> 197420 (-0.07 %)
> > Code Size: 7366304 -> 7324692 (-0.56 %) bytes
> > LDS: 91 -> 91 (0.00 %) blocks
> > Scratch: 1615872 -> 1524736 (-5.64 %) bytes per wave
> >
> > Totals from affected shaders:
> > SGPRS: 146696 -> 147432 (0.50 %)
> > VGPRS: 87212 -> 87080 (-0.15 %)
> > Code Size: 3852664 -> 3811052 (-1.08 %) bytes
> > LDS: 48 -> 48 (0.00 %) blocks
> > Scratch: 1179648 -> 1088512 (-7.73 %) bytes per wave
> > ---
> >  src/gallium/drivers/radeon/radeon_llvm_emit.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c
> b/src/gallium/drivers/radeon/radeon_llvm_emit.c
> > index 6b2ebde..4bda4a4 100644
> > --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c
> > +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c
> > @@ -84,6 +84,13 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned
> type)
> > sprintf(Str, "%1d", llvm_type);
> >
> > LLVMAddTargetDependentFunctionAttr(F, "ShaderType", Str);
> > +
> > +#if HAVE_LLVM >= 0x0308
> > +   /* This only affects TGSI (OpenGL), so it's okay to set it for
> > +* compute shaders too.
> > +*/
> > +   LLVMAddTargetDependentFunctionAttr(F, "unsafe-fp-math", "true");
> > +#endif
> >  }
> >
> >  static void init_r600_target()
> > --
> > 2.1.4
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nv50, nvc0: don't base decisions on available pushbuf space

2015-10-10 Thread Ilia Mirkin
We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.

Signed-off-by: Ilia Mirkin 
Cc: mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/nouveau/nv50/nv50_shader_state.c |  9 ++---
 src/gallium/drivers/nouveau/nv50/nv50_transfer.c | 16 +++-
 src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c | 20 +---
 3 files changed, 10 insertions(+), 35 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c 
b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
index fdde11f..941555f 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_shader_state.c
@@ -65,14 +65,9 @@ nv50_constbufs_validate(struct nv50_context *nv50)
PUSH_DATA (push, (b << 12) | (i << 8) | p | 1);
 }
 while (words) {
-   unsigned nr;
-
-   if (!PUSH_SPACE(push, 16))
-  break;
-   nr = PUSH_AVAIL(push);
-   assert(nr >= 16);
-   nr = MIN2(MIN2(nr - 3, words), NV04_PFIFO_MAX_PACKET_LEN);
+   unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
 
+   PUSH_SPACE(push, nr + 3);
BEGIN_NV04(push, NV50_3D(CB_ADDR), 1);
PUSH_DATA (push, (start << 8) | b);
BEGIN_NI04(push, NV50_3D(CB_DATA(0)), nr);
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c 
b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
index be51407..9a3fd1e 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_transfer.c
@@ -187,14 +187,7 @@ nv50_sifc_linear_u8(struct nouveau_context *nv,
PUSH_DATA (push, 0);
 
while (count) {
-  unsigned nr;
-
-  if (!PUSH_SPACE(push, 16))
- break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 1);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
 
   BEGIN_NI04(push, NV50_2D(SIFC_DATA), nr);
   PUSH_DATAp(push, src, nr);
@@ -395,12 +388,9 @@ nv50_cb_push(struct nouveau_context *nv,
nouveau_pushbuf_validate(push);
 
while (words) {
-  unsigned nr;
-
-  nr = PUSH_AVAIL(push);
-  nr = MIN2(nr - 7, words);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
+  unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN);
 
+  PUSH_SPACE(push, nr + 7);
   BEGIN_NV04(push, NV50_3D(CB_DEF_ADDRESS_HIGH), 3);
   PUSH_DATAh(push, bo->offset + base);
   PUSH_DATA (push, bo->offset + base);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
index aaec60a..d459dd6 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_transfer.c
@@ -188,14 +188,10 @@ nvc0_m2mf_push_linear(struct nouveau_context *nv,
nouveau_pushbuf_validate(push);
 
while (count) {
-  unsigned nr;
+  unsigned nr = MIN2(count, NV04_PFIFO_MAX_PACKET_LEN);
 
-  if (!PUSH_SPACE(push, 16))
+  if (!PUSH_SPACE(push, nr + 9))
  break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 9);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN);
 
   BEGIN_NVC0(push, NVC0_M2MF(OFFSET_OUT_HIGH), 2);
   PUSH_DATAh(push, dst->offset + offset);
@@ -234,14 +230,10 @@ nve4_p2mf_push_linear(struct nouveau_context *nv,
nouveau_pushbuf_validate(push);
 
while (count) {
-  unsigned nr;
+  unsigned nr = MIN2(count, (NV04_PFIFO_MAX_PACKET_LEN - 1));
 
-  if (!PUSH_SPACE(push, 16))
+  if (!PUSH_SPACE(push, nr + 10))
  break;
-  nr = PUSH_AVAIL(push);
-  assert(nr >= 16);
-  nr = MIN2(count, nr - 8);
-  nr = MIN2(nr, (NV04_PFIFO_MAX_PACKET_LEN - 1));
 
   BEGIN_NVC0(push, NVE4_P2MF(UPLOAD_DST_ADDRESS_HIGH), 2);
   PUSH_DATAh(push, dst->offset + offset);
@@ -571,9 +563,7 @@ nvc0_cb_bo_push(struct nouveau_context *nv,
PUSH_DATA (push, bo->offset + base);
 
while (words) {
-  unsigned nr = PUSH_AVAIL(push);
-  nr = MIN2(nr, words);
-  nr = MIN2(nr, NV04_PFIFO_MAX_PACKET_LEN - 1);
+  unsigned nr = MIN2(words, NV04_PFIFO_MAX_PACKET_LEN - 1);
 
   PUSH_SPACE(push, nr + 2);
   PUSH_REFN (push, bo, NOUVEAU_BO_WR | domain);
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] configure.ac: ensure RM is set

2015-10-10 Thread Jonathan Gray
GNU make predefines RM to rm -f but this is not required by POSIX
so ensure that RM is set.  This fixes "make clean" on OpenBSD.

v2: use AC_CHECK_PROG

Signed-off-by: Jonathan Gray 
CC: "10.6 11.0" 
---
 configure.ac | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/configure.ac b/configure.ac
index 3feec19..f99545f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -107,6 +107,8 @@ AC_SYS_LARGEFILE
 LT_PREREQ([2.2])
 LT_INIT([disable-static])
 
+AC_CHECK_PROG(RM, rm, [rm -f])
+
 AX_PROG_BISON([],
   AS_IF([test ! -f "$srcdir/src/glsl/glcpp/glcpp-parse.c"],
 [AC_MSG_ERROR([bison not found - unable to compile 
glcpp-parse.y])]))
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH shader-db] check_dependencies: refactor to a python script

2015-10-10 Thread Rhys Kidd
Deliver consistency with all other shader-db scripts, which are Python scripts.

No change in features or output strings.

Passed pep8, except for two comment lines suggesting commands to add
dependencies to the [require] section of *.shader_test files.

Although not a performance critical feature, equivalent performance to
Perl script other than process_directories() recursive directory traversal.

os.scandir(item) would be significantly faster than os.walk(item), however
its use would introduce a minimum dependency on Python 3.5 which is preferably
avoided at this time.

Signed-off-by: Rhys Kidd 
---
 check_dependencies.pl | 107 --
 check_dependencies.py |  82 ++
 2 files changed, 82 insertions(+), 107 deletions(-)
 delete mode 100755 check_dependencies.pl
 create mode 100755 check_dependencies.py

diff --git a/check_dependencies.pl b/check_dependencies.pl
deleted file mode 100755
index 3e49f7f..000
--- a/check_dependencies.pl
+++ /dev/null
@@ -1,107 +0,0 @@
-#!/usr/bin/perl
-#
-# Copyright © 2014 Intel Corporation
-#
-# Permission is hereby granted, free of charge, to any person obtaining a
-# copy of this software and associated documentation files (the "Software"),
-# to deal in the Software without restriction, including without limitation
-# the rights to use, copy, modify, merge, publish, distribute, sublicense,
-# and/or sell copies of the Software, and to permit persons to whom the
-# Software is furnished to do so, subject to the following conditions:
-#
-# The above copyright notice and this permission notice (including the next
-# paragraph) shall be included in all copies or substantial portions of the
-# Software.
-#
-# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
-# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
-# IN THE SOFTWARE.
-
-# For checking that shader_test's dependencies are correct.
-#
-# Run with
-#  ./check_dependencies.pl shaders/
-#
-# And then run a command like these to add dependencies to the [require]
-# section:
-#
-# find shaders/ -name '*.shader_test' -exec grep -l '#version 120' {} + | 
xargs sed -i -e 's/GLSL >= 1.10/GLSL >= 1.20/'
-# find shaders/ -name '*.shader_test' -exec grep -l '#extension 
GL_ARB_texture_rectangle : require' {} + | xargs sed -i -e 's/GLSL >= 1.20/GLSL 
>= 1.20\nGL_ARB_texture_rectangle/'
-
-use strict;
-use File::Find;
-
-die("Not enough arguments: specify a directory\n") if ($#ARGV < 0);
-
-# The array_diff function is copied from the Array::Utils package and contains
-# this copyright:
-#
-# This module is Copyright (c) 2007 Sergei A. Fedorov.
-# All rights reserved.
-#
-# You may distribute under the terms of either the GNU General Public
-# License or the Artistic License, as specified in the Perl README file.
-sub array_diff(\@\@) {
-   my %e = map { $_ => undef } @{$_[1]};
-   return @{[ ( grep { (exists $e{$_}) ? ( delete $e{$_} ) : ( 1 ) } @{ 
$_[0] } ), keys %e ] };
-}
-
-my @shader_test;
-
-sub wanted {
-   push(@shader_test, $File::Find::name) if (/\.shader_test$/);
-}
-
-finddepth(\,  @ARGV);
-
-my $fail = 0;
-
-foreach my $shader_test (@shader_test) {
-   my $expected;
-   my $actual;
-   my @expected_ext;
-   my @actual_ext;
-
-   open(my $fh, "<", $shader_test)
-   or die("cannot open < $shader_test: $!\n");
-   
-   while (<$fh>) {
-   chomp;
-
-   if (/^GLSL >= (\d)\.(\d\d)/) {
-   $expected = $1 * 100 + $2;
-   }
-   if (/^\s*#\s*version\s+(\d{3})/) {
-   $actual = $1 if $actual == undef;
-   $actual = $1 if $actual < $1;
-   }
-
-   if (/^(GL_\S+)/) {
-   next if ($1 eq "GL_ARB_fragment_program" ||
-$1 eq "GL_ARB_vertex_program");
-   push(@expected_ext, $1);
-   }
-   if (/^\s*#\s*extension\s+(GL_\S+)\s*:\s*require/) {
-   push(@actual_ext, $1);
-   }
-   }
-
-   close($fh);
-
-   if ($actual != undef && $expected != $actual) {
-   print "$shader_test requested $expected, but requires 
$actual\n";
-   $fail = 1;
-   }
-
-   my @extension = array_diff(@expected_ext, @actual_ext);
-   foreach my $extension (@extension) {
-   print "$shader_test extension $extension mismatch\n";
-   $fail = 1;
-   }
-}
-
-exit($fail);
diff --git a/check_dependencies.py b/check_dependencies.py

[Mesa-dev] [PATCH] nouveau: avoid emitting new fences unnecessarily

2015-10-10 Thread Ilia Mirkin
Right now we emit on every kick, but this is only necessary if something
will ever be able to observe that the fence completed. If there are no
refs, leave the fence alone and emit it another day.

This also happens to work around an issue for the kick handler -- a kick
can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be
due to lack of space in the pushbuf. We want the emit to happen in the
current batch, so we want there to always be enough space. However an
explicit kick could take the reserved space for the implicitly-triggered
kick's fence emission if it happened right after. With the new mechanism,
hopefully there's no way to cause two fences to be emitted into the same
reserved space.

Signed-off-by: Ilia Mirkin 
Cc: mesa-sta...@lists.freedesktop.org
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
---
 src/gallium/drivers/nouveau/nouveau_fence.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_fence.c 
b/src/gallium/drivers/nouveau/nouveau_fence.c
index ee4e08d..18b1592 100644
--- a/src/gallium/drivers/nouveau/nouveau_fence.c
+++ b/src/gallium/drivers/nouveau/nouveau_fence.c
@@ -190,8 +190,10 @@ nouveau_fence_wait(struct nouveau_fence *fence)
/* wtf, someone is waiting on a fence in flush_notify handler? */
assert(fence->state != NOUVEAU_FENCE_STATE_EMITTING);
 
-   if (fence->state < NOUVEAU_FENCE_STATE_EMITTED)
+   if (fence->state < NOUVEAU_FENCE_STATE_EMITTED) {
+  PUSH_SPACE(screen->pushbuf, 8);
   nouveau_fence_emit(fence);
+   }
 
if (fence->state < NOUVEAU_FENCE_STATE_FLUSHED)
   if (nouveau_pushbuf_kick(screen->pushbuf, screen->pushbuf->channel))
@@ -224,8 +226,12 @@ nouveau_fence_wait(struct nouveau_fence *fence)
 void
 nouveau_fence_next(struct nouveau_screen *screen)
 {
-   if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING)
-  nouveau_fence_emit(screen->fence.current);
+   if (screen->fence.current->state < NOUVEAU_FENCE_STATE_EMITTING) {
+  if (screen->fence.current->ref > 1)
+ nouveau_fence_emit(screen->fence.current);
+  else
+ return;
+   }
 
nouveau_fence_ref(NULL, >fence.current);
 
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/5] Implementation of vec4 equivalent to fs_cmod_propagation optimization

2015-10-10 Thread Alejandro Piñeiro
This series implements a vec4 equivalent to fs_cmod_propagation optimization.

The last two commits are not really needed for the optimization, are just
nice-to-have (imho) that I added while implementing the optimization.

Alejandro Piñeiro (5):
  i965/vec4: nir_emit_if doesn't need to predicate based on all the
channels
  i965/vec4: adding vec4_cmod_propagation optimization
  i965/vec4: Add unit tests for cmod propagation pass.
  i965/vec4: use a custom envvar to decide to print the assembly on
test_vec4_cmod_propagation
  i965/vec4: print predicate control at brw_vec4 dump_instruction

 src/mesa/drivers/dri/i965/Makefile.am  |   7 +
 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp |  17 +-
 src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
 .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 +
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |   4 +-
 .../dri/i965/test_vec4_cmod_propagation.cpp| 736 +
 7 files changed, 926 insertions(+), 3 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
 create mode 100644 src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp

-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] i965/vec4: Add unit tests for cmod propagation pass.

2015-10-10 Thread Alejandro Piñeiro
This include the same tests coming from test_fs_cmod_propagation, (non
vector glsl types included) plus some new with vec4 types, inspired on
the regressions found while the optimization was a work in progress.

Additionally, the check of number of instructions after the
optimization was changed from EXPECT_EQ to ASSERT_EQ. This was done to
avoid a crash on failing tests that expected no optimization, as after
checking the number of instructions, there were some checks related to
this last instruction opcode/conditional mod.
---
 src/mesa/drivers/dri/i965/Makefile.am  |   7 +
 .../dri/i965/test_vec4_cmod_propagation.cpp| 736 +
 2 files changed, 743 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp

diff --git a/src/mesa/drivers/dri/i965/Makefile.am 
b/src/mesa/drivers/dri/i965/Makefile.am
index 2e24151..63228a5 100644
--- a/src/mesa/drivers/dri/i965/Makefile.am
+++ b/src/mesa/drivers/dri/i965/Makefile.am
@@ -58,6 +58,7 @@ TESTS = \
test_fs_saturate_propagation \
 test_eu_compact \
test_vf_float_conversions \
+   test_vec4_cmod_propagation \
 test_vec4_copy_propagation \
 test_vec4_register_coalesce
 
@@ -93,6 +94,12 @@ test_vec4_copy_propagation_LDADD = \
 $(top_builddir)/src/gtest/libgtest.la \
 $(TEST_LIBS)
 
+test_vec4_cmod_propagation_SOURCES = \
+   test_vec4_cmod_propagation.cpp
+test_vec4_cmod_propagation_LDADD = \
+   $(top_builddir)/src/gtest/libgtest.la \
+   $(TEST_LIBS)
+
 test_eu_compact_SOURCES = \
test_eu_compact.c
 nodist_EXTRA_test_eu_compact_SOURCES = dummy.cpp
diff --git a/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp
new file mode 100644
index 000..d2fba1b
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp
@@ -0,0 +1,736 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Alejandro Piñeiro Iglesias 
+ *
+ * Based on test_fs_cmod_propagation.cpp
+ */
+
+#include 
+#include "brw_vec4.h"
+#include "brw_vec4_builder.h"
+#include "brw_cfg.h"
+#include "program/program.h"
+
+using namespace brw;
+
+class cmod_propagation_test : public ::testing::Test {
+   virtual void SetUp();
+
+public:
+   struct brw_compiler *compiler;
+   struct brw_device_info *devinfo;
+   struct gl_context *ctx;
+   struct gl_shader_program *shader_prog;
+   struct brw_vertex_program *vp;
+   vec4_visitor *v;
+};
+
+class cmod_propagation_vec4_visitor : public vec4_visitor
+{
+public:
+   cmod_propagation_vec4_visitor(struct brw_compiler *compiler,
+ nir_shader *shader)
+  : vec4_visitor(compiler, NULL, NULL, NULL, shader, NULL,
+ false, -1) {}
+
+protected:
+   /* Dummy implementation for pure virtual methods */
+   virtual dst_reg *make_reg_for_system_value(int location,
+  const glsl_type *type)
+   {
+  unreachable("Not reached");
+   }
+
+   virtual void setup_payload()
+   {
+  unreachable("Not reached");
+   }
+
+   virtual void emit_prolog()
+   {
+  unreachable("Not reached");
+   }
+
+   virtual void emit_program_code()
+   {
+  unreachable("Not reached");
+   }
+
+   virtual void emit_thread_end()
+   {
+  unreachable("Not reached");
+   }
+
+   virtual void emit_urb_write_header(int mrf)
+   {
+  unreachable("Not reached");
+   }
+
+   virtual vec4_instruction *emit_urb_write_opcode(bool complete)
+   {
+  unreachable("Not reached");
+   }
+};
+
+
+void cmod_propagation_test::SetUp()
+{
+   ctx = (struct gl_context *)calloc(1, sizeof(*ctx));
+   compiler = (struct brw_compiler *)calloc(1, sizeof(*compiler));
+   devinfo = (struct brw_device_info *)calloc(1, sizeof(*devinfo));

[Mesa-dev] [PATCH 4/5] i965/vec4: use a custom envvar to decide to print the assembly on test_vec4_cmod_propagation

2015-10-10 Thread Alejandro Piñeiro
The complete way to do this would be parse INTEL_DEBUG and
print the output if DEBUG_VS (or a new one) is present
(see intel_debug.c).

But that seems like an overkill for the unit tests, that
after all, the most common use case is being run when
calling make check.
---

Just added the envvar because while working on the optimization
I didn't want to recompile if I wanted to see the instructions.


 src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp
index d2fba1b..e840cb9 100644
--- a/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/test_vec4_cmod_propagation.cpp
@@ -125,7 +125,7 @@ instruction(bblock_t *block, int num)
 static bool
 cmod_propagation(vec4_visitor *v)
 {
-   const bool print = false;
+   const bool print = getenv("TEST_DEBUG");
 
if (print) {
   fprintf(stderr, "= Before =\n");
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization

2015-10-10 Thread Alejandro Piñeiro
vec4 port of fs_cmod_propagation.

Shader-db results:
total instructions in shared programs: 6241226 -> 6224469 (-0.27%)
instructions in affected programs: 498213 -> 481456 (-3.36%)
helped:3082
HURT:  0
---

The final outcome is really similar to fs_brw_cmod_propagation. In
fact the only difference is that on fs we have this:
 if (scan_inst->overwrites_reg(inst->src[0])) {
if (scan_inst->is_partial_write() ||
scan_inst->dst.reg_offset != inst->src[0].reg_offset)
   break;

And on vec4 (this commit) we have this:
 if (inst->src[0].in_range(scan_inst->dst,
   scan_inst->regs_written)) {

if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) ||
scan_inst->dst.reg_offset != inst->src[0].reg_offset ||
(scan_inst->dst.writemask != WRITEMASK_X && 
scan_inst->dst.writemask != WRITEMASK_XYZW))
   break;

if (scan_inst->dst.writemask == WRITEMASK_XYZW &&
inst->src[0].swizzle != BRW_SWIZZLE_XYZW) {
   break;
}

So at some point I thought about refactoring it and having one common,
like with opt_predicated_break, but that one was possible with just
backend_instructions, while here we would need to deal with
vec4_instructions and fs_inst, that could be somewhat messy, so
I'm leaving this as it is.

 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp |   1 +
 src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
 .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 +
 4 files changed, 166 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 81ef628..c1836d6 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -56,6 +56,7 @@ i965_compiler_FILES = \
brw_util.c \
brw_util.h \
brw_vec4_builder.h \
+   brw_vec4_cmod_propagation.cpp \
brw_vec4_copy_propagation.cpp \
brw_vec4.cpp \
brw_vec4_cse.cpp \
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e966b96..55e381b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1867,6 +1867,7 @@ vec4_visitor::run()
   OPT(dead_code_eliminate);
   OPT(dead_control_flow_eliminate, this);
   OPT(opt_copy_propagation);
+  OPT(opt_cmod_propagation);
   OPT(opt_cse);
   OPT(opt_algebraic);
   OPT(opt_register_coalesce);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 5e3500c..3c1711d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -149,6 +149,7 @@ public:
int var_range_start(unsigned v, unsigned n) const;
int var_range_end(unsigned v, unsigned n) const;
bool virtual_grf_interferes(int a, int b);
+   bool opt_cmod_propagation();
bool opt_copy_propagation(bool do_constant_prop = true);
bool opt_cse_local(bblock_t *block);
bool opt_cse();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
new file mode 100644
index 000..7e39d2b
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
@@ -0,0 +1,163 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Alejandro Piñeiro Iglesias 
+ *
+ * Based on brw_fs_cmod_propagation.cpp
+ */
+
+/** @file brw_vec4_cmod_propagation.cpp
+ *
+ * Really similar to 

[Mesa-dev] [PATCH 5/5] i965/vec4: print predicate control at brw_vec4 dump_instruction

2015-10-10 Thread Alejandro Piñeiro
---

I found this useful while I was using INTEL_DEBUG=optimizer after
changing how the ifs are emitted. And after all, that info is
also included by brw_disasm.c

I assumed that at the vec4_visitor we would not need to handle
pred_ctrl_align1, but Im not totally sure.


 src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 55e381b..eb81523 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1358,9 +1358,21 @@ vec4_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
vec4_instruction *inst = (vec4_instruction *)be_inst;
 
if (inst->predicate) {
-  fprintf(file, "(%cf0.%d) ",
+  static const char *const pred_ctrl_align16[16] = {
+ "",
+ "",
+ ".x",
+ ".y",
+ ".z",
+ ".w",
+ ".any4h",
+ ".all4h",
+  };
+
+  fprintf(file, "(%cf0.%d%s) ",
   inst->predicate_inverse ? '-' : '+',
-  inst->flag_subreg);
+  inst->flag_subreg,
+  pred_ctrl_align16[inst->predicate]);
}
 
fprintf(file, "%s", brw_instruction_name(inst->opcode));
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] i965/vec4: nir_emit_if doesn't need to predicate based on all the channels

2015-10-10 Thread Alejandro Piñeiro
---

I already talked about this with Jason Ekstrand and Matt Turner
privately, but just in case somebody else jump to the review:

When using BRW_PREDICATE_NORMAL, the if will use all the channels of
the register flag.  But nir_if only reads from one channel, so that
is not needed. Another hint showing that this is safe: the MOV that
put the condition on f0 is calling get_nir_src with just one
component. That will return always a source with swizzle
BRW_SWIZZLE_, so that component is the only to be used.

This commit is not needed/solving anything per-se, but it is needed in
order to be able to implement vec4_cmod_propagation with a good
overall outcome.

 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 41bd80d..e05745f 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -193,7 +193,9 @@ vec4_visitor::nir_emit_if(nir_if *if_stmt)
vec4_instruction *inst = emit(MOV(dst_null_d(), condition));
inst->conditional_mod = BRW_CONDITIONAL_NZ;
 
-   emit(IF(BRW_PREDICATE_NORMAL));
+   /* We can just predicate based on the X channel, as the condition only
+* reads from one channel */
+   emit(IF(BRW_PREDICATE_ALIGN16_REPLICATE_X));
 
nir_emit_cf_list(_stmt->then_list);
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 91643] mesa-demos-8.2.0 (latest released version) fails to build against mesa-10.6.4-2.mga6.tainted.src.rpm

2015-10-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=91643

Dennis Schridde  changed:

   What|Removed |Added

 CC||devuran...@gmx.net

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] New stable-branch 11.0 candidate pushed

2015-10-10 Thread Marek Olšák
On Thu, Oct 8, 2015 at 11:50 AM, Emil Velikov  wrote:
> Hello list,
>
> The candidate for the Mesa 11.0.3 is now available. Currently we have:
>  - 46 queued
>  - 18 nominated (outstanding)
>  - and 7 rejected/obsolete patches
>
> This time around we have a bunch of EGL patches, mangledGL build fixes
> and a healthy amount of driver bugfixes - radeonsi, nouveau, i915 and i965.

Hi,

I've tested this branch on radeonsi. There are no regressions.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev