Re: [Mesa-dev] [PATCH 1/2] util: move brw_env_var_as_boolean() to util

2015-11-19 Thread Nicolai Hähnle

Hi Rob,

On 18.11.2015 23:20, Rob Clark wrote:

From: Rob Clark 

Kind of a handy function.  And I'll what it available outside of i965
for common nir-pass helpers.

Signed-off-by: Rob Clark 
---
  src/mesa/drivers/dri/i965/brw_context.c  |  5 +++--
  src/mesa/drivers/dri/i965/brw_nir.c  |  4 +++-
  src/mesa/drivers/dri/i965/brw_shader.cpp |  3 ++-
  src/mesa/drivers/dri/i965/intel_debug.c  | 25 -
  src/mesa/drivers/dri/i965/intel_debug.h  |  2 --
  src/util/debug.c | 25 +
  src/util/debug.h |  2 ++
  7 files changed, 35 insertions(+), 31 deletions(-)

[.. snip ...]

diff --git a/src/util/debug.c b/src/util/debug.c
index 3729ce8..98b1853 100644
--- a/src/util/debug.c
+++ b/src/util/debug.c
@@ -51,3 +51,28 @@ parse_debug_string(const char *debug,

 return flag;
  }
+
+/**
+ * Reads an environment variable and interprets its value as a boolean.
+ *
+ * Recognizes 0/false/no and 1/true/yes.  Other values result in the default 
value.
+ */
+bool
+env_var_as_boolean(const char *var_name, bool default_value)
+{
+   const char *str = getenv(var_name);
+   if (str == NULL)
+  return default_value;
+
+   if (strcmp(str, "1") == 0 ||
+   strcasecmp(str, "true") == 0 ||
+   strcasecmp(str, "yes") == 0) {
+  return true;
+   } else if (strcmp(str, "0") == 0 ||
+  strcasecmp(str, "false") == 0 ||
+  strcasecmp(str, "no") == 0) {
+  return false;
+   } else {
+  return default_value;
+   }
+}


This all looks good to me. I do have two suggestions to slightly improve 
usability:


1) Add "on" and "off" as recognized values.

2) Add something to the effect of `fprintf(stderr, "%s: value not 
recognized, using default.\n", var_name);` to the default value branch.


Either way, feel free to add my R-b.

Cheers,
Nicolai


diff --git a/src/util/debug.h b/src/util/debug.h
index 801736a..3555417 100644
--- a/src/util/debug.h
+++ b/src/util/debug.h
@@ -38,6 +38,8 @@ struct debug_control {
  uint64_t
  parse_debug_string(const char *debug,
 const struct debug_control *control);
+bool
+env_var_as_boolean(const char *var_name, bool default_value);

  #ifdef __cplusplus
  } /* extern C */



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] glsl: don't eliminate unused inputs to SSO

2015-11-19 Thread Timothy Arceri
From: Timothy Arceri 

We do however allow unused builtins to be removed.

Cc: Ian Romanick 
Cc: Gregory Hainaut 
https://bugs.freedesktop.org/show_bug.cgi?id=79783
---
 src/glsl/glsl_parser_extras.cpp|  8 +---
 src/glsl/ir_optimization.h |  5 +++--
 src/glsl/linker.cpp| 11 ++-
 src/glsl/lower_named_interface_blocks.cpp  |  1 +
 src/glsl/opt_dead_code.cpp | 10 --
 src/glsl/test_optpass.cpp  |  5 +++--
 src/mesa/drivers/dri/i965/brw_link.cpp |  3 ++-
 src/mesa/main/ff_fragment_shader.cpp   |  3 ++-
 src/mesa/program/ir_to_mesa.cpp|  3 ++-
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp |  3 ++-
 10 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index 0dbef67..6348ac0 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -1798,7 +1798,7 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct 
gl_shader *shader,
* and reduce later work if the same shader is linked multiple times
*/
   while (do_common_optimization(shader->ir, false, false, options,
-ctx->Const.NativeIntegers))
+ctx->Const.NativeIntegers, false))
  ;
 
   validate_ir_tree(shader->ir);
@@ -1896,7 +1896,8 @@ bool
 do_common_optimization(exec_list *ir, bool linked,
   bool uniform_locations_assigned,
const struct gl_shader_compiler_options *options,
-   bool native_integers)
+   bool native_integers,
+   bool separate_shader)
 {
GLboolean progress = GL_FALSE;
 
@@ -1921,7 +1922,8 @@ do_common_optimization(exec_list *ir, bool linked,
}
 
if (linked)
-  progress = do_dead_code(ir, uniform_locations_assigned) || progress;
+  progress = do_dead_code(ir, uniform_locations_assigned, separate_shader)
+|| progress;
else
   progress = do_dead_code_unlinked(ir) || progress;
progress = do_dead_code_local(ir) || progress;
diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
index 2fee81c..c2628f1 100644
--- a/src/glsl/ir_optimization.h
+++ b/src/glsl/ir_optimization.h
@@ -75,7 +75,7 @@ enum lower_packing_builtins_op {
 bool do_common_optimization(exec_list *ir, bool linked,
bool uniform_locations_assigned,
 const struct gl_shader_compiler_options *options,
-bool native_integers);
+bool native_integers, bool separate_shader);
 
 bool do_rebalance_tree(exec_list *instructions);
 bool do_algebraic(exec_list *instructions, bool native_integers,
@@ -91,7 +91,8 @@ void do_dead_builtin_varyings(struct gl_context *ctx,
   gl_shader *producer, gl_shader *consumer,
   unsigned num_tfeedback_decls,
   class tfeedback_decl *tfeedback_decls);
-bool do_dead_code(exec_list *instructions, bool uniform_locations_assigned);
+bool do_dead_code(exec_list *instructions, bool uniform_locations_assigned,
+  bool separate_shader);
 bool do_dead_code_local(exec_list *instructions);
 bool do_dead_code_unlinked(exec_list *instructions);
 bool do_dead_functions(exec_list *instructions);
diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 8e7d92e..49f181f 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -4224,7 +4224,8 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
 
   while (do_common_optimization(prog->_LinkedShaders[i]->ir, true, false,
 >Const.ShaderCompilerOptions[i],
-ctx->Const.NativeIntegers))
+ctx->Const.NativeIntegers,
+prog->SeparateShader))
 ;
 
   lower_const_arrays_to_uniforms(prog->_LinkedShaders[i]->ir);
@@ -4338,7 +4339,7 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
 
   /* Eliminate code that is now dead due to unused outputs being demoted.
*/
-  while (do_dead_code(sh->ir, false))
+  while (do_dead_code(sh->ir, false, prog->SeparateShader))
  ;
}
else if (first == MESA_SHADER_FRAGMENT) {
@@ -4359,7 +4360,7 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
   } else
  demote_shader_inputs_and_outputs(sh, ir_var_shader_in);
 
-  while (do_dead_code(sh->ir, false))
+  while (do_dead_code(sh->ir, false, prog->SeparateShader))
  ;
}
 
@@ -4387,9 +4388,9 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
  /* Eliminate code that is now dead 

[Mesa-dev] [PATCH 1/3] glsl: don't eliminate unused outputs in SSO

2015-11-19 Thread Timothy Arceri
From: Timothy Arceri 

Cc: Ian Romanick 
Cc: Gregory Hainaut 
---
 src/glsl/linker.cpp | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
index 331d9a2..8e7d92e 100644
--- a/src/glsl/linker.cpp
+++ b/src/glsl/linker.cpp
@@ -4380,15 +4380,18 @@ link_shaders(struct gl_context *ctx, struct 
gl_shader_program *prog)
 next == MESA_SHADER_FRAGMENT ? num_tfeedback_decls : 0,
 tfeedback_decls);
 
-  demote_shader_inputs_and_outputs(sh_i, ir_var_shader_out);
-  demote_shader_inputs_and_outputs(sh_next, ir_var_shader_in);
+  if (!prog->SeparateShader) {
+ demote_shader_inputs_and_outputs(sh_i, ir_var_shader_out);
+ demote_shader_inputs_and_outputs(sh_next, ir_var_shader_in);
 
-  /* Eliminate code that is now dead due to unused outputs being demoted.
-   */
-  while (do_dead_code(sh_i->ir, false))
- ;
-  while (do_dead_code(sh_next->ir, false))
- ;
+ /* Eliminate code that is now dead due to unused outputs being
+  * demoted.
+  */
+ while (do_dead_code(sh_i->ir, false))
+;
+ while (do_dead_code(sh_next->ir, false))
+;
+  }
 
   /* This must be done after all dead varyings are eliminated. */
   if (!check_against_output_limit(ctx, prog, sh_i))
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] glsl: assign location for all outputs of SSO

2015-11-19 Thread Timothy Arceri
From: Timothy Arceri 

We can have multistage separate programs so assign locations
for outputs even when a consumer is available for SSO.

Cc: Ian Romanick 
Cc: Gregory Hainaut 
---
 src/glsl/link_varyings.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
index 7e77a67..cc8321d 100644
--- a/src/glsl/link_varyings.cpp
+++ b/src/glsl/link_varyings.cpp
@@ -1501,7 +1501,7 @@ assign_varying_locations(struct gl_context *ctx,
   * Always add TCS outputs. They are shared by all invocations
   * within a patch and can be used as shared memory.
   */
- if (input_var || (prog->SeparateShader && consumer == NULL) ||
+ if (input_var || (prog->SeparateShader) ||
  producer->Type == GL_TESS_CONTROL_SHADER) {
 matches.record(output_var, input_var);
  }
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] nv50: allow using inline vertex data submit when gl_VertexID is used

2015-11-19 Thread Samuel Pitoiset
The hardware can actually generates vertexid when vertices come from
a client-side buffer like when glDrawElements is used.

This doesn't fix (or break) any piglit tests but it improves the
previous attempt of Ilia (c830d19 "nv50: avoid using inline vertex
data submit when gl_VertexID is used")

The only disadvantage is that only works on G84+, but we don't really
care of that weird and old NV50 chipset.

Changes from v2:
 - use NV84_3D previously introduced
 - reset gl_VertexID after the draw is done

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nv50/nv50_program.c|  3 +-
 src/gallium/drivers/nouveau/nv50/nv50_program.h|  2 +-
 src/gallium/drivers/nouveau/nv50/nv50_push.c   | 42 +-
 .../drivers/nouveau/nv50/nv50_state_validate.c |  3 +-
 src/gallium/drivers/nouveau/nv50/nv50_vbo.c| 11 +-
 5 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.c 
b/src/gallium/drivers/nouveau/nv50/nv50_program.c
index 48057d2..a4b8ddf 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.c
@@ -66,7 +66,6 @@ nv50_vertprog_assign_slots(struct nv50_ir_prog_info *info)
   case TGSI_SEMANTIC_VERTEXID:
  prog->vp.attrs[2] |= NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID;
  prog->vp.attrs[2] |= 
NV50_3D_VP_GP_BUILTIN_ATTR_EN_VERTEX_ID_DRAW_ARRAYS_ADD_START;
- prog->vp.vertexid = 1;
  continue;
   default:
  break;
@@ -383,6 +382,8 @@ nv50_program_translate(struct nv50_program *prog, uint16_t 
chipset,
prog->max_gpr = MAX2(4, (info->bin.maxGPR >> 1) + 1);
prog->tls_space = info->bin.tlsSpace;
 
+   prog->vp.need_vertex_id = info->io.vertexId < PIPE_MAX_SHADER_INPUTS;
+
if (prog->type == PIPE_SHADER_FRAGMENT) {
   if (info->prop.fp.writesDepth) {
  prog->fp.flags[0] |= NV50_3D_FP_CONTROL_EXPORTS_Z;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_program.h 
b/src/gallium/drivers/nouveau/nv50/nv50_program.h
index f001670..1de5122 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_program.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_program.h
@@ -76,9 +76,9 @@ struct nv50_program {
   ubyte psiz;/* output slot of point size */
   ubyte bfc[2];  /* indices into varying for FFC (FP) or BFC (VP) */
   ubyte edgeflag;
-  ubyte vertexid;
   ubyte clpd[2]; /* output slot of clip distance[i]'s 1st component */
   ubyte clpd_nr;
+  bool need_vertex_id;
} vp;
 
struct {
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_push.c 
b/src/gallium/drivers/nouveau/nv50/nv50_push.c
index f31eaa0..cbef95d 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_push.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_push.c
@@ -24,6 +24,10 @@ struct push_context {
struct translate *translate;
 
bool primitive_restart;
+
+   bool need_vertex_id;
+   int32_t index_bias;
+
uint32_t prim;
uint32_t restart_index;
uint32_t instance_id;
@@ -74,6 +78,11 @@ emit_vertices_i08(struct push_context *ctx, unsigned start, 
unsigned count)
 
   size = ctx->vertex_words * nr;
 
+  if (unlikely(ctx->need_vertex_id)) {
+ BEGIN_NV04(ctx->push, NV84_3D(VERTEX_ID_BASE), 1);
+ PUSH_DATA (ctx->push, *elts + ctx->index_bias);
+  }
+
   BEGIN_NI04(ctx->push, NV50_3D(VERTEX_DATA), size);
 
   ctx->translate->run_elts8(ctx->translate, elts, nr, 0, ctx->instance_id,
@@ -107,6 +116,11 @@ emit_vertices_i16(struct push_context *ctx, unsigned 
start, unsigned count)
 
   size = ctx->vertex_words * nr;
 
+  if (unlikely(ctx->need_vertex_id)) {
+ BEGIN_NV04(ctx->push, NV84_3D(VERTEX_ID_BASE), 1);
+ PUSH_DATA (ctx->push, *elts + ctx->index_bias);
+  }
+
   BEGIN_NI04(ctx->push, NV50_3D(VERTEX_DATA), size);
 
   ctx->translate->run_elts16(ctx->translate, elts, nr, 0, ctx->instance_id,
@@ -140,6 +154,11 @@ emit_vertices_i32(struct push_context *ctx, unsigned 
start, unsigned count)
 
   size = ctx->vertex_words * nr;
 
+  if (unlikely(ctx->need_vertex_id)) {
+ BEGIN_NV04(ctx->push, NV84_3D(VERTEX_ID_BASE), 1);
+ PUSH_DATA (ctx->push, *elts + ctx->index_bias);
+  }
+
   BEGIN_NI04(ctx->push, NV50_3D(VERTEX_DATA), size);
 
   ctx->translate->run_elts(ctx->translate, elts, nr, 0, ctx->instance_id,
@@ -161,10 +180,18 @@ emit_vertices_i32(struct push_context *ctx, unsigned 
start, unsigned count)
 static void
 emit_vertices_seq(struct push_context *ctx, unsigned start, unsigned count)
 {
+   uint32_t elts = 0;
+
while (count) {
   unsigned push = MIN2(count, ctx->packet_vertex_limit);
   unsigned size = ctx->vertex_words * push;
 
+  if (unlikely(ctx->need_vertex_id)) {
+ /* For non-indexed draws, gl_VertexID goes up after each vertex. */
+ BEGIN_NV04(ctx->push, NV84_3D(VERTEX_ID_BASE), 1);
+ PUSH_DATA (ctx->push, 

[Mesa-dev] [PATCH 8/9] nir/builder: only read meaningful channels in nir_swizzle()

2015-11-19 Thread Iago Toral Quiroga
From: Connor Abbott 

This way the caller doesn't have to initialize all 4 channels when they
aren't using them.

v2: Fix signed/unsigned comparison warning (Iago)

Reviewed-by: Iago Toral Quiroga 
---
 src/glsl/nir/nir_builder.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_builder.h b/src/glsl/nir/nir_builder.h
index 624329d..1939d82 100644
--- a/src/glsl/nir/nir_builder.h
+++ b/src/glsl/nir/nir_builder.h
@@ -242,7 +242,7 @@ nir_swizzle(nir_builder *build, nir_ssa_def *src, unsigned 
swiz[4],
 {
nir_alu_src alu_src = { NIR_SRC_INIT };
alu_src.src = nir_src_for_ssa(src);
-   for (int i = 0; i < 4; i++)
+   for (unsigned i = 0; i < num_components; i++)
   alu_src.swizzle[i] = swiz[i];
 
return use_fmov ? nir_fmov_alu(build, alu_src, num_components) :
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] nir: s/nir_type_unsigned/nir_type_uint

2015-11-19 Thread Iago Toral Quiroga
From: Jason Ekstrand 

v2: do the same in tgsi_to_nir (Samuel)

v3: added missing cases after rebase (Iago)

Reviewed-by: Iago Toral Quiroga 
---
 src/gallium/auxiliary/nir/tgsi_to_nir.c|  2 +-
 src/glsl/nir/glsl_to_nir.cpp   |  2 +-
 src/glsl/nir/nir.h |  2 +-
 src/glsl/nir/nir_constant_expressions.py   |  2 +-
 src/glsl/nir/nir_opcodes.py| 78 +++---
 src/glsl/nir/nir_search.c  |  4 +-
 src/mesa/drivers/dri/i965/brw_nir.c|  4 +-
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |  2 +-
 8 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c 
b/src/gallium/auxiliary/nir/tgsi_to_nir.c
index 0539cfc..8918240 100644
--- a/src/gallium/auxiliary/nir/tgsi_to_nir.c
+++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c
@@ -295,7 +295,7 @@ ttn_emit_declaration(struct ttn_compile *c)
  type = nir_type_int;
  break;
   case TGSI_RETURN_TYPE_UINT:
- type = nir_type_unsigned;
+ type = nir_type_uint;
  break;
   case TGSI_RETURN_TYPE_FLOAT:
   default:
diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
index e149d73..3455b49 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -1826,7 +1826,7 @@ nir_visitor::visit(ir_texture *ir)
   instr->dest_type = nir_type_int;
   break;
case GLSL_TYPE_UINT:
-  instr->dest_type = nir_type_unsigned;
+  instr->dest_type = nir_type_uint;
   break;
default:
   unreachable("not reached");
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index e9d722e..7304517 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -633,7 +633,7 @@ typedef enum {
nir_type_invalid = 0, /* Not a valid type */
nir_type_float,
nir_type_int,
-   nir_type_unsigned,
+   nir_type_uint,
nir_type_bool
 } nir_alu_type;
 
diff --git a/src/glsl/nir/nir_constant_expressions.py 
b/src/glsl/nir/nir_constant_expressions.py
index 2ba8554..b16ef50 100644
--- a/src/glsl/nir/nir_constant_expressions.py
+++ b/src/glsl/nir/nir_constant_expressions.py
@@ -213,7 +213,7 @@ unpack_half_1x16(uint16_t u)
 }
 
 /* Some typed vector structures to make things like src0.y work */
-% for type in ["float", "int", "unsigned", "bool"]:
+% for type in ["float", "int", "uint", "bool"]:
 struct ${type}_vec {
${type} x;
${type} y;
diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py
index 729f695..8db24eb 100644
--- a/src/glsl/nir/nir_opcodes.py
+++ b/src/glsl/nir/nir_opcodes.py
@@ -91,7 +91,7 @@ class Opcode(object):
 tfloat = "float"
 tint = "int"
 tbool = "bool"
-tunsigned = "unsigned"
+tuint = "uint"
 
 commutative = "commutative "
 associative = "associative "
@@ -156,7 +156,7 @@ unop("fsqrt", tfloat, "sqrtf(src0)")
 unop("fexp2", tfloat, "exp2f(src0)")
 unop("flog2", tfloat, "log2f(src0)")
 unop_convert("f2i", tfloat, tint, "src0") # Float-to-integer conversion.
-unop_convert("f2u", tfloat, tunsigned, "src0") # Float-to-unsigned conversion
+unop_convert("f2u", tfloat, tuint, "src0") # Float-to-unsigned conversion
 unop_convert("i2f", tint, tfloat, "src0") # Integer-to-float conversion.
 # Float-to-boolean conversion
 unop_convert("f2b", tfloat, tbool, "src0 != 0.0f")
@@ -165,7 +165,7 @@ unop_convert("b2f", tbool, tfloat, "src0 ? 1.0f : 0.0f")
 # Int-to-boolean conversion
 unop_convert("i2b", tint, tbool, "src0 != 0")
 unop_convert("b2i", tbool, tint, "src0 ? 1 : 0") # Boolean-to-int conversion
-unop_convert("u2f", tunsigned, tfloat, "src0") #Unsigned-to-float conversion.
+unop_convert("u2f", tuint, tfloat, "src0") #Unsigned-to-float conversion.
 
 unop_reduce("bany", 1, tbool, tbool, "{src}", "{src0} || {src1}", "{src}")
 unop_reduce("ball", 1, tbool, tbool, "{src}", "{src0} && {src1}", "{src}")
@@ -205,13 +205,13 @@ unop("fddy_coarse", tfloat, "0.0f")
 # Floating point pack and unpack operations.
 
 def pack_2x16(fmt):
-   unop_horiz("pack_" + fmt + "_2x16", 1, tunsigned, 2, tfloat, """
+   unop_horiz("pack_" + fmt + "_2x16", 1, tuint, 2, tfloat, """
 dst.x = (uint32_t) pack_fmt_1x16(src0.x);
 dst.x |= ((uint32_t) pack_fmt_1x16(src0.y)) << 16;
 """.replace("fmt", fmt))
 
 def pack_4x8(fmt):
-   unop_horiz("pack_" + fmt + "_4x8", 1, tunsigned, 4, tfloat, """
+   unop_horiz("pack_" + fmt + "_4x8", 1, tuint, 4, tfloat, """
 dst.x = (uint32_t) pack_fmt_1x8(src0.x);
 dst.x |= ((uint32_t) pack_fmt_1x8(src0.y)) << 8;
 dst.x |= ((uint32_t) pack_fmt_1x8(src0.z)) << 16;
@@ -219,13 +219,13 @@ dst.x |= ((uint32_t) pack_fmt_1x8(src0.w)) << 24;
 """.replace("fmt", fmt))
 
 def unpack_2x16(fmt):
-   unop_horiz("unpack_" + fmt + "_2x16", 2, tfloat, 1, tunsigned, """
+   unop_horiz("unpack_" + fmt + "_2x16", 2, tfloat, 1, tuint, """
 dst.x = unpack_fmt_1x16((uint16_t)(src0.x & 0x));
 dst.y = unpack_fmt_1x16((uint16_t)(src0.x << 16));
 """.replace("fmt", fmt))
 
 def unpack_4x8(fmt):
-   

[Mesa-dev] [PATCH 6/9] i965/fs: add stride restrictions for copy propagation

2015-11-19 Thread Iago Toral Quiroga
From: Connor Abbott 

There are various restrictions on what the hstride can be that depend on
the Gen, and now that we're using hstride == 2 for packing/unpacking
doubles, we're going to run into these restrictions a lot more often.
Pull them out into a separate function, and move the one restriction we
checked previously into it.
---
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 58 +-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 426ea57..3ae5ead 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -275,6 +275,61 @@ is_logic_op(enum opcode opcode)
opcode == BRW_OPCODE_NOT);
 }
 
+static bool
+can_take_stride(fs_inst *inst, unsigned arg, unsigned stride,
+const brw_device_info *devinfo)
+{
+   if (stride > 4)
+  return false;
+
+   /* 3-source instructions can only be Align16, which restricts what strides
+* they can take. They can only take a stride of 1 (the usual case), or 0
+* with a special "repctrl" bit. But the repctrl bit doesn't work for
+* 64-bit datatypes, so if the source type is 64-bit then only a stride of
+* 1 is allowed. From the Broadwell PRM, Volume 7 "3D Media GPGPU", page
+* 944:
+*
+*This is applicable to 32b datatypes and 16b datatype. 64b datatypes
+*cannot use the replicate control.
+*/
+
+   if (inst->is_3src()) {
+  if (type_sz(inst->src[arg].type) > 4)
+ return stride == 1;
+  else
+ return stride == 1 || stride == 0;
+   }
+
+   /* From the Broadwell PRM, Volume 2a "Command Reference - Instructions",
+* page 391 ("Extended Math Function"):
+*
+* The following restrictions apply for align1 mode: Scalar source is
+* supported. Source and destination horizontal stride must be the
+* same.
+*
+* From the Haswell PRM Volume 2b "Command Reference - Instructions", page
+* 134 ("Extended Math Function"):
+*
+*Scalar source is supported. Source and destination horizontal stride
+*must be 1.
+*
+* and similar language exists for IVB and SNB. Pre-SNB, math instructions
+* are sends, so the sources are moved to MRF's and there are no
+* restrictions.
+*/
+
+   if (inst->is_math()) {
+  if (devinfo->gen == 6 || devinfo->gen == 7) {
+ assert(inst->dst.stride == 1);
+ return stride == 1 || stride == 0;
+  } else if (devinfo->gen >= 8) {
+ return stride == inst->dst.stride || stride == 0;
+  }
+   }
+
+   return true;
+}
+
 bool
 fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
 {
@@ -326,7 +381,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
/* Bail if the result of composing both strides would exceed the
 * hardware limit.
 */
-   if (entry->src.stride * inst->src[arg].stride > 4)
+   if (!can_take_stride(inst, arg, entry->src.stride * inst->src[arg].stride,
+devinfo))
   return false;
 
/* Bail if the instruction type is larger than the execution type of the
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Iago Toral Quiroga
From: Connor Abbott 

It appears that not only math instructions, but also MOV_BYTES or
any instruction that uses Align1 mode cannot be in the middle
of a dependency control sequence or the GPU will hang (at least on my
BDW). This fixes GPU hangs in some fp64 tests.

Reviewed-by: Iago Toral Quiroga 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 3bcd5cb..bc0a33b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction 
*inst)
}
 
/*
+* Instructions that use Align1 mode cause the GPU to hang when inserted
+* between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically.
+*/
+
+   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
+   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
+   inst->is_math())
+  return true;
+
+
+   /*
 * mlen:
 * In the presence of send messages, totally interrupt dependency
 * control. They're long enough that the chance of dependency
@@ -851,12 +862,8 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction 
*inst)
 * enable of the last instruction, the optimization must be avoided. This is
 * to avoid instructions being shot down the pipeline when no writes are
 * required.
-*
-* math:
-* Dependency control does not work well over math instructions.
-* NB: Discovered empirically
 */
-   return (inst->mlen || inst->predicate || inst->is_math());
+   return (inst->mlen || inst->predicate);
 }
 
 /**
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] Early fp64 fixes

2015-11-19 Thread Iago Toral Quiroga
These patches are fixes extracted from Connor's fp64 branch that
I would like to review and land ahead of the rest. They are
independent of the rest of the series, some of them are even
general fixes unrelated to fp64 that could fix issues in master,
others are fairly trivial and we can easily land them now.

It will help reduce the size of a rather big series (115 patches
and counting) that is still in development and ease future
rebases a little, specially since there is a patch that renames
one of the nir types, so we probably want to land that soon.

Apparently, Connor sent most of these for review back in August but
they did not get reviewed back then. I went ahead and added my Rb to
all but one:

i965/fs: add stride restrictions for copy propagation

I could not verify all the conditions (particularly, the cases where we allow
0 strides), so it would be nice if someone else could review that or point me
to the place in the docs where it says that 0 strides are safe for these
situations. Comments/Reviews to the other patches are also welcome :)

No piglit regressions on IVB.

Connor Abbott (8):
  i965/fs: print non-1 strides when dumping instructions
  i965/fs: print writemask_all when it's enabled
  i965: fix 64-bit immediates in brw_inst(_set)_bits
  i965/fs: respect force_sechalf/force_writemask_all in CSE
  i965/fs: don't propagate cmod when the exec sizes differ
  i965/fs: add stride restrictions for copy propagation
  i965/vec4: avoid dependency control around Align1 instructions
  nir/builder: only read meaningful channels in nir_swizzle()

Jason Ekstrand (1):
  nir: s/nir_type_unsigned/nir_type_uint

 src/gallium/auxiliary/nir/tgsi_to_nir.c|  2 +-
 src/glsl/nir/glsl_to_nir.cpp   |  2 +-
 src/glsl/nir/nir.h |  2 +-
 src/glsl/nir/nir_builder.h |  2 +-
 src/glsl/nir/nir_constant_expressions.py   |  2 +-
 src/glsl/nir/nir_opcodes.py| 78 +++---
 src/glsl/nir/nir_search.c  |  4 +-
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 15 +
 .../drivers/dri/i965/brw_fs_cmod_propagation.cpp   |  3 +
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 58 +++-
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |  2 +
 src/mesa/drivers/dri/i965/brw_inst.h   |  6 +-
 src/mesa/drivers/dri/i965/brw_nir.c|  4 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 +++--
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |  2 +-
 15 files changed, 142 insertions(+), 57 deletions(-)

-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] i965/fs: print non-1 strides when dumping instructions

2015-11-19 Thread Iago Toral Quiroga
From: Connor Abbott 

v2:
  - Simplify code (Iago)

Reviewed-by: Iago Toral Quiroga 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 72a2158..2d4ed6a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4669,6 +4669,8 @@ fs_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
case IMM:
   unreachable("not reached");
}
+   if (inst->dst.stride != 1)
+  fprintf(file, "<%u>", inst->dst.stride);
fprintf(file, ":%s, ", brw_reg_type_letters(inst->dst.type));
 
for (int i = 0; i < inst->sources; i++) {
@@ -4756,6 +4758,16 @@ fs_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
  fprintf(file, "|");
 
   if (inst->src[i].file != IMM) {
+ unsigned stride;
+ if (inst->src[i].file == ARF || inst->src[i].file == FIXED_GRF) {
+unsigned hstride = inst->src[i].hstride;
+stride = (hstride == 0 ? 0 : (1 << (hstride - 1)));
+ } else {
+stride = inst->src[i].stride;
+ }
+ if (stride != 1)
+fprintf(file, "<%u>", stride);
+
  fprintf(file, ":%s", brw_reg_type_letters(inst->src[i].type));
   }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] i965/fs: respect force_sechalf/force_writemask_all in CSE

2015-11-19 Thread Iago Toral Quiroga
From: Connor Abbott 

Reviewed-by: Iago Toral Quiroga 
---
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index 3c40fcd..3b65a38 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -210,6 +210,8 @@ create_copy_instr(const fs_builder , fs_inst *inst, 
fs_reg src, bool negate)
   copy = bld.LOAD_PAYLOAD(inst->dst, payload, sources, header_size);
} else {
   copy = bld.MOV(inst->dst, src);
+  copy->force_sechalf = inst->force_sechalf;
+  copy->force_writemask_all = inst->force_writemask_all;
   copy->src[0].negate = negate;
}
assert(copy->regs_written == written);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] i965/fs: don't propagate cmod when the exec sizes differ

2015-11-19 Thread Iago Toral Quiroga
From: Connor Abbott 

This can happen when the source of the compare was split by the SIMD
lowering pass. Potentially, we could allow the case where the exec size
of scan_inst is larger, and scan_inst has the right quarter selected,
but doing that seems a little more risky.

Reviewed-by: Iago Toral Quiroga 
---
 src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
index 8fdc959..93461f7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
@@ -93,6 +93,9 @@ opt_cmod_propagation_local(bblock_t *block)
 scan_inst->dst.reg_offset != inst->src[0].reg_offset)
break;
 
+if (scan_inst->exec_size != inst->exec_size)
+   break;
+
 /* CMP's result is the same regardless of dest type. */
 if (inst->conditional_mod == BRW_CONDITIONAL_NZ &&
 scan_inst->opcode == BRW_OPCODE_CMP &&
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] i965/fs: print writemask_all when it's enabled

2015-11-19 Thread Iago Toral Quiroga
From: Connor Abbott 

Reviewed-by: Iago Toral Quiroga 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 2d4ed6a..63dcba7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4787,6 +4787,9 @@ fs_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
  fprintf(file, "1sthalf ");
}
 
+   if (inst->force_writemask_all)
+  fprintf(file, "WE_all ");
+
fprintf(file, "\n");
 }
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] nv50: add NV84_3D macro

2015-11-19 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 +-
 src/gallium/drivers/nouveau/nv50/nv50_vbo.c| 4 ++--
 src/gallium/drivers/nouveau/nv50/nv50_winsys.h | 1 +
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c 
b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
index 4e7201d..cc7984d 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c
@@ -686,7 +686,7 @@ nv50_screen_init_hwctx(struct nv50_screen *screen)
BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
PUSH_DATA (push, 0);
if (screen->base.class_3d >= NV84_3D_CLASS) {
-  BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
+  BEGIN_NV04(push, NV84_3D(VERTEX_ID_BASE), 1);
   PUSH_DATA (push, 0);
}
 
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c 
b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
index 9aa593f..ac0c4d9 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c
@@ -487,7 +487,7 @@ nv50_draw_arrays(struct nv50_context *nv50,
   BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
   PUSH_DATA (push, 0);
   if (nv50->screen->base.class_3d >= NV84_3D_CLASS) {
- BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
+ BEGIN_NV04(push, NV84_3D(VERTEX_ID_BASE), 1);
  PUSH_DATA (push, 0);
   }
   nv50->state.index_bias = 0;
@@ -613,7 +613,7 @@ nv50_draw_elements(struct nv50_context *nv50, bool shorten,
   BEGIN_NV04(push, NV50_3D(VB_ELEMENT_BASE), 1);
   PUSH_DATA (push, index_bias);
   if (nv50->screen->base.class_3d >= NV84_3D_CLASS) {
- BEGIN_NV04(push, SUBC_3D(NV84_3D_VERTEX_ID_BASE), 1);
+ BEGIN_NV04(push, NV84_3D(VERTEX_ID_BASE), 1);
  PUSH_DATA (push, index_bias);
   }
   nv50->state.index_bias = index_bias;
diff --git a/src/gallium/drivers/nouveau/nv50/nv50_winsys.h 
b/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
index 76f1b41..6800230 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
+++ b/src/gallium/drivers/nouveau/nv50/nv50_winsys.h
@@ -49,6 +49,7 @@ PUSH_REFN(struct nouveau_pushbuf *push, struct nouveau_bo 
*bo, uint32_t flags)
 
 #define SUBC_3D(m) 3, (m)
 #define NV50_3D(n) SUBC_3D(NV50_3D_##n)
+#define NV84_3D(n) SUBC_3D(NV84_3D_##n)
 #define NVA0_3D(n) SUBC_3D(NVA0_3D_##n)
 
 #define SUBC_2D(m) 4, (m)
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] st/va: properly set max number of reference frames

2015-11-19 Thread Julien Isorce
It fixes asserts like assert(templ->max_references <= 2) in
nvc0_video.c::nvc0_create_decodier
This patch also post-update the correct value for the number
of reference frames in the h264 case, see below.

In VA-API the max num ref is retrieved later in handlePictureParameterBuffer.
Unfortunately by this time the decoder has been already created
in vaCreateContextwhich which one does not have any max_references
param compared to VDPAU api.

In vdpau-driver they delay the decoder creation to endPicture.
This is not practicable with gallium. But that's ok our buffer
will be bigger but at least they will have enough space and
st/va will still write correct value for the number of reference
frames in the hw buffer.

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/context.c | 10 +++---
 src/gallium/state_trackers/va/picture.c |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index ec9e048..b3dd293 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -248,10 +248,14 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID 
config_id, int picture_width,
   templat.max_references = num_render_targets;
   templat.expect_chunked_decode = true;
 
+  /* XXX HEVC ? */
   if (u_reduce_video_profile(templat.profile) ==
-PIPE_VIDEO_FORMAT_MPEG4_AVC)
-templat.level = u_get_h264_level(templat.width, templat.height,
- _references);
+ PIPE_VIDEO_FORMAT_MPEG4_AVC) {
+ templat.level = u_get_h264_level(templat.width, templat.height,
+_references);
+  } else {
+ templat.max_references = 2;
+  }
 
   context->decoder = drv->pipe->create_video_codec(drv->pipe, );
   if (!context->decoder) {
diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 5e7841a..9d4d1a8 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -146,6 +146,7 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *
   /*bit_depth_luma_minus8*/
   /*bit_depth_chroma_minus8*/
   context->desc.h264.num_ref_frames = h264->num_ref_frames;
+  context->decoder->max_references = 
MIN2(context->desc.h264.num_ref_frames, 16);
   /*chroma_format_idc*/
   /*residual_colour_transform_flag*/
   /*gaps_in_frame_num_value_allowed_flag*/
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/4] st/va: fixes h264

2015-11-19 Thread Julien Isorce
This patches make working the pending series:
"[PATCH v4 0/6] nouveau: add support for vaapi"

When comparing bsp_bo nouveau_bo content between vaapi
and vdpau I found that the h264 case was completely missing
parsing of reference frames info in st/va.

Note that for patch 3/4 I am not sure for h265 case.
Note that for patch 4/4 I am not sure which solution
to take, 1 or 2. But both are working.

Tested on clips available here: http://www.h264info.com/clips.html

Julien Isorce (4):
  nouveau: move interlaced assert down in
nouveau_vp3_video_buffer_create
  st/va: add missing profiles in PipeToProfile's switch.
  st/va: properly set max number of reference frames
  st/va: also retrieve reference frames info for h264

 src/gallium/drivers/nouveau/nouveau_vp3_video.c |  2 +-
 src/gallium/state_trackers/va/context.c | 10 +++-
 src/gallium/state_trackers/va/picture.c | 77 +
 src/gallium/state_trackers/va/va_private.h  |  7 +++
 4 files changed, 92 insertions(+), 4 deletions(-)

-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] st/va: also retrieve reference frames info for h264

2015-11-19 Thread Julien Isorce
st/va was completely missing parsing of:
VAPictureParameterBufferH264.ReferenceFrames[16]

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/picture.c | 76 +
 1 file changed, 76 insertions(+)

diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 9d4d1a8..1259a88 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -97,6 +97,8 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *
vlVaSurface *surf_forward;
vlVaSurface *surf_backward;
unsigned int i;
+   /* Solution 1 */
+   unsigned int top_or_bottom_field;
static const uint8_t default_intra_quant_matrix[64] = { 0 };
static const uint8_t default_non_intra_quant_matrix[64] = { 0 };
 
@@ -193,8 +195,82 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *
  h264->pic_fields.bits.deblocking_filter_control_present_flag;
   context->desc.h264.pps->redundant_pic_cnt_present_flag =
  h264->pic_fields.bits.redundant_pic_cnt_present_flag;
+
+  /* Solution 1 */
+  context->desc.h264.bottom_field_flag =
+ h264->pic_fields.bits.field_pic_flag &&
+ (h264->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD) != 0;
+  /* Solution 2 */
+  /*context->desc.h264.bottom_field_flag =
+ !!(h264->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD);*/
+
   /*reference_pic_flag*/
+  context->desc.h264.is_reference = 
h264->pic_fields.bits.reference_pic_flag;
   context->desc.h264.frame_num = h264->frame_num;
+
+  for (i = 0; i < context->decoder->max_references; ++i) {
+ if ((h264->ReferenceFrames[i].flags & VA_PICTURE_H264_INVALID) ||
+ (h264->ReferenceFrames[i].picture_id == VA_INVALID_SURFACE)) {
+context->desc.h264.ref[i] = NULL;
+context->desc.h264.frame_num_list[i] = 0;
+context->desc.h264.is_long_term[i] = 0;
+context->desc.h264.top_is_reference[i] = 0;
+context->desc.h264.bottom_is_reference[i] = 0;
+context->desc.h264.field_order_cnt_list[i][0] = 0;
+context->desc.h264.field_order_cnt_list[i][1] = 0;
+break;
+ }
+ getReferenceFrame(drv, h264->ReferenceFrames[i].picture_id, 
>desc.h264.ref[i]);
+ context->desc.h264.frame_num_list[i] = 
h264->ReferenceFrames[i].frame_idx;
+
+ /* Solution 1 */
+ top_or_bottom_field = h264->ReferenceFrames[i].flags &
+(VA_PICTURE_H264_TOP_FIELD | VA_PICTURE_H264_BOTTOM_FIELD);
+ context->desc.h264.is_long_term[i] = (h264->ReferenceFrames[i].flags &
+(VA_PICTURE_H264_SHORT_TERM_REFERENCE |
+VA_PICTURE_H264_LONG_TERM_REFERENCE)) !=
+VA_PICTURE_H264_SHORT_TERM_REFERENCE;
+ context->desc.h264.top_is_reference[i] =
+!context->desc.h264.is_long_term[i] ||
+!!(h264->ReferenceFrames[i].flags & VA_PICTURE_H264_TOP_FIELD);
+ context->desc.h264.bottom_is_reference[i] =
+!context->desc.h264.is_long_term[i] ||
+!!(h264->ReferenceFrames[i].flags & VA_PICTURE_H264_BOTTOM_FIELD);
+ context->desc.h264.field_order_cnt_list[i][0] =
+top_or_bottom_field != VA_PICTURE_H264_BOTTOM_FIELD ?
+h264->ReferenceFrames[i].TopFieldOrderCnt: INT_MAX;
+ context->desc.h264.field_order_cnt_list[i][1] =
+top_or_bottom_field != VA_PICTURE_H264_TOP_FIELD ?
+h264->ReferenceFrames[i].BottomFieldOrderCnt: INT_MAX;
+ /* Solution 2 */
+ /*context->desc.h264.is_long_term[i] = 
(h264->ReferenceFrames[i].flags &
+VA_PICTURE_H264_LONG_TERM_REFERENCE) != 0;
+ if ((h264->ReferenceFrames[i].flags &
+(VA_PICTURE_H264_TOP_FIELD | VA_PICTURE_H264_BOTTOM_FIELD)) == 0) {
+context->desc.h264.top_is_reference[i]  = 1;
+context->desc.h264.bottom_is_reference[i] = 1;
+ } else {
+context->desc.h264.top_is_reference[i] =
+   (h264->ReferenceFrames[i].flags & VA_PICTURE_H264_TOP_FIELD) != 
0;
+context->desc.h264.bottom_is_reference[i] =
+   (h264->ReferenceFrames[i].flags & VA_PICTURE_H264_BOTTOM_FIELD) 
!= 0;
+ }
+ context->desc.h264.field_order_cnt_list[i][0] =
+h264->ReferenceFrames[i].TopFieldOrderCnt;
+ context->desc.h264.field_order_cnt_list[i][1] =
+h264->ReferenceFrames[i].BottomFieldOrderCnt;*/
+  }
+
+  /* Make sure remaining elements are clean */
+  for (; i < 16; ++i) {
+ context->desc.h264.ref[i] = NULL;
+ context->desc.h264.frame_num_list[i] = 0;
+ context->desc.h264.is_long_term[i] = 0;
+ context->desc.h264.top_is_reference[i] = 0;
+ context->desc.h264.bottom_is_reference[i] = 0;
+ 

[Mesa-dev] [PATCH 2/4] st/va: add missing profiles in PipeToProfile's switch.

2015-11-19 Thread Julien Isorce
Otherwise assert is raised from vlVaQueryConfigProfiles's for loop.

Signed-off-by: Julien Isorce 
---
 src/gallium/state_trackers/va/va_private.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/gallium/state_trackers/va/va_private.h 
b/src/gallium/state_trackers/va/va_private.h
index 2b645d0..312bda9 100644
--- a/src/gallium/state_trackers/va/va_private.h
+++ b/src/gallium/state_trackers/va/va_private.h
@@ -148,6 +148,13 @@ PipeToProfile(enum pipe_video_profile profile)
case PIPE_VIDEO_PROFILE_HEVC_MAIN:
   return VAProfileHEVCMain;
case PIPE_VIDEO_PROFILE_MPEG4_AVC_EXTENDED:
+   case PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH10:
+   case PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH422:
+   case PIPE_VIDEO_PROFILE_MPEG4_AVC_HIGH444:
+   case PIPE_VIDEO_PROFILE_HEVC_MAIN_10:
+   case PIPE_VIDEO_PROFILE_HEVC_MAIN_12:
+   case PIPE_VIDEO_PROFILE_HEVC_MAIN_STILL:
+   case PIPE_VIDEO_PROFILE_HEVC_MAIN_444:
case PIPE_VIDEO_PROFILE_UNKNOWN:
   return VAProfileNone;
default:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] nouveau: move interlaced assert down in nouveau_vp3_video_buffer_create

2015-11-19 Thread Julien Isorce
templat->interlaced is 0 if not NV12 which is the case currently
when using VPP.

Signed-off-by: Julien Isorce 
---
 src/gallium/drivers/nouveau/nouveau_vp3_video.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_vp3_video.c 
b/src/gallium/drivers/nouveau/nouveau_vp3_video.c
index f3a64b2..411b853 100644
--- a/src/gallium/drivers/nouveau/nouveau_vp3_video.c
+++ b/src/gallium/drivers/nouveau/nouveau_vp3_video.c
@@ -83,10 +83,10 @@ nouveau_vp3_video_buffer_create(struct pipe_context *pipe,
struct pipe_sampler_view sv_templ;
struct pipe_surface surf_templ;
 
-   assert(templat->interlaced);
if (getenv("XVMC_VL") || templat->buffer_format != PIPE_FORMAT_NV12)
   return vl_video_buffer_create(pipe, templat);
 
+   assert(templat->interlaced);
assert(templat->chroma_format == PIPE_VIDEO_CHROMA_FORMAT_420);
 
buffer = CALLOC_STRUCT(nouveau_vp3_video_buffer);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now

2015-11-19 Thread Nicolai Hähnle

On 19.11.2015 03:55, Tom Stellard wrote:

On Thu, Nov 19, 2015 at 11:31:55AM +0900, Michel Dänzer wrote:

From: Michel Dänzer 

llvm.exp2.f32 doesn't work in some cases yet.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709
Signed-off-by: Michel Dänzer 
---

Once the problem is fixed in the LLVM AMDGPU backend, we can re-enable
llvm.exp2.f32 for the fixed LLVM version.

  src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c 
b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
index ac99e73..c94f109 100644
--- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
+++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
@@ -1539,7 +1539,7 @@ void radeon_llvm_context_init(struct radeon_llvm_context 
* ctx)
bld_base->op_actions[TGSI_OPCODE_ENDIF].emit = endif_emit;
bld_base->op_actions[TGSI_OPCODE_ENDLOOP].emit = endloop_emit;
bld_base->op_actions[TGSI_OPCODE_EX2].emit = build_tgsi_intrinsic_nomem;
-   bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.exp2.f32";
+   bld_base->op_actions[TGSI_OPCODE_EX2].intr_name = "llvm.AMDIL.exp.";


Do we want a native instruction here, or do we want IEEE precise exp2?
If it's the former then we shouldn't be using llvm.exp2.f32 anyway.

I know that we need to use llvm.AMDIL.exp. for older LLVM, but for newer
LLVM, I would really like to start doing intrinsics the correct way.  In
this case, it means adding an llvm.amdgcn.exp.f32 intrinsic to
include/llvm/IR/IntrinsicsAMDGPU.td.  In the section with the amdgcn
TargetPrefix.


Doesn't AMDGPU currently emit native instructions for the various math 
intrinsics anyway? If your plan is to change that and we add our own 
intrinsics, what's the plan to still benefit from existing LLVM 
optimization passes?


FWIW, the original bug that Michel addresses here is caused by LLVM 
libcall optimizations replacing the exp2 _intrinsic_ by a ldexp 
_libcall_, which AMDGPU codegen cannot deal with. I suggested to address 
this with http://reviews.llvm.org/D14327. Could you take a look at that?


Cheers,
Nicolai
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 10/42] i965: Calculate appropriate L3 partition weights for the current pipeline state.

2015-11-19 Thread Kristian Høgsberg
On Thu, Nov 19, 2015 at 4:24 AM, Francisco Jerez  wrote:
> Kristian Høgsberg  writes:
>
>> On Tue, Nov 17, 2015 at 9:54 PM, Jordan Justen
>>  wrote:
>>> From: Francisco Jerez 
>>>
>>> This calculates a rather conservative partitioning of the L3 cache
>>> based on the shaders currently bound to the pipeline and whether they
>>> use SLM, atomics, images or scratch space.  The result is intended to
>>> be fine-tuned later on based on other pipeline state.
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_compiler.h  |  1 +
>>>  src/mesa/drivers/dri/i965/gen7_l3_state.c | 53 
>>> +++
>>>  2 files changed, 54 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
>>> b/src/mesa/drivers/dri/i965/brw_compiler.h
>>> index 8f147d3..ef8bddb 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
>>> @@ -300,6 +300,7 @@ struct brw_stage_prog_data {
>>>
>>> unsigned curb_read_length;
>>> unsigned total_scratch;
>>> +   unsigned total_shared;
>>>
>>> /**
>>>  * Register where the thread expects to find input data from the URB
>>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
>>> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> index 4d0cfcd..1a88261 100644
>>> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>>> @@ -258,6 +258,59 @@ get_l3_config(const struct brw_device_info *devinfo, 
>>> struct brw_l3_weights w0)
>>>  }
>>>
>>>  /**
>>> + * Return a reasonable default L3 configuration for the specified device 
>>> based
>>> + * on whether SLM and DC are required.  In the non-SLM non-DC case the 
>>> result
>>> + * is intended to approximately resemble the hardware defaults.
>>> + */
>>> +static struct brw_l3_weights
>>> +get_default_l3_weights(const struct brw_device_info *devinfo,
>>> +   bool needs_dc, bool needs_slm)
>>> +{
>>> +   struct brw_l3_weights w = {{ 0 }};
>>> +
>>> +   w.w[L3P_SLM] = needs_slm;
>>> +   w.w[L3P_URB] = 1.0;
>>> +
>>> +   if (devinfo->gen >= 8) {
>>> +  w.w[L3P_ALL] = 1.0;
>>> +   } else {
>>> +  w.w[L3P_DC] = needs_dc ? 0.1 : 0;
>>> +  w.w[L3P_RO] = devinfo->is_baytrail ? 0.5 : 1.0;
>>> +   }
>>> +
>>> +   return norm_l3_weights(w);
>>> +}
>>> +
>>> +/**
>>> + * Calculate the desired L3 partitioning based on the current state of the
>>> + * pipeline.  For now this simply returns the conservative defaults 
>>> calculated
>>> + * by get_default_l3_weights(), but we could probably do better by 
>>> gathering
>>> + * more statistics from the pipeline state (e.g. guess of expected URB 
>>> usage
>>> + * and bound surfaces), or by using feed-back from performance counters.
>>> + */
>>> +static struct brw_l3_weights
>>> +get_pipeline_state_l3_weights(const struct brw_context *brw)
>>> +{
>>> +   const struct brw_stage_state *stage_states[] = {
>>> +  >vs.base, >gs.base, >wm.base, >cs.base
>>> +   };
>>> +   bool needs_dc = false, needs_slm = false;
>>
>> This doesn't seem optimal - we should evaluate the 3D pipe and the
>> compute pipe separately depending on which  one is active. For
>> example, if we have a current compute program that uses SLM, but are
>> using the 3D pipeline, we'll get a partition that includes SLM even
>> for the 3D pipe.
>>
> The intention of this patch is not to provide an optimal heuristic, but
> to implement a simple heuristic that calculates conservative defaults in
> order to guarantee functional correctness.  It would be possible to base
> the result on the currently active pipeline with minimal changes to this
> function (and making sure that the L3 config atom is invalidated while
> switching pipelines), but I don't think we want to switch back and forth
> between SLM and non-SLM configurations if the application interleaves
> draw and compute operations, because the L3 partitioning is global
> rather than per-pipeline and transitions are expensive -- Switching
> between L3 configuration requires a full pipeline stall and flushing and
> invalidation of all L3-backed caches, doing what you suggest would
> likely lead to cache-thrashing and might prevent pipelining of compute
> and render workloads [At least on BDW+ -- On Gen7 it might be impossible
> to achieve pipelining of render and compute workloads due to hardware
> issues].  If the result of the heuristic is based on the currently
> active pipeline some sort of hysteresis will likely be desirable, which
> in this patch I get for free by basing the calculation on the context
> state rather than on the currently active pipeline.  I agree with you
> that we'll probably need a more sophisticated heuristic in the future,
> but that falls outside the scope of this series -- If anything we need
> to be able to run benchmarks exercising CS and SLM before we can find
> out which kind of heuristic we want and what degree 

Re: [Mesa-dev] [RFCv2 03/13] nir: allow pre-resolved sampler uniform locations

2015-11-19 Thread Rob Clark
On Mon, Nov 9, 2015 at 4:08 PM, Timothy Arceri  wrote:
> On Mon, 2015-11-09 at 07:43 -0500, Rob Clark wrote:
>> On Sun, Nov 8, 2015 at 7:58 PM, Timothy Arceri 
>> wrote:
>> > On Sun, 2015-11-08 at 15:12 -0500, Rob Clark wrote:
>> > > From: Rob Clark 
>> > >
>> > > With TGSI, the ir_variable::data.location gets fixed up to be a stage
>> > > local location (rather than program global).  In this case we need to
>> > > skip the UniformStorage[location] lookup.
>> > > ---
>> > >  src/glsl/nir/nir_lower_samplers.c | 23 ---
>> > >  1 file changed, 16 insertions(+), 7 deletions(-)
>> > >
>> > > diff --git a/src/glsl/nir/nir_lower_samplers.c
>> > > b/src/glsl/nir/nir_lower_samplers.c
>> > > index 5df79a6..d99ba4c 100644
>> > > --- a/src/glsl/nir/nir_lower_samplers.c
>> > > +++ b/src/glsl/nir/nir_lower_samplers.c
>> > > @@ -130,14 +130,18 @@ lower_sampler(nir_tex_instr *instr, const struct
>> > > gl_shader_program *shader_progr
>> > >instr->sampler_array_size = array_elements;
>> > > }
>> > >
>> > > -   if (location > shader_program->NumUniformStorage - 1 ||
>> > > -   !shader_program->UniformStorage[location].opaque[stage].active)
>> > > {
>> > > -  assert(!"cannot return a sampler");
>> > > -  return;
>> > > -   }
>> > > +   if (!shader_program) {
>> > > +  instr->sampler_index = location;
>> > > +   } else {
>> > > +  if (location > shader_program->NumUniformStorage - 1 ||
>> > > +  !shader_program
>> > > ->UniformStorage[location].opaque[stage].active) {
>> > > + assert(!"cannot return a sampler");
>> > > + return;
>> > > +  }
>> > >
>> > > -   instr->sampler_index +=
>> > > -  shader_program->UniformStorage[location].opaque[stage].index;
>> > > +  instr->sampler_index =
>> > > + shader_program->UniformStorage[location].opaque[stage].index;
>> >
>> > Hi Rob,
>> >
>> > This will break arrays as instr->sampler_index is increamented inside
>> >  calc_sampler_offsets()
>>
>> oh, whoops, I didn't notice that.. ok, that part is easy enough to fix..
>>
>> > calc_sampler_offsets() also modifies the value of location is this what
>> > you
>> > want? I would assume not as we are counting uniforms not just samplers
>> > here.
>>
>> hmm, tbh I'm not entirely sure..  offhand, what piglit's should I
>> check?
>
> tests/spec/arb_gpu_shader5/execution/sampler_array_indexing
>
> Contains the tests you probably want to try out.

oh, hmm, looks like they all need at least gl3.2..

BR,
-R

>>  I guess it would be easier to debug if it worked correctly
>> with glsl_to_tgsi, but I guess I could try to get the non-indirect
>> case working..
>>
>> BR,
>> -R
>>
>> > The other thing to note is that glsl to tgsi doesn't handle indirects on
>> > structs or arrays of arrays correctly (Ilia was trying to fix this).
>> >
>> > Tim
>> >
>> >
>> >
>> > > +   }
>> > >
>> > > instr->sampler = NULL;
>> > >  }
>> > > @@ -177,6 +181,11 @@ lower_impl(nir_function_impl *impl, const struct
>> > > gl_shader_program *shader_progr
>> > > nir_foreach_block(impl, lower_block_cb, );
>> > >  }
>> > >
>> > > +/* Call with a null 'shader_program' if uniform locations are
>> >
>> > uniform locations -> sampler indices?
>> >
>> > > + * already local to the shader, ie. skipping the
>> > > + * shader_program->UniformStorage[location].opaque[stage].index
>> > > + * lookup
>> > > + */
>> > >  void
>> > >  nir_lower_samplers(nir_shader *shader,
>> > > const struct gl_shader_program *shader_program)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>> wrote:
>>> From: Connor Abbott 
>>>
>>> It appears that not only math instructions, but also MOV_BYTES or
>>> any instruction that uses Align1 mode cannot be in the middle
>>> of a dependency control sequence or the GPU will hang (at least on my
>>> BDW). This fixes GPU hangs in some fp64 tests.
>>
>> I'm pretty surprised by this assessment. Doubtful even.
>>
>>> Reviewed-by: Iago Toral Quiroga 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index 3bcd5cb..bc0a33b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>>> vec4_instruction *inst)
>>> }
>>>
>>> /*
>>> +* Instructions that use Align1 mode cause the GPU to hang when inserted
>>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>>> empirically.
>>> +*/
>>> +
>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>
>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>> added it I made a test that did
>>
>>   vec4 foo = vec4(packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...))
>>
>> and confirmed that it set depctrl properly on the whole sequence.
>> There could of course be bugs somewhere, but the "hardware doesn't
>> work if you mix align1 and align16 depctrl" seems unlikely.
>>
>> Do you know of a test that this affects?
>
> This only affects FP64 tests, since there we use an align1 mov to do
> double-to-float and float-to-double. However, I tried commenting out
> emit_nir_code() and just doing essentially:
>
> emit(MOV(...))->force_writemask_all = true;
> emit(VEC4_OPCODE_PACK_BYTES, ...);
> emit(MOV(...))->force_writemask_all = true;
>
> and on my BDW it hanged. In case it's not clear: this isn't about
> setting depctrl on the instruction itself, it just can't be inside of
> a depctrl sequence (which we were already disallowing for math
> instructions anyways).

Very weird. I'll take a look. So I understand, are the MOV
instructions writing different channels of the same register? And
VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
as the MOVs? (I saw your fixup reply)

By the way, the math code is too heavy handed as far as I know. The
BDW+ docs say that the MATH instruction itself cannot take dependency
control hints (and empirically earlier platforms seem to have problems
with this as well, see
tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
math instruction being in the middle of a NoDDC* block. The person who
implemented the math did the minimal amount of work to fix the
problem.

The PRM also says:

"""
Instructions other than send, may use this control as long as
operations that have different pipeline latencies are not mixed. The
operations that have longer latencies are:

Opcodes pln, lrp, dp*.
Operations involving double precision computation.
Integer DW multiplication where both source operands are DWs.
"""

I would say that mixing a double-precision operation and something
else might cause problems, but that seems like we have a different
problem thus far.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 19/42] glsl ubo/ssbo: Move common code into lower_buffer_access::setup_buffer_access

2015-11-19 Thread Jordan Justen
On 2015-11-18 00:00:08, Iago Toral wrote:
> On Tue, 2015-11-17 at 21:54 -0800, Jordan Justen wrote:
> > This code will also be usable by the pass to lower shared variables.
> > 
> > Note, that *const_offset is adjusted by setup_buffer_access so it must
> > be initialized before calling setup_buffer_access.
> > 
> > v2:
> >  * Add comment for lower_buffer_access::setup_buffer_access
> > 
> > Signed-off-by: Jordan Justen 
> > Cc: Samuel Iglesias Gonsalvez 
> > Cc: Iago Toral Quiroga 
> > Reviewed-by: Iago Toral Quiroga 
> > ---
> >  src/glsl/lower_buffer_access.cpp | 177 
> > +++
> >  src/glsl/lower_buffer_access.h   |   5 ++
> >  src/glsl/lower_ubo_reference.cpp | 160 +--
> >  3 files changed, 185 insertions(+), 157 deletions(-)
> > 
> > diff --git a/src/glsl/lower_buffer_access.cpp 
> > b/src/glsl/lower_buffer_access.cpp
> > index b5fe6e3..297ed69 100644
> > --- a/src/glsl/lower_buffer_access.cpp
> > +++ b/src/glsl/lower_buffer_access.cpp
> > @@ -305,4 +305,181 @@ 
> > lower_buffer_access::is_dereferenced_thing_row_major(const ir_rvalue *deref)
> > return false;
> >  }
> >  
> > +/**
> > + * This function initializes various values that will be used later by
> > + * emit_access when actually emitting loads or stores.
> > + *
> > + * Note: const_offset is an input as well as an output. For UBO and SSBO, 
> > the
> > + * caller should initialize it to 0 to point to the start of the buffer
> > + * object. For compute shader shared variables it will be initialized to 
> > the
> > + * offset of variable in the shared variable storage block.
> 
> I think this is not true, your version changes UBO and SSBO to behave
> just like shader variables since you no longer ever update *const_offset
> when you find an ir_type_dereference_variable node. Instead, you always
> initialize *const_offset to the value of ubo_var->Offset in
> setup_for_load_or_store. I think you can just rewrite the note comment
> above as:
> 
> "const_offset is an input as well as an output, clients must initialize
> it to the offset of the variable in the underlying block, and this
> function will adjust it by adding the constant offset of the member
> being accessed into that variable"

Yeah, that sounds good. I'll use your version.

Thanks,

-Jordan

> 
> > + */
> > +void
> > +lower_buffer_access::setup_buffer_access(void *mem_ctx,
> > + ir_variable *var,
> > + ir_rvalue *deref,
> > + ir_rvalue **offset,
> > + unsigned *const_offset,
> > + bool *row_major,
> > + int *matrix_columns,
> > + unsigned packing)
> > +{
> > +   *offset = new(mem_ctx) ir_constant(0u);
> > +   *row_major = is_dereferenced_thing_row_major(deref);
> > +   *matrix_columns = 1;
> > +
> > +   /* Calculate the offset to the start of the region of the UBO
> > +* dereferenced by *rvalue.  This may be a variable offset if an
> > +* array dereference has a variable index.
> > +*/
> > +   while (deref) {
> > +  switch (deref->ir_type) {
> > +  case ir_type_dereference_variable: {
> > + deref = NULL;
> > + break;
> > +  }
> > +
> > +  case ir_type_dereference_array: {
> > + ir_dereference_array *deref_array = (ir_dereference_array *) 
> > deref;
> > + unsigned array_stride;
> > + if (deref_array->array->type->is_vector()) {
> > +/* We get this when storing or loading a component out of a 
> > vector
> > + * with a non-constant index. This happens for v[i] = f where 
> > v is
> > + * a vector (or m[i][j] = f where m is a matrix). If we don't
> > + * lower that here, it gets turned into v = vector_insert(v, i,
> > + * f), which loads the entire vector, modifies one component 
> > and
> > + * then write the entire thing back.  That breaks if another
> > + * thread or SIMD channel is modifying the same vector.
> > + */
> > +array_stride = 4;
> > +if (deref_array->array->type->is_double())
> > +   array_stride *= 2;
> > + } else if (deref_array->array->type->is_matrix() && *row_major) {
> > +/* When loading a vector out of a row major matrix, the
> > + * step between the columns (vectors) is the size of a
> > + * float, while the step between the rows (elements of a
> > + * vector) is handled below in emit_ubo_loads.
> > + */
> > +array_stride = 4;
> > +if (deref_array->array->type->is_double())
> > +   array_stride *= 2;
> > +*matrix_columns = 

Re: [Mesa-dev] [RFCv2 03/13] nir: allow pre-resolved sampler uniform locations

2015-11-19 Thread Ilia Mirkin
On Thu, Nov 19, 2015 at 2:54 PM, Rob Clark  wrote:
> On Mon, Nov 9, 2015 at 4:08 PM, Timothy Arceri  wrote:
>> On Mon, 2015-11-09 at 07:43 -0500, Rob Clark wrote:
>>> On Sun, Nov 8, 2015 at 7:58 PM, Timothy Arceri 
>>> wrote:
>>> > On Sun, 2015-11-08 at 15:12 -0500, Rob Clark wrote:
>>> > > From: Rob Clark 
>>> > >
>>> > > With TGSI, the ir_variable::data.location gets fixed up to be a stage
>>> > > local location (rather than program global).  In this case we need to
>>> > > skip the UniformStorage[location] lookup.
>>> > > ---
>>> > >  src/glsl/nir/nir_lower_samplers.c | 23 ---
>>> > >  1 file changed, 16 insertions(+), 7 deletions(-)
>>> > >
>>> > > diff --git a/src/glsl/nir/nir_lower_samplers.c
>>> > > b/src/glsl/nir/nir_lower_samplers.c
>>> > > index 5df79a6..d99ba4c 100644
>>> > > --- a/src/glsl/nir/nir_lower_samplers.c
>>> > > +++ b/src/glsl/nir/nir_lower_samplers.c
>>> > > @@ -130,14 +130,18 @@ lower_sampler(nir_tex_instr *instr, const struct
>>> > > gl_shader_program *shader_progr
>>> > >instr->sampler_array_size = array_elements;
>>> > > }
>>> > >
>>> > > -   if (location > shader_program->NumUniformStorage - 1 ||
>>> > > -   !shader_program->UniformStorage[location].opaque[stage].active)
>>> > > {
>>> > > -  assert(!"cannot return a sampler");
>>> > > -  return;
>>> > > -   }
>>> > > +   if (!shader_program) {
>>> > > +  instr->sampler_index = location;
>>> > > +   } else {
>>> > > +  if (location > shader_program->NumUniformStorage - 1 ||
>>> > > +  !shader_program
>>> > > ->UniformStorage[location].opaque[stage].active) {
>>> > > + assert(!"cannot return a sampler");
>>> > > + return;
>>> > > +  }
>>> > >
>>> > > -   instr->sampler_index +=
>>> > > -  shader_program->UniformStorage[location].opaque[stage].index;
>>> > > +  instr->sampler_index =
>>> > > + shader_program->UniformStorage[location].opaque[stage].index;
>>> >
>>> > Hi Rob,
>>> >
>>> > This will break arrays as instr->sampler_index is increamented inside
>>> >  calc_sampler_offsets()
>>>
>>> oh, whoops, I didn't notice that.. ok, that part is easy enough to fix..
>>>
>>> > calc_sampler_offsets() also modifies the value of location is this what
>>> > you
>>> > want? I would assume not as we are counting uniforms not just samplers
>>> > here.
>>>
>>> hmm, tbh I'm not entirely sure..  offhand, what piglit's should I
>>> check?
>>
>> tests/spec/arb_gpu_shader5/execution/sampler_array_indexing
>>
>> Contains the tests you probably want to try out.
>
> oh, hmm, looks like they all need at least gl3.2..

Not to mention ARB_gpu_shader5 :)

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1.9/2] i965: Add brw_imm_uv().

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 9:53 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 8:00 AM, Jason Ekstrand  wrote:
>> On Wed, Nov 18, 2015 at 2:25 PM, Matt Turner  wrote:
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++
>>>  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
>>>  2 files changed, 12 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> index e5a286a..78c10e9 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>>> @@ -116,6 +116,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, 
>>> unsigned gen)
>>>case BRW_REGISTER_TYPE_V:
>>>   brw_reg = brw_imm_v(reg->ud);
>>>   break;
>>> +  case BRW_REGISTER_TYPE_UV:
>>> + brw_reg = brw_imm_uv(reg->ud);
>>> + break;
>>>default:
>>>  unreachable("not reached");
>>>}
>>> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
>>> b/src/mesa/drivers/dri/i965/brw_reg.h
>>> index 759bd93..b77cdeb 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_reg.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
>>> @@ -623,6 +623,15 @@ brw_imm_v(unsigned v)
>>> return imm;
>>>  }
>>>
>>> +/** Construct vector of eight unsigned half-byte values */
>>> +static inline struct brw_reg
>>> +brw_imm_uv(unsigned uv)
>>> +{
>>
>> Please add a GEN assertion either here or in the generator.  This only
>> becomes available on Haswell or Broadwell if I remember correctly.  I
>> do know it's not universally available.
>
> Good idea. (It's Sandybridge+)

On second thought, I can't (without adding a devinfo parameter to this
function, which I'm not going to do)

This is something the instruction validator can be tasked with verifying.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1.9/2] i965: Add brw_imm_uv().

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 8:00 AM, Jason Ekstrand  wrote:
> On Wed, Nov 18, 2015 at 2:25 PM, Matt Turner  wrote:
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++
>>  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
>>  2 files changed, 12 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> index e5a286a..78c10e9 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
>> @@ -116,6 +116,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned 
>> gen)
>>case BRW_REGISTER_TYPE_V:
>>   brw_reg = brw_imm_v(reg->ud);
>>   break;
>> +  case BRW_REGISTER_TYPE_UV:
>> + brw_reg = brw_imm_uv(reg->ud);
>> + break;
>>default:
>>  unreachable("not reached");
>>}
>> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
>> b/src/mesa/drivers/dri/i965/brw_reg.h
>> index 759bd93..b77cdeb 100644
>> --- a/src/mesa/drivers/dri/i965/brw_reg.h
>> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
>> @@ -623,6 +623,15 @@ brw_imm_v(unsigned v)
>> return imm;
>>  }
>>
>> +/** Construct vector of eight unsigned half-byte values */
>> +static inline struct brw_reg
>> +brw_imm_uv(unsigned uv)
>> +{
>
> Please add a GEN assertion either here or in the generator.  This only
> becomes available on Haswell or Broadwell if I remember correctly.  I
> do know it's not universally available.

Good idea. (It's Sandybridge+)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa: Add test for sorted extension table

2015-11-19 Thread Nanley Chery
From: Nanley Chery 

Enable developers to know if the table's alphabetical sorting
is maintained or lost.

v2: Move "*" next to pointer name (Matt)
Include extensions_table.h instead of extensions.h (Ian)
Remove extra " *" in comment (Ian)

Signed-off-by: Nanley Chery 
---

Does the original Rb still apply?

 src/mesa/main/tests/Makefile.am |  1 +
 src/mesa/main/tests/mesa_extensions.cpp | 51 +
 2 files changed, 52 insertions(+)
 create mode 100644 src/mesa/main/tests/mesa_extensions.cpp

diff --git a/src/mesa/main/tests/Makefile.am b/src/mesa/main/tests/Makefile.am
index bd7ab73..d6977e2 100644
--- a/src/mesa/main/tests/Makefile.am
+++ b/src/mesa/main/tests/Makefile.am
@@ -27,6 +27,7 @@ AM_CPPFLAGS += -DHAVE_SHARED_GLAPI
 main_test_SOURCES +=   \
dispatch_sanity.cpp \
mesa_formats.cpp\
+   mesa_extensions.cpp \
program_state_string.cpp
 
 main_test_LDADD += \
diff --git a/src/mesa/main/tests/mesa_extensions.cpp 
b/src/mesa/main/tests/mesa_extensions.cpp
new file mode 100644
index 000..0c7addd
--- /dev/null
+++ b/src/mesa/main/tests/mesa_extensions.cpp
@@ -0,0 +1,51 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+/**
+ * \name mesa_extensions.cpp
+ *
+ * Verify that the extensions table is sorted.
+ */
+
+#include 
+#include "util/macros.h"
+
+/**
+ * Debug/test: verify the extension table is alphabetically sorted.
+ */
+TEST(MesaExtensionsTest, AlphabeticallySorted)
+{
+   const char *ext_names[] = {
+   #define EXT(name_str, ...) #name_str,
+   #include "main/extensions_table.h"
+   #undef EXT
+   };
+
+   for (unsigned i = 0; i < ARRAY_SIZE(ext_names) - 1; ++i) {
+  const char *current_str = ext_names[i];
+  const char *next_str = ext_names[i+1];
+
+  /* We expect the extension table to be alphabetically sorted */
+  ASSERT_LT(strcmp(current_str, next_str), 0);
+   }
+}
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>>> wrote:
 From: Connor Abbott 

 It appears that not only math instructions, but also MOV_BYTES or
 any instruction that uses Align1 mode cannot be in the middle
 of a dependency control sequence or the GPU will hang (at least on my
 BDW). This fixes GPU hangs in some fp64 tests.
>>>
>>> I'm pretty surprised by this assessment. Doubtful even.
>>>
 Reviewed-by: Iago Toral Quiroga 
 ---
  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
 b/src/mesa/drivers/dri/i965/brw_vec4.cpp
 index 3bcd5cb..bc0a33b 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
 @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
 vec4_instruction *inst)
 }

 /*
 +* Instructions that use Align1 mode cause the GPU to hang when 
 inserted
 +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
 empirically.
 +*/
 +
 +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
 +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>>
>>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>>> added it I made a test that did
>>>
>>>   vec4 foo = vec4(packUnorm4x8(...),
>>>   packUnorm4x8(...),
>>>   packUnorm4x8(...),
>>>   packUnorm4x8(...))
>>>
>>> and confirmed that it set depctrl properly on the whole sequence.
>>> There could of course be bugs somewhere, but the "hardware doesn't
>>> work if you mix align1 and align16 depctrl" seems unlikely.
>>>
>>> Do you know of a test that this affects?
>>
>> This only affects FP64 tests, since there we use an align1 mov to do
>> double-to-float and float-to-double. However, I tried commenting out
>> emit_nir_code() and just doing essentially:
>>
>> emit(MOV(...))->force_writemask_all = true;
>> emit(VEC4_OPCODE_PACK_BYTES, ...);
>> emit(MOV(...))->force_writemask_all = true;
>>
>> and on my BDW it hanged. In case it's not clear: this isn't about
>> setting depctrl on the instruction itself, it just can't be inside of
>> a depctrl sequence (which we were already disallowing for math
>> instructions anyways).
>
> Very weird. I'll take a look. So I understand, are the MOV
> instructions writing different channels of the same register? And
> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
> as the MOVs? (I saw your fixup reply)

Actually, I had them writing the same thing so the second overwrote
the first one. The PACK_BYTES/MOV_BYTES/F2D/D2F (I think I tested all
of them, not sure) were operating on completely different registers,
and in the FP64 test that actually hung the GPU they were as well.
Using d2f since it's simpler and I remember what the operands are
(it's just an align1 mov with a dest stride of 2), the test code was
something like:

mov g50, g51 { no_dd_clear }
d2f g52, g54
mov g50, g53 { no_dd_check }

and changing the d2f to a normal align16 mov or commenting it out
prevented the hang. It would be interesting to see if a math
instruction instead of d2f also hangs.

>
> By the way, the math code is too heavy handed as far as I know. The
> BDW+ docs say that the MATH instruction itself cannot take dependency
> control hints (and empirically earlier platforms seem to have problems
> with this as well, see
> tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
> math instruction being in the middle of a NoDDC* block. The person who
> implemented the math did the minimal amount of work to fix the
> problem.
>
> The PRM also says:
>
> """
> Instructions other than send, may use this control as long as
> operations that have different pipeline latencies are not mixed. The
> operations that have longer latencies are:
>
> Opcodes pln, lrp, dp*.
> Operations involving double precision computation.
> Integer DW multiplication where both source operands are DWs.
> """
>
> I would say that mixing a double-precision operation and something
> else might cause problems, but that seems like we have a different
> problem thus far.

Yeah, these are all just mov's so I would expect that section to
apply. It still seems like we're not taking it into account, though...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1.9/2] i965: Add brw_imm_uv().

2015-11-19 Thread Matt Turner
On Wed, Nov 18, 2015 at 2:25 PM, Matt Turner  wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++
>  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
>  2 files changed, 12 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index e5a286a..78c10e9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> @@ -116,6 +116,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, unsigned 
> gen)
>case BRW_REGISTER_TYPE_V:
>   brw_reg = brw_imm_v(reg->ud);
>   break;
> +  case BRW_REGISTER_TYPE_UV:
> + brw_reg = brw_imm_uv(reg->ud);
> + break;

This hunk of the patch is now unnecessary with a patch I pushed this morning.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
>>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
 On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
 wrote:
> From: Connor Abbott 
>
> It appears that not only math instructions, but also MOV_BYTES or
> any instruction that uses Align1 mode cannot be in the middle
> of a dependency control sequence or the GPU will hang (at least on my
> BDW). This fixes GPU hangs in some fp64 tests.

 I'm pretty surprised by this assessment. Doubtful even.

> Reviewed-by: Iago Toral Quiroga 
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 3bcd5cb..bc0a33b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
> vec4_instruction *inst)
> }
>
> /*
> +* Instructions that use Align1 mode cause the GPU to hang when 
> inserted
> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
> empirically.
> +*/
> +
> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||

 PACK_BYTES sets depctrl itself in the generator, and at the time I
 added it I made a test that did

   vec4 foo = vec4(packUnorm4x8(...),
   packUnorm4x8(...),
   packUnorm4x8(...),
   packUnorm4x8(...))

 and confirmed that it set depctrl properly on the whole sequence.
 There could of course be bugs somewhere, but the "hardware doesn't
 work if you mix align1 and align16 depctrl" seems unlikely.

 Do you know of a test that this affects?
>>>
>>> This only affects FP64 tests, since there we use an align1 mov to do
>>> double-to-float and float-to-double. However, I tried commenting out
>>> emit_nir_code() and just doing essentially:
>>>
>>> emit(MOV(...))->force_writemask_all = true;
>>> emit(VEC4_OPCODE_PACK_BYTES, ...);
>>> emit(MOV(...))->force_writemask_all = true;
>>>
>>> and on my BDW it hanged. In case it's not clear: this isn't about
>>> setting depctrl on the instruction itself, it just can't be inside of
>>> a depctrl sequence (which we were already disallowing for math
>>> instructions anyways).
>>
>> Very weird. I'll take a look. So I understand, are the MOV
>> instructions writing different channels of the same register? And
>> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
>> as the MOVs? (I saw your fixup reply)
>
> Actually, I had them writing the same thing so the second overwrote
> the first one.

Err, just to be clear... in the actual test that led to me discovering
this, the instructions (not mov's but a bfi and a mov IIRC) were
writing different components of the same register. In my hacked-up
test to see if align1 in between a depctrl sequence was the problem, I
just had them both write to the same thing since it was easier. In any
case, I don't think it should matter too much.

> The PACK_BYTES/MOV_BYTES/F2D/D2F (I think I tested all
> of them, not sure) were operating on completely different registers,
> and in the FP64 test that actually hung the GPU they were as well.
> Using d2f since it's simpler and I remember what the operands are
> (it's just an align1 mov with a dest stride of 2), the test code was
> something like:
>
> mov g50, g51 { no_dd_clear }
> d2f g52, g54
> mov g50, g53 { no_dd_check }
>
> and changing the d2f to a normal align16 mov or commenting it out
> prevented the hang. It would be interesting to see if a math
> instruction instead of d2f also hangs.
>
>>
>> By the way, the math code is too heavy handed as far as I know. The
>> BDW+ docs say that the MATH instruction itself cannot take dependency
>> control hints (and empirically earlier platforms seem to have problems
>> with this as well, see
>> tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
>> math instruction being in the middle of a NoDDC* block. The person who
>> implemented the math did the minimal amount of work to fix the
>> problem.
>>
>> The PRM also says:
>>
>> """
>> Instructions other than send, may use this control as long as
>> operations that have different pipeline latencies are not mixed. The
>> operations that have longer latencies are:
>>
>> Opcodes pln, lrp, dp*.
>> Operations involving double precision computation.
>> Integer DW multiplication where both source operands are DWs.
>> """
>>

Re: [Mesa-dev] [PATCH 3/9] i965: fix 64-bit immediates in brw_inst(_set)_bits

2015-11-19 Thread Kristian Høgsberg
On Thu, Nov 19, 2015 at 11:24 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 9:30 AM, Kristian Høgsberg  wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>> wrote:
>>> From: Connor Abbott 
>>>
>>> If we tried to get/set something that was exactly 64 bits, we would
>>> try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
>>> 1's like we want.
>>>
>>> v2 (Iago)
>>>  - Replace ~0 by ~0ull
>>>  - Removed unnecessary parenthesis
>>>
>>> Reviewed-by: Iago Toral Quiroga 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_inst.h | 6 --
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
>>> b/src/mesa/drivers/dri/i965/brw_inst.h
>>> index 4ed95c4..ec08194 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_inst.h
>>> +++ b/src/mesa/drivers/dri/i965/brw_inst.h
>>> @@ -694,7 +694,8 @@ brw_inst_bits(const brw_inst *inst, unsigned high, 
>>> unsigned low)
>>> high %= 64;
>>> low %= 64;
>>>
>>> -   const uint64_t mask = (1ull << (high - low + 1)) - 1;
>>> +   const uint64_t mask = (high - low == 63) ? ~0ull :
>>> +  (1ull << (high - low + 1)) - 1;
>>
>> Can we do
>>
>> const uint64_t mask = (~0ul >> (64 - (high - low + 1)));
>>
>> instead?
>
> I don't think so, because ~0ul is of type unsigned, so right shifting
> it shifts in zeros. I was going to make a similar comment on the
> original patch -- "-1" is preferable over ~0u with an increasingly
> long sequence of l's because it's signed, so it's sign extended to
> fill whatever you assign it to. In your code though, since it's an
> operand we'd need -1ll, I think...

No, shifting in zeros is the whole point. We start out with 64 1 bits,
then shift it down enough that we end up with (high - low + 1) 1 bits
at the bottom, which is what we're trying to compute.

Kristian

>
> So with s/~0ul/-1ll/, I think that'll work? It's all evaluated at
> compile-time in any case, so clarity is the only metric. I don't have
> a preference.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Brian Paul

On 11/19/2015 01:19 PM, Ilia Mirkin wrote:

On Thu, Nov 19, 2015 at 3:13 PM, Brian Paul  wrote:

Hi Nanley,

Maybe you can fix an issue I have with the new extension code.

Previously, I could do something like export
MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer see
it in the GL_EXTENSIONS string, even if it was an "always on" extension.


How sure are you that this worked?


I'm pretty sure it worked that way when I first wrote it.



AFAIK there was no way to
enable/disable an always-on ext.


That's not the point.  The point is to change what 
glGetString(GL_EXTENSIONS) (or the glGetStringi() method) returns to the 
application.  I don't care if the underlying functionality is still 
working in Mesa.  We're trying to tell the application to not use a 
feature.  I've found this extremely helpful when debugging some apps.




That env var never affected the
extension strings, but only the actual ctx->Extensions struct, which
would indirectly modify the ext string. If the entry in that ext table
was a dummy one, it had logic to not futz with it. (But it did so
silently.)


Yeah, I don't care about the table.  I only care about the return value 
of glGetString(GL_EXTENSIONS).


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 2/2] radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values

2015-11-19 Thread Tom Stellard
On Wed, Nov 18, 2015 at 05:43:31PM +, Emil Velikov wrote:
> Hi Tom,
> 
> Please flip the order of the patches and drop the now patch 1/2 from
> the stable queue.
> 

I'm confused about why I need to flip the order of the patches.

-Tom

> On 16 November 2015 at 20:03, Tom Stellard  wrote:
> > The compiler has more information and is able to optimize the bits
> > it sets in these registers.
> >
> > CC: 
> > ---
> >  src/gallium/drivers/radeonsi/si_compute.c | 37 
> > ++-
> >  src/gallium/drivers/radeonsi/si_shader.c  |  2 ++
> >  2 files changed, 9 insertions(+), 30 deletions(-)
> >
> > diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
> > b/src/gallium/drivers/radeonsi/si_compute.c
> > index 2d551dd..a461b2c 100644
> > --- a/src/gallium/drivers/radeonsi/si_compute.c
> > +++ b/src/gallium/drivers/radeonsi/si_compute.c
> > @@ -34,11 +34,6 @@
> >
> >  #define MAX_GLOBAL_BUFFERS 20
> >
> > -/* XXX: Even though we don't pass the scratch buffer via user sgprs any 
> > more
> > - * LLVM still expects that we specify 4 USER_SGPRS so it can remain 
> > compatible
> > - * with older mesa. */
> > -#define NUM_USER_SGPRS 4
> > -
> >  struct si_compute {
> > struct si_context *ctx;
> >
> > @@ -238,7 +233,6 @@ static void si_launch_grid(
> > uint64_t kernel_args_va;
> > uint64_t scratch_buffer_va = 0;
> > uint64_t shader_va;
> > -   unsigned arg_user_sgpr_count = NUM_USER_SGPRS;
> > unsigned i;
> > struct si_shader *shader = >shader;
> > unsigned lds_blocks;
> > @@ -366,19 +360,7 @@ static void si_launch_grid(
> > si_pm4_set_reg(pm4, R_00B830_COMPUTE_PGM_LO, shader_va >> 8);
> > si_pm4_set_reg(pm4, R_00B834_COMPUTE_PGM_HI, shader_va >> 40);
> >
> > -   si_pm4_set_reg(pm4, R_00B848_COMPUTE_PGM_RSRC1,
> > -   /* We always use at least 3 VGPRS, these come from
> > -* TIDIG_COMP_CNT.
> > -* XXX: The compiler should account for this.
> > -*/
> > -   S_00B848_VGPRS((MAX2(3, shader->num_vgprs) - 1) / 4)
> > -   /* We always use at least 4 + arg_user_sgpr_count.  The 4 
> > extra
> > -* sgprs are from TGID_X_EN, TGID_Y_EN, TGID_Z_EN, 
> > TG_SIZE_EN
> > -* XXX: The compiler should account for this.
> > -*/
> > -   |  S_00B848_SGPRS(((MAX2(4 + arg_user_sgpr_count,
> > -   shader->num_sgprs)) - 1) / 8)
> > -   |  S_00B028_FLOAT_MODE(shader->float_mode))
> > +   si_pm4_set_reg(pm4, R_00B848_COMPUTE_PGM_RSRC1, shader->rsrc1);
> > ;
> The above semicolon should be nuked as well, shouldn't it ?
> 
> Thanks
> Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Ian Romanick
On 11/19/2015 12:30 PM, Ilia Mirkin wrote:
> On Thu, Nov 19, 2015 at 3:25 PM, Brian Paul  wrote:
>> On 11/19/2015 01:19 PM, Ilia Mirkin wrote:
>>> On Thu, Nov 19, 2015 at 3:13 PM, Brian Paul  wrote:

 Hi Nanley,

 Maybe you can fix an issue I have with the new extension code.

 Previously, I could do something like export
 MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer
 see
 it in the GL_EXTENSIONS string, even if it was an "always on" extension.
>>>
>>> How sure are you that this worked?
>>
>> I'm pretty sure it worked that way when I first wrote it.
>>
>>> AFAIK there was no way to
>>> enable/disable an always-on ext.
>>
>> That's not the point.  The point is to change what
>> glGetString(GL_EXTENSIONS) (or the glGetStringi() method) returns to the
>> application.  I don't care if the underlying functionality is still working
>> in Mesa.  We're trying to tell the application to not use a feature.  I've
>> found this extremely helpful when debugging some apps.
>>
>>> That env var never affected the
>>> extension strings, but only the actual ctx->Extensions struct, which
>>> would indirectly modify the ext string. If the entry in that ext table
>>> was a dummy one, it had logic to not futz with it. (But it did so
>>> silently.)
>>
>> Yeah, I don't care about the table.  I only care about the return value of
>> glGetString(GL_EXTENSIONS).
> 
> I was pretty annoyed at that too -- having it affect exts returned by
> the driver seemed to be the important bit, but I think I investigated
> why it didn't work ~6 months ago, and came to the above conclusion
> [that it only futzed with the table]. In fact recently someone was
> unable to work around an ARB_dsa shortcoming in nouveau by disabling
> it with an override.

Ugh... How much should this work?  Extensions are used to determine GL
versions, and compute_version can't check that "always on" extensions
have been disabled.  If you try to disable GL_EXT_texture, you'll still
get at least OpenGL 1.1. :)  Is it okay that the version override is
still needed to disable newer versions?

> IMHO it makes sense for it to affect both the table and the ext
> string... i.e. if there's a ctx->Extensions.foo listed, it should turn
> that on/off, but if there isn't, it should still touch up the returned
> ext list.
> 
>   -ilia
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965: Add src/dst interference for certain instructions with hazards.

2015-11-19 Thread Matt Turner
From: Kenneth Graunke 

When working on tessellation shaders, I created some vec4 virtual
opcodes for creating message headers through a sequence like:

   mov(8) g7<1>UD  0xUD{ align1 WE_all 1Q compacted };
   mov(1) g7.5<1>UD0x0100UD{ align1 WE_all };
   mov(1) g7<1>UD  g0<0,1,0>UD { align1 WE_all compacted };
   mov(1) g7.3<1>UDg8<0,1,0>UD { align1 WE_all };

This is done in the generator since the vec4 backend can't handle align1
regioning.  From the visitor's point of view, this is a single opcode:

   hs_set_output_urb_offsets vgrf7.0:UD, 1U, vgrf8.:UD

Normally, there's no hazard between sources and destinations - an
instruction (naturally) reads its sources, then writes the result to the
destination.  However, when the virtual instruction generates multiple
hardware instructions, we can get into trouble.

In the above example, if the register allocator assigned vgrf7 and vgrf8
to the same hardware register, then we'd clobber the source with 0 in
the first instruction, and read back the wrong value in the last one.

It occured to me that this is exactly the same problem we have with
SIMD16 instructions that use W/UW or B/UB types with 0 stride.  The
hardware implicitly decodes them as two SIMD8 instructions, and with
the overlapping regions, the first would clobber the second.

Previously, we handled that by incrementing the live range end IP by 1,
which works, but is excessive: the next instruction doesn't actually
care about that.  It might also be the end of control flow.  This might
keep values alive too long.  What we really want is to say "my source
and destinations interfere".

This patch creates new infrastructure for doing just that, and teaches
the register allocator to add interference when there's a hazard.  For
my vec4 case, we can determine this by switching on opcodes.  For the
SIMD16 case, we just move the existing code there.

I audited our existing virtual opcodes that generate multiple
instructions; I believe FS_OPCODE_PACK_HALF_2x16_SPLIT needs this
treatment as well, but no others.

Reviewed-by: Matt Turner 
Signed-off-by: Kenneth Graunke 
---
[mattst88] Rebased, applied R-b, and ran shader-db on ILK and HSW (no change)

 src/mesa/drivers/dri/i965/brw_fs.cpp   | 65 ++
 .../drivers/dri/i965/brw_fs_live_variables.cpp | 36 +---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  | 13 +
 src/mesa/drivers/dri/i965/brw_ir_fs.h  |  1 +
 src/mesa/drivers/dri/i965/brw_ir_vec4.h|  1 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 29 ++
 .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 13 +
 7 files changed, 123 insertions(+), 35 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e9c990d..b1f0164 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -299,6 +299,71 @@ fs_inst::is_send_from_grf() const
}
 }
 
+/**
+ * Returns true if this instruction's sources and destinations cannot
+ * safely be the same register.
+ *
+ * In most cases, a register can be written over safely by the same
+ * instruction that is its last use.  For a single instruction, the
+ * sources are dereferenced before writing of the destination starts
+ * (naturally).
+ *
+ * However, there are a few cases where this can be problematic:
+ *
+ * - Virtual opcodes that translate to multiple instructions in the
+ *   code generator: if src == dst and one instruction writes the
+ *   destination before a later instruction reads the source, then
+ *   src will have been clobbered.
+ *
+ * - SIMD16 compressed instructions with certain regioning (see below).
+ *
+ * The register allocator uses this information to set up conflicts between
+ * GRF sources and the destination.
+ */
+bool
+fs_inst::has_source_and_destination_hazard() const
+{
+   switch (opcode) {
+   case FS_OPCODE_PACK_HALF_2x16_SPLIT:
+  /* Multiple partial writes to the destination */
+  return true;
+   default:
+  /* The SIMD16 compressed instruction
+   *
+   * add(16)  g4<1>F  g4<8,8,1>F   g6<8,8,1>F
+   *
+   * is actually decoded in hardware as:
+   *
+   * add(8)   g4<1>F  g4<8,8,1>F   g6<8,8,1>F
+   * add(8)   g5<1>F  g5<8,8,1>F   g7<8,8,1>F
+   *
+   * Which is safe.  However, if we have uniform accesses
+   * happening, we get into trouble:
+   *
+   * add(8)   g4<1>F  g4<0,1,0>F   g6<8,8,1>F
+   * add(8)   g5<1>F  g4<0,1,0>F   g7<8,8,1>F
+   *
+   * Now our destination for the first instruction overwrote the
+   * second instruction's src0, and we get garbage for those 8
+   * pixels.  There's a similar issue for the pre-gen6
+   * pixel_x/pixel_y, which are registers of 16-bit values and thus
+   * would get 

Re: [Mesa-dev] [PATCH 2/5] i965/gen8: Allow rendering to B8G8R8X8

2015-11-19 Thread Ben Widawsky
On Thu, Nov 19, 2015 at 04:25:18PM +0100, Neil Roberts wrote:
> Since Gen8 this is allowed as a rendering target so we don't need to
> override it to B8G8R8A8. This is helpful on Gen9+ where using this
> override causes fast clears not to work.
> ---
>  src/mesa/drivers/dri/i965/brw_surface_formats.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> index 55e7e64..7c38431 100644
> --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> @@ -167,8 +167,8 @@ const struct surface_format_info surface_formats[] = {
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   I32_FLOAT)
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   L32_FLOAT)
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   A32_FLOAT)
> -   SF( Y,  Y,  x,  Y,  x,  x,  x,  x, 60,   90,   B8G8R8X8_UNORM)
> -   SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
> +   SF( Y,  Y,  x,  Y, 80, 80,  x,  x, 60,   90,   B8G8R8X8_UNORM)
> +   SF( Y,  Y,  x,  x, 80, 80,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM_SRGB)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R9G9B9E5_SHAREDEXP)
> @@ -670,9 +670,10 @@ brw_init_surface_formats(struct brw_context *brw)
> * mask writes to alpha (ala glColorMask) and reconfigure the
> * alpha blending hardware to use GL_ONE (or GL_ZERO) for
> * cases where GL_DST_ALPHA (or GL_ONE_MINUS_DST_ALPHA) is
> -   * used.
> +   * used. On Gen8+ BGRX is actually allowed (but not RGBX).
> */
> -  render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
> + if (gen < tinfo->render_target)
> +render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
>break;
>case BRW_SURFACEFORMAT_R8G8B8X8_UNORM:
>   render = BRW_SURFACEFORMAT_R8G8B8A8_UNORM;

There's probably something I don't know about the format naming but at the top
of the of the comment you modified they refer to the formats with the components
reversed (XRGB and ARGB) whereas you named them as in order (BGRX).

BTW, I pointed this out to Nanley, but I didn't see a patch, I think we need to
add is_braswell (and maybe broxton)) to the gen += 5 at the top of this
function.

lgtm
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] i965/gen9: Don't allow the RGBX formats for texturing/rendering

2015-11-19 Thread Ben Widawsky
On Thu, Nov 19, 2015 at 04:25:21PM +0100, Neil Roberts wrote:
> The RGBX surface formats aren't renderable so we internally remap them
> to RGBA when rendering. They are retained as RGBX when used as
> textures. However since the previous patch fast clears are disabled
> for surfaces that use a different format for rendering than for
> texturing. To avoid this situation we can just pretend not to support
> RGBX formats at all. This will cause the upper layers of mesa to pick
> an RGBA format internally instead. This should be safe because we
> always override the alpha component to 1.0 for RGBX in the texture
> swizzle anyway. We could also do this for all gens except that it's a
> bit more difficult when the hardware doesn't support texture
> swizzling. Gens using the blorp have further problems because that
> doesn't implement this swizzle override.
> ---
>  src/mesa/drivers/dri/i965/brw_surface_formats.c | 28 
> +
>  1 file changed, 28 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> index 7c38431..7594aca 100644
> --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> @@ -733,6 +733,34 @@ brw_init_surface_formats(struct brw_context *brw)
> if (brw->gen >= 8)
>ctx->TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = true;
>  
> +   /* The RGBX formats are not renderable. Normally these get mapped
> +* internally to RGBA formats when rendering. However on Gen9+ when this
> +* internal override is used fast clears don't work so they are disabled 
> in
> +* brw_meta_fast_clear. To avoid this problem we can just pretend not to
> +* support RGBX formats at all. This will cause the upper layers of Mesa 
> to
> +* pick the RGBA formats instead. This works fine because when it is used

A lot of these formats are already unsupported for fast clears. In fact, I
believe only MESA_FORMAT_R8G8B8X8_UNORM is a problem. Are you trying to
accomplish something else here as well?

> +* as a texture source the swizzle state is programmed to force the alpha
> +* channel to 1.0 anyway. We could also do this for all gens except that
> +* it's a bit more difficult when the hardware doesn't support texture
> +* swizzling. Gens using the blorp have further problems because that
> +* doesn't implement this swizzle override. We don't need to do this for
> +* BGRX because that actually is supported natively on Gen8+.
> +*/
> +   if (brw->gen >= 9) {
> +  static const mesa_format rgbx_formats[] = {
> + MESA_FORMAT_R8G8B8X8_UNORM,
> + MESA_FORMAT_R8G8B8X8_SRGB,
> + MESA_FORMAT_RGBX_UNORM16,
> + MESA_FORMAT_RGBX_FLOAT16,
> + MESA_FORMAT_RGBX_FLOAT32
> +  };
> +
> +  for (int i = 0; i < ARRAY_SIZE(rgbx_formats); i++) {
> + ctx->TextureFormatSupported[rgbx_formats[i]] = false;
> + brw->format_supported_as_render_target[rgbx_formats[i]] = false;
> +  }
> +   }
> +


> /* On hardware that lacks support for ETC1, we map ETC1 to RGBX
>  * during glCompressedTexImage2D(). See intel_mipmap_tree::wraps_etc1.
>  */
> -- 
> 1.9.3
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 3/4] glsl: avoid linker and user varying location to overlap

2015-11-19 Thread Timothy Arceri
On Sun, 2015-10-25 at 15:01 +0100, Gregory Hainaut wrote:
> Current behavior on the interface matching:
> 
> layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
> out1; // Assigned to VARYING_SLOT_VAR0 by the linker
> 
> New behavior on the interface matching:
> 
> layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
> out1; // Assigned to VARYING_SLOT_VAR1 by the linker
> 
> piglit: arb_separate_shader_object-rendezvous_by_name
> 
> v4:
> * Fix variable name in assert
> 
> Signed-off-by: Gregory Hainaut 
> ---
>  src/glsl/link_varyings.cpp | 46 +++
> ---
>  1 file changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
> index 7e77a67..67d04cb 100644
> --- a/src/glsl/link_varyings.cpp
> +++ b/src/glsl/link_varyings.cpp
> @@ -766,7 +766,7 @@ public:
> gl_shader_stage consumer_stage);
> ~varying_matches();
> void record(ir_variable *producer_var, ir_variable *consumer_var);
> -   unsigned assign_locations();
> +   unsigned assign_locations(uint64_t reserved_slots);
> void store_locations() const;
>  
>  private:
> @@ -986,7 +986,7 @@ varying_matches::record(ir_variable *producer_var,
> ir_variable *consumer_var)
>   * passed to varying_matches::record().
>   */
>  unsigned
> -varying_matches::assign_locations()
> +varying_matches::assign_locations(uint64_t reserved_slots)
>  {
> /* Sort varying matches into an order that makes them easy to pack. */
> qsort(this->matches, this->num_matches, sizeof(*this->matches),
> @@ -1013,6 +1013,10 @@ varying_matches::assign_locations()
>!= this->matches[i].packing_class) {
>   *location = ALIGN(*location, 4);
>}
> +  while ((*location < MAX_VARYING * 4u) &&
> +(reserved_slots & (1u << *location / 4u))) {
> + *location = ALIGN(*location + 1, 4);
> +  }
>  
>this->matches[i].generic_location = *location;
>  
> @@ -1376,6 +1380,38 @@ canonicalize_shader_io(exec_list *ir, enum
> ir_variable_mode io_mode)
>  }
>  
>  /**
> + * Generate a bitfield map of the already reserved slots for a shader.

Maybe change this to:

Generate a bitfield map of the explicit locations for shader varyings.

Otherwise:

Reviewed-by: Timothy Arceri  + *
> + * In theory a 32 bits value will be enough but a 64 bits value is future
> proof.
> + */
> +uint64_t
> +reserved_varying_slot(struct gl_shader *stage, ir_variable_mode io_mode)
> +{
> +   assert(io_mode == ir_var_shader_in || io_mode == ir_var_shader_out);
> +   assert(MAX_VARYING <= 64); /* avoid an overflow of the returned value */
> +
> +   uint64_t slots = 0;
> +   int var_slot;
> +
> +   if (!stage)
> +  return slots;
> +
> +   foreach_in_list(ir_instruction, node, stage->ir) {
> +  ir_variable *const var = node->as_variable();
> +
> +  if (var == NULL || var->data.mode != io_mode || !var
> ->data.explicit_location)
> + continue;
> +
> +  var_slot = var->data.location - VARYING_SLOT_VAR0;
> +  if (var_slot >= 0 && var_slot < MAX_VARYING)
> + slots |= 1u << var_slot;
> +   }
> +
> +   return slots;
> +}
> +
> +
> +/**
>   * Assign locations for all variables that are produced in one pipeline
> stage
>   * (the "producer") and consumed in the next stage (the "consumer").
>   *
> @@ -1550,7 +1586,11 @@ assign_varying_locations(struct gl_context *ctx,
>   matches.record(matched_candidate->toplevel_var, NULL);
> }
>  
> -   const unsigned slots_used = matches.assign_locations();
> +   const uint64_t reserved_slots =
> +  reserved_varying_slot(producer, ir_var_shader_out) |
> +  reserved_varying_slot(consumer, ir_var_shader_in);
> +
> +   const unsigned slots_used = matches.assign_locations(reserved_slots);
> matches.store_locations();
>  
> for (unsigned i = 0; i < num_tfeedback_decls; ++i) {
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Nanley Chery
On Thu, Nov 19, 2015 at 12:13 PM, Brian Paul  wrote:

> Hi Nanley,
>
>
Hi Brian,


> Maybe you can fix an issue I have with the new extension code.
>
> Previously, I could do something like export
> MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer
> see it in the GL_EXTENSIONS string, even if it was an "always on" extension.
>
> Now when I try that I get:
>
> Mesa 11.1.0-devel implementation error: Trying to disable permanently
> enabled extensions: GL_ARB_get_texture_sub_image
> Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
>
> The whole point of the "-GL_EXT_foobar" syntax was to hide an extension
> from the application when it queries the driver's extensions.
>
> Can you please fix this so it works as before?
>
>
I have two branches that provide the ability to disable permanently enabled
extensions:
1. The first only modifies the extension strings and is located here:
http://cgit.freedesktop.org/~nchery/mesa/commit/?h=mod_always_on
2. The second modifies the extension strings and disables the extension
within the driver (assuming appropriate the helper function is used). It
also provides some performance benefits. :
http://cgit.freedesktop.org/~nchery/mesa/commit/?h=init_ext_vals

I'd appreciate any feedback on the two approaches as I work to get the
feature upstreamed.

Regards,
Nanley


> Thanks.
>
> -Brian
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965/gen9: Don't disallow fast clear for MSRT formats matching render

2015-11-19 Thread Ben Widawsky
On Thu, Nov 19, 2015 at 04:25:17PM +0100, Neil Roberts wrote:
> Previously fast clear was disallowed on Gen9 for MSRTs with the claim
> that some formats don't work but we didn't understand why. On further
> investigation it seems the formats that don't work are the ones where
> the render surface format is being overriden to a different format
> than the one used for texturing. The one used for texturing is not
> actually a renderable format. It arguably makes sense that the sampler
> hardware doesn't handle the fast color correctly in these cases
> because it shouldn't be possible to end up with a fast cleared surface
> that is non-renderable.
> 
> This patch changes the limitation to prevent fast clear for surfaces
> where the format for rendering is overriden.
> ---
>  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> index 1f8bfdf..f2e894a 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
> @@ -48,6 +48,7 @@
>  #include "brw_defines.h"
>  #include "brw_context.h"
>  #include "brw_draw.h"
> +#include "brw_state.h"
>  #include "intel_fbo.h"
>  #include "intel_batchbuffer.h"
>  
> @@ -549,11 +550,17 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
>if (brw->gen < 7)
>   clear_type = REP_CLEAR;
>  
> -  /* Certain formats have unresolved issues with sampling from the MCS
> -   * buffer on Gen9. This disables fast clears altogether for MSRTs until
> -   * we can figure out what's going on.
> +  /* If we're mapping the render format to a different format than the
> +   * format we use for texturing then it is a bit questionable whether it
> +   * should be possible to use a fast clear. Although we only actually
> +   * render using a renderable format, without the override workaround it
> +   * wouldn't be possible to have a non-renderable surface in a fast 
> clear
> +   * state so the hardware probably legitimately doesn't need to support
> +   * this case. At least on Gen9 this really does seem to cause problems.
> */
> -  if (brw->gen >= 9 && irb->mt->num_samples > 1)
> +  if (brw->gen >= 9 &&
> +  brw_format_for_mesa_format(irb->mt->format) !=
> +  brw->render_target_format[irb->mt->format])

Could you just do !brw->format_supported_as_render_target[irb->mt->format]?

>   clear_type = REP_CLEAR;
>  
>if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)

I forget, did you find failures for this in the non-MSRT case? If not, maybe we
could skip this patch and just take the rest of the series? That way we can
avoid the "perf regression" of RGBX clears which we do hit on certain workloads.

Assuming we have a failure in non-MSRT case, and we can't just bypass this
patch, this is:
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] i965: Check base format to determine whether to use tiled memcpy

2015-11-19 Thread Ben Widawsky
On Thu, Nov 19, 2015 at 04:25:19PM +0100, Neil Roberts wrote:
> The tiled memcpy doesn't work for copying from RGBX to RGBA because it
> doesn't override the alpha component to 1.0. Commit 2cebaac479d4 added
> a check to disable it for RGBX formats by looking at the TexFormat.
> However a lot of the rest of the code base is written with the
> assumption that an RGBA texture can be used internally to implement a
> GL_RGB texture. If that is done then this check breaks. This patch
> makes it instead check the base format of the texture which I think
> more directly matches the intention.
> 
> Cc: Jason Ekstrand 


I really don't know the corner cases well enough, but concept seems good to me.
In particular, I assume this is now going to return false for more cases than
previously - and that's okay, I guess?

> ---
>  src/mesa/drivers/dri/i965/intel_pixel_read.c | 7 ---
>  src/mesa/drivers/dri/i965/intel_tex_image.c  | 7 ---
>  2 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c 
> b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> index 9bcbbd1..c8aef65 100644
> --- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
> +++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> @@ -135,10 +135,11 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx,
>return false;
>  
> /* We can't handle copying from RGBX or BGRX because the tiled_memcpy
> -* function doesn't set the last channel to 1.
> +* function doesn't set the last channel to 1. Note this checks BaseFormat
> +* rather than TexFormat in case the RGBX format is being simulated with 
> an
> +* RGBA format.
>  */
> -   if (rb->Format == MESA_FORMAT_B8G8R8X8_UNORM ||
> -   rb->Format == MESA_FORMAT_R8G8B8X8_UNORM)
> +   if (rb->_BaseFormat == GL_RGB)
>return false;
>  
> if (!intel_get_memcpy(rb->Format, format, type, _copy, ,
> diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
> b/src/mesa/drivers/dri/i965/intel_tex_image.c
> index 34b91e8..e3710da7 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex_image.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
> @@ -399,10 +399,11 @@ intel_gettexsubimage_tiled_memcpy(struct gl_context 
> *ctx,
>return false;
>  
> /* We can't handle copying from RGBX or BGRX because the tiled_memcpy
> -* function doesn't set the last channel to 1.
> +* function doesn't set the last channel to 1. Note this checks BaseFormat
> +* rather than TexFormat in case the RGBX format is being simulated with 
> an
> +* RGBA format.
>  */
> -   if (texImage->TexFormat == MESA_FORMAT_B8G8R8X8_UNORM ||
> -   texImage->TexFormat == MESA_FORMAT_R8G8B8X8_UNORM)
> +   if (texImage->_BaseFormat == GL_RGB)
>return false;
>  
> if (!intel_get_memcpy(texImage->TexFormat, format, type, _copy, ,
> -- 
> 1.9.3
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Ian Romanick
On 11/19/2015 05:40 PM, Nanley Chery wrote:
> 
> On Thu, Nov 19, 2015 at 12:13 PM, Brian Paul  > wrote:
> 
> Hi Nanley,
> 
>  
> Hi Brian,
>  
> 
> Maybe you can fix an issue I have with the new extension code.
> 
> Previously, I could do something like export
> MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no
> longer see it in the GL_EXTENSIONS string, even if it was an "always
> on" extension.
> 
> Now when I try that I get:
> 
> Mesa 11.1.0-devel implementation error: Trying to disable
> permanently enabled extensions: GL_ARB_get_texture_sub_image
> Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
> 
> The whole point of the "-GL_EXT_foobar" syntax was to hide an
> extension from the application when it queries the driver's extensions.
> 
> Can you please fix this so it works as before?
> 
> 
> I have two branches that provide the ability to disable permanently
> enabled extensions:
> 1. The first only modifies the extension strings and is located here:
> http://cgit.freedesktop.org/~nchery/mesa/commit/?h=mod_always_on
> 
> 2. The second modifies the extension strings and disables the extension
> within the driver (assuming appropriate the helper function is used). It
> also provides some performance benefits. :
> http://cgit.freedesktop.org/~nchery/mesa/commit/?h=init_ext_vals
> 
> 
> I'd appreciate any feedback on the two approaches as I work to get the
> feature upstreamed.

I think #2 might be better, but there's a lot of churn.  I don't know
that we want that much churn right around the time of the release branch
point, and I think it would be good to have this resolved in 11.1.  I
also have a few bits of feedback in #2, so it might take a couple
iterations before that could land.

Whatever we do, I think we should create some 'make check' tests to
verify that the functionality continues to work in the future.

> Regards,
> Nanley
>  
> 
> Thanks.
> 
> -Brian
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org 
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] automake: loader: don't create an empty dri3 helper

2015-11-19 Thread Matt Turner
Patches 1, 3, and 4 are

Reviewed-by: Matt Turner 

(I gave an Ack on 2 since I didn't review the list of functions)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/4] automake: egl: add symbols test

2015-11-19 Thread Matt Turner
Seems like a good plan. I haven't reviewed the list, so have an

Acked-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Add test for sorted extension table

2015-11-19 Thread Ian Romanick
On 11/19/2015 10:44 AM, Nanley Chery wrote:
> From: Nanley Chery 
> 
> Enable developers to know if the table's alphabetical sorting
> is maintained or lost.
> 
> v2: Move "*" next to pointer name (Matt)
> Include extensions_table.h instead of extensions.h (Ian)

That actually does make it a lot nicer.

Reviewed-by: Ian Romanick 

> Remove extra " *" in comment (Ian)
> 
> Signed-off-by: Nanley Chery 
> ---
> 
> Does the original Rb still apply?
> 
>  src/mesa/main/tests/Makefile.am |  1 +
>  src/mesa/main/tests/mesa_extensions.cpp | 51 
> +
>  2 files changed, 52 insertions(+)
>  create mode 100644 src/mesa/main/tests/mesa_extensions.cpp
> 
> diff --git a/src/mesa/main/tests/Makefile.am b/src/mesa/main/tests/Makefile.am
> index bd7ab73..d6977e2 100644
> --- a/src/mesa/main/tests/Makefile.am
> +++ b/src/mesa/main/tests/Makefile.am
> @@ -27,6 +27,7 @@ AM_CPPFLAGS += -DHAVE_SHARED_GLAPI
>  main_test_SOURCES += \
>   dispatch_sanity.cpp \
>   mesa_formats.cpp\
> + mesa_extensions.cpp \
>   program_state_string.cpp
>  
>  main_test_LDADD += \
> diff --git a/src/mesa/main/tests/mesa_extensions.cpp 
> b/src/mesa/main/tests/mesa_extensions.cpp
> new file mode 100644
> index 000..0c7addd
> --- /dev/null
> +++ b/src/mesa/main/tests/mesa_extensions.cpp
> @@ -0,0 +1,51 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +/**
> + * \name mesa_extensions.cpp
> + *
> + * Verify that the extensions table is sorted.
> + */
> +
> +#include 
> +#include "util/macros.h"
> +
> +/**
> + * Debug/test: verify the extension table is alphabetically sorted.
> + */
> +TEST(MesaExtensionsTest, AlphabeticallySorted)
> +{
> +   const char *ext_names[] = {
> +   #define EXT(name_str, ...) #name_str,
> +   #include "main/extensions_table.h"
> +   #undef EXT
> +   };
> +
> +   for (unsigned i = 0; i < ARRAY_SIZE(ext_names) - 1; ++i) {
> +  const char *current_str = ext_names[i];
> +  const char *next_str = ext_names[i+1];
> +
> +  /* We expect the extension table to be alphabetically sorted */
> +  ASSERT_LT(strcmp(current_str, next_str), 0);
> +   }
> +}
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] i965: fix 64-bit immediates in brw_inst(_set)_bits

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 11:35 AM, Kristian Høgsberg  wrote:
> On Thu, Nov 19, 2015 at 11:24 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 9:30 AM, Kristian Høgsberg  
>> wrote:
>>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>>> wrote:
 From: Connor Abbott 

 If we tried to get/set something that was exactly 64 bits, we would
 try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
 1's like we want.

 v2 (Iago)
  - Replace ~0 by ~0ull
  - Removed unnecessary parenthesis

 Reviewed-by: Iago Toral Quiroga 
 ---
  src/mesa/drivers/dri/i965/brw_inst.h | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
 b/src/mesa/drivers/dri/i965/brw_inst.h
 index 4ed95c4..ec08194 100644
 --- a/src/mesa/drivers/dri/i965/brw_inst.h
 +++ b/src/mesa/drivers/dri/i965/brw_inst.h
 @@ -694,7 +694,8 @@ brw_inst_bits(const brw_inst *inst, unsigned high, 
 unsigned low)
 high %= 64;
 low %= 64;

 -   const uint64_t mask = (1ull << (high - low + 1)) - 1;
 +   const uint64_t mask = (high - low == 63) ? ~0ull :
 +  (1ull << (high - low + 1)) - 1;
>>>
>>> Can we do
>>>
>>> const uint64_t mask = (~0ul >> (64 - (high - low + 1)));
>>>
>>> instead?
>>
>> I don't think so, because ~0ul is of type unsigned, so right shifting
>> it shifts in zeros. I was going to make a similar comment on the
>> original patch -- "-1" is preferable over ~0u with an increasingly
>> long sequence of l's because it's signed, so it's sign extended to
>> fill whatever you assign it to. In your code though, since it's an
>> operand we'd need -1ll, I think...
>
> No, shifting in zeros is the whole point. We start out with 64 1 bits,
> then shift it down enough that we end up with (high - low + 1) 1 bits
> at the bottom, which is what we're trying to compute.

Doh. Of course. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Brian Paul

Hi Nanley,

Maybe you can fix an issue I have with the new extension code.

Previously, I could do something like export 
MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer 
see it in the GL_EXTENSIONS string, even if it was an "always on" extension.


Now when I try that I get:

Mesa 11.1.0-devel implementation error: Trying to disable permanently 
enabled extensions: GL_ARB_get_texture_sub_image

Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa

The whole point of the "-GL_EXT_foobar" syntax was to hide an extension 
from the application when it queries the driver's extensions.


Can you please fix this so it works as before?

Thanks.

-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 11:35 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott  wrote:
>> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
>>> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
 On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
> wrote:
>> From: Connor Abbott 
>>
>> It appears that not only math instructions, but also MOV_BYTES or
>> any instruction that uses Align1 mode cannot be in the middle
>> of a dependency control sequence or the GPU will hang (at least on my
>> BDW). This fixes GPU hangs in some fp64 tests.
>
> I'm pretty surprised by this assessment. Doubtful even.
>
>> Reviewed-by: Iago Toral Quiroga 
>> ---
>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> index 3bcd5cb..bc0a33b 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>> vec4_instruction *inst)
>> }
>>
>> /*
>> +* Instructions that use Align1 mode cause the GPU to hang when 
>> inserted
>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>> empirically.
>> +*/
>> +
>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>
> PACK_BYTES sets depctrl itself in the generator, and at the time I
> added it I made a test that did
>
>   vec4 foo = vec4(packUnorm4x8(...),
>   packUnorm4x8(...),
>   packUnorm4x8(...),
>   packUnorm4x8(...))
>
> and confirmed that it set depctrl properly on the whole sequence.
> There could of course be bugs somewhere, but the "hardware doesn't
> work if you mix align1 and align16 depctrl" seems unlikely.
>
> Do you know of a test that this affects?

 This only affects FP64 tests, since there we use an align1 mov to do
 double-to-float and float-to-double. However, I tried commenting out
 emit_nir_code() and just doing essentially:

 emit(MOV(...))->force_writemask_all = true;
 emit(VEC4_OPCODE_PACK_BYTES, ...);
 emit(MOV(...))->force_writemask_all = true;

 and on my BDW it hanged. In case it's not clear: this isn't about
 setting depctrl on the instruction itself, it just can't be inside of
 a depctrl sequence (which we were already disallowing for math
 instructions anyways).
>>>
>>> Very weird. I'll take a look. So I understand, are the MOV
>>> instructions writing different channels of the same register? And
>>> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
>>> as the MOVs? (I saw your fixup reply)
>>
>> Actually, I had them writing the same thing so the second overwrote
>> the first one.
>
> Err, just to be clear... in the actual test that led to me discovering
> this, the instructions (not mov's but a bfi and a mov IIRC) were
> writing different components of the same register. In my hacked-up
> test to see if align1 in between a depctrl sequence was the problem, I
> just had them both write to the same thing since it was easier. In any
> case, I don't think it should matter too much.

Interesting. If it was a BFI and a MOV writing different components of
the same register, I can see that failing. Though I haven't measured,
3-src instructions usually have 2+ extra cycles of latency and with a
2-cycle issue time, a BFI followed immediately by a MOV would mean
that both of them retired in the same cycle. That would seem bad if
one of them was supposed to signal the scoreboard.
×
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Ilia Mirkin
On Thu, Nov 19, 2015 at 3:25 PM, Brian Paul  wrote:
> On 11/19/2015 01:19 PM, Ilia Mirkin wrote:
>>
>> On Thu, Nov 19, 2015 at 3:13 PM, Brian Paul  wrote:
>>>
>>> Hi Nanley,
>>>
>>> Maybe you can fix an issue I have with the new extension code.
>>>
>>> Previously, I could do something like export
>>> MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer
>>> see
>>> it in the GL_EXTENSIONS string, even if it was an "always on" extension.
>>
>>
>> How sure are you that this worked?
>
>
> I'm pretty sure it worked that way when I first wrote it.
>
>
>> AFAIK there was no way to
>> enable/disable an always-on ext.
>
>
> That's not the point.  The point is to change what
> glGetString(GL_EXTENSIONS) (or the glGetStringi() method) returns to the
> application.  I don't care if the underlying functionality is still working
> in Mesa.  We're trying to tell the application to not use a feature.  I've
> found this extremely helpful when debugging some apps.
>
>
>> That env var never affected the
>> extension strings, but only the actual ctx->Extensions struct, which
>> would indirectly modify the ext string. If the entry in that ext table
>> was a dummy one, it had logic to not futz with it. (But it did so
>> silently.)
>
>
> Yeah, I don't care about the table.  I only care about the return value of
> glGetString(GL_EXTENSIONS).

I was pretty annoyed at that too -- having it affect exts returned by
the driver seemed to be the important bit, but I think I investigated
why it didn't work ~6 months ago, and came to the above conclusion
[that it only futzed with the table]. In fact recently someone was
unable to work around an ARB_dsa shortcoming in nouveau by disabling
it with an override.

IMHO it makes sense for it to affect both the table and the ext
string... i.e. if there's a ctx->Extensions.foo listed, it should turn
that on/off, but if there isn't, it should still touch up the returned
ext list.

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 2/2] radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values

2015-11-19 Thread Emil Velikov
On 19 November 2015 at 20:37, Tom Stellard  wrote:
> On Wed, Nov 18, 2015 at 05:43:31PM +, Emil Velikov wrote:
>> Hi Tom,
>>
>> Please flip the order of the patches and drop the now patch 1/2 from
>> the stable queue.
>>
>
> I'm confused about why I need to flip the order of the patches.
>
As mentioned on IRC - patch 1 is not a bugfix and despite how trivial
I'd rather have only bugfixes in stable.
Is flipping things too much to ask ?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/x11/dri3: do not expose the preserved swap behavior (to be squashed)

2015-11-19 Thread Eric Anholt
Martin Graesslin  writes:

> On Thursday, November 19, 2015 1:54:22 PM CET Martin Peres wrote:
>> On 11/11/15 00:44, Eric Anholt wrote:
>> > Martin Peres  writes:
>> >> The preserved swap behavior is currently untested in piglit and not
>> >> supported on the GLX side. Before working on implementing it for
>> >> EGL/DRI3, let's disable it until support comes.
>> >> 
>> >> This patch is trivial enough and will likely be squashed in the commit
>> >> egl/x11: Implement dri3 support with loader's dri3 helper
>> >> 
>> >> Signed-off-by: Martin Peres 
>> > 
>> > This looks good to me.  I don't think anybody really cares about
>> > SWAP_BEHAVIOR_PRESERVED -- the buffer_age stuff was what you really
>> > wanted all along.
>> 
>> Hey Eric and Martin,
>> 
>> It seems like KWin is relying on SWAP_BEHAVIOR_PRESERVED for its EGL
>> backend. Should we add proper support for it in mesa or should we
>> propose a patch for kwin to test for the extension to be present before
>> using it?
>> 
>> Martin, what is your opinion on this since buffer age is what you need
>> and you already have support for it for the glx and wayland backends?
>
> That's certainly a left over from before buffer age.  I don't know whether we 
> can remove it unconditionally. Not every GLES harware which does support X11 
> supports buffer age.
>
> I assume the smartest thing to do is not ask for PRESERVED if buffer age is 
> supported.

Yeah, you definitely want to not ask for PRESERVED if you can handle
buffer age, since PRESERVED means losing page flipping.  (Part of why
PRESERVED was a bad idea from day 1)


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Add test for sorted extension table

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 10:44 AM, Nanley Chery  wrote:
> From: Nanley Chery 
>
> Enable developers to know if the table's alphabetical sorting
> is maintained or lost.
>
> v2: Move "*" next to pointer name (Matt)
> Include extensions_table.h instead of extensions.h (Ian)
> Remove extra " *" in comment (Ian)
>
> Signed-off-by: Nanley Chery 
> ---

Looks good.

Reviewed-by: Matt Turner 

Thanks Nanley!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] i965: fix 64-bit immediates in brw_inst(_set)_bits

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 9:30 AM, Kristian Høgsberg  wrote:
> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
>> From: Connor Abbott 
>>
>> If we tried to get/set something that was exactly 64 bits, we would
>> try to do (1 << 64) - 1 to calculate the mask which doesn't give us all
>> 1's like we want.
>>
>> v2 (Iago)
>>  - Replace ~0 by ~0ull
>>  - Removed unnecessary parenthesis
>>
>> Reviewed-by: Iago Toral Quiroga 
>> ---
>>  src/mesa/drivers/dri/i965/brw_inst.h | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
>> b/src/mesa/drivers/dri/i965/brw_inst.h
>> index 4ed95c4..ec08194 100644
>> --- a/src/mesa/drivers/dri/i965/brw_inst.h
>> +++ b/src/mesa/drivers/dri/i965/brw_inst.h
>> @@ -694,7 +694,8 @@ brw_inst_bits(const brw_inst *inst, unsigned high, 
>> unsigned low)
>> high %= 64;
>> low %= 64;
>>
>> -   const uint64_t mask = (1ull << (high - low + 1)) - 1;
>> +   const uint64_t mask = (high - low == 63) ? ~0ull :
>> +  (1ull << (high - low + 1)) - 1;
>
> Can we do
>
> const uint64_t mask = (~0ul >> (64 - (high - low + 1)));
>
> instead?

I don't think so, because ~0ul is of type unsigned, so right shifting
it shifts in zeros. I was going to make a similar comment on the
original patch -- "-1" is preferable over ~0u with an increasingly
long sequence of l's because it's signed, so it's sign extended to
fill whatever you assign it to. In your code though, since it's an
operand we'd need -1ll, I think...

So with s/~0ul/-1ll/, I think that'll work? It's all evaluated at
compile-time in any case, so clarity is the only metric. I don't have
a preference.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/9] Early fp64 fixes

2015-11-19 Thread Matt Turner
For the next patches you send from Connor, please use
--suppress-cc=author so that every reply doesn't generate a bounce
message about his disabled @intel email. :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8 v2] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Ian Romanick
On 11/19/2015 09:34 AM, Neil Roberts wrote:
> Jason Ekstrand  writes:
> 
>> On Nov 18, 2015 6:38 PM, "Ian Romanick"  wrote:
>>>
>>> From: Ian Romanick 
>>>
>>> v2: Handle immediate value for MCS smarter.  Rebase on changes to
>>> nir_texop_sampels_identical (missing second parameter).  Suggested by
>>> Jason.  This still doesn't handle the 16x MSAA case.
> 
> 16x MSAA has a field in the program sampler key because it needs a new
> instruction to fetch from the texture. For 16x MSAA it just uses two of
> the registers that are returned by the MCS fetch instruction instead of
> just one. Maybe you could do something like this:
> 
>   if (mcs.file == BRW_IMMEDIATE_VALUE) {
>  bld.MOV(dst, fs_reg(0));
>   } else if ((key_tex->msaa_16 & (1 << sampler))) {
>  fs_reg tmp = vgrf(glsl_type::uint_type);
>  bld.OR(tmp, mcs, offset(mcs, bld, 1));
>  bld.CMP(dst, tmp, src_reg(0u), BRW_CONDITIONAL_EQ);
>   } else {
>  bld.CMP(dst, mcs, src_reg(0u), BRW_CONDITIONAL_EQ);
>   }
> 
> I tested this on my SKL and it passes all of your Piglit tests. The
> optimiser quite neatly removes the second instruction and the temporary
> register in the 16x case and makes the OR directly update the flag
> registers, so it is possibly the same cost as the 8x case in practice.

I was looking at the msaa_16 flags last night after I sent out v2.  I
was thinking of something along these lines.  Am I correct that nothing
special is needed in the vec4 backend?  It seems like mcs should know
the register size, and the CMP with 0 should just do the right thing.

I'll add your code to the FS and your S-o-b to the commit message... and
hopefully this can land soon. :)

> I can confirm that I've already pushed the 16x MSAA patches to master.
> 
> Regards,
> - Neil


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/8] Implement EXT_shader_samples_identical

2015-11-19 Thread Chris Forbes
Series (with the v2 changes) is:

Reviewed-by: Chris Forbes 

On Thu, Nov 19, 2015 at 12:46 PM, Ian Romanick  wrote:

> This patch series implements a new GL extension,
> EXT_shader_samples_identical.  This extension allows shaders to
> determine when all of the samples in a particular texel are the same.
> This takes advantage of the way compressed multisample surfaces are
> stored on modern Intel and AMD hardware.  This enables optimizations in
> application multisample resolve filters, etc.
>
> I really wanted to get this in the next Mesa release.  For some reason,
> I thought the branch point was after Thanksgiving (which is next
> Thursday).  Ken reminded me yesterday that the branch point is actually
> this Friday. :( As a result, I'm sending it out today to get review as
> soon as possible.
>
> I also wanted to get as much time as possible for other drivers to get
> implementations.  I worked with Graham Sellers on this extension, and he
> assures me that the implementation on modern Radeons is trivial.  My
> expectation is that it should be about the same as the Intel
> implementation.
>
> There will be some extra TGSI bits needed, but that should also be
> trivial.  For the NIR and i965 backend bits, I mostly copied and blended
> the implementations of txf_ms and query_samples.
>
> There are currently only trivial piglit tests, but I am working on more.
> I basically hacked up tests/spec/arb_texture_multisample/texelfetch.c to
> use the extension to render different colors based on whether
> textureSamplesIdenticalEXT returned true or false.  The resulting image
> and the generated assembly look good.  My plan is to get a set of real
> tests out by midday tomorrow.
>
> As soon as we're confident that the spec is good, I'll submit it to
> Khronos for publication in the registry.  I'm still waiting on feedback
> from another closed-source driver writer.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Connor Abbott
On Thu, Nov 19, 2015 at 3:11 PM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 11:35 AM, Connor Abbott  wrote:
>> On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott  wrote:
>>> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner  wrote:
 On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott  wrote:
> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  
>> wrote:
>>> From: Connor Abbott 
>>>
>>> It appears that not only math instructions, but also MOV_BYTES or
>>> any instruction that uses Align1 mode cannot be in the middle
>>> of a dependency control sequence or the GPU will hang (at least on my
>>> BDW). This fixes GPU hangs in some fp64 tests.
>>
>> I'm pretty surprised by this assessment. Doubtful even.
>>
>>> Reviewed-by: Iago Toral Quiroga 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index 3bcd5cb..bc0a33b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>>> vec4_instruction *inst)
>>> }
>>>
>>> /*
>>> +* Instructions that use Align1 mode cause the GPU to hang when 
>>> inserted
>>> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>>> empirically.
>>> +*/
>>> +
>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>
>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>> added it I made a test that did
>>
>>   vec4 foo = vec4(packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...),
>>   packUnorm4x8(...))
>>
>> and confirmed that it set depctrl properly on the whole sequence.
>> There could of course be bugs somewhere, but the "hardware doesn't
>> work if you mix align1 and align16 depctrl" seems unlikely.
>>
>> Do you know of a test that this affects?
>
> This only affects FP64 tests, since there we use an align1 mov to do
> double-to-float and float-to-double. However, I tried commenting out
> emit_nir_code() and just doing essentially:
>
> emit(MOV(...))->force_writemask_all = true;
> emit(VEC4_OPCODE_PACK_BYTES, ...);
> emit(MOV(...))->force_writemask_all = true;
>
> and on my BDW it hanged. In case it's not clear: this isn't about
> setting depctrl on the instruction itself, it just can't be inside of
> a depctrl sequence (which we were already disallowing for math
> instructions anyways).

 Very weird. I'll take a look. So I understand, are the MOV
 instructions writing different channels of the same register? And
 VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
 as the MOVs? (I saw your fixup reply)
>>>
>>> Actually, I had them writing the same thing so the second overwrote
>>> the first one.
>>
>> Err, just to be clear... in the actual test that led to me discovering
>> this, the instructions (not mov's but a bfi and a mov IIRC) were
>> writing different components of the same register. In my hacked-up
>> test to see if align1 in between a depctrl sequence was the problem, I
>> just had them both write to the same thing since it was easier. In any
>> case, I don't think it should matter too much.
>
> Interesting. If it was a BFI and a MOV writing different components of
> the same register, I can see that failing. Though I haven't measured,
> 3-src instructions usually have 2+ extra cycles of latency and with a
> 2-cycle issue time, a BFI followed immediately by a MOV would mean
> that both of them retired in the same cycle. That would seem bad if
> one of them was supposed to signal the scoreboard.
> ×

They weren't right next to each other, though -- there was a sequence
of 5-6 instructions between them, one of which happens to be an align1
mov which (according to my testing) hangs the GPU in between any
depctrl sequence.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] MESA_EXTENSION_OVERRIDE problem

2015-11-19 Thread Ilia Mirkin
On Thu, Nov 19, 2015 at 3:13 PM, Brian Paul  wrote:
> Hi Nanley,
>
> Maybe you can fix an issue I have with the new extension code.
>
> Previously, I could do something like export
> MESA_EXTENSION_OVERRIDE="-ARB_clear_buffer_object" and I would no longer see
> it in the GL_EXTENSIONS string, even if it was an "always on" extension.

How sure are you that this worked? AFAIK there was no way to
enable/disable an always-on ext. That env var never affected the
extension strings, but only the actual ctx->Extensions struct, which
would indirectly modify the ext string. If the entry in that ext table
was a dummy one, it had logic to not futz with it. (But it did so
silently.)

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1.9/2] i965: Add brw_imm_uv().

2015-11-19 Thread Jason Ekstrand
On Thu, Nov 19, 2015 at 11:33 AM, Matt Turner  wrote:
> On Thu, Nov 19, 2015 at 9:53 AM, Matt Turner  wrote:
>> On Thu, Nov 19, 2015 at 8:00 AM, Jason Ekstrand  wrote:
>>> On Wed, Nov 18, 2015 at 2:25 PM, Matt Turner  wrote:
 ---
  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +++
  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
  2 files changed, 12 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 index e5a286a..78c10e9 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
 @@ -116,6 +116,9 @@ brw_reg_from_fs_reg(fs_inst *inst, fs_reg *reg, 
 unsigned gen)
case BRW_REGISTER_TYPE_V:
   brw_reg = brw_imm_v(reg->ud);
   break;
 +  case BRW_REGISTER_TYPE_UV:
 + brw_reg = brw_imm_uv(reg->ud);
 + break;
default:
  unreachable("not reached");
}
 diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
 b/src/mesa/drivers/dri/i965/brw_reg.h
 index 759bd93..b77cdeb 100644
 --- a/src/mesa/drivers/dri/i965/brw_reg.h
 +++ b/src/mesa/drivers/dri/i965/brw_reg.h
 @@ -623,6 +623,15 @@ brw_imm_v(unsigned v)
 return imm;
  }

 +/** Construct vector of eight unsigned half-byte values */
 +static inline struct brw_reg
 +brw_imm_uv(unsigned uv)
 +{
>>>
>>> Please add a GEN assertion either here or in the generator.  This only
>>> becomes available on Haswell or Broadwell if I remember correctly.  I
>>> do know it's not universally available.
>>
>> Good idea. (It's Sandybridge+)
>
> On second thought, I can't (without adding a devinfo parameter to this
> function, which I'm not going to do)
>
> This is something the instruction validator can be tasked with verifying.

You could also do it in the generator, but the validator is fine too.
As long as it's somewhere.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8 v2] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Neil Roberts
Ian Romanick  writes:

> Am I correct that nothing special is needed in the vec4 backend? It
> seems like mcs should know the register size, and the CMP with 0
> should just do the right thing.

I think you probably will have to do something for the vec4 backend.
Currently it generates an instruction like this:

cmp.z.f0(8) g8<1>.xUD   g9<4,4,1>.xUD   0xUD

g9 contains the MCS data. If I understand correctly for 16x MSAA the
second half of the MCS data is in the y component which is currently
ignored by this instruction. Maybe something like this would work:

  if (mcs.file == BRW_IMMEDIATE_VALUE) {
 emit(MOV(dest, src_reg(0u)));
  } else if ((key_tex->msaa_16 & (1 << sampler))) {
 dst_reg tmp(this, glsl_type::uint_type);
 dst_reg swizzled_mcs = mcs;
 swizzled_mcs.swizzle = BRW_SWIZZLE_;
 emit(OR(tmp, mcs, swizzled_mcs));
 emit(CMP(dest, tmp, src_reg(0u), BRW_CONDITIONAL_EQ));
  } else {
 emit(CMP(dest, mcs, src_reg(0u), BRW_CONDITIONAL_EQ));
  }

Sadly the optimiser doesn't optimise out the extra instruction this
time. There's probably a better way to do it. On the other hand I can't
think why anyone would be using this function in the vec4 backend so
it's probably not worth worrying about.

I haven't tested this at all. I guess to test it you would have to write
a GS version of the Piglit test? I think that is the only place that
uses the vec4 backend on SKL.

If we don't want to risk this we could always just make it return false
when msaa_16 is set for the sampler. On the other hand if there is
currently no test for the 8x version either then we should probably
assume that neither of them work… maybe it wouldn't be so bad to just
always return false in the vec4 backend until we've tested it?

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Fix JIP to properly skip over unrelated control flow.

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 2:05 PM, Kenneth Graunke  wrote:
> We've apparently always been botching JIP for sequences such as:

Wrong since Dec 1 2015! Nice find.

>
>do
>cmp.f0.0 ...
>(+f0.0) break
>...
>if
>   ...
>else
>   ...
>endif
>...
>while
>
> Normally, UIP is supposed to point to the final destination of the jump,
> while in nested control flow, JIP is supposed to point to the end of the
> current nesting level.  It essentially bounces out of the current nested
> control flow, to an instruction that has a JIP which bounces out another
> level, and so on.
>
> In the above example, when setting JIP for the BREAK, we call
> brw_find_next_block_end(), which begins a search after the BREAK for the
> next ENDIF, ELSE, WHILE, or HALT.  It ignores the IF and finds the ELSE,
> setting JIP there.
>
> This makes no sense at all.  The break is supposed to skip over the
> whole if/else/endif block entirely.  They have a sibling relationship,
> not a nesting relationship.
>
> This patch fixes brw_find_next_block_end() to track depth as it does
> its search, and ignore anything not at depth 0.  So when it sees the
> IF, it ignores everything until after the ENDIF.  That way, it finds
> the end of the right block.
>
> Caught while debugging a tessellation shader - no apparent effect on
> Piglit.  I did look for actual applications that were affected, and
> found that GLBenchmark Manhattan had a BREAK with a bogus JIP.
>
> Cc: mesa-sta...@lists.freedesktop.org
> Signed-off-by: Kenneth Graunke 
> ---
>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index da1ddfd..e457fd2 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -2604,17 +2604,27 @@ brw_find_next_block_end(struct brw_codegen *p, int 
> start_offset)
> void *store = p->store;
> const struct brw_device_info *devinfo = p->devinfo;
>
> +   int depth = 0;
> +
> for (offset = next_offset(devinfo, store, start_offset);
>  offset < p->next_insn_offset;
>  offset = next_offset(devinfo, store, offset)) {
>brw_inst *insn = store + offset;
>
>switch (brw_inst_opcode(devinfo, insn)) {
> +  case BRW_OPCODE_IF:
> + depth++;
> + break;
>case BRW_OPCODE_ENDIF:
> + if (depth == 0)
> +return offset;
> + depth--;
> + break;
>case BRW_OPCODE_ELSE:
>case BRW_OPCODE_WHILE:
>case BRW_OPCODE_HALT:
> -return offset;
> + if (depth == 0)
> +return offset;


Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Jason Ekstrand
On Wed, Nov 18, 2015 at 9:29 PM, Jason Ekstrand  wrote:
>
> On Nov 18, 2015 5:02 PM, "Jason Ekstrand"  wrote:
>>
>> On Wed, Nov 18, 2015 at 4:06 PM, Kenneth Graunke 
>> wrote:
>> > On Wednesday, November 18, 2015 03:46:54 PM Ian Romanick wrote:
>> >> From: Ian Romanick 
>> >>
>> >> Signed-off-by: Ian Romanick 
>> >> ---
>> >>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  1 +
>> >>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 16 
>> >>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |  1 +
>> >>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 11 +++
>> >>  src/mesa/drivers/dri/i965/intel_extensions.c   |  1 +
>> >>  5 files changed, 30 insertions(+)
>> >>
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> index 1f71f66..4af1234 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> >> @@ -2550,6 +2550,7 @@ fs_visitor::nir_emit_texture(const fs_builder
>> >> , nir_tex_instr *instr)
>> >>   switch (instr->op) {
>> >>   case nir_texop_txf:
>> >>   case nir_texop_txf_ms:
>> >> + case nir_texop_samples_identical:
>> >>  coordinate = retype(src, BRW_REGISTER_TYPE_D);
>> >>  break;
>> >>   default:
>> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> index a7bd9ce..6688f6a 100644
>> >> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
>> >> @@ -259,6 +259,22 @@ fs_visitor::emit_texture(ir_texture_opcode op,
>> >>lod = fs_reg(0u);
>> >> }
>> >>
>> >> +   if (op == ir_samples_identical) {
>> >> +  fs_reg dst = vgrf(glsl_type::get_instance(dest_type->base_type,
>> >> 1, 1));
>> >> +
>> >> +  if (mcs.file == BRW_IMMEDIATE_VALUE) {
>> >> + fs_reg tmp = vgrf(glsl_type::uint_type);
>> >> +
>> >> + bld.MOV(tmp, mcs);
>> >> + bld.CMP(dst, tmp, src_reg(0u), BRW_CONDITIONAL_EQ);
>> >
>> > Seems a little strange to emit assembly to do the comparison when
>> > you've already determined that the value is a compile time constant.
>> >
>> > Why not just:
>> >
>> >bld.MOV(dst, fs_reg(mcs.ud == 0u ? ~0u : 0u));
>>
>> Actually, getting an immediate here means we don't have an MCS and we
>> have no idea of the samples are identical, so we should return false
>> always.
>>
>> >> +  } else {
>> >> + bld.CMP(dst, mcs, src_reg(0u), BRW_CONDITIONAL_EQ);
>>
>> We should also consider handling the clear color case.  In this case,
>> we'll get 0xff for 2x and 0x for 4x or 8x.  Do we know the
>> number of samples in the shader?  We should be able to get that from
>> the sampler or something but then we would have to pass that through
>> the key and that would get gross.
>
> First off, I realized that the numbers I have there are wrong. It's 0xff for
> 2x and 4x and 0x for 8x.  However, I also just realized that the 8
> 8-bit values you get for 8x MSAA range from 0 to 7 but take up 4 bits each.
> This means that no valid 8x MSAA MCS value can have 0xff as its bottom 8
> bits unless it's the clear color.  This means that a simple and with 0xff
> will get us a clear color check on all but 16x.  Unfortunately, 16x has a
> 64-bit MCS value and, unless the hardware provides is with some extra
> guarantees, 0xff would be valid in the bottom 8 bits.
>
> Going off into the world of speculation just a bit, can we make some
> assumptions about the hardware?  Suppose for a moment that the used a fairly
> greedy algorithm for determining which plane to store a value in:
>
> 1) If all samples are affected, store in slice zero
> 2) If not, store in the first available empty or completely overwritten
> slice.
>
> Such an algorithm would make sense and have the nice property of tending to
> pack the samples in the earlier slices this decreasing the possibility of
> ever touching slice 15.  This is good for cache locality.  It also has
> another property that would be very useful for us, namely that it only
> touches slice 15 if all 16 samples have different colors.  In particular, it
> would mean that you can never have two samples that both lie in slice 15
> and, more specifically, 0xff would also be invalid for 16x.
>
> Unfortunately, that's entirely based on my speculation as to how the
> hardware works.  Given that we don't actually know, it's not documented, and
> that we're not liable to ever find anyone willing to give us those kinds of
> details, we're not likely to find out without a very clever experiment.

So, chad has convinced my that my speculation is quite possibly bogus
and *very* hard to actually test, so just checking the bottom 8 bits
probably won't work.

Something else that came out of that 

[Mesa-dev] [PATCH] i965: Fix JIP to properly skip over unrelated control flow.

2015-11-19 Thread Kenneth Graunke
We've apparently always been botching JIP for sequences such as:

   do
   cmp.f0.0 ...
   (+f0.0) break
   ...
   if
  ...
   else
  ...
   endif
   ...
   while

Normally, UIP is supposed to point to the final destination of the jump,
while in nested control flow, JIP is supposed to point to the end of the
current nesting level.  It essentially bounces out of the current nested
control flow, to an instruction that has a JIP which bounces out another
level, and so on.

In the above example, when setting JIP for the BREAK, we call
brw_find_next_block_end(), which begins a search after the BREAK for the
next ENDIF, ELSE, WHILE, or HALT.  It ignores the IF and finds the ELSE,
setting JIP there.

This makes no sense at all.  The break is supposed to skip over the
whole if/else/endif block entirely.  They have a sibling relationship,
not a nesting relationship.

This patch fixes brw_find_next_block_end() to track depth as it does
its search, and ignore anything not at depth 0.  So when it sees the
IF, it ignores everything until after the ENDIF.  That way, it finds
the end of the right block.

Caught while debugging a tessellation shader - no apparent effect on
Piglit.  I did look for actual applications that were affected, and
found that GLBenchmark Manhattan had a BREAK with a bogus JIP.

Cc: mesa-sta...@lists.freedesktop.org
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index da1ddfd..e457fd2 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2604,17 +2604,27 @@ brw_find_next_block_end(struct brw_codegen *p, int 
start_offset)
void *store = p->store;
const struct brw_device_info *devinfo = p->devinfo;
 
+   int depth = 0;
+
for (offset = next_offset(devinfo, store, start_offset);
 offset < p->next_insn_offset;
 offset = next_offset(devinfo, store, offset)) {
   brw_inst *insn = store + offset;
 
   switch (brw_inst_opcode(devinfo, insn)) {
+  case BRW_OPCODE_IF:
+ depth++;
+ break;
   case BRW_OPCODE_ENDIF:
+ if (depth == 0)
+return offset;
+ depth--;
+ break;
   case BRW_OPCODE_ELSE:
   case BRW_OPCODE_WHILE:
   case BRW_OPCODE_HALT:
-return offset;
+ if (depth == 0)
+return offset;
   }
}
 
-- 
2.6.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Neil Roberts
Jason Ekstrand  writes:

> Something else that came out of that conversation is that, for 2x
> MSAA, we may get bogus data in all but the bottom 4 bits.  In other
> words, just blindly checking for zero is probably a bad idea.  It'll
> work because the extension spec lets us return false negatives, but it
> isn't a good idea in general.  If we really want the implementation to
> be solid, we need to mask off all but the bottom n * log2(n) bits
> where n = number of samples.

I just noticed that this optimisation is actually suggested in the PRM.
I'll paste it here (sorry for the noise if everyone was already aware of
this):

IVB vol4 part 1, 2.7.1

“A simple optimization with probable large return in performance is to
 compare the MCS value to zero (indicating all samples are on sample
 slice 0), and sample only from sample slice 0 using ld2dss if MCS is
 zero. Sample slice 0 is the pixel color in this case. If MCS is not
 zero, each sample is then obtained using ld2dms messages and the
 results are averaged in the kernel after being returned. Refer to the
 multisample storage format in the GPU Overview volume for more
 details.”

This at least implies that this is an expected use case and maybe is a
good hint that it might do the right thing and clear the extra bits for
the 2x case? Of course it would be good to be more sure. I can't find
the more details that it's referring to. (Why don't the specs have
hyperlinks…?)

Regards,
- Neil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] i965/gen8: Allow rendering to B8G8R8X8

2015-11-19 Thread Anuj Phogat
On Thu, Nov 19, 2015 at 7:25 AM, Neil Roberts  wrote:
> Since Gen8 this is allowed as a rendering target so we don't need to
> override it to B8G8R8A8. This is helpful on Gen9+ where using this
> override causes fast clears not to work.
> ---
>  src/mesa/drivers/dri/i965/brw_surface_formats.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
> b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> index 55e7e64..7c38431 100644
> --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
> +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
> @@ -167,8 +167,8 @@ const struct surface_format_info surface_formats[] = {
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   I32_FLOAT)
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   L32_FLOAT)
> SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   A32_FLOAT)
> -   SF( Y,  Y,  x,  Y,  x,  x,  x,  x, 60,   90,   B8G8R8X8_UNORM)
> -   SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
> +   SF( Y,  Y,  x,  Y, 80, 80,  x,  x, 60,   90,   B8G8R8X8_UNORM)
> +   SF( Y,  Y,  x,  x, 80, 80,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM_SRGB)
> SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R9G9B9E5_SHAREDEXP)
> @@ -670,9 +670,10 @@ brw_init_surface_formats(struct brw_context *brw)
>   * mask writes to alpha (ala glColorMask) and reconfigure the
>   * alpha blending hardware to use GL_ONE (or GL_ZERO) for
>   * cases where GL_DST_ALPHA (or GL_ONE_MINUS_DST_ALPHA) is
> - * used.
> + * used. On Gen8+ BGRX is actually allowed (but not RGBX).
>   */
> -render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
> + if (gen < tinfo->render_target)
> +render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
>  break;
>case BRW_SURFACEFORMAT_R8G8B8X8_UNORM:
>   render = BRW_SURFACEFORMAT_R8G8B8A8_UNORM;
> --
> 1.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Verified in the hardware specs.
Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92706] glBlitFramebuffer refuses to blit RGBA to RGB with MSAA

2015-11-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92706

--- Comment #7 from EoD  ---
OpenGL renderer string: Gallium 0.4 on AMD BARTS (DRM 2.43.0, LLVM 3.7.0)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.1.0-devel
(git-0cfc130)

After applying both patches, the patched piglit test passes just fine:

http://patchwork.freedesktop.org/patch/63670/
http://patchwork.freedesktop.org/patch/63672/

PIGLIT: {"result": "pass" }

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 0/4] V4: Improve GLSL support of GL_ARB_separate_shader_objects

2015-11-19 Thread Timothy Arceri
Hi Gregory,

My apologies I only just found this series, when you said your Mesa patchs I
thought you meant the ones in the bug comments.

It seems we have taken very different appoaches to the problem with inputs, I
do think my change here is simpler and my series also address outputs being
removed too. It would be great if you could test my series with PCSX2.

I will review and test out your patch 3 tomorrow, thanks for this and sorry
for the confusion on my side.

Tim

On Sun, 2015-10-25 at 15:01 +0100, Gregory Hainaut wrote:
> v4:
> Rebase against lastest master and fix bad variable name in assert.
> 
> A new test was developed to properly check commit 4 behavior.
> I ran most of the piglit test without regression.
> 
> v3:
> Squash old commit 1&2
> * Use a better name for the new attribute: always_active_io
> * Use ir_variable directly instead of ir_variable_refcount_visitor
> * Put related code in linker.cpp
> 
> Add 2 new commits to fix wrong interface matching in more complex case.
> Commit 3: avoid collision between user and linker slot assignment
> Commit 4: avoid unpredictable sorting of varying
> 
> Commit 1/2/3 fix the piglit test: arb_separate_shader_object
> -rendezvous_by_name posted on piglit ML
> Commit 4 was tested on the PCSX2 application.
> 
> Gregory Hainaut (4):
>   glsl IR: add always_active_io attribute to ir_variable
>   glsl IR: only allow optimization of interstage variable
>   glsl: avoid linker and user varying location to overlap
>   glsl: don't sort varying in separate shader mode
> 
>  src/glsl/ir.cpp|  1 +
>  src/glsl/ir.h  |  7 +
>  src/glsl/link_varyings.cpp | 76 ++-
> ---
>  src/glsl/linker.cpp| 73
> 
>  src/glsl/opt_dead_code.cpp | 18 +++
>  5 files changed, 169 insertions(+), 6 deletions(-)
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] i965/vec4: avoid dependency control around Align1 instructions

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
> From: Connor Abbott 
>
> It appears that not only math instructions, but also MOV_BYTES or
> any instruction that uses Align1 mode cannot be in the middle
> of a dependency control sequence or the GPU will hang (at least on my
> BDW). This fixes GPU hangs in some fp64 tests.

I'm pretty surprised by this assessment. Doubtful even.

> Reviewed-by: Iago Toral Quiroga 
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 3bcd5cb..bc0a33b 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction 
> *inst)
> }
>
> /*
> +* Instructions that use Align1 mode cause the GPU to hang when inserted
> +* between a NoDDClr and NoDDChk in Align16 mode. Discovered empirically.
> +*/
> +
> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
> +   inst->opcode == VEC4_OPCODE_MOV_BYTES ||

PACK_BYTES sets depctrl itself in the generator, and at the time I
added it I made a test that did

  vec4 foo = vec4(packUnorm4x8(...),
  packUnorm4x8(...),
  packUnorm4x8(...),
  packUnorm4x8(...))

and confirmed that it set depctrl properly on the whole sequence.
There could of course be bugs somewhere, but the "hardware doesn't
work if you mix align1 and align16 depctrl" seems unlikely.

Do you know of a test that this affects?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/9] i965/fs: don't propagate cmod when the exec sizes differ

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
> From: Connor Abbott 
>
> This can happen when the source of the compare was split by the SIMD
> lowering pass. Potentially, we could allow the case where the exec size
> of scan_inst is larger, and scan_inst has the right quarter selected,
> but doing that seems a little more risky.
>
> Reviewed-by: Iago Toral Quiroga 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
> index 8fdc959..93461f7 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
> @@ -93,6 +93,9 @@ opt_cmod_propagation_local(bblock_t *block)
>  scan_inst->dst.reg_offset != inst->src[0].reg_offset)
> break;
>
> +if (scan_inst->exec_size != inst->exec_size)
> +   break;

I'd combine this with the previous if (...) break block.

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] i965: fix 64-bit immediates in brw_inst(_set)_bits

2015-11-19 Thread Matt Turner
Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] i965/fs: add stride restrictions for copy propagation

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
> From: Connor Abbott 
>
> There are various restrictions on what the hstride can be that depend on
> the Gen, and now that we're using hstride == 2 for packing/unpacking
> doubles, we're going to run into these restrictions a lot more often.
> Pull them out into a separate function, and move the one restriction we
> checked previously into it.
> ---
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 58 
> +-
>  1 file changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index 426ea57..3ae5ead 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -275,6 +275,61 @@ is_logic_op(enum opcode opcode)
> opcode == BRW_OPCODE_NOT);
>  }
>
> +static bool
> +can_take_stride(fs_inst *inst, unsigned arg, unsigned stride,
> +const brw_device_info *devinfo)
> +{
> +   if (stride > 4)
> +  return false;
> +
> +   /* 3-source instructions can only be Align16, which restricts what strides
> +* they can take. They can only take a stride of 1 (the usual case), or 0
> +* with a special "repctrl" bit. But the repctrl bit doesn't work for
> +* 64-bit datatypes, so if the source type is 64-bit then only a stride of
> +* 1 is allowed. From the Broadwell PRM, Volume 7 "3D Media GPGPU", page
> +* 944:
> +*
> +*This is applicable to 32b datatypes and 16b datatype. 64b datatypes
> +*cannot use the replicate control.
> +*/
> +

Remove this blank line.

> +   if (inst->is_3src()) {
> +  if (type_sz(inst->src[arg].type) > 4)
> + return stride == 1;
> +  else
> + return stride == 1 || stride == 0;
> +   }
> +
> +   /* From the Broadwell PRM, Volume 2a "Command Reference - Instructions",
> +* page 391 ("Extended Math Function"):
> +*
> +* The following restrictions apply for align1 mode: Scalar source is
> +* supported. Source and destination horizontal stride must be the
> +* same.
> +*
> +* From the Haswell PRM Volume 2b "Command Reference - Instructions", page
> +* 134 ("Extended Math Function"):
> +*
> +*Scalar source is supported. Source and destination horizontal stride
> +*must be 1.
> +*
> +* and similar language exists for IVB and SNB. Pre-SNB, math instructions
> +* are sends, so the sources are moved to MRF's and there are no
> +* restrictions.
> +*/
> +

Remove this blank line.

> +   if (inst->is_math()) {
> +  if (devinfo->gen == 6 || devinfo->gen == 7) {
> + assert(inst->dst.stride == 1);
> + return stride == 1 || stride == 0;
> +  } else if (devinfo->gen >= 8) {
> + return stride == inst->dst.stride || stride == 0;
> +  }
> +   }
> +
> +   return true;
> +}
> +
>  bool
>  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>  {
> @@ -326,7 +381,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> acp_entry *entry)
> /* Bail if the result of composing both strides would exceed the
>  * hardware limit.
>  */
> -   if (entry->src.stride * inst->src[arg].stride > 4)
> +   if (!can_take_stride(inst, arg, entry->src.stride * inst->src[arg].stride,
> +devinfo))
>return false;
>
> /* Bail if the instruction type is larger than the execution type of the
> --

Reviewed-by: Matt Turner 

That all looks right to me and splitting it out into a separate
function is a good idea.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/x11/dri3: do not expose the preserved swap behavior (to be squashed)

2015-11-19 Thread Martin Peres

On 11/11/15 00:44, Eric Anholt wrote:

Martin Peres  writes:


The preserved swap behavior is currently untested in piglit and not supported
on the GLX side. Before working on implementing it for EGL/DRI3, let's
disable it until support comes.

This patch is trivial enough and will likely be squashed in the commit
egl/x11: Implement dri3 support with loader's dri3 helper

Signed-off-by: Martin Peres 

This looks good to me.  I don't think anybody really cares about
SWAP_BEHAVIOR_PRESERVED -- the buffer_age stuff was what you really
wanted all along.


Hey Eric and Martin,

It seems like KWin is relying on SWAP_BEHAVIOR_PRESERVED for its EGL 
backend. Should we add proper support for it in mesa or should we 
propose a patch for kwin to test for the extension to be present before 
using it?


Martin, what is your opinion on this since buffer age is what you need 
and you already have support for it for the glx and wayland backends?


Cheers,
Martin
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] st/va: properly set max number of reference frames

2015-11-19 Thread Christian König

On 19.11.2015 10:37, Julien Isorce wrote:

It fixes asserts like assert(templ->max_references <= 2) in
nvc0_video.c::nvc0_create_decodier
This patch also post-update the correct value for the number
of reference frames in the h264 case, see below.

In VA-API the max num ref is retrieved later in handlePictureParameterBuffer.
Unfortunately by this time the decoder has been already created
in vaCreateContextwhich which one does not have any max_references
param compared to VDPAU api.

In vdpau-driver they delay the decoder creation to endPicture.
This is not practicable with gallium. But that's ok our buffer
will be bigger but at least they will have enough space and
st/va will still write correct value for the number of reference
frames in the hw buffer.


Why is delaying the decoder creation till you know all the parameters 
not practical here?


Adjusting an internal parameter after the decoder was created sounds 
rather awkward to me.


Regards,
Christian.




Signed-off-by: Julien Isorce 
---
  src/gallium/state_trackers/va/context.c | 10 +++---
  src/gallium/state_trackers/va/picture.c |  1 +
  2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/gallium/state_trackers/va/context.c 
b/src/gallium/state_trackers/va/context.c
index ec9e048..b3dd293 100644
--- a/src/gallium/state_trackers/va/context.c
+++ b/src/gallium/state_trackers/va/context.c
@@ -248,10 +248,14 @@ vlVaCreateContext(VADriverContextP ctx, VAConfigID 
config_id, int picture_width,
templat.max_references = num_render_targets;
templat.expect_chunked_decode = true;
  
+  /* XXX HEVC ? */

if (u_reduce_video_profile(templat.profile) ==
-PIPE_VIDEO_FORMAT_MPEG4_AVC)
-templat.level = u_get_h264_level(templat.width, templat.height,
- _references);
+ PIPE_VIDEO_FORMAT_MPEG4_AVC) {
+ templat.level = u_get_h264_level(templat.width, templat.height,
+_references);
+  } else {
+ templat.max_references = 2;
+  }
  
context->decoder = drv->pipe->create_video_codec(drv->pipe, );

if (!context->decoder) {
diff --git a/src/gallium/state_trackers/va/picture.c 
b/src/gallium/state_trackers/va/picture.c
index 5e7841a..9d4d1a8 100644
--- a/src/gallium/state_trackers/va/picture.c
+++ b/src/gallium/state_trackers/va/picture.c
@@ -146,6 +146,7 @@ handlePictureParameterBuffer(vlVaDriver *drv, vlVaContext 
*context, vlVaBuffer *
/*bit_depth_luma_minus8*/
/*bit_depth_chroma_minus8*/
context->desc.h264.num_ref_frames = h264->num_ref_frames;
+  context->decoder->max_references = 
MIN2(context->desc.h264.num_ref_frames, 16);
/*chroma_format_idc*/
/*residual_colour_transform_flag*/
/*gaps_in_frame_num_value_allowed_flag*/


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: handle case where index is array deref in optimize_split_arrays

2015-11-19 Thread Tapani Pälli
Previously pass did not traverse to those array dereferences which were
used as indices to arrays. This fixes Synmark2 Gl42CSCloth application
issues.

Signed-off-by: Tapani Pälli 
---
 src/glsl/opt_array_splitting.cpp | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/glsl/opt_array_splitting.cpp b/src/glsl/opt_array_splitting.cpp
index 9e73f3c..89ce76b 100644
--- a/src/glsl/opt_array_splitting.cpp
+++ b/src/glsl/opt_array_splitting.cpp
@@ -188,6 +188,10 @@ 
ir_array_reference_visitor::visit_enter(ir_dereference_array *ir)
if (entry && !ir->array_index->as_constant())
   entry->split = false;
 
+   /* If the index is also array dereference, visit index. */
+   if (ir->array_index->as_dereference_array())
+  visit_enter(ir->array_index->as_dereference_array());
+
return visit_continue_with_parent;
 }
 
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/9] i965/fs: print non-1 strides when dumping instructions

2015-11-19 Thread Matt Turner
Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/9] i965/fs: respect force_sechalf/force_writemask_all in CSE

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
> From: Connor Abbott 
>
> Reviewed-by: Iago Toral Quiroga 
> ---

In fact, I think Ken ran into this very bug last week.

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] i965/fs: print writemask_all when it's enabled

2015-11-19 Thread Matt Turner
Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] nir: s/nir_type_unsigned/nir_type_uint

2015-11-19 Thread Matt Turner
On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga  wrote:
> From: Jason Ekstrand 
>
> v2: do the same in tgsi_to_nir (Samuel)
>
> v3: added missing cases after rebase (Iago)
>
> Reviewed-by: Iago Toral Quiroga 
> ---
>  src/gallium/auxiliary/nir/tgsi_to_nir.c|  2 +-
>  src/glsl/nir/glsl_to_nir.cpp   |  2 +-
>  src/glsl/nir/nir.h |  2 +-
>  src/glsl/nir/nir_constant_expressions.py   |  2 +-
>  src/glsl/nir/nir_opcodes.py| 78 
> +++---
>  src/glsl/nir/nir_search.c  |  4 +-
>  src/mesa/drivers/dri/i965/brw_nir.c|  4 +-
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp |  2 +-
>  8 files changed, 48 insertions(+), 48 deletions(-)
>
> diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c 
> b/src/gallium/auxiliary/nir/tgsi_to_nir.c
> index 0539cfc..8918240 100644
> --- a/src/gallium/auxiliary/nir/tgsi_to_nir.c
> +++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c
> @@ -295,7 +295,7 @@ ttn_emit_declaration(struct ttn_compile *c)
>   type = nir_type_int;
>   break;
>case TGSI_RETURN_TYPE_UINT:
> - type = nir_type_unsigned;
> + type = nir_type_uint;
>   break;
>case TGSI_RETURN_TYPE_FLOAT:
>default:
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index e149d73..3455b49 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -1826,7 +1826,7 @@ nir_visitor::visit(ir_texture *ir)
>instr->dest_type = nir_type_int;
>break;
> case GLSL_TYPE_UINT:
> -  instr->dest_type = nir_type_unsigned;
> +  instr->dest_type = nir_type_uint;
>break;
> default:
>unreachable("not reached");
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index e9d722e..7304517 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -633,7 +633,7 @@ typedef enum {
> nir_type_invalid = 0, /* Not a valid type */
> nir_type_float,
> nir_type_int,
> -   nir_type_unsigned,
> +   nir_type_uint,
> nir_type_bool
>  } nir_alu_type;
>
> diff --git a/src/glsl/nir/nir_constant_expressions.py 
> b/src/glsl/nir/nir_constant_expressions.py
> index 2ba8554..b16ef50 100644
> --- a/src/glsl/nir/nir_constant_expressions.py
> +++ b/src/glsl/nir/nir_constant_expressions.py
> @@ -213,7 +213,7 @@ unpack_half_1x16(uint16_t u)
>  }
>
>  /* Some typed vector structures to make things like src0.y work */
> -% for type in ["float", "int", "unsigned", "bool"]:
> +% for type in ["float", "int", "uint", "bool"]:
>  struct ${type}_vec {
> ${type} x;
> ${type} y;
> diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py
> index 729f695..8db24eb 100644
> --- a/src/glsl/nir/nir_opcodes.py
> +++ b/src/glsl/nir/nir_opcodes.py
> @@ -91,7 +91,7 @@ class Opcode(object):
>  tfloat = "float"
>  tint = "int"
>  tbool = "bool"
> -tunsigned = "unsigned"
> +tuint = "uint"
>
>  commutative = "commutative "
>  associative = "associative "
> @@ -156,7 +156,7 @@ unop("fsqrt", tfloat, "sqrtf(src0)")
>  unop("fexp2", tfloat, "exp2f(src0)")
>  unop("flog2", tfloat, "log2f(src0)")
>  unop_convert("f2i", tfloat, tint, "src0") # Float-to-integer conversion.
> -unop_convert("f2u", tfloat, tunsigned, "src0") # Float-to-unsigned conversion
> +unop_convert("f2u", tfloat, tuint, "src0") # Float-to-unsigned conversion
>  unop_convert("i2f", tint, tfloat, "src0") # Integer-to-float conversion.
>  # Float-to-boolean conversion
>  unop_convert("f2b", tfloat, tbool, "src0 != 0.0f")
> @@ -165,7 +165,7 @@ unop_convert("b2f", tbool, tfloat, "src0 ? 1.0f : 0.0f")
>  # Int-to-boolean conversion
>  unop_convert("i2b", tint, tbool, "src0 != 0")
>  unop_convert("b2i", tbool, tint, "src0 ? 1 : 0") # Boolean-to-int conversion
> -unop_convert("u2f", tunsigned, tfloat, "src0") #Unsigned-to-float conversion.
> +unop_convert("u2f", tuint, tfloat, "src0") #Unsigned-to-float conversion.

While you're modifying this line, could you put a space after the # ?

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/9] nir/builder: only read meaningful channels in nir_swizzle()

2015-11-19 Thread Matt Turner
Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/x11/dri3: do not expose the preserved swap behavior (to be squashed)

2015-11-19 Thread Martin Graesslin
On Thursday, November 19, 2015 1:54:22 PM CET Martin Peres wrote:
> On 11/11/15 00:44, Eric Anholt wrote:
> > Martin Peres  writes:
> >> The preserved swap behavior is currently untested in piglit and not
> >> supported on the GLX side. Before working on implementing it for
> >> EGL/DRI3, let's disable it until support comes.
> >> 
> >> This patch is trivial enough and will likely be squashed in the commit
> >> egl/x11: Implement dri3 support with loader's dri3 helper
> >> 
> >> Signed-off-by: Martin Peres 
> > 
> > This looks good to me.  I don't think anybody really cares about
> > SWAP_BEHAVIOR_PRESERVED -- the buffer_age stuff was what you really
> > wanted all along.
> 
> Hey Eric and Martin,
> 
> It seems like KWin is relying on SWAP_BEHAVIOR_PRESERVED for its EGL
> backend. Should we add proper support for it in mesa or should we
> propose a patch for kwin to test for the extension to be present before
> using it?
> 
> Martin, what is your opinion on this since buffer age is what you need
> and you already have support for it for the glx and wayland backends?

That's certainly a left over from before buffer age.  I don't know whether we 
can remove it unconditionally. Not every GLES harware which does support X11 
supports buffer age.

I assume the smartest thing to do is not ask for PRESERVED if buffer age is 
supported.

Cheers
Martin

signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl/x11/dri3: do not expose the preserved swap behavior (to be squashed)

2015-11-19 Thread Martin Peres

On 19/11/15 14:06, Martin Graesslin wrote:

On Thursday, November 19, 2015 1:54:22 PM CET Martin Peres wrote:

On 11/11/15 00:44, Eric Anholt wrote:

Martin Peres  writes:

The preserved swap behavior is currently untested in piglit and not
supported on the GLX side. Before working on implementing it for
EGL/DRI3, let's disable it until support comes.

This patch is trivial enough and will likely be squashed in the commit
egl/x11: Implement dri3 support with loader's dri3 helper

Signed-off-by: Martin Peres 

This looks good to me.  I don't think anybody really cares about
SWAP_BEHAVIOR_PRESERVED -- the buffer_age stuff was what you really
wanted all along.

Hey Eric and Martin,

It seems like KWin is relying on SWAP_BEHAVIOR_PRESERVED for its EGL
backend. Should we add proper support for it in mesa or should we
propose a patch for kwin to test for the extension to be present before
using it?

Martin, what is your opinion on this since buffer age is what you need
and you already have support for it for the glx and wayland backends?

That's certainly a left over from before buffer age.  I don't know whether we
can remove it unconditionally. Not every GLES harware which does support X11
supports buffer age.

I assume the smartest thing to do is not ask for PRESERVED if buffer age is
supported.


Sounds good! Checking for the availability of both buffer age and 
preserved, preferring buffer age but not forcing behaviour preserved if 
buffer age is missing (only enable it if the extension exists).


Thanks for your quick answer Martin G.
Martin P.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 10/42] i965: Calculate appropriate L3 partition weights for the current pipeline state.

2015-11-19 Thread Francisco Jerez
Kristian Høgsberg  writes:

> On Tue, Nov 17, 2015 at 9:54 PM, Jordan Justen
>  wrote:
>> From: Francisco Jerez 
>>
>> This calculates a rather conservative partitioning of the L3 cache
>> based on the shaders currently bound to the pipeline and whether they
>> use SLM, atomics, images or scratch space.  The result is intended to
>> be fine-tuned later on based on other pipeline state.
>> ---
>>  src/mesa/drivers/dri/i965/brw_compiler.h  |  1 +
>>  src/mesa/drivers/dri/i965/gen7_l3_state.c | 53 
>> +++
>>  2 files changed, 54 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
>> b/src/mesa/drivers/dri/i965/brw_compiler.h
>> index 8f147d3..ef8bddb 100644
>> --- a/src/mesa/drivers/dri/i965/brw_compiler.h
>> +++ b/src/mesa/drivers/dri/i965/brw_compiler.h
>> @@ -300,6 +300,7 @@ struct brw_stage_prog_data {
>>
>> unsigned curb_read_length;
>> unsigned total_scratch;
>> +   unsigned total_shared;
>>
>> /**
>>  * Register where the thread expects to find input data from the URB
>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c 
>> b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> index 4d0cfcd..1a88261 100644
>> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c
>> @@ -258,6 +258,59 @@ get_l3_config(const struct brw_device_info *devinfo, 
>> struct brw_l3_weights w0)
>>  }
>>
>>  /**
>> + * Return a reasonable default L3 configuration for the specified device 
>> based
>> + * on whether SLM and DC are required.  In the non-SLM non-DC case the 
>> result
>> + * is intended to approximately resemble the hardware defaults.
>> + */
>> +static struct brw_l3_weights
>> +get_default_l3_weights(const struct brw_device_info *devinfo,
>> +   bool needs_dc, bool needs_slm)
>> +{
>> +   struct brw_l3_weights w = {{ 0 }};
>> +
>> +   w.w[L3P_SLM] = needs_slm;
>> +   w.w[L3P_URB] = 1.0;
>> +
>> +   if (devinfo->gen >= 8) {
>> +  w.w[L3P_ALL] = 1.0;
>> +   } else {
>> +  w.w[L3P_DC] = needs_dc ? 0.1 : 0;
>> +  w.w[L3P_RO] = devinfo->is_baytrail ? 0.5 : 1.0;
>> +   }
>> +
>> +   return norm_l3_weights(w);
>> +}
>> +
>> +/**
>> + * Calculate the desired L3 partitioning based on the current state of the
>> + * pipeline.  For now this simply returns the conservative defaults 
>> calculated
>> + * by get_default_l3_weights(), but we could probably do better by gathering
>> + * more statistics from the pipeline state (e.g. guess of expected URB usage
>> + * and bound surfaces), or by using feed-back from performance counters.
>> + */
>> +static struct brw_l3_weights
>> +get_pipeline_state_l3_weights(const struct brw_context *brw)
>> +{
>> +   const struct brw_stage_state *stage_states[] = {
>> +  >vs.base, >gs.base, >wm.base, >cs.base
>> +   };
>> +   bool needs_dc = false, needs_slm = false;
>
> This doesn't seem optimal - we should evaluate the 3D pipe and the
> compute pipe separately depending on which  one is active. For
> example, if we have a current compute program that uses SLM, but are
> using the 3D pipeline, we'll get a partition that includes SLM even
> for the 3D pipe.
>
The intention of this patch is not to provide an optimal heuristic, but
to implement a simple heuristic that calculates conservative defaults in
order to guarantee functional correctness.  It would be possible to base
the result on the currently active pipeline with minimal changes to this
function (and making sure that the L3 config atom is invalidated while
switching pipelines), but I don't think we want to switch back and forth
between SLM and non-SLM configurations if the application interleaves
draw and compute operations, because the L3 partitioning is global
rather than per-pipeline and transitions are expensive -- Switching
between L3 configuration requires a full pipeline stall and flushing and
invalidation of all L3-backed caches, doing what you suggest would
likely lead to cache-thrashing and might prevent pipelining of compute
and render workloads [At least on BDW+ -- On Gen7 it might be impossible
to achieve pipelining of render and compute workloads due to hardware
issues].  If the result of the heuristic is based on the currently
active pipeline some sort of hysteresis will likely be desirable, which
in this patch I get for free by basing the calculation on the context
state rather than on the currently active pipeline.  I agree with you
that we'll probably need a more sophisticated heuristic in the future,
but that falls outside the scope of this series -- If anything we need
to be able to run benchmarks exercising CS and SLM before we can find
out which kind of heuristic we want and what degree of hysteresis is
desirable.

> Kristian
>
>> +   for (unsigned i = 0; i < ARRAY_SIZE(stage_states); i++) {
>> +  const struct gl_shader_program *prog =
>> + 

Re: [Mesa-dev] [PATCH 1/8] Import current draft of EXT_shader_samples_identical spec

2015-11-19 Thread Nicolai Hähnle

On 19.11.2015 00:46, Ian Romanick wrote:

From: Ian Romanick 

Signed-off-by: Ian Romanick 
Cc: "Chris Forbes" 
---
  docs/specs/EXT_shader_samples_identical.txt | 174 
  1 file changed, 174 insertions(+)
  create mode 100644 docs/specs/EXT_shader_samples_identical.txt

diff --git a/docs/specs/EXT_shader_samples_identical.txt 
b/docs/specs/EXT_shader_samples_identical.txt
new file mode 100644
index 000..bae6c73
--- /dev/null
+++ b/docs/specs/EXT_shader_samples_identical.txt
@@ -0,0 +1,174 @@
+Name
+
+EXT_shader_samples_identical
+
+Name Strings
+
+GL_EXT_shader_samples_identical
+
+Contact
+
+Ian Romanick, Intel (ian.d.romanick 'at' intel.com)
+
+Contributors
+
+Chris Forbes, Mesa
+Magnus Wendt, Intel
+Graham Sellers, AMD
+
+Status
+
+XXX - Not complete yet.
+
+Version
+
+Last Modified Date: November 18, 2015
+Revision: 5
+
+Number
+
+TBD
+
+Dependencies
+
+OpenGL 3.2, or OpenGL ES 3.1, or ARB_texture_multisample is required.
+
+This extension is written against the OpenGL 4.5 (Core Profile)
+Specification
+
+Overview
+
+Multisampled antialiasing has become a common method for improving the
+quality of rendered images.  Multisampling differs from supersampling in
+that the color of a primitive that covers all or part of a pixel is
+resolved once, regardless of the number of samples covered.  If a large
+polygon is rendered, the colors of all samples in each interior pixel will
+be the same.  This suggests a simple compression scheme that can reduce
+the necessary memory bandwidth requirements.  In one such scheme, each
+sample is stored in a separate slice of the multisample surface.  An
+additional multisample control surface (MCS) contains a mapping from pixel
+samples to slices.
+
+If all the values stored in the MCS for a particular pixel are the same,
+then all the samples have the same value.  Applications can take advantage
+of this information to reduce the bandwidth of reading multisample
+textures.  A custom multisample resolve filter could optimize resolving
+pixels where every sample is identical by reading the color once.
+
+color = texelFetch(sampler, coordinate, 0);
+if (!textureSamplesAllIdenticalEXT(sampler, coordinate)) {


textureSamplesIdenticalEXT


+for (int i = 1; i < MAX_SAMPLES; i++) {
+vec4 c = texelFetch(sampler, coordinate, i);
+
+//... accumulate c into color
+
+}
+}
+
+New Procedures and Functions
+
+None.
+
+New Tokens
+
+None.
+
+Additions to the OpenGL 4.5 (Core Profile) Specification
+
+None.
+
+Modifications to The OpenGL Shading Language Specification, Version 4.50.5
+
+Including the following line in a shader can be used to control the
+language features described in this extension:
+
+#extension GL_EXT_shader_samples_identical
+
+A new preprocessor #define is added to the OpenGL Shading Language:
+
+#define GL_EXT_shader_samples_identical
+
+Add to the table in section 8.7 "Texture Lookup Functions"
+
+Syntax:
+
+bool textureSamplesIdenticalEXT(gsampler2DMS sampler, ivec2 coord)
+
+bool textureSamplesIdenticalEXT(gsampler2DMSArray sampler,
+ivec3 coord)
+
+Description:
+
+Returns true if it can be determined that all samples within the texel
+of the multisample texture bound to  at  contain the
+same values or false if this cannot be determined."
+
+Additions to the AGL/EGL/GLX/WGL Specifications
+
+None
+
+Errors
+
+None
+
+New State
+
+None
+
+New Implementation Dependent State
+
+None
+
+Issues
+
+1) What should the new functions be called?
+
+RESOLVED: textureSamplesIdenticalEXT.  Initially
+textureAllSamplesIdenticalEXT was considered, but
+textureSamplesIdenticalEXT is more similar to the existing textureSamples
+function.
+
+2) It seems like applications could implement additional optimization if
+   they were provided with raw MCS data.  Should this extension also
+   provide that data?
+
+There are a number of challenges in providing raw MCS data.  The biggest
+problem being that the amount of MCS data depends on the number of
+samples, and that is not known at compile time.  Additionally, without new
+texelFetch functions, applications would have difficulty utilizing the
+information.
+
+Another option is to have a function that returns an array of tuples of
+sample number and count.  This also has difficulties with the maximum
+array size not being known at compile time.
+
+RESOLVED: Do not expose raw MCS data in this extension.
+
+3) Should this extension also extend SPIR-V?
+
+RESOLVED: Yes, but this has not yet been written.
+
+4) Is it possible for 

[Mesa-dev] SSO fixes

2015-11-19 Thread Timothy Arceri
Before implementing ARB_enhanced_layouts it makes sense to fix up some bugs in 
SSO first.

This is the first of two fixes I intend to send, the other fix will be for 
shaders with mixed explicit and non explicit locations for inputs and outputs.

This series fixes glsl bugs where inputs and outputs of SSOs 
are dead code eliminated when they shouldn't be.

I ran this series on Intel CI system with no regressions.

I haven't got any piglit tests for this yet but this fixes the bug and apitrace 
mentioned in patch 3 where a vertex shader input was being optimised away.

From the GL 4.5 spec Chapter 7.4.1

"With separable program objects, interfaces between shader stages may involve
the outputs from one program object and the inputs from a second program object.
For such interfaces, it is not possible to detect mismatches at link time, 
because the programs are linked separately.

When each such program is linked, all inputs or outputs interfacing with 
another program stage are treated as active.

 The linker will generate an executable that assumes the presence of a 
compatible program on the other side of the interface. If a mismatch between 
programs occurs, no GL error is generated, but some or all of the inputs on the 
interface will be undefined."


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/8] Implement EXT_shader_samples_identical

2015-11-19 Thread Emil Velikov
On 19 November 2015 at 14:53, Ilia Mirkin  wrote:
> On Thu, Nov 19, 2015 at 9:47 AM, Emil Velikov  
> wrote:
>> Hi Ian,
>>
>> Attempting to high-jack the thread :-P
>>
>> On 18 November 2015 at 23:46, Ian Romanick  wrote:
>>
>>> I really wanted to get this in the next Mesa release.  For some reason,
>>> I thought the branch point was after Thanksgiving (which is next
>>> Thursday).  Ken reminded me yesterday that the branch point is actually
>>> this Friday. :( As a result, I'm sending it out today to get review as
>>> soon as possible.
>>>
>> Any suggestions on how we make dates more prominent/obvious ? I was
>> going to say "we even have it in #irc-devel channel topic" but you
>> don't seem to be around these days.
>>
>> Alternatively you can set a filter in your email client - the subject
>> is always "Mesa #VERSION release plan"
>
> A common thing is to have a $project-announce mailing list which would
> receive such emails along with actual release notifications.
>
Good idea - we have mesa-announce so might as well use it :P
It is kind of strange that we do not use it (be that myself or
Ian/Carl priorly) for these things.

Will do from now on. Thanks !
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] i965/fs: print writemask_all when it's enabled

2015-11-19 Thread Connor Abbott
I think Ken already pushed a similar patch so we can drop this.

On Thu, Nov 19, 2015 at 5:05 AM, Iago Toral Quiroga  wrote:
> From: Connor Abbott 
>
> Reviewed-by: Iago Toral Quiroga 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 2d4ed6a..63dcba7 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4787,6 +4787,9 @@ fs_visitor::dump_instruction(backend_instruction 
> *be_inst, FILE *file)
>   fprintf(file, "1sthalf ");
> }
>
> +   if (inst->force_writemask_all)
> +  fprintf(file, "WE_all ");
> +
> fprintf(file, "\n");
>  }
>
> --
> 1.9.1
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] i965/gen9: Don't disallow fast clear for MSRT formats matching render

2015-11-19 Thread Neil Roberts
Previously fast clear was disallowed on Gen9 for MSRTs with the claim
that some formats don't work but we didn't understand why. On further
investigation it seems the formats that don't work are the ones where
the render surface format is being overriden to a different format
than the one used for texturing. The one used for texturing is not
actually a renderable format. It arguably makes sense that the sampler
hardware doesn't handle the fast color correctly in these cases
because it shouldn't be possible to end up with a fast cleared surface
that is non-renderable.

This patch changes the limitation to prevent fast clear for surfaces
where the format for rendering is overriden.
---
 src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c 
b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 1f8bfdf..f2e894a 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -48,6 +48,7 @@
 #include "brw_defines.h"
 #include "brw_context.h"
 #include "brw_draw.h"
+#include "brw_state.h"
 #include "intel_fbo.h"
 #include "intel_batchbuffer.h"
 
@@ -549,11 +550,17 @@ brw_meta_fast_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
   if (brw->gen < 7)
  clear_type = REP_CLEAR;
 
-  /* Certain formats have unresolved issues with sampling from the MCS
-   * buffer on Gen9. This disables fast clears altogether for MSRTs until
-   * we can figure out what's going on.
+  /* If we're mapping the render format to a different format than the
+   * format we use for texturing then it is a bit questionable whether it
+   * should be possible to use a fast clear. Although we only actually
+   * render using a renderable format, without the override workaround it
+   * wouldn't be possible to have a non-renderable surface in a fast clear
+   * state so the hardware probably legitimately doesn't need to support
+   * this case. At least on Gen9 this really does seem to cause problems.
*/
-  if (brw->gen >= 9 && irb->mt->num_samples > 1)
+  if (brw->gen >= 9 &&
+  brw_format_for_mesa_format(irb->mt->format) !=
+  brw->render_target_format[irb->mt->format])
  clear_type = REP_CLEAR;
 
   if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] i965/gen8: Allow rendering to B8G8R8X8

2015-11-19 Thread Neil Roberts
Since Gen8 this is allowed as a rendering target so we don't need to
override it to B8G8R8A8. This is helpful on Gen9+ where using this
override causes fast clears not to work.
---
 src/mesa/drivers/dri/i965/brw_surface_formats.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
b/src/mesa/drivers/dri/i965/brw_surface_formats.c
index 55e7e64..7c38431 100644
--- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
+++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
@@ -167,8 +167,8 @@ const struct surface_format_info surface_formats[] = {
SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   I32_FLOAT)
SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   L32_FLOAT)
SF( Y, 50,  Y,  x,  x,  x,  x,  x,  x,x,   A32_FLOAT)
-   SF( Y,  Y,  x,  Y,  x,  x,  x,  x, 60,   90,   B8G8R8X8_UNORM)
-   SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
+   SF( Y,  Y,  x,  Y, 80, 80,  x,  x, 60,   90,   B8G8R8X8_UNORM)
+   SF( Y,  Y,  x,  x, 80, 80,  x,  x,  x,x,   B8G8R8X8_UNORM_SRGB)
SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM)
SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R8G8B8X8_UNORM_SRGB)
SF( Y,  Y,  x,  x,  x,  x,  x,  x,  x,x,   R9G9B9E5_SHAREDEXP)
@@ -670,9 +670,10 @@ brw_init_surface_formats(struct brw_context *brw)
  * mask writes to alpha (ala glColorMask) and reconfigure the
  * alpha blending hardware to use GL_ONE (or GL_ZERO) for
  * cases where GL_DST_ALPHA (or GL_ONE_MINUS_DST_ALPHA) is
- * used.
+ * used. On Gen8+ BGRX is actually allowed (but not RGBX).
  */
-render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
+ if (gen < tinfo->render_target)
+render = BRW_SURFACEFORMAT_B8G8R8A8_UNORM;
 break;
   case BRW_SURFACEFORMAT_R8G8B8X8_UNORM:
  render = BRW_SURFACEFORMAT_R8G8B8A8_UNORM;
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] i965: Check base format to determine whether to use tiled memcpy

2015-11-19 Thread Neil Roberts
The tiled memcpy doesn't work for copying from RGBX to RGBA because it
doesn't override the alpha component to 1.0. Commit 2cebaac479d4 added
a check to disable it for RGBX formats by looking at the TexFormat.
However a lot of the rest of the code base is written with the
assumption that an RGBA texture can be used internally to implement a
GL_RGB texture. If that is done then this check breaks. This patch
makes it instead check the base format of the texture which I think
more directly matches the intention.

Cc: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/intel_pixel_read.c | 7 ---
 src/mesa/drivers/dri/i965/intel_tex_image.c  | 7 ---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c 
b/src/mesa/drivers/dri/i965/intel_pixel_read.c
index 9bcbbd1..c8aef65 100644
--- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
+++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
@@ -135,10 +135,11 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx,
   return false;
 
/* We can't handle copying from RGBX or BGRX because the tiled_memcpy
-* function doesn't set the last channel to 1.
+* function doesn't set the last channel to 1. Note this checks BaseFormat
+* rather than TexFormat in case the RGBX format is being simulated with an
+* RGBA format.
 */
-   if (rb->Format == MESA_FORMAT_B8G8R8X8_UNORM ||
-   rb->Format == MESA_FORMAT_R8G8B8X8_UNORM)
+   if (rb->_BaseFormat == GL_RGB)
   return false;
 
if (!intel_get_memcpy(rb->Format, format, type, _copy, ,
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 34b91e8..e3710da7 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -399,10 +399,11 @@ intel_gettexsubimage_tiled_memcpy(struct gl_context *ctx,
   return false;
 
/* We can't handle copying from RGBX or BGRX because the tiled_memcpy
-* function doesn't set the last channel to 1.
+* function doesn't set the last channel to 1. Note this checks BaseFormat
+* rather than TexFormat in case the RGBX format is being simulated with an
+* RGBA format.
 */
-   if (texImage->TexFormat == MESA_FORMAT_B8G8R8X8_UNORM ||
-   texImage->TexFormat == MESA_FORMAT_R8G8B8X8_UNORM)
+   if (texImage->_BaseFormat == GL_RGB)
   return false;
 
if (!intel_get_memcpy(texImage->TexFormat, format, type, _copy, ,
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] i965/gen9: Don't allow the RGBX formats for texturing/rendering

2015-11-19 Thread Neil Roberts
The RGBX surface formats aren't renderable so we internally remap them
to RGBA when rendering. They are retained as RGBX when used as
textures. However since the previous patch fast clears are disabled
for surfaces that use a different format for rendering than for
texturing. To avoid this situation we can just pretend not to support
RGBX formats at all. This will cause the upper layers of mesa to pick
an RGBA format internally instead. This should be safe because we
always override the alpha component to 1.0 for RGBX in the texture
swizzle anyway. We could also do this for all gens except that it's a
bit more difficult when the hardware doesn't support texture
swizzling. Gens using the blorp have further problems because that
doesn't implement this swizzle override.
---
 src/mesa/drivers/dri/i965/brw_surface_formats.c | 28 +
 1 file changed, 28 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c 
b/src/mesa/drivers/dri/i965/brw_surface_formats.c
index 7c38431..7594aca 100644
--- a/src/mesa/drivers/dri/i965/brw_surface_formats.c
+++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c
@@ -733,6 +733,34 @@ brw_init_surface_formats(struct brw_context *brw)
if (brw->gen >= 8)
   ctx->TextureFormatSupported[MESA_FORMAT_Z_UNORM16] = true;
 
+   /* The RGBX formats are not renderable. Normally these get mapped
+* internally to RGBA formats when rendering. However on Gen9+ when this
+* internal override is used fast clears don't work so they are disabled in
+* brw_meta_fast_clear. To avoid this problem we can just pretend not to
+* support RGBX formats at all. This will cause the upper layers of Mesa to
+* pick the RGBA formats instead. This works fine because when it is used
+* as a texture source the swizzle state is programmed to force the alpha
+* channel to 1.0 anyway. We could also do this for all gens except that
+* it's a bit more difficult when the hardware doesn't support texture
+* swizzling. Gens using the blorp have further problems because that
+* doesn't implement this swizzle override. We don't need to do this for
+* BGRX because that actually is supported natively on Gen8+.
+*/
+   if (brw->gen >= 9) {
+  static const mesa_format rgbx_formats[] = {
+ MESA_FORMAT_R8G8B8X8_UNORM,
+ MESA_FORMAT_R8G8B8X8_SRGB,
+ MESA_FORMAT_RGBX_UNORM16,
+ MESA_FORMAT_RGBX_FLOAT16,
+ MESA_FORMAT_RGBX_FLOAT32
+  };
+
+  for (int i = 0; i < ARRAY_SIZE(rgbx_formats); i++) {
+ ctx->TextureFormatSupported[rgbx_formats[i]] = false;
+ brw->format_supported_as_render_target[rgbx_formats[i]] = false;
+  }
+   }
+
/* On hardware that lacks support for ETC1, we map ETC1 to RGBX
 * during glCompressedTexImage2D(). See intel_mipmap_tree::wraps_etc1.
 */
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] blit: Don't take into account the Mesa format when checking MSRT blit

2015-11-19 Thread Neil Roberts
According to the GLES3 spec, blitting between multisample FBOs with
different internal formats should not be allowed. The
compatible_resolve_formats function implements this check. Previously
it had a shortcut where if the Mesa formats of the two renderbuffers
were the same then it would assume the blit is ok. However some
drivers map different internal formats to the same Mesa format, for
example it might implement both GL_RGB and GL_RGBA textures with
MESA_FORMAT_R8G8B8A_UNORM. The function is used to generate a GL error
according to what the GL spec requires so the blit should not be
allowed in that case. This patch just removes the shortcut so that it
only ever looks at the internal format.

Note that I posted a related patch to disable this check altogether
for desktop GL. However this function is still used on GLES3 because
there are conformance tests that require this behaviour so this patch
is still useful.

Cc: Marek Olšák 
---
 src/mesa/main/blit.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/src/mesa/main/blit.c b/src/mesa/main/blit.c
index a32f1a4..deb4f5a 100644
--- a/src/mesa/main/blit.c
+++ b/src/mesa/main/blit.c
@@ -129,20 +129,22 @@ compatible_resolve_formats(const struct gl_renderbuffer 
*readRb,
 {
GLenum readFormat, drawFormat;
 
-   /* The simple case where we know the backing Mesa formats are the same.
-*/
-   if (_mesa_get_srgb_format_linear(readRb->Format) ==
-   _mesa_get_srgb_format_linear(drawRb->Format)) {
-  return GL_TRUE;
-   }
-
-   /* The Mesa formats are different, so we must check whether the internal
-* formats are compatible.
+   /* This checks whether the internal formats are compatible rather than the
+* Mesa format for two reasons:
+*
+* • Under some circumstances, the user may request e.g. two GL_RGBA8
+*   textures and get two entirely different Mesa formats like RGBA and
+*   ARGB. Drivers behaving like that should be able to cope with
+*   non-matching formats by themselves, because it's not the user's fault.
+*
+* • Picking two different internal formats can end up with the same Mesa
+*   format. For example the driver might be simulating GL_RGB textures
+*   with GL_RGBA internally and in that case both internal formats would
+*   end up with RGBA.
 *
-* Under some circumstances, the user may request e.g. two GL_RGBA8
-* textures and get two entirely different Mesa formats like RGBA and
-* ARGB. Drivers behaving like that should be able to cope with
-* non-matching formats by themselves, because it's not the user's fault.
+* This function is used to generate a GL error according to the spec so in
+* both cases we want to be looking at the appliation-level format, which
+* is InternalFormat.
 *
 * Blits between linear and sRGB formats are also allowed.
 */
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/llvm: Use llvm.AMDIL.exp intrinsic again for now

2015-11-19 Thread Michel Dänzer
On 20.11.2015 02:11, Nicolai Hähnle wrote:
> On 19.11.2015 17:37, Tom Stellard wrote:
>> On Thu, Nov 19, 2015 at 11:17:12AM +0100, Nicolai Hähnle wrote:
>>> On 19.11.2015 03:55, Tom Stellard wrote:
 On Thu, Nov 19, 2015 at 11:31:55AM +0900, Michel Dänzer wrote:
> From: Michel Dänzer 
>
> llvm.exp2.f32 doesn't work in some cases yet.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709
> Signed-off-by: Michel Dänzer 
> ---
>
> Once the problem is fixed in the LLVM AMDGPU backend, we can re-enable
> llvm.exp2.f32 for the fixed LLVM version.
>
>   src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> index ac99e73..c94f109 100644
> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c
> @@ -1539,7 +1539,7 @@ void radeon_llvm_context_init(struct
> radeon_llvm_context * ctx)
>   bld_base->op_actions[TGSI_OPCODE_ENDIF].emit = endif_emit;
>   bld_base->op_actions[TGSI_OPCODE_ENDLOOP].emit = endloop_emit;
>   bld_base->op_actions[TGSI_OPCODE_EX2].emit =
> build_tgsi_intrinsic_nomem;
> -bld_base->op_actions[TGSI_OPCODE_EX2].intr_name =
> "llvm.exp2.f32";
> +bld_base->op_actions[TGSI_OPCODE_EX2].intr_name =
> "llvm.AMDIL.exp.";

 Do we want a native instruction here, or do we want IEEE precise exp2?
 If it's the former then we shouldn't be using llvm.exp2.f32 anyway.

 I know that we need to use llvm.AMDIL.exp. for older LLVM, but for
 newer
 LLVM, I would really like to start doing intrinsics the correct
 way.  In
 this case, it means adding an llvm.amdgcn.exp.f32 intrinsic to
 include/llvm/IR/IntrinsicsAMDGPU.td.  In the section with the amdgcn
 TargetPrefix.
>>>
>>> Doesn't AMDGPU currently emit native instructions for the various
>>> math intrinsics anyway? If your plan is to change that and we add
>>> our own intrinsics, what's the plan to still benefit from existing
>>> LLVM optimization passes?
>>>
>>
>> AMDGPU does emit native instruction for the math intrinsics, but this is
>> wrong in most cases, because it assumes that a less precise
>> calculation is
>> acceptable when this is not always the case.
> 
> Okay, that makes sense.
> 
>> This a good point about optimizations.  I think the main benefits we
>> get from
>> using the LLVM intrinsic are constant folding and also speculative
>> execution.
>>
>> An alternate solution would be to use the llvm intrinsic, but then
>> communicate
>> to the backend via a function attribute that it is OK to use the less
>> precise
>> version of exp.
> 
> I like that alternate solution.
> 
>>> FWIW, the original bug that Michel addresses here is caused by LLVM
>>> libcall optimizations replacing the exp2 _intrinsic_ by a ldexp
>>> _libcall_, which AMDGPU codegen cannot deal with. I suggested to
>>> address this with http://reviews.llvm.org/D14327. Could you take a
>>> look at that?
>>>
>>
>> Please take a look at the comment I added to this review.
> 
> Done :)

Meanwhile, can I get an R-b for this patch to fix the immediate
regression in Mesa Git until LLVM is fixed (and for released versions of
LLVM)?


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/8] Implement EXT_shader_samples_identical

2015-11-19 Thread Ian Romanick
On 11/19/2015 06:47 AM, Emil Velikov wrote:
> Hi Ian,
> 
> Attempting to high-jack the thread :-P
> 
> On 18 November 2015 at 23:46, Ian Romanick  wrote:
> 
>> I really wanted to get this in the next Mesa release.  For some reason,
>> I thought the branch point was after Thanksgiving (which is next
>> Thursday).  Ken reminded me yesterday that the branch point is actually
>> this Friday. :( As a result, I'm sending it out today to get review as
>> soon as possible.
>>
> Any suggestions on how we make dates more prominent/obvious ? I was
> going to say "we even have it in #irc-devel channel topic" but you
> don't seem to be around these days.

VPN problems have made it basically impossible for me to be on IRC.
It's a long, annoying story.  It's one of those "too busy to fix it"
things too.  Ugh.

> Alternatively you can set a filter in your email client - the subject
> is always "Mesa #VERSION release plan"

All my mesa-dev email goes in one folder, and it's really easy for
single messages to get lost between the 40 patch series.  I like Ilia's
suggestion of using mesa-announce.  That just dumps in my main inbox,
and I'm less likely to miss it there.

One other thing... what time are you planning to make the branch
tomorrow? :)

> Thanks
> Emil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 13/14] glsl: add subroutine index qualifier support

2015-11-19 Thread Timothy Arceri
Hi Dave/Tapani,

This is the last patch in this series unreviewed is either of you able to
review is as you guys implemented the respective extensions.

Thanks,
Tim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 13/14] glsl: add subroutine index qualifier support

2015-11-19 Thread Timothy Arceri
On Fri, 2015-11-20 at 14:25 +1100, Timothy Arceri wrote:
> Hi Dave/Tapani,
> 
> This is the last patch in this series unreviewed is either of you able to
> review is as you guys implemented the respective extensions.

Forgot to say that there are piglit compile tests in master under
 ARB_explicit_uniform_location

and an unreviewed patch that test the correct values are returned by progrm
query

http://patchwork.freedesktop.org/patch/63795/

> 
> Thanks,
> Tim
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 8/8 v2] i965: Enable EXT_shader_samples_identical

2015-11-19 Thread Ian Romanick
On 11/19/2015 02:13 PM, Neil Roberts wrote:
> Ian Romanick  writes:
> 
>> Am I correct that nothing special is needed in the vec4 backend? It
>> seems like mcs should know the register size, and the CMP with 0
>> should just do the right thing.
> 
> I think you probably will have to do something for the vec4 backend.
> Currently it generates an instruction like this:
> 
> cmp.z.f0(8) g8<1>.xUD   g9<4,4,1>.xUD   0xUD
> 
> g9 contains the MCS data. If I understand correctly for 16x MSAA the
> second half of the MCS data is in the y component which is currently
> ignored by this instruction. Maybe something like this would work:
> 
>   if (mcs.file == BRW_IMMEDIATE_VALUE) {
>  emit(MOV(dest, src_reg(0u)));
>   } else if ((key_tex->msaa_16 & (1 << sampler))) {
>  dst_reg tmp(this, glsl_type::uint_type);
>  dst_reg swizzled_mcs = mcs;
>  swizzled_mcs.swizzle = BRW_SWIZZLE_;
>  emit(OR(tmp, mcs, swizzled_mcs));
>  emit(CMP(dest, tmp, src_reg(0u), BRW_CONDITIONAL_EQ));
>   } else {
>  emit(CMP(dest, mcs, src_reg(0u), BRW_CONDITIONAL_EQ));
>   }
> 
> Sadly the optimiser doesn't optimise out the extra instruction this
> time. There's probably a better way to do it. On the other hand I can't
> think why anyone would be using this function in the vec4 backend so
> it's probably not worth worrying about.
> 
> I haven't tested this at all. I guess to test it you would have to write
> a GS version of the Piglit test? I think that is the only place that
> uses the vec4 backend on SKL.
> 
> If we don't want to risk this we could always just make it return false
> when msaa_16 is set for the sampler. On the other hand if there is

As far as I could tell, there is no msaa_16 for vec4.  It looks like it
always takes the equivalent of the FS msaa_16 path when gen >= 9.

> currently no test for the 8x version either then we should probably
> assume that neither of them work… maybe it wouldn't be so bad to just
> always return false in the vec4 backend until we've tested it?

Hmm... yeah, I'll do that.  It's an optimization, and I don't think
anyone will miss it in vec4 on any platform.  That way I don't have to
worry about applying a bunch of fixes to the 11.1 branch after I create
tests.  We can either implement it ("fix") in 11.1 or 11.2.

> Regards,
> - Neil

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 2/2] radeonsi/compute: Use the compiler's COMPUTE_PGM_RSRC* register values

2015-11-19 Thread Michel Dänzer
On 20.11.2015 05:53, Emil Velikov wrote:
> On 19 November 2015 at 20:37, Tom Stellard  wrote:
>> On Wed, Nov 18, 2015 at 05:43:31PM +, Emil Velikov wrote:
>>> Hi Tom,
>>>
>>> Please flip the order of the patches and drop the now patch 1/2 from
>>> the stable queue.
>>>
>>
>> I'm confused about why I need to flip the order of the patches.
>>
> As mentioned on IRC - patch 1 is not a bugfix and despite how trivial
> I'd rather have only bugfixes in stable.
> Is flipping things too much to ask ?

It would make the current patch 2 a bit ugly, because it would use the
ls_rsrc field even for non-LS shaders.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] automake: loader: don't create an empty dri3 helper

2015-11-19 Thread Vinson Lee
On Thu, Nov 19, 2015 at 9:36 PM, Vinson Lee  wrote:
> On Thu, Nov 19, 2015 at 8:32 AM, Emil Velikov  
> wrote:
>> On 19 November 2015 at 16:33, Emil Velikov  wrote:
>>> From: Emil Velikov 
>>>
>>> Seems that creating an empty one does not fair too well with MacOSX's
>>> ar. Considering that all the users of the helper include it only when
>>> needed, let's reshuffle the makefile.
>>>
>> Forgot to mention - the cause is guesswork, as I'm not that intimate
>> familiar with different AR implementations. Yet this seems the most
>> likely one.
>>
>> Vinson, can you test this ?
>>
>> -Emil
>
> This patch fixed my Mac OS X build error.
>
> Tested-by: Vinson Lee 

Fixed a typo in Tested-by line.

Tested-by: Vinson Lee 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >