Re: [Mesa-dev] [PATCH 0/2] fix load of unpacked double vector input varyings

2016-06-01 Thread Samuel Iglesias Gonsálvez


On 02/06/16 07:43, Timothy Arceri wrote:
> On Thu, 2016-06-02 at 07:22 +0200, Samuel Iglesias Gonsálvez wrote:
>> On 26/05/16 07:56, Samuel Iglesias Gonsálvez wrote:
>>>
>>> Hello,
>>>
>>> Timothy found that tests with unpacked double vector input varyings
>>> were failing in i965 driver. For example, this is happening when
>>> using explicit locations because Mesa disables varying packing for
>>> that case.
>>>
>>> These patches fix the following piglit test:
>>>
>>> spec/arb_gpu_shader_fp64/execution/vs-fs-explicit-locations
>>>
>>> Samuel Iglesias Gonsálvez (2):
>>>   i965/fs: fix offset when loading double vector input varyings
>>>   i965/fs: fix FS_OPCODE_CINTERP for unpacked double input varyings
>>>
>> These patches have a Tested-by (Thanks Timothy!) but they are still
>> unreviewed. Can someone take a look at them?
>>
>> Also, this patch [0] is still unreviewed.
> 
> If no one else takes a look at them over the next few days feel free to
> add my r-b to all three for what its worth they look ok to me.
> 
> Also don't forget to add tags for 12.0 stable before pushing.
> 

Sure :)

Thanks a lot!

Sam

> 
>>
>> Thanks,
>>
>> Sam
>>
>> [0] https://lists.freedesktop.org/archives/mesa-dev/2016-May/118759.h
>> tml
>>
>>
>>>
>>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 -
>>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 22
>>> +-
>>>  2 files changed, 33 insertions(+), 2 deletions(-)
>>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] fix load of unpacked double vector input varyings

2016-06-01 Thread Timothy Arceri
On Thu, 2016-06-02 at 07:22 +0200, Samuel Iglesias Gonsálvez wrote:
> On 26/05/16 07:56, Samuel Iglesias Gonsálvez wrote:
> > 
> > Hello,
> > 
> > Timothy found that tests with unpacked double vector input varyings
> > were failing in i965 driver. For example, this is happening when
> > using explicit locations because Mesa disables varying packing for
> > that case.
> > 
> > These patches fix the following piglit test:
> > 
> > spec/arb_gpu_shader_fp64/execution/vs-fs-explicit-locations
> > 
> > Samuel Iglesias Gonsálvez (2):
> >   i965/fs: fix offset when loading double vector input varyings
> >   i965/fs: fix FS_OPCODE_CINTERP for unpacked double input varyings
> > 
> These patches have a Tested-by (Thanks Timothy!) but they are still
> unreviewed. Can someone take a look at them?
> 
> Also, this patch [0] is still unreviewed.

If no one else takes a look at them over the next few days feel free to
add my r-b to all three for what its worth they look ok to me.

Also don't forget to add tags for 12.0 stable before pushing.


> 
> Thanks,
> 
> Sam
> 
> [0] https://lists.freedesktop.org/archives/mesa-dev/2016-May/118759.h
> tml
> 
> 
> > 
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 -
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 22
> > +-
> >  2 files changed, 33 insertions(+), 2 deletions(-)
> > 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Add missing types to type_sz().

2016-06-01 Thread Francisco Jerez
Matt Turner  writes:

> Coverity warns in multiple places about the potential for division by
> zero, caused by this function's default case.
>
> Cc: Francisco Jerez 

Reviewed-by: Francisco Jerez 

> ---
>  src/mesa/drivers/dri/i965/brw_reg.h | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
> b/src/mesa/drivers/dri/i965/brw_reg.h
> index b0ef94e..be23678 100644
> --- a/src/mesa/drivers/dri/i965/brw_reg.h
> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
> @@ -292,15 +292,19 @@ type_sz(unsigned type)
> case BRW_REGISTER_TYPE_UD:
> case BRW_REGISTER_TYPE_D:
> case BRW_REGISTER_TYPE_F:
> +   case BRW_REGISTER_TYPE_VF:
>return 4;
> case BRW_REGISTER_TYPE_UW:
> case BRW_REGISTER_TYPE_W:
> +   case BRW_REGISTER_TYPE_UV:
> +   case BRW_REGISTER_TYPE_V:
> +   case BRW_REGISTER_TYPE_HF:
>return 2;
> case BRW_REGISTER_TYPE_UB:
> case BRW_REGISTER_TYPE_B:
>return 1;
> default:
> -  return 0;
> +  unreachable("not reached");
> }
>  }
>  
> -- 
> 2.7.3


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: stop allocating memory for SSBOs and builtins

2016-06-01 Thread Timothy Arceri
---
 src/compiler/glsl/link_uniforms.cpp | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/src/compiler/glsl/link_uniforms.cpp 
b/src/compiler/glsl/link_uniforms.cpp
index a7f136c..571c49f 100644
--- a/src/compiler/glsl/link_uniforms.cpp
+++ b/src/compiler/glsl/link_uniforms.cpp
@@ -402,7 +402,9 @@ private:
* uniforms.
*/
   this->num_active_uniforms++;
-  this->num_values += values;
+
+  if(!is_gl_identifier(name) && !is_shader_storage)
+ this->num_values += values;
}
 
struct string_to_uint_map *hidden_map;
@@ -762,13 +764,14 @@ private:
  current_var->data.how_declared == ir_var_hidden;
   this->uniforms[id].builtin = is_gl_identifier(name);
 
-  /* Do not assign storage if the uniform is builtin */
-  if (!this->uniforms[id].builtin)
- this->uniforms[id].storage = this->values;
-
   this->uniforms[id].is_shader_storage =
  current_var->is_in_shader_storage_block();
 
+  /* Do not assign storage if the uniform is builtin */
+  if (!this->uniforms[id].builtin &&
+  !this->uniforms[id].is_shader_storage)
+ this->uniforms[id].storage = this->values;
+
   if (this->buffer_block_index != -1) {
  this->uniforms[id].block_index = this->buffer_block_index;
 
@@ -819,7 +822,9 @@ private:
  this->uniforms[id].row_major = false;
   }
 
-  this->values += values_for_type(type);
+  if (!this->uniforms[id].builtin &&
+  !this->uniforms[id].is_shader_storage)
+ this->values += values_for_type(type);
}
 
/**
@@ -1270,7 +1275,8 @@ link_assign_uniform_locations(struct gl_shader_program 
*prog,
 
 #ifndef NDEBUG
for (unsigned i = 0; i < num_uniforms; i++) {
-  assert(uniforms[i].storage != NULL || uniforms[i].builtin);
+  assert(uniforms[i].storage != NULL || uniforms[i].builtin ||
+ uniforms[i].is_shader_storage);
}
 
assert(parcel.values == data_end);
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] fix load of unpacked double vector input varyings

2016-06-01 Thread Samuel Iglesias Gonsálvez
On 26/05/16 07:56, Samuel Iglesias Gonsálvez wrote:
> Hello,
> 
> Timothy found that tests with unpacked double vector input varyings
> were failing in i965 driver. For example, this is happening when
> using explicit locations because Mesa disables varying packing for
> that case.
> 
> These patches fix the following piglit test:
> 
> spec/arb_gpu_shader_fp64/execution/vs-fs-explicit-locations
> 
> Samuel Iglesias Gonsálvez (2):
>   i965/fs: fix offset when loading double vector input varyings
>   i965/fs: fix FS_OPCODE_CINTERP for unpacked double input varyings
> 

These patches have a Tested-by (Thanks Timothy!) but they are still
unreviewed. Can someone take a look at them?

Also, this patch [0] is still unreviewed.

Thanks,

Sam

[0] https://lists.freedesktop.org/archives/mesa-dev/2016-May/118759.html


>  src/mesa/drivers/dri/i965/brw_fs.cpp | 13 -
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 22 +-
>  2 files changed, 33 insertions(+), 2 deletions(-)
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCHv2 13/25] i965/fs: Skip SIMD lowering destination zipping if possible.

2016-06-01 Thread Jason Ekstrand
Looks good.  R-B

On Wed, Jun 1, 2016 at 5:21 PM, Francisco Jerez 
wrote:

> Skipping the temporary allocation and copy instructions is easy (just
> return dst), but the conditions used to find out whether the copy can
> be optimized out safely without breaking the program are rather
> complex: The destination must be exactly one component of at most the
> execution width of the lowered instruction, and all source regions of
> the instruction must be either fully disjoint from the destination or
> be aligned with it group by group.
>
> v2: Don't handle partial source-destination overlap for simplicity
> (Jason).  No instruction count regressions with respect to v1 in
> either shader-db or the few FP64 shader_runner test-cases with
> partial overlap I've checked manually.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 55
> 
>  1 file changed, 55 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 00d937e..bfae1d7 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -5078,6 +5078,52 @@ emit_unzip(const fs_builder , bblock_t *block,
> fs_inst *inst,
>  }
>
>  /**
> + * Return true if splitting out the group of channels of instruction \p
> inst
> + * given by lbld.group() requires allocating a temporary for the
> destination
> + * of the lowered instruction and copying the data back to the original
> + * destination region.
> + */
> +static inline bool
> +needs_dst_copy(const fs_builder , const fs_inst *inst)
> +{
> +   /* If the instruction writes more than one component we'll have to
> shuffle
> +* the results of multiple lowered instructions in order to make sure
> that
> +* they end up arranged correctly in the original destination region.
> +*/
> +   if (inst->regs_written * REG_SIZE >
> +   inst->dst.component_size(inst->exec_size))
> +  return true;
> +
> +   /* If the lowered execution size is larger than the original the
> result of
> +* the instruction won't fit in the original destination, so we'll
> have to
> +* allocate a temporary in any case.
> +*/
> +   if (lbld.dispatch_width() > inst->exec_size)
> +  return true;
> +
> +   for (unsigned i = 0; i < inst->sources; i++) {
> +  /* If we already made a copy of the source for other reasons there
> won't
> +   * be any overlap with the destination.
> +   */
> +  if (needs_src_copy(lbld, inst, i))
> + continue;
> +
> +  /* In order to keep the logic simple we emit a copy whenever the
> +   * destination region doesn't exactly match an overlapping source,
> which
> +   * may point at the source and destination not being aligned group
> by
> +   * group which could cause one of the lowered instructions to
> overwrite
> +   * the data read from the same source by other lowered instructions.
> +   */
> +  if (regions_overlap(inst->dst, inst->regs_written * REG_SIZE,
> +  inst->src[i], inst->regs_read(i) * REG_SIZE) &&
> +  !inst->dst.equals(inst->src[i]))
> +return true;
> +   }
> +
> +   return false;
> +}
> +
> +/**
>   * Insert data from a packed temporary into the channel group given by
>   * lbld.group() of the destination region of instruction \p inst and
> return
>   * the temporary as result.  If any copy instructions are required they
> will
> @@ -5097,6 +5143,8 @@ emit_zip(const fs_builder , bblock_t *block,
> fs_inst *inst)
> const fs_reg dst = horiz_offset(inst->dst, lbld.group());
> const unsigned dst_size = inst->regs_written * REG_SIZE /
>  inst->dst.component_size(inst->exec_size);
> +
> +   if (needs_dst_copy(lbld, inst)) {
> const fs_reg tmp = lbld.vgrf(inst->dst.type, dst_size);
>
> if (inst->predicate) {
> @@ -5114,6 +5162,13 @@ emit_zip(const fs_builder , bblock_t *block,
> fs_inst *inst)
>.MOV(offset(dst, inst->exec_size, k), offset(tmp, lbld, k));
>
> return tmp;
> +
> +   } else {
> +  /* No need to allocate a temporary for the lowered instruction, just
> +   * take the right group of channels from the original region.
> +   */
> +  return dst;
> +   }
>  }
>
>  bool
> --
> 2.7.3
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v11] mesa: Add MESA_SHADER_CAPTURE_PATH for writing .shader_test files.

2016-06-01 Thread Matt Turner
On Wed, Jun 1, 2016 at 12:01 PM, Kenneth Graunke  wrote:
> This writes linked shader programs to .shader_test files to
> $MESA_SHADER_CAPTURE_PATH in the format used by shader-db
> (http://cgit.freedesktop.org/mesa/shader-db).
>
> It supports both GLSL shaders and ARB programs.  All stages that
> are linked together are written in a single .shader_test file.
>
> This eliminates the need for shader-db's split-to-files.py, as Mesa
> produces the desired format directly.  It's much more reliable than
> parsing stdout/stderr, as those may contain extraneous messages, or
> simply be closed by the application and unavailable.
>
> We have many similar features already, but this is a bit different:
> - MESA_GLSL=dump writes to stdout, not files.
> - MESA_GLSL=log writes each stage to separate files (rather than
>   all linked shaders in one file), at draw time (not link time),
>   with uniform data and state flag info.
> - Tapani's shader replacement mechanism (MESA_SHADER_DUMP_PATH and
>   MESA_SHADER_READ_PATH) also uses separate files per shader stage,
>   but allows reading in files to replace an app's shader code.
>
> v2:  Dump ARB programs too, not just GLSL.
> v3:  Don't dump bogus 0.shader_test file.
> v4:  Add "GL_ARB_separate_shader_objects" to the [require] block.
> v5:  Print "GLSL 4.00" instead of "GLSL 4.0" in the [require] block.
> v6:  Don't hardcode /tmp/mesa.
> v7:  Fix memoization of getenv().
> v8:  Also print "SSO ENABLED" (suggested by Timothy).
> v9:  Also handle ES shaders (suggested by Ilia).
> v10: Guard against MESA_SHADER_CAPTURE_PATH being too long; add
>  _mesa_warning calls on error handling (suggested by Ben).
> v11: Fix crash when variable is unset introduced in v10.
>
> Signed-off-by: Kenneth Graunke 

Thanks for sending this. With the stray change to mtypes.h removed,

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Add _NEW_POINT to a couple of comments.

2016-06-01 Thread Kenneth Graunke
Signed-off-by: Kenneth Graunke 
Cc: "12.0" 
---
 src/mesa/drivers/dri/i965/gen6_sf_state.c | 2 +-
 src/mesa/drivers/dri/i965/gen7_sf_state.c | 2 +-
 src/mesa/drivers/dri/i965/gen8_sf_state.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
b/src/mesa/drivers/dri/i965/gen6_sf_state.c
index 8b1b7eb..0538ab7 100644
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
@@ -378,7 +378,7 @@ upload_sf_state(struct brw_context *brw)
 ctx->Point._Attenuated))
   dw4 |= GEN6_SF_USE_STATE_POINT_WIDTH;
 
-   /* Clamp to ARB_point_parameters user limits */
+   /* _NEW_POINT - Clamp to ARB_point_parameters user limits */
point_size = CLAMP(ctx->Point.Size, ctx->Point.MinSize, ctx->Point.MaxSize);
 
/* Clamp to the hardware limits and convert to fixed point */
diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c 
b/src/mesa/drivers/dri/i965/gen7_sf_state.c
index 7a3cc53..d3a658c 100644
--- a/src/mesa/drivers/dri/i965/gen7_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c
@@ -217,7 +217,7 @@ upload_sf_state(struct brw_context *brw)
if (!(ctx->VertexProgram.PointSizeEnabled || ctx->Point._Attenuated))
   dw3 |= GEN6_SF_USE_STATE_POINT_WIDTH;
 
-   /* Clamp to ARB_point_parameters user limits */
+   /* _NEW_POINT - Clamp to ARB_point_parameters user limits */
point_size = CLAMP(ctx->Point.Size, ctx->Point.MinSize, ctx->Point.MaxSize);
 
/* Clamp to the hardware limits and convert to fixed point */
diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c 
b/src/mesa/drivers/dri/i965/gen8_sf_state.c
index 60e8c94..d854b6d 100644
--- a/src/mesa/drivers/dri/i965/gen8_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c
@@ -166,7 +166,7 @@ upload_sf(struct brw_context *brw)
   dw2 |= GEN6_SF_LINE_END_CAP_WIDTH_1_0;
}
 
-   /* Clamp to ARB_point_parameters user limits */
+   /* _NEW_POINT - Clamp to ARB_point_parameters user limits */
point_size = CLAMP(ctx->Point.Size, ctx->Point.MinSize, ctx->Point.MaxSize);
 
/* Clamp to the hardware limits and convert to fixed point */
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Fix point size with tessellation/geometry shaders.

2016-06-01 Thread Kenneth Graunke
I believe we were using the state-based point size when either the
tessellation evaluation or geometry shader, but the vertex shader
didn't write it.  If the last enabled stage writes it (corresponding
to brw_vue_map_geom_out), we should use the shader's value, otherwise
state.  This should handle the Attenuated case as well, as the
fixed-function shader will also write it.

Fixes a number of dEQP tests with EXT_tessellation_shader enabled:
dEQP-GLES31.functional.tessellation_geometry_interaction.point_size.*

Cc: "12.0" 
Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/gen6_sf_state.c | 5 ++---
 src/mesa/drivers/dri/i965/gen7_sf_state.c | 8 
 src/mesa/drivers/dri/i965/gen8_sf_state.c | 8 
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
b/src/mesa/drivers/dri/i965/gen6_sf_state.c
index 0538ab7..f35926e 100644
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
@@ -373,9 +373,8 @@ upload_sf_state(struct brw_context *brw)
if (multisampled_fbo && ctx->Multisample.Enabled)
   dw3 |= GEN6_SF_MSRAST_ON_PATTERN;
 
-   /* _NEW_PROGRAM | _NEW_POINT */
-   if (!(ctx->VertexProgram.PointSizeEnabled ||
-ctx->Point._Attenuated))
+   /* BRW_NEW_VUE_MAP_GEOM_OUT */
+   if ((brw->vue_map_geom_out.slots_valid & VARYING_BIT_PSIZ) == 0)
   dw4 |= GEN6_SF_USE_STATE_POINT_WIDTH;
 
/* _NEW_POINT - Clamp to ARB_point_parameters user limits */
diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c 
b/src/mesa/drivers/dri/i965/gen7_sf_state.c
index d3a658c..7957614 100644
--- a/src/mesa/drivers/dri/i965/gen7_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c
@@ -213,8 +213,8 @@ upload_sf_state(struct brw_context *brw)
 
dw3 = GEN6_SF_LINE_AA_MODE_TRUE;
 
-   /* _NEW_PROGRAM | _NEW_POINT */
-   if (!(ctx->VertexProgram.PointSizeEnabled || ctx->Point._Attenuated))
+   /* BRW_NEW_VUE_MAP_GEOM_OUT */
+   if ((brw->vue_map_geom_out.slots_valid & VARYING_BIT_PSIZ) == 0)
   dw3 |= GEN6_SF_USE_STATE_POINT_WIDTH;
 
/* _NEW_POINT - Clamp to ARB_point_parameters user limits */
@@ -252,11 +252,11 @@ const struct brw_tracked_state gen7_sf_state = {
_NEW_MULTISAMPLE |
_NEW_POINT |
_NEW_POLYGON |
-   _NEW_PROGRAM |
_NEW_SCISSOR,
   .brw   = BRW_NEW_BLORP |
BRW_NEW_CONTEXT |
-   BRW_NEW_PRIMITIVE,
+   BRW_NEW_PRIMITIVE |
+   BRW_NEW_VUE_MAP_GEOM_OUT,
},
.emit = upload_sf_state,
 };
diff --git a/src/mesa/drivers/dri/i965/gen8_sf_state.c 
b/src/mesa/drivers/dri/i965/gen8_sf_state.c
index d854b6d..b22a5f0 100644
--- a/src/mesa/drivers/dri/i965/gen8_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_sf_state.c
@@ -172,8 +172,8 @@ upload_sf(struct brw_context *brw)
/* Clamp to the hardware limits and convert to fixed point */
dw3 |= U_FIXED(CLAMP(point_size, 0.125f, 255.875f), 3);
 
-   /* _NEW_PROGRAM | _NEW_POINT */
-   if (!(ctx->VertexProgram.PointSizeEnabled || ctx->Point._Attenuated))
+   /* BRW_NEW_VUE_MAP_GEOM_OUT */
+   if ((brw->vue_map_geom_out.slots_valid & VARYING_BIT_PSIZ) == 0)
   dw3 |= GEN6_SF_USE_STATE_POINT_WIDTH;
 
/* _NEW_POINT | _NEW_MULTISAMPLE */
@@ -204,12 +204,12 @@ upload_sf(struct brw_context *brw)
 const struct brw_tracked_state gen8_sf_state = {
.dirty = {
   .mesa  = _NEW_LIGHT |
-   _NEW_PROGRAM |
_NEW_LINE |
_NEW_MULTISAMPLE |
_NEW_POINT,
   .brw   = BRW_NEW_BLORP |
-   BRW_NEW_CONTEXT,
+   BRW_NEW_CONTEXT |
+   BRW_NEW_VUE_MAP_GEOM_OUT,
},
.emit = upload_sf,
 };
-- 
2.8.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/copyimage: report INVALID_VALUE for missing cube array

2016-06-01 Thread Dave Airlie
From: Dave Airlie 

The specs says INVALID_VALUE for exceeding dimensions,
which is really what is happening here.

This fixes:
GL45-CTS.copy_image.non_existent_mipmap

Signed-off-by: Dave Airlie 
---
 src/mesa/main/copyimage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/copyimage.c b/src/mesa/main/copyimage.c
index 6aa6bcb..7e5df61 100644
--- a/src/mesa/main/copyimage.c
+++ b/src/mesa/main/copyimage.c
@@ -180,7 +180,7 @@ prepare_target(struct gl_context *ctx, GLuint name, GLenum 
target,
  for (i = 0; i < depth; i++) {
 if (!texObj->Image[z+i][level]) {
/* missing cube face */
-   _mesa_error(ctx, GL_INVALID_OPERATION,
+   _mesa_error(ctx, GL_INVALID_VALUE,
"glCopyImageSubData(missing cube face)");
return false;
 }
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] anv/pipeline: Silently pass tests if depth or stencil is missing

2016-06-01 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
---
 src/intel/vulkan/gen7_pipeline.c  |  4 +++-
 src/intel/vulkan/gen8_pipeline.c  |  4 +++-
 src/intel/vulkan/genX_pipeline_util.h | 30 +-
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/src/intel/vulkan/gen7_pipeline.c b/src/intel/vulkan/gen7_pipeline.c
index ad45ecb..a53bdc4 100644
--- a/src/intel/vulkan/gen7_pipeline.c
+++ b/src/intel/vulkan/gen7_pipeline.c
@@ -155,6 +155,8 @@ genX(graphics_pipeline_create)(
 VkPipeline* pPipeline)
 {
ANV_FROM_HANDLE(anv_device, device, _device);
+   ANV_FROM_HANDLE(anv_render_pass, pass, pCreateInfo->renderPass);
+   struct anv_subpass *subpass = >subpasses[pCreateInfo->subpass];
struct anv_pipeline *pipeline;
VkResult result;
 
@@ -178,7 +180,7 @@ genX(graphics_pipeline_create)(
assert(pCreateInfo->pRasterizationState);
gen7_emit_rs_state(pipeline, pCreateInfo->pRasterizationState, extra);
 
-   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState);
+   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState, pass, subpass);
 
gen7_emit_cb_state(pipeline, pCreateInfo->pColorBlendState,
 pCreateInfo->pMultisampleState);
diff --git a/src/intel/vulkan/gen8_pipeline.c b/src/intel/vulkan/gen8_pipeline.c
index cbeb791..77c81f4 100644
--- a/src/intel/vulkan/gen8_pipeline.c
+++ b/src/intel/vulkan/gen8_pipeline.c
@@ -268,6 +268,8 @@ genX(graphics_pipeline_create)(
 VkPipeline* pPipeline)
 {
ANV_FROM_HANDLE(anv_device, device, _device);
+   ANV_FROM_HANDLE(anv_render_pass, pass, pCreateInfo->renderPass);
+   struct anv_subpass *subpass = >subpasses[pCreateInfo->subpass];
struct anv_pipeline *pipeline;
VkResult result;
uint32_t offset, length;
@@ -294,7 +296,7 @@ genX(graphics_pipeline_create)(
emit_rs_state(pipeline, pCreateInfo->pRasterizationState,
  pCreateInfo->pMultisampleState, extra);
emit_ms_state(pipeline, pCreateInfo->pMultisampleState);
-   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState);
+   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState, pass, subpass);
emit_cb_state(pipeline, pCreateInfo->pColorBlendState,
pCreateInfo->pMultisampleState);
 
diff --git a/src/intel/vulkan/genX_pipeline_util.h 
b/src/intel/vulkan/genX_pipeline_util.h
index fe24048..669b456 100644
--- a/src/intel/vulkan/genX_pipeline_util.h
+++ b/src/intel/vulkan/genX_pipeline_util.h
@@ -21,6 +21,8 @@
  * IN THE SOFTWARE.
  */
 
+#include "vk_format_info.h"
+
 static uint32_t
 vertex_element_comp_control(enum isl_format format, unsigned comp)
 {
@@ -428,7 +430,9 @@ static const uint32_t vk_to_gen_stencil_op[] = {
 
 static void
 emit_ds_state(struct anv_pipeline *pipeline,
-  const VkPipelineDepthStencilStateCreateInfo *info)
+  const VkPipelineDepthStencilStateCreateInfo *info,
+  const struct anv_render_pass *pass,
+  const struct anv_subpass *subpass)
 {
 #if GEN_GEN == 7
 #  define depth_stencil_dw pipeline->gen7.depth_stencil_state
@@ -470,6 +474,30 @@ emit_ds_state(struct anv_pipeline *pipeline,
   .BackfaceStencilTestFunction = 
vk_to_gen_compare_op[info->back.compareOp],
};
 
+   VkImageAspectFlags aspects = 0;
+   if (pass->attachments == NULL) {
+  /* This comes from meta.  Assume we have verything. */
+  aspects = VK_IMAGE_ASPECT_DEPTH_BIT | VK_IMAGE_ASPECT_STENCIL_BIT;
+   } else if (subpass->depth_stencil_attachment != VK_ATTACHMENT_UNUSED) {
+  VkFormat depth_stencil_format =
+ pass->attachments[subpass->depth_stencil_attachment].format;
+  aspects = vk_format_aspects(depth_stencil_format);
+   }
+
+   /* The Vulkan spec requires that if either depth or stencil is not present,
+* the pipeline is to act as if the test silently passes.
+*/
+   if (!(aspects & VK_IMAGE_ASPECT_DEPTH_BIT)) {
+  depth_stencil.DepthBufferWriteEnable = false;
+  depth_stencil.DepthTestFunction = PREFILTEROPALWAYS;
+   }
+
+   if (!(aspects & VK_IMAGE_ASPECT_STENCIL_BIT)) {
+  depth_stencil.StencilBufferWriteEnable = false;
+  depth_stencil.StencilTestFunction = PREFILTEROPALWAYS;
+  depth_stencil.BackfaceStencilTestFunction = PREFILTEROPALWAYS;
+   }
+
/* From the Broadwell PRM:
 *
 *"If Depth_Test_Enable = 1 AND Depth_Test_func = EQUAL, the
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] anv/pipeline: Add support for early depth stencil

2016-06-01 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
---
 src/intel/vulkan/gen7_pipeline.c | 10 +-
 src/intel/vulkan/gen8_pipeline.c |  9 -
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/gen7_pipeline.c b/src/intel/vulkan/gen7_pipeline.c
index a53bdc4..f069db9 100644
--- a/src/intel/vulkan/gen7_pipeline.c
+++ b/src/intel/vulkan/gen7_pipeline.c
@@ -365,12 +365,20 @@ genX(graphics_pipeline_create)(
  wm.ThreadDispatchEnable= true;
  wm.LineEndCapAntialiasingRegionWidth   = 0; /* 0.5 pixels */
  wm.LineAntialiasingRegionWidth = 1; /* 1.0 pixels */
- wm.EarlyDepthStencilControl= EDSC_NORMAL;
  wm.PointRasterizationRule  = RASTRULE_UPPER_RIGHT;
  wm.PixelShaderComputedDepthMode= 
wm_prog_data->computed_depth_mode;
  wm.PixelShaderUsesSourceDepth  = wm_prog_data->uses_src_depth;
  wm.PixelShaderUsesSourceW  = wm_prog_data->uses_src_w;
  wm.PixelShaderUsesInputCoverageMask= 
wm_prog_data->uses_sample_mask;
+
+ if (wm_prog_data->early_fragment_tests) {
+wm.EarlyDepthStencilControl = EDSC_PREPS;
+ } else if (wm_prog_data->has_side_effects) {
+wm.EarlyDepthStencilControl = EDSC_PSEXEC;
+ } else {
+wm.EarlyDepthStencilControl = EDSC_NORMAL;
+ }
+
  wm.BarycentricInterpolationMode= 
wm_prog_data->barycentric_interp_modes;
   }
}
diff --git a/src/intel/vulkan/gen8_pipeline.c b/src/intel/vulkan/gen8_pipeline.c
index 77c81f4..1300c0d 100644
--- a/src/intel/vulkan/gen8_pipeline.c
+++ b/src/intel/vulkan/gen8_pipeline.c
@@ -329,10 +329,17 @@ genX(graphics_pipeline_create)(
   wm.StatisticsEnable= true;
   wm.LineEndCapAntialiasingRegionWidth   = _05pixels;
   wm.LineAntialiasingRegionWidth = _10pixels;
-  wm.EarlyDepthStencilControl= NORMAL;
   wm.ForceThreadDispatchEnable   = NORMAL;
   wm.PointRasterizationRule  = RASTRULE_UPPER_RIGHT;
 
+  if (wm_prog_data && wm_prog_data->early_fragment_tests) {
+ wm.EarlyDepthStencilControl = PREPS;
+  } else if (wm_prog_data && wm_prog_data->has_side_effects) {
+ wm.EarlyDepthStencilControl = PSEXEC;
+  } else {
+ wm.EarlyDepthStencilControl = NORMAL;
+  }
+
   wm.BarycentricInterpolationMode = pipeline->ps_ksp0 == NO_KERNEL ?
  0 : wm_prog_data->barycentric_interp_modes;
}
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] nir/info: Get rid of uses_interp_var_at_offset

2016-06-01 Thread Jason Ekstrand
We were using this briefly in the i965 driver to trigger recompiles but we
haven't been using it since we switched to the NIR y-transform lowering
pass.

Signed-off-by: Jason Ekstrand 
---
 src/compiler/glsl/glsl_to_nir.cpp  | 3 ---
 src/compiler/nir/nir.h | 3 ---
 src/compiler/nir/nir_gather_info.c | 4 
 3 files changed, 10 deletions(-)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index 63a2cfd..daf237e 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -1284,9 +1284,6 @@ nir_visitor::visit(ir_expression *ir)
   intrin->intrinsic == nir_intrinsic_interp_var_at_sample)
  intrin->src[0] = nir_src_for_ssa(evaluate_rvalue(ir->operands[1]));
 
-  if (intrin->intrinsic == nir_intrinsic_interp_var_at_offset)
- shader->info.uses_interp_var_at_offset = true;
-
   unsigned bit_size =  glsl_get_bit_size(deref->type);
   add_instr(>instr, deref->type->vector_elements, bit_size);
 
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 2e1bdfb..f7d4fff 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1718,9 +1718,6 @@ typedef struct nir_shader_info {
/* Whether or not this shader ever uses textureGather() */
bool uses_texture_gather;
 
-   /** Whether or not this shader uses nir_intrinsic_interp_var_at_offset */
-   bool uses_interp_var_at_offset;
-
/* Whether or not this shader uses the gl_ClipDistance output */
bool uses_clip_distance_out;
 
diff --git a/src/compiler/nir/nir_gather_info.c 
b/src/compiler/nir/nir_gather_info.c
index 7900fd1..89a6302 100644
--- a/src/compiler/nir/nir_gather_info.c
+++ b/src/compiler/nir/nir_gather_info.c
@@ -56,10 +56,6 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, nir_shader 
*shader)
   shader->info.gs.uses_end_primitive = 1;
   break;
 
-   case nir_intrinsic_interp_var_at_offset:
-  shader->info.uses_interp_var_at_offset = 1;
-  break;
-
default:
   break;
}
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] i965/fs Add a wm_prog_data bit for has_side_effects

2016-06-01 Thread Jason Ekstrand
This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.

Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
---
 src/mesa/drivers/dri/i965/brw_compiler.h |  1 +
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 14 ++
 2 files changed, 15 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 0844694..375028b 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -402,6 +402,7 @@ struct brw_wm_prog_data {
bool uses_src_depth;
bool uses_src_w;
bool uses_sample_mask;
+   bool has_side_effects;
bool pulls_bary;
 
/**
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 9b6093c..54b3e7b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -3338,6 +3338,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
case nir_intrinsic_atomic_counter_inc:
case nir_intrinsic_atomic_counter_dec:
case nir_intrinsic_atomic_counter_read: {
+  if (stage == MESA_SHADER_FRAGMENT &&
+  instr->intrinsic != nir_intrinsic_atomic_counter_read)
+ ((struct brw_wm_prog_data *)prog_data)->has_side_effects = true;
+
   /* Get the arguments of the atomic intrinsic. */
   const fs_reg offset = get_nir_src(instr->src[0]);
   const unsigned surface = (stage_prog_data->binding_table.abo_start +
@@ -3384,6 +3388,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
case nir_intrinsic_image_atomic_comp_swap: {
   using namespace image_access;
 
+  if (stage == MESA_SHADER_FRAGMENT &&
+  instr->intrinsic != nir_intrinsic_image_load)
+ ((struct brw_wm_prog_data *)prog_data)->has_side_effects = true;
+
   /* Get the referenced image variable and type. */
   const nir_variable *var = instr->variables[0]->var;
   const glsl_type *type = var->type->without_array();
@@ -3697,6 +3705,9 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
case nir_intrinsic_store_ssbo: {
   assert(devinfo->gen >= 7);
 
+  if (stage == MESA_SHADER_FRAGMENT)
+ ((struct brw_wm_prog_data *)prog_data)->has_side_effects = true;
+
   /* Block index */
   fs_reg surf_index;
   nir_const_value *const_uniform_block =
@@ -3885,6 +3896,9 @@ void
 fs_visitor::nir_emit_ssbo_atomic(const fs_builder ,
  int op, nir_intrinsic_instr *instr)
 {
+   if (stage == MESA_SHADER_FRAGMENT)
+  ((struct brw_wm_prog_data *)prog_data)->has_side_effects = true;
+
fs_reg dest;
if (nir_intrinsic_infos[instr->intrinsic].has_dest)
   dest = get_nir_dest(instr->dest);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] mesa: Get rid of _mesa_active_fragment_shader_has_side_effects

2016-06-01 Thread Jason Ekstrand
It is no longer used.

Signed-off-by: Jason Ekstrand 
---
 src/mesa/main/mtypes.h | 18 --
 1 file changed, 18 deletions(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 2233526..4f93a39 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -4682,24 +4682,6 @@ enum _debug
DEBUG_INCOMPLETE_FBO = (1 << 3)
 };
 
-/**
- * Checks if the active fragment shader program can have side effects due
- * to use of things like atomic buffers or images
- */
-static inline bool
-_mesa_active_fragment_shader_has_side_effects(const struct gl_context *ctx)
-{
-   const struct gl_shader *sh;
-
-   if (!ctx->_Shader->_CurrentFragmentProgram)
-  return false;
-
-   sh = 
ctx->_Shader->_CurrentFragmentProgram->_LinkedShaders[MESA_SHADER_FRAGMENT];
-   return sh->NumAtomicBuffers > 0 ||
-  sh->NumImages > 0 ||
-  sh->NumShaderStorageBlocks > 0;
-}
-
 #ifdef __cplusplus
 }
 #endif
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] i965/ps_state: Use wm_prog_data.has_side_effects

2016-06-01 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/gen7_wm_state.c | 8 +++-
 src/mesa/drivers/dri/i965/gen8_ps_state.c | 7 +++
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_state.c
index a618c3e..8d4d4fc 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
@@ -78,10 +78,8 @@ upload_wm_state(struct brw_context *brw)
}
 
/* _NEW_BUFFERS | _NEW_COLOR */
-   const bool active_fs_has_side_effects =
-  _mesa_active_fragment_shader_has_side_effects(>ctx);
if (brw_color_buffer_write_enabled(brw) || writes_depth ||
-   active_fs_has_side_effects || dw1 & GEN7_WM_KILL_ENABLE) {
+   prog_data->has_side_effects || dw1 & GEN7_WM_KILL_ENABLE) {
   dw1 |= GEN7_WM_DISPATCH_ENABLE;
}
if (multisampled_fbo) {
@@ -107,7 +105,7 @@ upload_wm_state(struct brw_context *brw)
/* BRW_NEW_FS_PROG_DATA */
if (prog_data->early_fragment_tests)
   dw1 |= GEN7_WM_EARLY_DS_CONTROL_PREPS;
-   else if (active_fs_has_side_effects)
+   else if (prog_data->has_side_effects)
   dw1 |= GEN7_WM_EARLY_DS_CONTROL_PSEXEC;
 
/* The "UAV access enable" bits are unnecessary on HSW because they only
@@ -120,7 +118,7 @@ upload_wm_state(struct brw_context *brw)
 */
if (brw->is_haswell &&
!(brw_color_buffer_write_enabled(brw) || writes_depth) &&
-   active_fs_has_side_effects)
+   prog_data->has_side_effects)
   dw2 |= HSW_WM_UAV_ONLY;
 
BEGIN_BATCH(3);
diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c 
b/src/mesa/drivers/dri/i965/gen8_ps_state.c
index c475a52..51a3121 100644
--- a/src/mesa/drivers/dri/i965/gen8_ps_state.c
+++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c
@@ -32,7 +32,6 @@ void
 gen8_upload_ps_extra(struct brw_context *brw,
  const struct brw_wm_prog_data *prog_data)
 {
-   struct gl_context *ctx = >ctx;
uint32_t dw1 = 0;
 
dw1 |= GEN8_PSX_PIXEL_SHADER_VALID;
@@ -95,8 +94,8 @@ gen8_upload_ps_extra(struct brw_context *brw,
 *
 * BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS | 
_NEW_COLOR
 */
-   if ((_mesa_active_fragment_shader_has_side_effects(ctx) ||
-prog_data->uses_kill) && !brw_color_buffer_write_enabled(brw))
+   if ((prog_data->has_side_effects || prog_data->uses_kill) &&
+   !brw_color_buffer_write_enabled(brw))
   dw1 |= GEN8_PSX_SHADER_HAS_UAV;
 
if (prog_data->computed_stencil) {
@@ -155,7 +154,7 @@ upload_wm_state(struct brw_context *brw)
/* BRW_NEW_FS_PROG_DATA */
if (brw->wm.prog_data->early_fragment_tests)
   dw1 |= GEN7_WM_EARLY_DS_CONTROL_PREPS;
-   else if (_mesa_active_fragment_shader_has_side_effects(>ctx))
+   else if (brw->wm.prog_data->has_side_effects)
   dw1 |= GEN7_WM_EARLY_DS_CONTROL_PSEXEC;
 
BEGIN_BATCH(2);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/copyimage: fix num samples check to handle renderbuffers.

2016-06-01 Thread Dave Airlie
From: Dave Airlie 

This test was only happening for textures, but there is
nothing in the spec to say this, so test it for all cases.

This fixes:
GL45-CTS.copy_image.invalid_target

Signed-off-by: Dave Airlie 
---
 src/mesa/main/copyimage.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/copyimage.c b/src/mesa/main/copyimage.c
index 63ce13a..6aa6bcb 100644
--- a/src/mesa/main/copyimage.c
+++ b/src/mesa/main/copyimage.c
@@ -65,6 +65,7 @@ prepare_target(struct gl_context *ctx, GLuint name, GLenum 
target,
GLenum *internalFormat,
GLuint *width,
GLuint *height,
+   GLuint *num_samples,
const char *dbg_prefix)
 {
if (name == 0) {
@@ -131,6 +132,7 @@ prepare_target(struct gl_context *ctx, GLuint name, GLenum 
target,
   *internalFormat = rb->InternalFormat;
   *width = rb->Width;
   *height = rb->Height;
+  *num_samples = rb->NumSamples;
   *tex_image = NULL;
} else {
   struct gl_texture_object *texObj = _mesa_lookup_texture(ctx, name);
@@ -201,6 +203,7 @@ prepare_target(struct gl_context *ctx, GLuint name, GLenum 
target,
   *internalFormat = (*tex_image)->InternalFormat;
   *width = (*tex_image)->Width;
   *height = (*tex_image)->Height;
+  *num_samples = (*tex_image)->NumSamples;
}
 
return true;
@@ -456,6 +459,7 @@ _mesa_CopyImageSubData(GLuint srcName, GLenum srcTarget, 
GLint srcLevel,
GLenum srcIntFormat, dstIntFormat;
GLuint src_w, src_h, dst_w, dst_h;
GLuint src_bw, src_bh, dst_bw, dst_bh;
+   GLuint src_num_samples, dst_num_samples;
int dstWidth, dstHeight, dstDepth;
int i;
 
@@ -477,12 +481,12 @@ _mesa_CopyImageSubData(GLuint srcName, GLenum srcTarget, 
GLint srcLevel,
 
if (!prepare_target(ctx, srcName, srcTarget, srcLevel, srcZ, srcDepth,
, , ,
-   , _w, _h, "src"))
+   , _w, _h, _num_samples, "src"))
   return;
 
if (!prepare_target(ctx, dstName, dstTarget, dstLevel, dstZ, srcDepth,
, , ,
-   , _w, _h, "dst"))
+   , _w, _h, _num_samples, "dst"))
   return;
 
_mesa_get_format_block_size(srcFormat, _bw, _bh);
@@ -565,8 +569,7 @@ _mesa_CopyImageSubData(GLuint srcName, GLenum srcTarget, 
GLint srcLevel,
   return;
}
 
-   if (srcTexImage && dstTexImage &&
-   srcTexImage->NumSamples != dstTexImage->NumSamples) {
+   if (src_num_samples != dst_num_samples) {
   _mesa_error(ctx, GL_INVALID_OPERATION,
   "glCopyImageSubData(number of samples mismatch)");
   return;
-- 
2.5.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] clover: fix getting scalar args api size

2016-06-01 Thread Jan Vesely
On Wed, 2016-06-01 at 14:36 +0200, Serge Martin wrote:
> This fix getting the size of a struct arg. vec3 types still work ok.
> Only buit-in args need to have power of two alignement, getTypeAllocSize
> reports the corect size.

"alignment", "correct"

otherwise LGTM. you probably want to cc Francisco on clover patches.

Jan

> 
> Cc: 12.0 
> ---
>  src/gallium/state_trackers/clover/llvm/invocation.cpp | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp 
> b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> index 03487d6..bb0faaa 100644
> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp
> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp
> @@ -469,10 +469,9 @@ namespace {
>  
>   // OpenCL 1.2 specification, Ch. 6.1.5: "A built-in data
>   // type that is not a power of two bytes in size must be
> - // aligned to the next larger power of two".  We need this
> - // alignment for three element vectors, which have
> - // non-power-of-2 store size.
> - const unsigned arg_api_size = 
> util_next_power_of_two(arg_store_size);
> + // aligned to the next larger power of two.
> + // This rule applies to built-in types only, not structs or unions."
> + const unsigned arg_api_size = TD.getTypeAllocSize(arg_type);
>  
>   llvm::Type *target_type = arg_type->isIntegerTy() ?
> TD.getSmallestLegalIntType(mod->getContext(), arg_store_size 
> * 8)



signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] nir/lower_indirect_derefs: Use the direct array deref for recursion

2016-06-01 Thread Kenneth Graunke
On Wednesday, June 1, 2016 2:44:53 PM PDT Jason Ekstrand wrote:
> This fixes about 100 of the new Vulkan CTS tests.
> 
> Signed-off-by: Jason Ekstrand 
> Cc: "12.0" 
> Cc: Connor Abbott 
> Cc: Ian Romanick 
> Cc: Kenneth Graunke 
> ---
>  src/compiler/nir/nir_lower_indirect_derefs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/compiler/nir/nir_lower_indirect_derefs.c 
> b/src/compiler/nir/nir_lower_indirect_derefs.c
> index 694a6e0..1bf4bf6 100644
> --- a/src/compiler/nir/nir_lower_indirect_derefs.c
> +++ b/src/compiler/nir/nir_lower_indirect_derefs.c
> @@ -50,7 +50,7 @@ emit_indirect_load_store(nir_builder *b, 
> nir_intrinsic_instr *orig_instr,
>direct.indirect = NIR_SRC_INIT;
>  
>arr_parent->child = 
> -  emit_load_store(b, orig_instr, deref, >deref, dest, src);
> +  emit_load_store(b, orig_instr, deref, , dest, src);
>arr_parent->child = >deref;
> } else {
>int mid = start + (end - start) / 2;
> 

This looks right to me - we want to be storing using the direct deref,
not the indirect one we're lowering.

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] anv/pipeline: Silently pass tests if depth or stencil is missing

2016-06-01 Thread Kenneth Graunke
On Wednesday, June 1, 2016 2:44:58 PM PDT Jason Ekstrand wrote:
> Signed-off-by: Jason Ekstrand 
> Cc: "12.0" 
> Cc: Ian Romanick 
> ---
>  src/intel/vulkan/gen7_pipeline.c  | 12 ++--
>  src/intel/vulkan/gen8_pipeline.c  | 12 ++--
>  src/intel/vulkan/genX_pipeline_util.h | 30 +-
>  3 files changed, 49 insertions(+), 5 deletions(-)
> 
> diff --git a/src/intel/vulkan/gen7_pipeline.c 
> b/src/intel/vulkan/gen7_pipeline.c
> index 243b18b..0d2d086 100644
> --- a/src/intel/vulkan/gen7_pipeline.c
> +++ b/src/intel/vulkan/gen7_pipeline.c
> @@ -155,6 +155,8 @@ genX(graphics_pipeline_create)(
>  VkPipeline* pPipeline)
>  {
> ANV_FROM_HANDLE(anv_device, device, _device);
> +   ANV_FROM_HANDLE(anv_render_pass, pass, pCreateInfo->renderPass);
> +   struct anv_subpass *subpass = >subpasses[pCreateInfo->subpass];
> struct anv_pipeline *pipeline;
> VkResult result;
>  
> @@ -178,7 +180,7 @@ genX(graphics_pipeline_create)(
> assert(pCreateInfo->pRasterizationState);
> gen7_emit_rs_state(pipeline, pCreateInfo->pRasterizationState, extra);
>  
> -   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState);
> +   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState, pass, subpass);
>  
> gen7_emit_cb_state(pipeline, pCreateInfo->pColorBlendState,
>  pCreateInfo->pMultisampleState);
> @@ -369,10 +371,16 @@ genX(graphics_pipeline_create)(
>   wm.PixelShaderUsesSourceW  = wm_prog_data->uses_src_w;
>   wm.PixelShaderUsesInputCoverageMask= 
> wm_prog_data->uses_sample_mask;
>  
> + /* TODO: We could probably do something a bit more intellegent here.
> +  * However, CTS tests expect that if earliy fragment tests are not
> +  * performed, the shader *will* be executed for every fragment.  In
> +  * order to work around this we would have to check whether or not
> +  * the shader has side-effects before we can set the mode to NORMAL.
> +  */

That's what we do in i965 - we should really do it for anv as well...


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/68] i965: move program id generation

2016-06-01 Thread Timothy Arceri
On Wed, 2016-06-01 at 16:22 +1000, Timothy Arceri wrote:
> This generates the program ids at cache upload time rather than at
> program creation time.
> 
> Moving the id generation here will be useful for on-disk shader
> cache support because it means we don't generate ids if there was
> a cache miss and we had to fall back to compiling from source.
> This increases the likelyhood of finding a match.
> 
> This also changes the workaround that checked for an id of 0 to
> dectect a missing TCS. Now we check for the name "tcs_passthrough"
> since id will now always be 0 at this point.

The above paragraph was only true in V1 of this patch Ken made this
change as part of another series recently. I've removed this from the
commit message locally.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/10] R600: Cache flush fixes and cleanup v2

2016-06-01 Thread Grazvydas Ignotas
On Wed, Jun 1, 2016 at 9:57 PM, Marek Olšák  wrote:
> Hi,
>
> This is version 2 of the previous series. This time it's been tested!!
>
> Tested cards:
> - RV670
> - RV730
> - EG/REDWOOD
> - CAYMAN

All good on JUNIPER now (piglit and a few random games).
Tested-by: Grazvydas Ignotas  # on JUNIPER

Gražvydas
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCHv2 13/25] i965/fs: Skip SIMD lowering destination zipping if possible.

2016-06-01 Thread Francisco Jerez
Skipping the temporary allocation and copy instructions is easy (just
return dst), but the conditions used to find out whether the copy can
be optimized out safely without breaking the program are rather
complex: The destination must be exactly one component of at most the
execution width of the lowered instruction, and all source regions of
the instruction must be either fully disjoint from the destination or
be aligned with it group by group.

v2: Don't handle partial source-destination overlap for simplicity
(Jason).  No instruction count regressions with respect to v1 in
either shader-db or the few FP64 shader_runner test-cases with
partial overlap I've checked manually.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 55 
 1 file changed, 55 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 00d937e..bfae1d7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -5078,6 +5078,52 @@ emit_unzip(const fs_builder , bblock_t *block, 
fs_inst *inst,
 }
 
 /**
+ * Return true if splitting out the group of channels of instruction \p inst
+ * given by lbld.group() requires allocating a temporary for the destination
+ * of the lowered instruction and copying the data back to the original
+ * destination region.
+ */
+static inline bool
+needs_dst_copy(const fs_builder , const fs_inst *inst)
+{
+   /* If the instruction writes more than one component we'll have to shuffle
+* the results of multiple lowered instructions in order to make sure that
+* they end up arranged correctly in the original destination region.
+*/
+   if (inst->regs_written * REG_SIZE >
+   inst->dst.component_size(inst->exec_size))
+  return true;
+
+   /* If the lowered execution size is larger than the original the result of
+* the instruction won't fit in the original destination, so we'll have to
+* allocate a temporary in any case.
+*/
+   if (lbld.dispatch_width() > inst->exec_size)
+  return true;
+
+   for (unsigned i = 0; i < inst->sources; i++) {
+  /* If we already made a copy of the source for other reasons there won't
+   * be any overlap with the destination.
+   */
+  if (needs_src_copy(lbld, inst, i))
+ continue;
+
+  /* In order to keep the logic simple we emit a copy whenever the
+   * destination region doesn't exactly match an overlapping source, which
+   * may point at the source and destination not being aligned group by
+   * group which could cause one of the lowered instructions to overwrite
+   * the data read from the same source by other lowered instructions.
+   */
+  if (regions_overlap(inst->dst, inst->regs_written * REG_SIZE,
+  inst->src[i], inst->regs_read(i) * REG_SIZE) &&
+  !inst->dst.equals(inst->src[i]))
+return true;
+   }
+
+   return false;
+}
+
+/**
  * Insert data from a packed temporary into the channel group given by
  * lbld.group() of the destination region of instruction \p inst and return
  * the temporary as result.  If any copy instructions are required they will
@@ -5097,6 +5143,8 @@ emit_zip(const fs_builder , bblock_t *block, fs_inst 
*inst)
const fs_reg dst = horiz_offset(inst->dst, lbld.group());
const unsigned dst_size = inst->regs_written * REG_SIZE /
 inst->dst.component_size(inst->exec_size);
+
+   if (needs_dst_copy(lbld, inst)) {
const fs_reg tmp = lbld.vgrf(inst->dst.type, dst_size);
 
if (inst->predicate) {
@@ -5114,6 +5162,13 @@ emit_zip(const fs_builder , bblock_t *block, 
fs_inst *inst)
   .MOV(offset(dst, inst->exec_size, k), offset(tmp, lbld, k));
 
return tmp;
+
+   } else {
+  /* No need to allocate a temporary for the lowered instruction, just
+   * take the right group of channels from the original region.
+   */
+  return dst;
+   }
 }
 
 bool
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/10] gallium/radeon: don't use the DMA ring for pipelined buffer uploads

2016-06-01 Thread Vedran Miletić

On 06/01/2016 08:57 PM, Marek Olšák wrote:

From: Marek Olšák 

Submitting a DMA IB flushes the GFX IB and all GPU caches.

Reviewed-by: Alex Deucher 


On Tonga 380X, this improves The Talos Principle from 8.3 fps to 28.3 
fps (all graphics settings Ultra, 4xAA, 1080p resolution with 
downsampling from 1200p).


Wow!

Tested-by: Vedran Miletić 

Regards,
Vedran

--
Vedran Miletić
vedran.miletic.net
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/9] anv: Fix several of the new Vulkan CTS tests

2016-06-01 Thread Kenneth Graunke
On Wednesday, June 1, 2016 2:44:51 PM PDT Jason Ekstrand wrote:
> I recently grabbed the latest dev version of the Vulkan CTS and ran it on
> our driver.  This series fixes a bunch of the bugs that it exposed.  In an
> effort to get more people involved in Vulkan development and in the hopes
> of actually getting reviews, I've CC'd at least one person on each patch.
> If you got CC'd, it doesn't necesaraly mean you *have* to review it, just
> that I think you seemed like a good candidate. :-)
> 
> Jason Ekstrand (9):
>   anv/clear: Handle ClearImage on 3-D images
>   nir/lower_indirect_derefs: Use the direct array deref for recursion
>   anv/pipeline: Refactor specialization constant handling a bit
>   anv/pipeline: Add support for early depth stencil
>   genxml/gen6,7,75: s/BackFace/Backface
>   anv/pipeline: Unify gen7/8 emit_ds_state
>   anv/pipeline: Silently pass tests if depth or stencil is missing
>   nir/spirv: Use breaks instead of returns in constant handling
>   nir/spirv: Handle the WorkgroupSize builtin decoration
> 
>  src/compiler/nir/nir_lower_indirect_derefs.c |  2 +-
>  src/compiler/spirv/spirv_to_nir.c| 29 +-
>  src/intel/genxml/gen6.xml|  4 +-
>  src/intel/genxml/gen7.xml|  4 +-
>  src/intel/genxml/gen75.xml   |  4 +-
>  src/intel/vulkan/anv_meta_clear.c|  6 +-
>  src/intel/vulkan/anv_pipeline.c  |  9 ++-
>  src/intel/vulkan/gen7_cmd_buffer.c   |  2 +-
>  src/intel/vulkan/gen7_pipeline.c | 53 +
>  src/intel/vulkan/gen8_pipeline.c | 66 +
>  src/intel/vulkan/genX_pipeline_util.h| 87 
> 
>  11 files changed, 160 insertions(+), 106 deletions(-)

Patches 1, 3-6, and 8-9 are:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 16.1/24] i965/vec4: Fix cmod propagation not to propagate non-identity cmod into CMP(N).

2016-06-01 Thread Francisco Jerez
The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction.  This prevents
cmod propagation from (mis)optimizing:

 cmp.z.f0 tmp, ...
 mov.z.f0 null, tmp

into:

 cmp.z.f0 tmp, ...

which gives the negation of the flag result of the original sequence.
I originally noticed this while working on SIMD32 in the scalar
back-end, but the same scenario is likely to be possible in vec4
programs so this commit ports the bugfix with the same name from the
scalar back-end to the vec4 cmod propagation pass.

Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
index 0c8224f..c376beb 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
@@ -115,6 +115,18 @@ opt_cmod_propagation_local(bblock_t *block)
break;
 }
 
+/* The conditional mod of the CMP/CMPN instructions behaves
+ * specially because the flag output is not calculated from the
+ * result of the instruction, but the other way around, which
+ * means that even if the condmod to propagate and the condmod
+ * from the CMP instruction are the same they will in general give
+ * different results because they are evaluated based on different
+ * inputs.
+ */
+if (scan_inst->opcode == BRW_OPCODE_CMP ||
+scan_inst->opcode == BRW_OPCODE_CMPN)
+   break;
+
 /* Otherwise, try propagating the conditional. */
 enum brw_conditional_mod cond =
inst->src[0].negate ? brw_swap_cmod(inst->conditional_mod)
-- 
2.7.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH 3/5] mesa: Fix add_index_to_name logic

2016-06-01 Thread Timothy Arceri
On Wed, 2016-06-01 at 14:55 -0700, Ian Romanick wrote:
> On 05/31/2016 04:45 PM, Ian Romanick wrote:
> > 
> > On 05/31/2016 02:44 PM, Timothy Arceri wrote:
> > > 
> > > On Tue, 2016-05-31 at 11:52 -0700, Ian Romanick wrote:
> > > > 
> > > > From: Ian Romanick 
> > > > 
> > > > Our piglit tests for geometry and tessellation shader inputs
> > > > were
> > > > incorrect.  Array shader inputs and output should have '[0]' on
> > > > the
> > > > end
> > > > regardless of stage.  In addtion, transform feedback varyings
> > > > should
> > > > not.
> > > Is there a spec quote for this? It doesn't seem right to me since
> > > for
> > > arrays of arrays would that mean we should end up with gs inputs
> > > like
> > > this
> > Here are all the rules that I think applies:
> > 
> >   * For an active variable declared as an array of basic types,
> > a single
> > entry will be generated, with its name string formed by
> > concatenating
> > the name of the array and the string "[0]".
> > 
> >   * For an active variable declared as a structure, a separate
> > entry will
> > be generated for each active structure member.  The name of
> > each entry
> > is formed by concatenating the name of the structure, the
> > "."
> > character, and the name of the structure member.  If a
> > structure
> > member to enumerate is itself a structure or array, these
> > enumeration
> > rules are applied recursively.
> > 
> >   * For an active variable declared as an array of an aggregate
> > data type
> > (structures or arrays), a separate entry will be generated
> > for each
> > active array element, unless noted immediately below.  The
> > name of
> > each entry is formed by concatenating the name of the
> > array, the "["
> > character, an integer identifying the element number, and
> > the "]"
> > character.  These enumeration rules are applied
> > recursively, treating
> > each enumerated array element as a separate active
> > variable.
> > 
> > > 
> > > input_name[0][0]
> > > input_name[...][0]
> > > input_name[num_vertices-1][0]
> > Yes, this is correct. We don't do this with or without this patch.
> > I
> > don't know of any tests that exercise this.  Alas.
> > 
> > > 
> > > otherwise
> > > 
> > > in vec4 input1[];
> > > and
> > > in vec4 input2[][3];
> > > 
> > > Would both end up as:
> > > input1[0]
> > > input2[0]
> > > 
> > > > 
> > > > 
> > > > Signed-off-by: Ian Romanick 
> > > > Cc: "12.0" 
> > > > ---
> > > >  src/mesa/main/shader_query.cpp | 23 ++-
> > > >  1 file changed, 10 insertions(+), 13 deletions(-)
> > > > 
> > > > diff --git a/src/mesa/main/shader_query.cpp
> > > > b/src/mesa/main/shader_query.cpp
> > > > index eec933c..f4b7243 100644
> > > > --- a/src/mesa/main/shader_query.cpp
> > > > +++ b/src/mesa/main/shader_query.cpp
> > > > @@ -696,20 +696,17 @@ _mesa_program_resource_find_index(struct
> > > > gl_shader_program *shProg,
> > > >  static bool
> > > >  add_index_to_name(struct gl_program_resource *res)
> > > >  {
> > > > -   bool add_index = !((res->Type == GL_PROGRAM_INPUT &&
> > > > -   res->StageReferences & (1 <<
> > > > MESA_SHADER_GEOMETRY |
> > > > -   1 <<
> > > > MESA_SHADER_TESS_CTRL |
> > > > -   1 <<
> > > > MESA_SHADER_TESS_EVAL)) ||
> > > > -  (res->Type == GL_PROGRAM_OUTPUT &&
> > > > -   res->StageReferences & 1 <<
> > > > MESA_SHADER_TESS_CTRL));
> > > > -
> > > > -   /* Transform feedback varyings have array index already
> > > > appended
> > > > -* in their names.
> > > > -*/
> > > > -   if (res->Type == GL_TRANSFORM_FEEDBACK_VARYING)
> > > > -  add_index = false;
> > > > +   if (res->Type != GL_PROGRAM_INPUT && res->Type !=
> > > > GL_PROGRAM_OUTPUT)
> > > > +  return res->Type != GL_TRANSFORM_FEEDBACK_VARYING;
> > > I'm slighlty confused by this. When does this return true? And
> > > for
> > > transform feedback wont this always end up us false?
> > > 
> > > So isn't it just 
> > > 
> > >    if (res->Type == GL_TRANSFORM_FEEDBACK_VARYING)
> > >   return false;
> > I thought I tried that but it regressed some dEQP tests.  I'll
> > double
> > check.
> It makes a huge pile of dEQP tests fail because lots of things,
> including UBOs and SSBOs, come through this function.  Inputs and
> outputs get some special treatment because of the array-of-interface
> handing, and xfb variables never get the [0] suffix.  Everything else
> gets the [0] suffix.

Hmm ... I guess thats because lowering of block works different to
interface blocks.

I've taken another look at the code and it seems to me that the change
should be made in _mesa_program_resource_array_size() rather than here:

e.g

   case GL_PROGRAM_INPUT:
   case 

Re: [Mesa-dev] [PATCH 5/7] isl: automake: ensure that we (can) clean the generated files

2016-06-01 Thread Jason Ekstrand
On Jun 1, 2016 4:21 PM, "Jason Ekstrand"  wrote:
>
>
> On May 31, 2016 9:27 AM, "Emil Velikov"  wrote:
> >
> > From: Emil Velikov 
> >
> > Signed-off-by: Emil Velikov 
> > ---
> >  src/gallium/drivers/swr/Makefile.am | 1 +
> >  src/intel/isl/Makefile.am   | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/src/gallium/drivers/swr/Makefile.am
b/src/gallium/drivers/swr/Makefile.am
> > index b8035c7..e160084 100644
> > --- a/src/gallium/drivers/swr/Makefile.am
> > +++ b/src/gallium/drivers/swr/Makefile.am
> > @@ -55,6 +55,7 @@ BUILT_SOURCES = \
> > rasterizer/jitter/builder_gen.cpp \
> > rasterizer/jitter/builder_x86.h \
> > rasterizer/jitter/builder_x86.cpp
> > +
>
> Wrong commit?

With that line dropped,

Reviewed-by: Jason Ekstrand 

> >  CLEANFILES = $(BUILT_SOURCES)
> >
> >  MKDIR_GEN = $(AM_V_at)$(MKDIR_P) $(@D)
> > diff --git a/src/intel/isl/Makefile.am b/src/intel/isl/Makefile.am
> > index 74f863a..388e7e9 100644
> > --- a/src/intel/isl/Makefile.am
> > +++ b/src/intel/isl/Makefile.am
> > @@ -65,6 +65,7 @@ libisl_gen9_la_SOURCES = $(ISL_GEN9_FILES)
> >  libisl_gen9_la_CFLAGS = $(libisl_la_CFLAGS) -DGEN_VERSIONx10=90
> >
> >  BUILT_SOURCES = $(ISL_GENERATED_FILES)
> > +CLEANFILES = $(BUILT_SOURCES)
> >
> >  isl_format_layout.c: isl_format_layout_gen.bash \
> >   isl_format_layout.csv
> > --
> > 2.8.2
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/7] isl: automake: don't include isl_format_layout.c in two lists.

2016-06-01 Thread Jason Ekstrand
On Jun 1, 2016 4:20 PM, "Jason Ekstrand"  wrote:
>
>
> On May 31, 2016 9:27 AM, "Emil Velikov"  wrote:
> >
> > From: Emil Velikov 
> >
> > Including the file in both ISL_FILES and ISL_GENERATED_FILES makes
> > the actual dependency list less obvious.
> >
> > Signed-off-by: Emil Velikov 
> > ---
> >  src/intel/isl/Makefile.am  | 2 +-
> >  src/intel/isl/Makefile.sources | 1 -
> >  src/intel/vulkan/Makefile.am   | 1 +
> >  3 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/intel/isl/Makefile.am b/src/intel/isl/Makefile.am
> > index 4922b1f..74f863a 100644
> > --- a/src/intel/isl/Makefile.am
> > +++ b/src/intel/isl/Makefile.am
> > @@ -50,7 +50,7 @@ libisl_la_CFLAGS = $(CFLAGS) -Wno-override-init
> >
> >  libisl_la_LIBADD = $(ISL_GEN_LIBS)
> >
> > -libisl_la_SOURCES = $(ISL_FILES)
> > +libisl_la_SOURCES = $(ISL_FILES) $(ISL_GENERATED_FILES)
> >
> >  libisl_gen7_la_SOURCES = $(ISL_GEN7_FILES)
> >  libisl_gen7_la_CFLAGS = $(libisl_la_CFLAGS) -DGEN_VERSIONx10=70
> > diff --git a/src/intel/isl/Makefile.sources
b/src/intel/isl/Makefile.sources
> > index fe6a00f..89b1418 100644
> > --- a/src/intel/isl/Makefile.sources
> > +++ b/src/intel/isl/Makefile.sources
> > @@ -2,7 +2,6 @@ ISL_FILES = \
> > isl.c \
> > isl.h \
> > isl_format.c \
> > -   isl_format_layout.c \
> > isl_gen4.c \
> > isl_gen4.h \
> > isl_gen6.c \
> > diff --git a/src/intel/vulkan/Makefile.am b/src/intel/vulkan/Makefile.am
> > index 37c2986..9d36b22 100644
> > --- a/src/intel/vulkan/Makefile.am
> > +++ b/src/intel/vulkan/Makefile.am
> > @@ -129,6 +129,7 @@ VULKAN_ENTRYPOINT_CPPFLAGS = \
> > -DVK_USE_PLATFORM_WAYLAND_KHR
> >
> >  anv_entrypoints.h : anv_entrypoints_gen.py $(vulkan_include_HEADERS)
> > +   $(CPP) $(VULKAN_ENTRYPOINT_CPPFLAGS)
$(top_srcdir)/include/vulkan/vulkan_intel.h >parsed_header.log 2>&1
>
> I don't think this line was intended.

With that fixed

Reviewed-by: Jason Ekstrand 

> > $(AM_V_GEN)$(CPP) $(VULKAN_ENTRYPOINT_CPPFLAGS)
$(top_srcdir)/include/vulkan/vulkan_intel.h |\
> > $(PYTHON2) $(srcdir)/anv_entrypoints_gen.py header > $@
> >
> > --
> > 2.8.2
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] isl: automake: ensure that we (can) clean the generated files

2016-06-01 Thread Jason Ekstrand
On May 31, 2016 9:27 AM, "Emil Velikov"  wrote:
>
> From: Emil Velikov 
>
> Signed-off-by: Emil Velikov 
> ---
>  src/gallium/drivers/swr/Makefile.am | 1 +
>  src/intel/isl/Makefile.am   | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/src/gallium/drivers/swr/Makefile.am
b/src/gallium/drivers/swr/Makefile.am
> index b8035c7..e160084 100644
> --- a/src/gallium/drivers/swr/Makefile.am
> +++ b/src/gallium/drivers/swr/Makefile.am
> @@ -55,6 +55,7 @@ BUILT_SOURCES = \
> rasterizer/jitter/builder_gen.cpp \
> rasterizer/jitter/builder_x86.h \
> rasterizer/jitter/builder_x86.cpp
> +

Wrong commit?

>  CLEANFILES = $(BUILT_SOURCES)
>
>  MKDIR_GEN = $(AM_V_at)$(MKDIR_P) $(@D)
> diff --git a/src/intel/isl/Makefile.am b/src/intel/isl/Makefile.am
> index 74f863a..388e7e9 100644
> --- a/src/intel/isl/Makefile.am
> +++ b/src/intel/isl/Makefile.am
> @@ -65,6 +65,7 @@ libisl_gen9_la_SOURCES = $(ISL_GEN9_FILES)
>  libisl_gen9_la_CFLAGS = $(libisl_la_CFLAGS) -DGEN_VERSIONx10=90
>
>  BUILT_SOURCES = $(ISL_GENERATED_FILES)
> +CLEANFILES = $(BUILT_SOURCES)
>
>  isl_format_layout.c: isl_format_layout_gen.bash \
>   isl_format_layout.csv
> --
> 2.8.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/7] isl: automake: don't include isl_format_layout.c in two lists.

2016-06-01 Thread Jason Ekstrand
On May 31, 2016 9:27 AM, "Emil Velikov"  wrote:
>
> From: Emil Velikov 
>
> Including the file in both ISL_FILES and ISL_GENERATED_FILES makes
> the actual dependency list less obvious.
>
> Signed-off-by: Emil Velikov 
> ---
>  src/intel/isl/Makefile.am  | 2 +-
>  src/intel/isl/Makefile.sources | 1 -
>  src/intel/vulkan/Makefile.am   | 1 +
>  3 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/isl/Makefile.am b/src/intel/isl/Makefile.am
> index 4922b1f..74f863a 100644
> --- a/src/intel/isl/Makefile.am
> +++ b/src/intel/isl/Makefile.am
> @@ -50,7 +50,7 @@ libisl_la_CFLAGS = $(CFLAGS) -Wno-override-init
>
>  libisl_la_LIBADD = $(ISL_GEN_LIBS)
>
> -libisl_la_SOURCES = $(ISL_FILES)
> +libisl_la_SOURCES = $(ISL_FILES) $(ISL_GENERATED_FILES)
>
>  libisl_gen7_la_SOURCES = $(ISL_GEN7_FILES)
>  libisl_gen7_la_CFLAGS = $(libisl_la_CFLAGS) -DGEN_VERSIONx10=70
> diff --git a/src/intel/isl/Makefile.sources
b/src/intel/isl/Makefile.sources
> index fe6a00f..89b1418 100644
> --- a/src/intel/isl/Makefile.sources
> +++ b/src/intel/isl/Makefile.sources
> @@ -2,7 +2,6 @@ ISL_FILES = \
> isl.c \
> isl.h \
> isl_format.c \
> -   isl_format_layout.c \
> isl_gen4.c \
> isl_gen4.h \
> isl_gen6.c \
> diff --git a/src/intel/vulkan/Makefile.am b/src/intel/vulkan/Makefile.am
> index 37c2986..9d36b22 100644
> --- a/src/intel/vulkan/Makefile.am
> +++ b/src/intel/vulkan/Makefile.am
> @@ -129,6 +129,7 @@ VULKAN_ENTRYPOINT_CPPFLAGS = \
> -DVK_USE_PLATFORM_WAYLAND_KHR
>
>  anv_entrypoints.h : anv_entrypoints_gen.py $(vulkan_include_HEADERS)
> +   $(CPP) $(VULKAN_ENTRYPOINT_CPPFLAGS)
$(top_srcdir)/include/vulkan/vulkan_intel.h >parsed_header.log 2>&1

I don't think this line was intended.

> $(AM_V_GEN)$(CPP) $(VULKAN_ENTRYPOINT_CPPFLAGS)
$(top_srcdir)/include/vulkan/vulkan_intel.h |\
> $(PYTHON2) $(srcdir)/anv_entrypoints_gen.py header > $@
>
> --
> 2.8.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965/fs: Copy the offset when lowering logical pull constant sends

2016-06-01 Thread Jason Ekstrand
On Wed, Jun 1, 2016 at 3:58 PM, Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > On Wed, Jun 1, 2016 at 3:33 PM, Francisco Jerez 
> > wrote:
> >
> >> Jason Ekstrand  writes:
> >>
> >> > This fixes 64 Vulkan CTS tests per gen
> >> >
> >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
> >> > Cc: "12.0" 
> >> > Cc: Francisco Jerez 
> >> > Cc: Mark Janes 
> >> > ---
> >> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 8 
> >> >  1 file changed, 8 insertions(+)
> >> >
> >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> > index 00d937e..20bb900 100644
> >> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> >> > @@ -4448,6 +4448,14 @@ lower_varying_pull_constant_logical_send(const
> >> fs_builder , fs_inst *inst)
> >> > const brw_device_info *devinfo = bld.shader->devinfo;
> >> >
> >> > if (devinfo->gen >= 7) {
> >> > +  /* We are switching the instruction from an ALU-like
> instruction
> >> to a
> >> > +   * send-from-grf instruction.  Since sends can't handle
> strides or
> >> > +   * source modifiers, we have to make a copy of the offset
> source.
> >> > +   */
> >> > +  fs_reg tmp = bld.vgrf(inst->src[1].type);
> >>
> >> I suggest you use a fixed UD type for the temporary (since that is what
> >> the varying pull constant load instruction requires and using any other
> >> type will cause the send-message to silently reinterpret the offset as
> >> something else than what it is.  E.g. think non-32bit integers or
> >> integer vectors, a type-converting move is what you want below in such
> >> cases).  With that change this patch is:
> >>
> >
> > Hrm... How would we ever get anything in there other than a UD?  If
> > something else, say a float, got copy-propagated in here, that seems to
> be
> > a problem with the optimizer not lowering.  I'm fine making the change if
> > you'd like.
> >
> If you don't expect anything else than a UD as source, how about
> 'assert(...type == BRW_REGISTER_TYPE_UD)'?  I think it should be okay
> either way, I suggested using a fixed type above instead because that
> way you'd save an artificial restriction on the source register type of
> this virtual opcode, making the assertion redundant.
>

Fair enough.  Certainly, an implicit format conversion is a better failure
mode than a GPU hang or something similarly stupid.


> > Reviewed-by: Francisco Jerez 
> >>
> >
> > Thanks!
> >
> >
> >>
> >> > +  bld.MOV(tmp, inst->src[1]);
> >> > +  inst->src[1] = tmp;
> >> > +
> >> >inst->opcode = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7;
> >> >
> >> > } else {
> >> > --
> >> > 2.5.0.400.gff86faf
> >> >
> >> > ___
> >> > mesa-stable mailing list
> >> > mesa-sta...@lists.freedesktop.org
> >> > https://lists.freedesktop.org/mailman/listinfo/mesa-stable
> >>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965/fs: Copy the offset when lowering logical pull constant sends

2016-06-01 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Wed, Jun 1, 2016 at 3:33 PM, Francisco Jerez 
> wrote:
>
>> Jason Ekstrand  writes:
>>
>> > This fixes 64 Vulkan CTS tests per gen
>> >
>> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
>> > Cc: "12.0" 
>> > Cc: Francisco Jerez 
>> > Cc: Mark Janes 
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 8 
>> >  1 file changed, 8 insertions(+)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > index 00d937e..20bb900 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> > @@ -4448,6 +4448,14 @@ lower_varying_pull_constant_logical_send(const
>> fs_builder , fs_inst *inst)
>> > const brw_device_info *devinfo = bld.shader->devinfo;
>> >
>> > if (devinfo->gen >= 7) {
>> > +  /* We are switching the instruction from an ALU-like instruction
>> to a
>> > +   * send-from-grf instruction.  Since sends can't handle strides or
>> > +   * source modifiers, we have to make a copy of the offset source.
>> > +   */
>> > +  fs_reg tmp = bld.vgrf(inst->src[1].type);
>>
>> I suggest you use a fixed UD type for the temporary (since that is what
>> the varying pull constant load instruction requires and using any other
>> type will cause the send-message to silently reinterpret the offset as
>> something else than what it is.  E.g. think non-32bit integers or
>> integer vectors, a type-converting move is what you want below in such
>> cases).  With that change this patch is:
>>
>
> Hrm... How would we ever get anything in there other than a UD?  If
> something else, say a float, got copy-propagated in here, that seems to be
> a problem with the optimizer not lowering.  I'm fine making the change if
> you'd like.
>
If you don't expect anything else than a UD as source, how about
'assert(...type == BRW_REGISTER_TYPE_UD)'?  I think it should be okay
either way, I suggested using a fixed type above instead because that
way you'd save an artificial restriction on the source register type of
this virtual opcode, making the assertion redundant.

> Reviewed-by: Francisco Jerez 
>>
>
> Thanks!
>
>
>>
>> > +  bld.MOV(tmp, inst->src[1]);
>> > +  inst->src[1] = tmp;
>> > +
>> >inst->opcode = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7;
>> >
>> > } else {
>> > --
>> > 2.5.0.400.gff86faf
>> >
>> > ___
>> > mesa-stable mailing list
>> > mesa-sta...@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-stable
>>


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 00/12] Rework CS local IDs for gen7+

2016-06-01 Thread Jason Ekstrand
I made a few cosmetic comments on patch 9 and I made a comment on patch 5
that might make a good future cleanup.  However, I think it's good enough
to land.

Reviewed-by: Jason Ekstrand 

Don't forget to cc 12.0
--Jason

On Wed, Jun 1, 2016 at 3:04 PM, Jordan Justen 
wrote:

> git://people.freedesktop.org/~jljusten/mesa
> hsw-cs-cross-thread-constants-v4
>
> v4:
>  * Support both the old and new layouts until the switch-over to the
>new layout. This minimizes the size of the switch over patch.
>(Jason)
>
> v3:
>  * https://lists.freedesktop.org/archives/mesa-dev/2016-May/118722.html
>
> v2:
>  * https://lists.freedesktop.org/archives/mesa-dev/2016-May/118566.html
>
> v1:
>  * https://lists.freedesktop.org/archives/mesa-dev/2016-May/117952.html
>
>
> Jordan Justen (12):
>   glsl: Add glsl LowerCsDerivedVariables option
>   nir: Make lowering gl_LocalInvocationIndex optional
>   i965: Add nir channel_num system value
>   i965: Add uniform for a CS thread local base ID
>   i965: Put CS local thread ID uniform in last push register
>   i965: Add nir based intrinsic lowering and thread ID uniform
>   i965: Store number of threads in brw_cs_prog_data
>   i965: Add CS push constant info to brw_cs_prog_data
>   i965: Support new local ID push constant & cross-thread constants
>   anv: Support new local ID generation & cross-thread constants
>   i965: Enable cross-thread constants and compact local IDs for hsw+
>   i965: Remove old CS local ID handling
>
>  src/compiler/glsl/builtin_variables.cpp|  29 ++--
>  src/compiler/glsl/glsl_parser_extras.cpp   |   2 +-
>  src/compiler/glsl/ir.h |   3 +-
>  src/compiler/nir/nir.c |   4 +
>  src/compiler/nir/nir.h |   2 +
>  src/compiler/nir/nir_gather_info.c |   1 +
>  src/compiler/nir/nir_intrinsics.h  |   2 +
>  src/compiler/nir/nir_lower_system_values.c |  16 +-
>  src/intel/vulkan/anv_cmd_buffer.c  |  52 +++
>  src/intel/vulkan/anv_pipeline.c|   4 +
>  src/intel/vulkan/anv_private.h |   1 -
>  src/intel/vulkan/gen7_cmd_buffer.c |  15 +-
>  src/intel/vulkan/gen8_cmd_buffer.c |  13 +-
>  src/intel/vulkan/genX_cmd_buffer.c |   4 +-
>  src/intel/vulkan/genX_pipeline.c   |  12 +-
>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>  src/mesa/drivers/dri/i965/brw_compiler.h   |  22 ++-
>  src/mesa/drivers/dri/i965/brw_cs.c |   3 +
>  src/mesa/drivers/dri/i965/brw_defines.h|   3 +
>  src/mesa/drivers/dri/i965/brw_fs.cpp   | 197
> +
>  src/mesa/drivers/dri/i965/brw_fs.h |   1 -
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  22 ++-
>  src/mesa/drivers/dri/i965/brw_nir.h|   2 +
>  src/mesa/drivers/dri/i965/brw_nir_intrinsics.c | 179
> ++
>  src/mesa/drivers/dri/i965/gen7_cs_state.c  | 124 
>  src/mesa/main/mtypes.h |   3 +
>  src/mesa/state_tracker/st_extensions.c |   1 +
>  27 files changed, 472 insertions(+), 246 deletions(-)
>  create mode 100644 src/mesa/drivers/dri/i965/brw_nir_intrinsics.c
>
> --
> 2.8.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 09/12] i965: Support new local ID push constant & cross-thread constants

2016-06-01 Thread Jason Ekstrand
On Wed, Jun 1, 2016 at 3:04 PM, Jordan Justen 
wrote:

> The cross thread constant support appears on Haswell. It allows us to
> upload a set of uniform data for all threads without duplicating it
> per thread.
>
> We also support per-thread data which allows us to store a per-thread
> ID in one of the uniforms that can be used to calculate the
> gl_LocalInvocationIndex and gl_LocalInvocationID variables.
>
> v4:
>  * Support the old local ID push constant layout as well (Jason)
>
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h   |  3 +
>  src/mesa/drivers/dri/i965/gen7_cs_state.c | 99
> +--
>  2 files changed, 56 insertions(+), 46 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 4eb6b1f..e7d1a9f 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -2943,6 +2943,9 @@ enum brw_wm_barycentric_interp_mode {
>  # define MEDIA_GPGPU_THREAD_COUNT_MASK  INTEL_MASK(7, 0)
>  # define GEN8_MEDIA_GPGPU_THREAD_COUNT_SHIFT0
>  # define GEN8_MEDIA_GPGPU_THREAD_COUNT_MASK INTEL_MASK(9, 0)
> +/* GEN7 DW6, GEN8+ DW7 */
> +# define CROSS_THREAD_READ_LENGTH_SHIFT 0
> +# define CROSS_THREAD_READ_LENGTH_MASK  INTEL_MASK(7, 0)
>  #define MEDIA_STATE_FLUSH   0x7004
>  #define GPGPU_WALKER0x7105
>  /* GEN7 DW0 */
> diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> index 619edfb..2fee02d 100644
> --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> @@ -42,7 +42,6 @@ brw_upload_cs_state(struct brw_context *brw)
> uint32_t offset;
> uint32_t *desc = (uint32_t*) brw_state_batch(brw,
> AUB_TRACE_SURFACE_STATE,
>  8 * 4, 64, );
> -   struct gl_program *prog = (struct gl_program *) brw->compute_program;
> struct brw_stage_state *stage_state = >cs.base;
> struct brw_cs_prog_data *cs_prog_data = brw->cs.prog_data;
> struct brw_stage_prog_data *prog_data = _prog_data->base;
> @@ -59,16 +58,6 @@ brw_upload_cs_state(struct brw_context *brw)
>
>  prog_data->binding_table.size_bytes,
>  32,
> _state->bind_bo_offset);
>
> -   unsigned local_id_dwords = 0;
> -
> -   if (prog->SystemValuesRead & SYSTEM_BIT_LOCAL_INVOCATION_ID)
> -  local_id_dwords = cs_prog_data->local_invocation_id_regs * 8;
> -
> -   unsigned push_constant_data_size =
> -  (prog_data->nr_params + local_id_dwords) *
> sizeof(gl_constant_value);
> -   unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size,
> 32);
> -   unsigned push_constant_regs = reg_aligned_constant_size / 32;
> -
> uint32_t dwords = brw->gen < 8 ? 8 : 9;
> BEGIN_BATCH(dwords);
> OUT_BATCH(MEDIA_VFE_STATE << 16 | (dwords - 2));
> @@ -118,7 +107,8 @@ brw_upload_cs_state(struct brw_context *brw)
>  * Note: The constant data is built in brw_upload_cs_push_constants
> below.
>  */
> const uint32_t vfe_curbe_allocation =
> -  push_constant_regs * cs_prog_data->threads;
> +  ALIGN(cs_prog_data->push.per_thread.regs * cs_prog_data->threads +
> +cs_prog_data->push.cross_thread.regs, 2);
> OUT_BATCH(SET_FIELD(vfe_urb_allocation, MEDIA_VFE_STATE_URB_ALLOC) |
>   SET_FIELD(vfe_curbe_allocation,
> MEDIA_VFE_STATE_CURBE_ALLOC));
> OUT_BATCH(0);
> @@ -126,11 +116,11 @@ brw_upload_cs_state(struct brw_context *brw)
> OUT_BATCH(0);
> ADVANCE_BATCH();
>
> -   if (reg_aligned_constant_size > 0) {
> +   if (cs_prog_data->push.total.size > 0) {
>BEGIN_BATCH(4);
>OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2));
>OUT_BATCH(0);
> -  OUT_BATCH(ALIGN(reg_aligned_constant_size * cs_prog_data->threads,
> 64));
> +  OUT_BATCH(ALIGN(cs_prog_data->push.total.size, 64));
>OUT_BATCH(stage_state->push_const_offset);
>ADVANCE_BATCH();
> }
> @@ -149,7 +139,8 @@ brw_upload_cs_state(struct brw_context *brw)
> desc[dw++] = stage_state->sampler_offset |
>((stage_state->sampler_count + 3) / 4);
> desc[dw++] = stage_state->bind_bo_offset;
> -   desc[dw++] = SET_FIELD(push_constant_regs, MEDIA_CURBE_READ_LENGTH);
> +   desc[dw++] = SET_FIELD(cs_prog_data->push.per_thread.regs,
> +  MEDIA_CURBE_READ_LENGTH);
> const uint32_t media_threads =
>brw->gen >= 8 ?
>SET_FIELD(cs_prog_data->threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) :
> @@ -171,6 +162,10 @@ brw_upload_cs_state(struct brw_context *brw)
>SET_FIELD(slm_size, MEDIA_SHARED_LOCAL_MEMORY_SIZE) |
>media_threads;
>
> +   desc[dw++] =
> +  SET_FIELD(cs_prog_data->push.cross_thread.regs,
> +CROSS_THREAD_READ_LENGTH);
>

I don't think this needs 3 

Re: [Mesa-dev] [PATCH v4 07/12] i965: Store number of threads in brw_cs_prog_data

2016-06-01 Thread Jason Ekstrand
On Wed, Jun 1, 2016 at 3:04 PM, Jordan Justen 
wrote:

> Signed-off-by: Jordan Justen 
> ---
>  src/intel/vulkan/anv_cmd_buffer.c |  7 +++
>  src/intel/vulkan/anv_private.h|  1 -
>  src/intel/vulkan/gen7_cmd_buffer.c|  2 +-
>  src/intel/vulkan/gen8_cmd_buffer.c|  2 +-
>  src/intel/vulkan/genX_cmd_buffer.c|  4 ++--
>  src/intel/vulkan/genX_pipeline.c  |  4 +---
>  src/mesa/drivers/dri/i965/brw_compiler.h  |  1 +
>  src/mesa/drivers/dri/i965/brw_fs.cpp  | 15 ---
>  src/mesa/drivers/dri/i965/gen7_cs_state.c | 32
> ++-
>  9 files changed, 31 insertions(+), 37 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_cmd_buffer.c
> b/src/intel/vulkan/anv_cmd_buffer.c
> index 4d0fd7c..63d096c 100644
> --- a/src/intel/vulkan/anv_cmd_buffer.c
> +++ b/src/intel/vulkan/anv_cmd_buffer.c
> @@ -1076,9 +1076,8 @@ anv_cmd_buffer_cs_push_constants(struct
> anv_cmd_buffer *cmd_buffer)
> if (reg_aligned_constant_size == 0)
>return (struct anv_state) { .offset = 0 };
>
> -   const unsigned threads = pipeline->cs_thread_width_max;
> const unsigned total_push_constants_size =
> -  reg_aligned_constant_size * threads;
> +  reg_aligned_constant_size * cs_prog_data->threads;
> const unsigned push_constant_alignment =
>cmd_buffer->device->info.gen < 8 ? 32 : 64;
> const unsigned aligned_total_push_constants_size =
> @@ -1091,7 +1090,7 @@ anv_cmd_buffer_cs_push_constants(struct
> anv_cmd_buffer *cmd_buffer)
> /* Walk through the param array and fill the buffer with data */
> uint32_t *u32_map = state.map;
>
> -   brw_cs_fill_local_id_payload(cs_prog_data, u32_map, threads,
> +   brw_cs_fill_local_id_payload(cs_prog_data, u32_map,
> cs_prog_data->threads,
>  reg_aligned_constant_size);
>
> /* Setup uniform data for the first thread */
> @@ -1102,7 +1101,7 @@ anv_cmd_buffer_cs_push_constants(struct
> anv_cmd_buffer *cmd_buffer)
>
> /* Copy uniform data from the first thread to every other thread */
> const size_t uniform_data_size = prog_data->nr_params *
> sizeof(uint32_t);
> -   for (unsigned t = 1; t < threads; t++) {
> +   for (unsigned t = 1; t < cs_prog_data->threads; t++) {
>memcpy(_map[t * param_aligned_count + local_id_dwords],
>   _map[local_id_dwords],
>   uniform_data_size);
> diff --git a/src/intel/vulkan/anv_private.h
> b/src/intel/vulkan/anv_private.h
> index 7325f3f..26ffbd6 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -1474,7 +1474,6 @@ struct anv_pipeline {
> bool primitive_restart;
> uint32_t topology;
>
> -   uint32_t cs_thread_width_max;
>

Hooray!  Less crap in the pipeline!


> uint32_t cs_right_mask;
>
> struct {
> diff --git a/src/intel/vulkan/gen7_cmd_buffer.c
> b/src/intel/vulkan/gen7_cmd_buffer.c
> index 331275e..40ab008 100644
> --- a/src/intel/vulkan/gen7_cmd_buffer.c
> +++ b/src/intel/vulkan/gen7_cmd_buffer.c
> @@ -271,7 +271,7 @@ flush_compute_descriptor_set(struct anv_cmd_buffer
> *cmd_buffer)
>.BarrierEnable = cs_prog_data->uses_barrier,
>.SharedLocalMemorySize = slm_size,
>.NumberofThreadsinGPGPUThreadGroup =
> - pipeline->cs_thread_width_max);
> + cs_prog_data->threads);
>
> const uint32_t size = GENX(INTERFACE_DESCRIPTOR_DATA_length) *
> sizeof(uint32_t);
> anv_batch_emit(_buffer->batch,
> diff --git a/src/intel/vulkan/gen8_cmd_buffer.c
> b/src/intel/vulkan/gen8_cmd_buffer.c
> index 547fedd..e139e8a 100644
> --- a/src/intel/vulkan/gen8_cmd_buffer.c
> +++ b/src/intel/vulkan/gen8_cmd_buffer.c
> @@ -356,7 +356,7 @@ flush_compute_descriptor_set(struct anv_cmd_buffer
> *cmd_buffer)
>.BarrierEnable = cs_prog_data->uses_barrier,
>.SharedLocalMemorySize = slm_size,
>.NumberofThreadsinGPGPUThreadGroup =
> - pipeline->cs_thread_width_max);
> + cs_prog_data->threads);
>
> uint32_t size = GENX(INTERFACE_DESCRIPTOR_DATA_length) *
> sizeof(uint32_t);
> anv_batch_emit(_buffer->batch,
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index e7d322c..d9acf58 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -773,7 +773,7 @@ void genX(CmdDispatch)(
>ggw.SIMDSize = prog_data->simd_size / 16;
>ggw.ThreadDepthCounterMaximum= 0;
>ggw.ThreadHeightCounterMaximum   = 0;
> -  ggw.ThreadWidthCounterMaximum= 

Re: [Mesa-dev] [Mesa-stable] [PATCH] i965/fs: Copy the offset when lowering logical pull constant sends

2016-06-01 Thread Jason Ekstrand
On Wed, Jun 1, 2016 at 3:33 PM, Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > This fixes 64 Vulkan CTS tests per gen
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
> > Cc: "12.0" 
> > Cc: Francisco Jerez 
> > Cc: Mark Janes 
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 00d937e..20bb900 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -4448,6 +4448,14 @@ lower_varying_pull_constant_logical_send(const
> fs_builder , fs_inst *inst)
> > const brw_device_info *devinfo = bld.shader->devinfo;
> >
> > if (devinfo->gen >= 7) {
> > +  /* We are switching the instruction from an ALU-like instruction
> to a
> > +   * send-from-grf instruction.  Since sends can't handle strides or
> > +   * source modifiers, we have to make a copy of the offset source.
> > +   */
> > +  fs_reg tmp = bld.vgrf(inst->src[1].type);
>
> I suggest you use a fixed UD type for the temporary (since that is what
> the varying pull constant load instruction requires and using any other
> type will cause the send-message to silently reinterpret the offset as
> something else than what it is.  E.g. think non-32bit integers or
> integer vectors, a type-converting move is what you want below in such
> cases).  With that change this patch is:
>

Hrm... How would we ever get anything in there other than a UD?  If
something else, say a float, got copy-propagated in here, that seems to be
a problem with the optimizer not lowering.  I'm fine making the change if
you'd like.

Reviewed-by: Francisco Jerez 
>

Thanks!


>
> > +  bld.MOV(tmp, inst->src[1]);
> > +  inst->src[1] = tmp;
> > +
> >inst->opcode = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7;
> >
> > } else {
> > --
> > 2.5.0.400.gff86faf
> >
> > ___
> > mesa-stable mailing list
> > mesa-sta...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-stable
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Mesa-stable] [PATCH] i965/fs: Copy the offset when lowering logical pull constant sends

2016-06-01 Thread Francisco Jerez
Jason Ekstrand  writes:

> This fixes 64 Vulkan CTS tests per gen
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
> Cc: "12.0" 
> Cc: Francisco Jerez 
> Cc: Mark Janes 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 00d937e..20bb900 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -4448,6 +4448,14 @@ lower_varying_pull_constant_logical_send(const 
> fs_builder , fs_inst *inst)
> const brw_device_info *devinfo = bld.shader->devinfo;
>  
> if (devinfo->gen >= 7) {
> +  /* We are switching the instruction from an ALU-like instruction to a
> +   * send-from-grf instruction.  Since sends can't handle strides or
> +   * source modifiers, we have to make a copy of the offset source.
> +   */
> +  fs_reg tmp = bld.vgrf(inst->src[1].type);

I suggest you use a fixed UD type for the temporary (since that is what
the varying pull constant load instruction requires and using any other
type will cause the send-message to silently reinterpret the offset as
something else than what it is.  E.g. think non-32bit integers or
integer vectors, a type-converting move is what you want below in such
cases).  With that change this patch is:

Reviewed-by: Francisco Jerez 

> +  bld.MOV(tmp, inst->src[1]);
> +  inst->src[1] = tmp;
> +
>inst->opcode = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7;
>  
> } else {
> -- 
> 2.5.0.400.gff86faf
>
> ___
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 05/12] i965: Put CS local thread ID uniform in last push register

2016-06-01 Thread Jason Ekstrand
On Wed, Jun 1, 2016 at 3:04 PM, Jordan Justen 
wrote:

> This thread ID uniform will be used to compute the
> gl_LocalInvocationIndex and gl_LocalInvocationID values.
>
> It is important for this uniform to be added in the last push constant
> register. fs_visitor::assign_constant_locations is updated to make
> sure this happens.
>
> The reason this is important is that the cross-thread push constant
> registers are loaded first, and the per-thread push constant registers
> are loaded after that. (Broadwell adds another push constant upload
> mechanism which reverses this order, but we are ignoring this for
> now.)
>
> v2:
>  * Add variable in intrinsics lowering pass
>  * Make sure the ID is pushed last in assign_constant_locations, and
>that we save a spot for the ID in the push constants
>
> v3:
>  * Simplify code based with Jason's suggestions.
>
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index e8a3aab..bb1bf7a 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2097,6 +2097,10 @@ fs_visitor::assign_constant_locations()
> bool contiguous[uniforms];
> memset(contiguous, 0, sizeof(contiguous));
>
> +   int thread_local_id_index =
> +  (stage == MESA_SHADER_COMPUTE) ?
> +  ((brw_cs_prog_data*)stage_prog_data)->thread_local_id_index : -1;
> +
> /* First, we walk through the instructions and do two things:
>  *
>  *  1) Figure out which uniforms are live.
> @@ -2141,6 +2145,9 @@ fs_visitor::assign_constant_locations()
>}
> }
>
> +   if (thread_local_id_index >= 0 && !is_live[thread_local_id_index])
> +  thread_local_id_index = -1;
> +
> /* Only allow 16 registers (128 uniform components) as push constants.
>  *
>  * Just demote the end of the list.  We could probably do better
> @@ -2149,7 +2156,9 @@ fs_visitor::assign_constant_locations()
>  * If changing this value, note the limitation about total_regs in
>  * brw_curbe.c.
>  */
> -   const unsigned int max_push_components = 16 * 8;
> +   unsigned int max_push_components = 16 * 8;
> +   if (thread_local_id_index >= 0)
> +  max_push_components--; /* Save a slot for the thread ID */
>
> /* We push small arrays, but no bigger than 16 floats.  This is big
> enough
>  * for a vec4 but hopefully not large enough to push out other stuff.
> We
> @@ -2187,6 +2196,10 @@ fs_visitor::assign_constant_locations()
>if (!is_live[u] || is_live_64bit[u])
>   continue;
>
> +  /* Skip thread_local_id_index to put it in the last push register.
> */
> +  if (thread_local_id_index == (int)u)
> + continue;
> +
>set_push_pull_constant_loc(u, _start, contiguous[u],
>   push_constant_loc, pull_constant_loc,
>   _push_constants, _pull_constants,
> @@ -2194,6 +2207,10 @@ fs_visitor::assign_constant_locations()
>   stage_prog_data);
> }
>
> +   /* Add the CS local thread ID uniform at the end of the push constants
> */
> +   if (thread_local_id_index >= 0)
> +  push_constant_loc[thread_local_id_index] = num_push_constants++;
> +
> /* As the uniforms are going to be reordered, take the data from a
> temporary
>  * copy of the original param[].
>  */
> @@ -2212,6 +2229,7 @@ fs_visitor::assign_constant_locations()
>  * push_constant_loc[i] <= i and we can do it in one smooth loop
> without
>  * having to make a copy.
>  */
> +   int new_thread_local_id_index = -1;
> for (unsigned int i = 0; i < uniforms; i++) {
>const gl_constant_value *value = param[i];
>
> @@ -2219,9 +2237,15 @@ fs_visitor::assign_constant_locations()
>   stage_prog_data->pull_param[pull_constant_loc[i]] = value;
>} else if (push_constant_loc[i] != -1) {
>   stage_prog_data->param[push_constant_loc[i]] = value;
> + if (thread_local_id_index == (int)i)
> +new_thread_local_id_index = push_constant_loc[i];
>

First off, I think the following is better done as a fix-up patch if we do
it at all :-)

If we make this

if ((int)i == thread_local_id_index) {
   assert(stage == MESA_SHADER_COMPUTE)
   ((brw_cs_prog_data *)stage_prog_data)->thread_local_id_index =
push_constant_loc[i];
   continue;
}

at the top of the loop then may be able to avoid having a "param" entry for
the local id.  This would mean we could get rid of the extra code where we
set up param and nr_param.  Just a thought.


>}
> }
> ralloc_free(param);
> +
> +   if (stage == MESA_SHADER_COMPUTE)
> +  ((brw_cs_prog_data*)stage_prog_data)->thread_local_id_index =
> + new_thread_local_id_index;
>  }
>
>  /**
> --
> 

[Mesa-dev] [PATCH] i965/fs: Copy the offset when lowering logical pull constant sends

2016-06-01 Thread Jason Ekstrand
This fixes 64 Vulkan CTS tests per gen

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
Cc: "12.0" 
Cc: Francisco Jerez 
Cc: Mark Janes 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 00d937e..20bb900 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4448,6 +4448,14 @@ lower_varying_pull_constant_logical_send(const 
fs_builder , fs_inst *inst)
const brw_device_info *devinfo = bld.shader->devinfo;
 
if (devinfo->gen >= 7) {
+  /* We are switching the instruction from an ALU-like instruction to a
+   * send-from-grf instruction.  Since sends can't handle strides or
+   * source modifiers, we have to make a copy of the offset source.
+   */
+  fs_reg tmp = bld.vgrf(inst->src[1].type);
+  bld.MOV(tmp, inst->src[1]);
+  inst->src[1] = tmp;
+
   inst->opcode = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7;
 
} else {
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 03/12] i965: Add nir channel_num system value

2016-06-01 Thread Jordan Justen
v2:
 * simd16/32 fixes (curro)

Signed-off-by: Jordan Justen 
---
 src/compiler/nir/nir_intrinsics.h|  1 +
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/src/compiler/nir/nir_intrinsics.h 
b/src/compiler/nir/nir_intrinsics.h
index aeb6038..6f86c9f 100644
--- a/src/compiler/nir/nir_intrinsics.h
+++ b/src/compiler/nir/nir_intrinsics.h
@@ -304,6 +304,7 @@ SYSTEM_VALUE(work_group_id, 3, 0, xx, xx, xx)
 SYSTEM_VALUE(user_clip_plane, 4, 1, UCP_ID, xx, xx)
 SYSTEM_VALUE(num_work_groups, 3, 0, xx, xx, xx)
 SYSTEM_VALUE(helper_invocation, 1, 0, xx, xx, xx)
+SYSTEM_VALUE(channel_num, 1, 0, xx, xx, xx)
 
 /*
  * Load operations pull data from some piece of GPU memory.  All load
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 9b6093c..81c7204 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -3876,6 +3876,21 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   break;
}
 
+   case nir_intrinsic_load_channel_num: {
+  fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UW);
+  dest = retype(dest, BRW_REGISTER_TYPE_UD);
+  const fs_builder allbld8 = bld.group(8, 0).exec_all();
+  allbld8.MOV(tmp, brw_imm_v(0x76543210));
+  if (dispatch_width > 8)
+ allbld8.ADD(byte_offset(tmp, 16), tmp, brw_imm_uw(8u));
+  if (dispatch_width > 16) {
+ const fs_builder allbld16 = bld.group(16, 0).exec_all();
+ allbld16.ADD(byte_offset(tmp, 32), tmp, brw_imm_uw(16u));
+  }
+  bld.MOV(dest, tmp);
+  break;
+   }
+
default:
   unreachable("unknown intrinsic");
}
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 08/12] i965: Add CS push constant info to brw_cs_prog_data

2016-06-01 Thread Jordan Justen
We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.

When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.

The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.

For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.

v4:
 * Support older local ID push constant layout as well. (Jason)

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_compiler.h | 12 +++
 src/mesa/drivers/dri/i965/brw_fs.cpp | 61 
 2 files changed, 73 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index f1f9e56..dda6297 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -424,6 +424,12 @@ struct brw_wm_prog_data {
int urb_setup[VARYING_SLOT_MAX];
 };
 
+struct brw_push_const_block {
+   unsigned dwords; /* Dword count, not reg aligned */
+   unsigned regs;
+   unsigned size;   /* Bytes, register aligned */
+};
+
 struct brw_cs_prog_data {
struct brw_stage_prog_data base;
 
@@ -437,6 +443,12 @@ struct brw_cs_prog_data {
int thread_local_id_index;
 
struct {
+  struct brw_push_const_block cross_thread;
+  struct brw_push_const_block per_thread;
+  struct brw_push_const_block total;
+   } push;
+
+   struct {
   /** @{
* surface indices the CS-specific surfaces
*/
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 7e5d583..d461e2f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -6562,6 +6562,64 @@ fs_visitor::emit_cs_work_group_id_setup()
 }
 
 static void
+fill_push_const_block_info(struct brw_push_const_block *block, unsigned dwords)
+{
+   block->dwords = dwords;
+   block->regs = DIV_ROUND_UP(dwords, 8);
+   block->size = block->regs * 32;
+}
+
+static void
+cs_fill_push_const_info(const struct brw_device_info *devinfo,
+struct brw_cs_prog_data *cs_prog_data)
+{
+   const struct brw_stage_prog_data *prog_data =
+  (struct brw_stage_prog_data*) cs_prog_data;
+   bool fill_thread_id =
+  cs_prog_data->thread_local_id_index >= 0 &&
+  cs_prog_data->thread_local_id_index < (int)prog_data->nr_params;
+   bool cross_thread_supported = false; /* Not yet supported by driver. */
+
+   /* The thread ID should be stored in the last param dword */
+   assert(prog_data->nr_params > 0 || !fill_thread_id);
+   assert(!fill_thread_id ||
+  cs_prog_data->thread_local_id_index ==
+ (int)prog_data->nr_params - 1);
+
+   unsigned cross_thread_dwords, per_thread_dwords;
+   if (!cross_thread_supported) {
+  cross_thread_dwords = 0u;
+  per_thread_dwords =
+ 8 * cs_prog_data->local_invocation_id_regs +
+ prog_data->nr_params;
+   } else if (fill_thread_id) {
+  /* Fill all but the last register with cross-thread payload */
+  cross_thread_dwords = 8 * (cs_prog_data->thread_local_id_index / 8);
+  per_thread_dwords = prog_data->nr_params - cross_thread_dwords;
+  assert(per_thread_dwords > 0 && per_thread_dwords <= 8);
+   } else {
+  /* Fill all data using cross-thread payload */
+  cross_thread_dwords = prog_data->nr_params;
+  per_thread_dwords = 0u;
+   }
+
+   fill_push_const_block_info(_prog_data->push.cross_thread, 
cross_thread_dwords);
+   fill_push_const_block_info(_prog_data->push.per_thread, 
per_thread_dwords);
+
+   unsigned total_dwords =
+  (cs_prog_data->push.per_thread.size * cs_prog_data->threads +
+   cs_prog_data->push.cross_thread.size) / 4;
+   fill_push_const_block_info(_prog_data->push.total, total_dwords);
+
+   assert(cs_prog_data->push.cross_thread.dwords % 8 == 0 ||
+  cs_prog_data->push.per_thread.size == 0);
+   assert(cs_prog_data->push.cross_thread.dwords +
+  cs_prog_data->push.per_thread.dwords ==
+ 8 * cs_prog_data->local_invocation_id_regs +
+ prog_data->nr_params);
+}
+
+static void
 cs_set_simd_size(struct brw_cs_prog_data *cs_prog_data, unsigned size)
 {
cs_prog_data->simd_size = size;
@@ -6627,6 +6685,7 @@ brw_compile_cs(const struct brw_compiler *compiler, void 
*log_data,
   } else {
  cfg = v8.cfg;
  cs_set_simd_size(prog_data, 8);
+ 

[Mesa-dev] [PATCH v4 04/12] i965: Add uniform for a CS thread local base ID

2016-06-01 Thread Jordan Justen
v4:
 * Force thread_local_id_index to -1 for now, and have
   fs_visitor::setup_cs_payload look at thread_local_id_index. This
   enables us to more easily cut over from the old local ID layout to
   the new layout, as suggested by Jason.

Signed-off-by: Jordan Justen 
---
 src/intel/vulkan/anv_pipeline.c  |  4 
 src/mesa/drivers/dri/i965/brw_compiler.h |  1 +
 src/mesa/drivers/dri/i965/brw_cs.c   |  3 +++
 src/mesa/drivers/dri/i965/brw_fs.cpp | 18 +-
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 789bc1a..504f0be 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -338,6 +338,10 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,
   pipeline->needs_data_cache = true;
}
 
+   if (stage == MESA_SHADER_COMPUTE)
+  ((struct brw_cs_prog_data *)prog_data)->thread_local_id_index =
+ prog_data->nr_params++; /* The CS Thread ID uniform */
+
if (nir->info.num_ssbos > 0)
   pipeline->needs_data_cache = true;
 
diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 0844694..bed969c 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -433,6 +433,7 @@ struct brw_cs_prog_data {
bool uses_barrier;
bool uses_num_work_groups;
unsigned local_invocation_id_regs;
+   int thread_local_id_index;
 
struct {
   /** @{
diff --git a/src/mesa/drivers/dri/i965/brw_cs.c 
b/src/mesa/drivers/dri/i965/brw_cs.c
index a9cbde9..2a25584 100644
--- a/src/mesa/drivers/dri/i965/brw_cs.c
+++ b/src/mesa/drivers/dri/i965/brw_cs.c
@@ -93,6 +93,9 @@ brw_codegen_cs_prog(struct brw_context *brw,
 */
int param_count = cp->program.Base.nir->num_uniforms / 4;
 
+   /* The backend also sometimes add a param for the thread local id. */
+   prog_data.thread_local_id_index = param_count++;
+
/* The backend also sometimes adds params for texture size. */
param_count += 2 * 
ctx->Const.Program[MESA_SHADER_COMPUTE].MaxTextureImageUnits;
prog_data.base.param =
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 00d937e..e8a3aab 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -5621,7 +5621,8 @@ fs_visitor::setup_cs_payload()
 
payload.num_regs = 1;
 
-   if (nir->info.system_values_read & SYSTEM_BIT_LOCAL_INVOCATION_ID) {
+   if (nir->info.system_values_read & SYSTEM_BIT_LOCAL_INVOCATION_ID &&
+   prog_data->thread_local_id_index < 0) {
   prog_data->local_invocation_id_regs = dispatch_width * 3 / 8;
   payload.local_invocation_id_reg = payload.num_regs;
   payload.num_regs += prog_data->local_invocation_id_regs;
@@ -6551,6 +6552,21 @@ brw_compile_cs(const struct brw_compiler *compiler, void 
*log_data,
   true);
brw_nir_lower_cs_shared(shader);
prog_data->base.total_shared += shader->num_shared;
+
+   /* The driver isn't yet ready to support thread_local_id_index, so we force
+* it to disabled for now.
+*/
+   prog_data->thread_local_id_index = -1;
+
+   /* Now that we cloned the nir_shader, we can update num_uniforms based on
+* the thread_local_id_index.
+*/
+   if (prog_data->thread_local_id_index >= 0) {
+  shader->num_uniforms =
+ MAX2(shader->num_uniforms,
+  (unsigned)4 * (prog_data->thread_local_id_index + 1));
+   }
+
shader = brw_postprocess_nir(shader, compiler->devinfo, true);
 
prog_data->local_size[0] = shader->info.cs.local_size[0];
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 06/12] i965: Add nir based intrinsic lowering and thread ID uniform

2016-06-01 Thread Jordan Justen
We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.

We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.

We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.

v2:
 * Create variable during lowering pass. (Ken)

v3:
 * Don't create a variable, but instead just insert an intrisic call
   to load a uniform from the allocated location. (Jason)

v4:
 * Don't run this pass if thread_local_id_index < 0

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   |   1 +
 src/mesa/drivers/dri/i965/brw_nir.h|   2 +
 src/mesa/drivers/dri/i965/brw_nir_intrinsics.c | 186 +
 4 files changed, 190 insertions(+)
 create mode 100644 src/mesa/drivers/dri/i965/brw_nir_intrinsics.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index d8711ed..f448551 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -46,6 +46,7 @@ i965_compiler_FILES = \
brw_nir.c \
brw_nir_analyze_boolean_resolves.c \
brw_nir_attribute_workarounds.c \
+   brw_nir_intrinsics.c \
brw_nir_opt_peephole_ffma.c \
brw_packed_float.c \
brw_predicated_break.cpp \
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index bb1bf7a..d83d9e0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -6591,6 +6591,7 @@ brw_compile_cs(const struct brw_compiler *compiler, void 
*log_data,
   (unsigned)4 * (prog_data->thread_local_id_index + 1));
}
 
+   brw_nir_lower_intrinsics(shader, _data->base);
shader = brw_postprocess_nir(shader, compiler->devinfo, true);
 
prog_data->local_size[0] = shader->info.cs.local_size[0];
diff --git a/src/mesa/drivers/dri/i965/brw_nir.h 
b/src/mesa/drivers/dri/i965/brw_nir.h
index 409e49a..74c354f 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.h
+++ b/src/mesa/drivers/dri/i965/brw_nir.h
@@ -91,6 +91,8 @@ void brw_nir_analyze_boolean_resolves(nir_shader *nir);
 nir_shader *brw_preprocess_nir(const struct brw_compiler *compiler,
nir_shader *nir);
 
+bool brw_nir_lower_intrinsics(nir_shader *nir,
+  struct brw_stage_prog_data *prog_data);
 void brw_nir_lower_vs_inputs(nir_shader *nir,
  const struct brw_device_info *devinfo,
  bool is_scalar,
diff --git a/src/mesa/drivers/dri/i965/brw_nir_intrinsics.c 
b/src/mesa/drivers/dri/i965/brw_nir_intrinsics.c
new file mode 100644
index 000..972b117
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_nir_intrinsics.c
@@ -0,0 +1,186 @@
+/*
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "brw_nir.h"
+#include "compiler/nir/nir_builder.h"
+
+struct lower_intrinsics_state {
+   nir_shader *nir;
+   union {
+  struct brw_stage_prog_data *prog_data;
+  struct brw_cs_prog_data *cs_prog_data;
+   };
+   nir_function_impl *impl;
+   bool progress;
+   nir_builder builder;
+   bool cs_thread_id_used;
+};
+
+static nir_ssa_def *
+read_thread_local_id(struct lower_intrinsics_state *state)
+{
+   assert(state->cs_prog_data->thread_local_id_index >= 0);
+   state->cs_thread_id_used = true;
+   const int id_index = state->cs_prog_data->thread_local_id_index;
+
+   nir_builder *b = >builder;
+   nir_shader *nir = state->nir;
+   nir_intrinsic_instr *load =
+  nir_intrinsic_instr_create(nir, nir_intrinsic_load_uniform);
+   load->num_components = 1;
+   load->src[0] = nir_src_for_ssa(nir_imm_int(b, 0));
+   nir_ssa_dest_init(>instr, >dest, 

[Mesa-dev] [PATCH v4 11/12] i965: Enable cross-thread constants and compact local IDs for hsw+

2016-06-01 Thread Jordan Justen
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.

Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.

To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.

v4:
 * Minimize size of patch that switches from the old local ID layout
   to the new layout (Jason)

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_compiler.c |  3 +--
 src/mesa/drivers/dri/i965/brw_context.c  |  1 -
 src/mesa/drivers/dri/i965/brw_fs.cpp | 16 +---
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
b/src/mesa/drivers/dri/i965/brw_compiler.c
index bb06733..a4855a0 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.c
+++ b/src/mesa/drivers/dri/i965/brw_compiler.c
@@ -40,8 +40,7 @@
.lower_fdiv = true,\
.lower_flrp64 = true,  \
.native_integers = true,   \
-   .vertex_id_zero_based = true,  \
-   .lower_cs_local_index_from_id = true
+   .vertex_id_zero_based = true
 
 static const struct nir_shader_compiler_options scalar_nir_options = {
COMMON_OPTIONS,
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index ad8d514..97dc226 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -599,7 +599,6 @@ brw_initialize_context_constants(struct brw_context *brw)
   ctx->Const.MaxClipPlanes = 8;
 
ctx->Const.LowerTessLevel = true;
-   ctx->Const.LowerCsDerivedVariables = true;
ctx->Const.PrimitiveRestartForPatches = true;
 
ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeInstructions = 16 * 1024;
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index d461e2f..3dd795e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -6578,7 +6578,7 @@ cs_fill_push_const_info(const struct brw_device_info 
*devinfo,
bool fill_thread_id =
   cs_prog_data->thread_local_id_index >= 0 &&
   cs_prog_data->thread_local_id_index < (int)prog_data->nr_params;
-   bool cross_thread_supported = false; /* Not yet supported by driver. */
+   bool cross_thread_supported = devinfo->gen > 7 || devinfo->is_haswell;
 
/* The thread ID should be stored in the last param dword */
assert(prog_data->nr_params > 0 || !fill_thread_id);
@@ -6644,19 +6644,13 @@ brw_compile_cs(const struct brw_compiler *compiler, 
void *log_data,
brw_nir_lower_cs_shared(shader);
prog_data->base.total_shared += shader->num_shared;
 
-   /* The driver isn't yet ready to support thread_local_id_index, so we force
-* it to disabled for now.
-*/
-   prog_data->thread_local_id_index = -1;
-
/* Now that we cloned the nir_shader, we can update num_uniforms based on
 * the thread_local_id_index.
 */
-   if (prog_data->thread_local_id_index >= 0) {
-  shader->num_uniforms =
- MAX2(shader->num_uniforms,
-  (unsigned)4 * (prog_data->thread_local_id_index + 1));
-   }
+   assert(prog_data->thread_local_id_index >= 0);
+   shader->num_uniforms =
+  MAX2(shader->num_uniforms,
+   (unsigned)4 * (prog_data->thread_local_id_index + 1));
 
brw_nir_lower_intrinsics(shader, _data->base);
shader = brw_postprocess_nir(shader, compiler->devinfo, true);
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 07/12] i965: Store number of threads in brw_cs_prog_data

2016-06-01 Thread Jordan Justen
Signed-off-by: Jordan Justen 
---
 src/intel/vulkan/anv_cmd_buffer.c |  7 +++
 src/intel/vulkan/anv_private.h|  1 -
 src/intel/vulkan/gen7_cmd_buffer.c|  2 +-
 src/intel/vulkan/gen8_cmd_buffer.c|  2 +-
 src/intel/vulkan/genX_cmd_buffer.c|  4 ++--
 src/intel/vulkan/genX_pipeline.c  |  4 +---
 src/mesa/drivers/dri/i965/brw_compiler.h  |  1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 15 ---
 src/mesa/drivers/dri/i965/gen7_cs_state.c | 32 ++-
 9 files changed, 31 insertions(+), 37 deletions(-)

diff --git a/src/intel/vulkan/anv_cmd_buffer.c 
b/src/intel/vulkan/anv_cmd_buffer.c
index 4d0fd7c..63d096c 100644
--- a/src/intel/vulkan/anv_cmd_buffer.c
+++ b/src/intel/vulkan/anv_cmd_buffer.c
@@ -1076,9 +1076,8 @@ anv_cmd_buffer_cs_push_constants(struct anv_cmd_buffer 
*cmd_buffer)
if (reg_aligned_constant_size == 0)
   return (struct anv_state) { .offset = 0 };
 
-   const unsigned threads = pipeline->cs_thread_width_max;
const unsigned total_push_constants_size =
-  reg_aligned_constant_size * threads;
+  reg_aligned_constant_size * cs_prog_data->threads;
const unsigned push_constant_alignment =
   cmd_buffer->device->info.gen < 8 ? 32 : 64;
const unsigned aligned_total_push_constants_size =
@@ -1091,7 +1090,7 @@ anv_cmd_buffer_cs_push_constants(struct anv_cmd_buffer 
*cmd_buffer)
/* Walk through the param array and fill the buffer with data */
uint32_t *u32_map = state.map;
 
-   brw_cs_fill_local_id_payload(cs_prog_data, u32_map, threads,
+   brw_cs_fill_local_id_payload(cs_prog_data, u32_map, cs_prog_data->threads,
 reg_aligned_constant_size);
 
/* Setup uniform data for the first thread */
@@ -1102,7 +1101,7 @@ anv_cmd_buffer_cs_push_constants(struct anv_cmd_buffer 
*cmd_buffer)
 
/* Copy uniform data from the first thread to every other thread */
const size_t uniform_data_size = prog_data->nr_params * sizeof(uint32_t);
-   for (unsigned t = 1; t < threads; t++) {
+   for (unsigned t = 1; t < cs_prog_data->threads; t++) {
   memcpy(_map[t * param_aligned_count + local_id_dwords],
  _map[local_id_dwords],
  uniform_data_size);
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 7325f3f..26ffbd6 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1474,7 +1474,6 @@ struct anv_pipeline {
bool primitive_restart;
uint32_t topology;
 
-   uint32_t cs_thread_width_max;
uint32_t cs_right_mask;
 
struct {
diff --git a/src/intel/vulkan/gen7_cmd_buffer.c 
b/src/intel/vulkan/gen7_cmd_buffer.c
index 331275e..40ab008 100644
--- a/src/intel/vulkan/gen7_cmd_buffer.c
+++ b/src/intel/vulkan/gen7_cmd_buffer.c
@@ -271,7 +271,7 @@ flush_compute_descriptor_set(struct anv_cmd_buffer 
*cmd_buffer)
   .BarrierEnable = cs_prog_data->uses_barrier,
   .SharedLocalMemorySize = slm_size,
   .NumberofThreadsinGPGPUThreadGroup =
- pipeline->cs_thread_width_max);
+ cs_prog_data->threads);
 
const uint32_t size = GENX(INTERFACE_DESCRIPTOR_DATA_length) * 
sizeof(uint32_t);
anv_batch_emit(_buffer->batch,
diff --git a/src/intel/vulkan/gen8_cmd_buffer.c 
b/src/intel/vulkan/gen8_cmd_buffer.c
index 547fedd..e139e8a 100644
--- a/src/intel/vulkan/gen8_cmd_buffer.c
+++ b/src/intel/vulkan/gen8_cmd_buffer.c
@@ -356,7 +356,7 @@ flush_compute_descriptor_set(struct anv_cmd_buffer 
*cmd_buffer)
   .BarrierEnable = cs_prog_data->uses_barrier,
   .SharedLocalMemorySize = slm_size,
   .NumberofThreadsinGPGPUThreadGroup =
- pipeline->cs_thread_width_max);
+ cs_prog_data->threads);
 
uint32_t size = GENX(INTERFACE_DESCRIPTOR_DATA_length) * sizeof(uint32_t);
anv_batch_emit(_buffer->batch,
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index e7d322c..d9acf58 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -773,7 +773,7 @@ void genX(CmdDispatch)(
   ggw.SIMDSize = prog_data->simd_size / 16;
   ggw.ThreadDepthCounterMaximum= 0;
   ggw.ThreadHeightCounterMaximum   = 0;
-  ggw.ThreadWidthCounterMaximum= pipeline->cs_thread_width_max - 1;
+  ggw.ThreadWidthCounterMaximum= prog_data->threads - 1;
   ggw.ThreadGroupIDXDimension  = x;
   ggw.ThreadGroupIDYDimension  = y;
   ggw.ThreadGroupIDZDimension  = z;
@@ -874,7 +874,7 @@ void genX(CmdDispatchIndirect)(
   ggw.SIMDSize   

[Mesa-dev] [PATCH v4 10/12] anv: Support new local ID generation & cross-thread constants

2016-06-01 Thread Jordan Justen
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.

v4:
 * Support the old local ID push constant layout as well (Jason)

Signed-off-by: Jordan Justen 
---
 src/intel/vulkan/anv_cmd_buffer.c  | 54 +-
 src/intel/vulkan/gen7_cmd_buffer.c | 13 -
 src/intel/vulkan/gen8_cmd_buffer.c | 13 -
 src/intel/vulkan/genX_pipeline.c   | 10 ++-
 4 files changed, 42 insertions(+), 48 deletions(-)

diff --git a/src/intel/vulkan/anv_cmd_buffer.c 
b/src/intel/vulkan/anv_cmd_buffer.c
index 63d096c..edaaa3d 100644
--- a/src/intel/vulkan/anv_cmd_buffer.c
+++ b/src/intel/vulkan/anv_cmd_buffer.c
@@ -1065,23 +1065,14 @@ anv_cmd_buffer_cs_push_constants(struct anv_cmd_buffer 
*cmd_buffer)
const struct brw_cs_prog_data *cs_prog_data = get_cs_prog_data(pipeline);
const struct brw_stage_prog_data *prog_data = _prog_data->base;
 
-   const unsigned local_id_dwords = cs_prog_data->local_invocation_id_regs * 8;
-   const unsigned push_constant_data_size =
-  (local_id_dwords + prog_data->nr_params) * 4;
-   const unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, 
32);
-   const unsigned param_aligned_count =
-  reg_aligned_constant_size / sizeof(uint32_t);
-
/* If we don't actually have any push constants, bail. */
-   if (reg_aligned_constant_size == 0)
+   if (cs_prog_data->push.total.size == 0)
   return (struct anv_state) { .offset = 0 };
 
-   const unsigned total_push_constants_size =
-  reg_aligned_constant_size * cs_prog_data->threads;
const unsigned push_constant_alignment =
   cmd_buffer->device->info.gen < 8 ? 32 : 64;
const unsigned aligned_total_push_constants_size =
-  ALIGN(total_push_constants_size, push_constant_alignment);
+  ALIGN(cs_prog_data->push.total.size, push_constant_alignment);
struct anv_state state =
   anv_cmd_buffer_alloc_dynamic_state(cmd_buffer,
  aligned_total_push_constants_size,
@@ -1090,21 +1081,36 @@ anv_cmd_buffer_cs_push_constants(struct anv_cmd_buffer 
*cmd_buffer)
/* Walk through the param array and fill the buffer with data */
uint32_t *u32_map = state.map;
 
-   brw_cs_fill_local_id_payload(cs_prog_data, u32_map, cs_prog_data->threads,
-reg_aligned_constant_size);
-
-   /* Setup uniform data for the first thread */
-   for (unsigned i = 0; i < prog_data->nr_params; i++) {
-  uint32_t offset = (uintptr_t)prog_data->param[i];
-  u32_map[local_id_dwords + i] = *(uint32_t *)((uint8_t *)data + offset);
+   if (cs_prog_data->push.cross_thread.size > 0) {
+  assert(cs_prog_data->thread_local_id_index < 0 ||
+ cs_prog_data->thread_local_id_index >=
+cs_prog_data->push.cross_thread.dwords);
+  for (unsigned i = 0;
+   i < cs_prog_data->push.cross_thread.dwords;
+   i++) {
+ uint32_t offset = (uintptr_t)prog_data->param[i];
+ u32_map[i] = *(uint32_t *)((uint8_t *)data + offset);
+  }
}
 
-   /* Copy uniform data from the first thread to every other thread */
-   const size_t uniform_data_size = prog_data->nr_params * sizeof(uint32_t);
-   for (unsigned t = 1; t < cs_prog_data->threads; t++) {
-  memcpy(_map[t * param_aligned_count + local_id_dwords],
- _map[local_id_dwords],
- uniform_data_size);
+   if (cs_prog_data->push.per_thread.size > 0) {
+  brw_cs_fill_local_id_payload(cs_prog_data, u32_map, 
cs_prog_data->threads,
+   cs_prog_data->push.per_thread.size);
+  for (unsigned t = 0; t < cs_prog_data->threads; t++) {
+ unsigned dst =
+8 * (cs_prog_data->push.per_thread.regs * t +
+ cs_prog_data->push.cross_thread.regs +
+ cs_prog_data->local_invocation_id_regs);
+ unsigned src = cs_prog_data->push.cross_thread.dwords;
+ for ( ; src < prog_data->nr_params; src++, dst++) {
+if (src != cs_prog_data->thread_local_id_index) {
+   uint32_t offset = (uintptr_t)prog_data->param[src];
+   u32_map[dst] = *(uint32_t *)((uint8_t *)data + offset);
+} else {
+   u32_map[dst] = t * cs_prog_data->simd_size;
+}
+ }
+  }
}
 
if (!cmd_buffer->device->info.has_llc)
diff --git a/src/intel/vulkan/gen7_cmd_buffer.c 
b/src/intel/vulkan/gen7_cmd_buffer.c
index 40ab008..478122b 100644
--- a/src/intel/vulkan/gen7_cmd_buffer.c
+++ b/src/intel/vulkan/gen7_cmd_buffer.c
@@ -234,12 +234,6 @@ flush_compute_descriptor_set(struct anv_cmd_buffer 
*cmd_buffer)
const struct brw_cs_prog_data *cs_prog_data = 

[Mesa-dev] [PATCH v4 05/12] i965: Put CS local thread ID uniform in last push register

2016-06-01 Thread Jordan Justen
This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.

It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.

The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)

v2:
 * Add variable in intrinsics lowering pass
 * Make sure the ID is pushed last in assign_constant_locations, and
   that we save a spot for the ID in the push constants

v3:
 * Simplify code based with Jason's suggestions.

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e8a3aab..bb1bf7a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2097,6 +2097,10 @@ fs_visitor::assign_constant_locations()
bool contiguous[uniforms];
memset(contiguous, 0, sizeof(contiguous));
 
+   int thread_local_id_index =
+  (stage == MESA_SHADER_COMPUTE) ?
+  ((brw_cs_prog_data*)stage_prog_data)->thread_local_id_index : -1;
+
/* First, we walk through the instructions and do two things:
 *
 *  1) Figure out which uniforms are live.
@@ -2141,6 +2145,9 @@ fs_visitor::assign_constant_locations()
   }
}
 
+   if (thread_local_id_index >= 0 && !is_live[thread_local_id_index])
+  thread_local_id_index = -1;
+
/* Only allow 16 registers (128 uniform components) as push constants.
 *
 * Just demote the end of the list.  We could probably do better
@@ -2149,7 +2156,9 @@ fs_visitor::assign_constant_locations()
 * If changing this value, note the limitation about total_regs in
 * brw_curbe.c.
 */
-   const unsigned int max_push_components = 16 * 8;
+   unsigned int max_push_components = 16 * 8;
+   if (thread_local_id_index >= 0)
+  max_push_components--; /* Save a slot for the thread ID */
 
/* We push small arrays, but no bigger than 16 floats.  This is big enough
 * for a vec4 but hopefully not large enough to push out other stuff.  We
@@ -2187,6 +2196,10 @@ fs_visitor::assign_constant_locations()
   if (!is_live[u] || is_live_64bit[u])
  continue;
 
+  /* Skip thread_local_id_index to put it in the last push register. */
+  if (thread_local_id_index == (int)u)
+ continue;
+
   set_push_pull_constant_loc(u, _start, contiguous[u],
  push_constant_loc, pull_constant_loc,
  _push_constants, _pull_constants,
@@ -2194,6 +2207,10 @@ fs_visitor::assign_constant_locations()
  stage_prog_data);
}
 
+   /* Add the CS local thread ID uniform at the end of the push constants */
+   if (thread_local_id_index >= 0)
+  push_constant_loc[thread_local_id_index] = num_push_constants++;
+
/* As the uniforms are going to be reordered, take the data from a temporary
 * copy of the original param[].
 */
@@ -2212,6 +2229,7 @@ fs_visitor::assign_constant_locations()
 * push_constant_loc[i] <= i and we can do it in one smooth loop without
 * having to make a copy.
 */
+   int new_thread_local_id_index = -1;
for (unsigned int i = 0; i < uniforms; i++) {
   const gl_constant_value *value = param[i];
 
@@ -2219,9 +2237,15 @@ fs_visitor::assign_constant_locations()
  stage_prog_data->pull_param[pull_constant_loc[i]] = value;
   } else if (push_constant_loc[i] != -1) {
  stage_prog_data->param[push_constant_loc[i]] = value;
+ if (thread_local_id_index == (int)i)
+new_thread_local_id_index = push_constant_loc[i];
   }
}
ralloc_free(param);
+
+   if (stage == MESA_SHADER_COMPUTE)
+  ((brw_cs_prog_data*)stage_prog_data)->thread_local_id_index =
+ new_thread_local_id_index;
 }
 
 /**
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 09/12] i965: Support new local ID push constant & cross-thread constants

2016-06-01 Thread Jordan Justen
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.

v4:
 * Support the old local ID push constant layout as well (Jason)

Signed-off-by: Jordan Justen 
---
 src/mesa/drivers/dri/i965/brw_defines.h   |  3 +
 src/mesa/drivers/dri/i965/gen7_cs_state.c | 99 +--
 2 files changed, 56 insertions(+), 46 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 4eb6b1f..e7d1a9f 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -2943,6 +2943,9 @@ enum brw_wm_barycentric_interp_mode {
 # define MEDIA_GPGPU_THREAD_COUNT_MASK  INTEL_MASK(7, 0)
 # define GEN8_MEDIA_GPGPU_THREAD_COUNT_SHIFT0
 # define GEN8_MEDIA_GPGPU_THREAD_COUNT_MASK INTEL_MASK(9, 0)
+/* GEN7 DW6, GEN8+ DW7 */
+# define CROSS_THREAD_READ_LENGTH_SHIFT 0
+# define CROSS_THREAD_READ_LENGTH_MASK  INTEL_MASK(7, 0)
 #define MEDIA_STATE_FLUSH   0x7004
 #define GPGPU_WALKER0x7105
 /* GEN7 DW0 */
diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c 
b/src/mesa/drivers/dri/i965/gen7_cs_state.c
index 619edfb..2fee02d 100644
--- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
@@ -42,7 +42,6 @@ brw_upload_cs_state(struct brw_context *brw)
uint32_t offset;
uint32_t *desc = (uint32_t*) brw_state_batch(brw, AUB_TRACE_SURFACE_STATE,
 8 * 4, 64, );
-   struct gl_program *prog = (struct gl_program *) brw->compute_program;
struct brw_stage_state *stage_state = >cs.base;
struct brw_cs_prog_data *cs_prog_data = brw->cs.prog_data;
struct brw_stage_prog_data *prog_data = _prog_data->base;
@@ -59,16 +58,6 @@ brw_upload_cs_state(struct brw_context *brw)
 
prog_data->binding_table.size_bytes,
 32, _state->bind_bo_offset);
 
-   unsigned local_id_dwords = 0;
-
-   if (prog->SystemValuesRead & SYSTEM_BIT_LOCAL_INVOCATION_ID)
-  local_id_dwords = cs_prog_data->local_invocation_id_regs * 8;
-
-   unsigned push_constant_data_size =
-  (prog_data->nr_params + local_id_dwords) * sizeof(gl_constant_value);
-   unsigned reg_aligned_constant_size = ALIGN(push_constant_data_size, 32);
-   unsigned push_constant_regs = reg_aligned_constant_size / 32;
-
uint32_t dwords = brw->gen < 8 ? 8 : 9;
BEGIN_BATCH(dwords);
OUT_BATCH(MEDIA_VFE_STATE << 16 | (dwords - 2));
@@ -118,7 +107,8 @@ brw_upload_cs_state(struct brw_context *brw)
 * Note: The constant data is built in brw_upload_cs_push_constants below.
 */
const uint32_t vfe_curbe_allocation =
-  push_constant_regs * cs_prog_data->threads;
+  ALIGN(cs_prog_data->push.per_thread.regs * cs_prog_data->threads +
+cs_prog_data->push.cross_thread.regs, 2);
OUT_BATCH(SET_FIELD(vfe_urb_allocation, MEDIA_VFE_STATE_URB_ALLOC) |
  SET_FIELD(vfe_curbe_allocation, MEDIA_VFE_STATE_CURBE_ALLOC));
OUT_BATCH(0);
@@ -126,11 +116,11 @@ brw_upload_cs_state(struct brw_context *brw)
OUT_BATCH(0);
ADVANCE_BATCH();
 
-   if (reg_aligned_constant_size > 0) {
+   if (cs_prog_data->push.total.size > 0) {
   BEGIN_BATCH(4);
   OUT_BATCH(MEDIA_CURBE_LOAD << 16 | (4 - 2));
   OUT_BATCH(0);
-  OUT_BATCH(ALIGN(reg_aligned_constant_size * cs_prog_data->threads, 64));
+  OUT_BATCH(ALIGN(cs_prog_data->push.total.size, 64));
   OUT_BATCH(stage_state->push_const_offset);
   ADVANCE_BATCH();
}
@@ -149,7 +139,8 @@ brw_upload_cs_state(struct brw_context *brw)
desc[dw++] = stage_state->sampler_offset |
   ((stage_state->sampler_count + 3) / 4);
desc[dw++] = stage_state->bind_bo_offset;
-   desc[dw++] = SET_FIELD(push_constant_regs, MEDIA_CURBE_READ_LENGTH);
+   desc[dw++] = SET_FIELD(cs_prog_data->push.per_thread.regs,
+  MEDIA_CURBE_READ_LENGTH);
const uint32_t media_threads =
   brw->gen >= 8 ?
   SET_FIELD(cs_prog_data->threads, GEN8_MEDIA_GPGPU_THREAD_COUNT) :
@@ -171,6 +162,10 @@ brw_upload_cs_state(struct brw_context *brw)
   SET_FIELD(slm_size, MEDIA_SHARED_LOCAL_MEMORY_SIZE) |
   media_threads;
 
+   desc[dw++] =
+  SET_FIELD(cs_prog_data->push.cross_thread.regs,
+CROSS_THREAD_READ_LENGTH);
+
BEGIN_BATCH(4);
OUT_BATCH(MEDIA_INTERFACE_DESCRIPTOR_LOAD << 16 | (4 - 2));
OUT_BATCH(0);
@@ -213,10 +208,6 @@ brw_upload_cs_push_constants(struct brw_context *brw,
struct gl_context *ctx = >ctx;
const struct brw_stage_prog_data *prog_data =
   (struct 

[Mesa-dev] [PATCH v4 12/12] i965: Remove old CS local ID handling

2016-06-01 Thread Jordan Justen
The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.

The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.

Signed-off-by: Jordan Justen 
---
 src/intel/vulkan/anv_cmd_buffer.c  |  5 +-
 src/mesa/drivers/dri/i965/brw_compiler.h   |  8 ---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 94 +-
 src/mesa/drivers/dri/i965/brw_fs.h |  1 -
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  7 --
 src/mesa/drivers/dri/i965/brw_nir_intrinsics.c |  7 --
 src/mesa/drivers/dri/i965/gen7_cs_state.c  |  5 +-
 7 files changed, 3 insertions(+), 124 deletions(-)

diff --git a/src/intel/vulkan/anv_cmd_buffer.c 
b/src/intel/vulkan/anv_cmd_buffer.c
index edaaa3d..3d37de2 100644
--- a/src/intel/vulkan/anv_cmd_buffer.c
+++ b/src/intel/vulkan/anv_cmd_buffer.c
@@ -1094,13 +1094,10 @@ anv_cmd_buffer_cs_push_constants(struct anv_cmd_buffer 
*cmd_buffer)
}
 
if (cs_prog_data->push.per_thread.size > 0) {
-  brw_cs_fill_local_id_payload(cs_prog_data, u32_map, 
cs_prog_data->threads,
-   cs_prog_data->push.per_thread.size);
   for (unsigned t = 0; t < cs_prog_data->threads; t++) {
  unsigned dst =
 8 * (cs_prog_data->push.per_thread.regs * t +
- cs_prog_data->push.cross_thread.regs +
- cs_prog_data->local_invocation_id_regs);
+ cs_prog_data->push.cross_thread.regs);
  unsigned src = cs_prog_data->push.cross_thread.dwords;
  for ( ; src < prog_data->nr_params; src++, dst++) {
 if (src != cs_prog_data->thread_local_id_index) {
diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index dda6297..6e6d20c 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -439,7 +439,6 @@ struct brw_cs_prog_data {
unsigned threads;
bool uses_barrier;
bool uses_num_work_groups;
-   unsigned local_invocation_id_regs;
int thread_local_id_index;
 
struct {
@@ -831,13 +830,6 @@ brw_compile_cs(const struct brw_compiler *compiler, void 
*log_data,
unsigned *final_assembly_size,
char **error_str);
 
-/**
- * Fill out local id payload for compute shader according to cs_prog_data.
- */
-void
-brw_cs_fill_local_id_payload(const struct brw_cs_prog_data *cs_prog_data,
- void *buffer, uint32_t threads, uint32_t stride);
-
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 3dd795e..55d600a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -5573,31 +5573,6 @@ fs_visitor::setup_vs_payload()
payload.num_regs = 2;
 }
 
-/**
- * We are building the local ID push constant data using the simplest possible
- * method. We simply push the local IDs directly as they should appear in the
- * registers for the uvec3 gl_LocalInvocationID variable.
- *
- * Therefore, for SIMD8, we use 3 full registers, and for SIMD16 we use 6
- * registers worth of push constant space.
- *
- * Note: Any updates to brw_cs_prog_local_id_payload_dwords,
- * fill_local_id_payload or fs_visitor::emit_cs_local_invocation_id_setup need
- * to coordinated.
- *
- * FINISHME: There are a few easy optimizations to consider.
- *
- * 1. If gl_WorkGroupSize x, y or z is 1, we can just use zero, and there is
- *no need for using push constant space for that dimension.
- *
- * 2. Since GL_MAX_COMPUTE_WORK_GROUP_SIZE is currently 1024 or less, we can
- *easily use 16-bit words rather than 32-bit dwords in the push constant
- *data.
- *
- * 3. If gl_WorkGroupSize x, y or z is small, then we can use bytes for
- *conveying the data, and thereby reduce push constant usage.
- *
- */
 void
 fs_visitor::setup_gs_payload()
 {
@@ -5641,16 +5616,7 @@ void
 fs_visitor::setup_cs_payload()
 {
assert(devinfo->gen >= 7);
-   brw_cs_prog_data *prog_data = (brw_cs_prog_data*) this->prog_data;
-
payload.num_regs = 1;
-
-   if (nir->info.system_values_read & SYSTEM_BIT_LOCAL_INVOCATION_ID &&
-   prog_data->thread_local_id_index < 0) {
-  prog_data->local_invocation_id_regs = dispatch_width * 3 / 8;
-  payload.local_invocation_id_reg = payload.num_regs;
-  payload.num_regs += prog_data->local_invocation_id_regs;
-   }
 }
 
 void
@@ -6525,25 +6491,6 @@ brw_compile_fs(const struct brw_compiler *compiler, void 
*log_data,
 }
 
 fs_reg *
-fs_visitor::emit_cs_local_invocation_id_setup()
-{
-   assert(stage == MESA_SHADER_COMPUTE);
-
-   fs_reg *reg = new(this->mem_ctx) fs_reg(vgrf(glsl_type::uvec3_type));
-
-   struct brw_reg src =
-  brw_vec8_grf(payload.local_invocation_id_reg, 0);
-   src = retype(src, 

[Mesa-dev] [PATCH v4 02/12] nir: Make lowering gl_LocalInvocationIndex optional

2016-06-01 Thread Jordan Justen
Signed-off-by: Jordan Justen 
---
 src/compiler/nir/nir.c |  4 
 src/compiler/nir/nir.h |  2 ++
 src/compiler/nir/nir_gather_info.c |  1 +
 src/compiler/nir/nir_intrinsics.h  |  1 +
 src/compiler/nir/nir_lower_system_values.c | 16 
 src/mesa/drivers/dri/i965/brw_compiler.c   |  3 ++-
 6 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c
index 2741eb6..3c8b4e0 100644
--- a/src/compiler/nir/nir.c
+++ b/src/compiler/nir/nir.c
@@ -1752,6 +1752,8 @@ nir_intrinsic_from_system_value(gl_system_value val)
   return nir_intrinsic_load_sample_mask_in;
case SYSTEM_VALUE_LOCAL_INVOCATION_ID:
   return nir_intrinsic_load_local_invocation_id;
+   case SYSTEM_VALUE_LOCAL_INVOCATION_INDEX:
+  return nir_intrinsic_load_local_invocation_index;
case SYSTEM_VALUE_WORK_GROUP_ID:
   return nir_intrinsic_load_work_group_id;
case SYSTEM_VALUE_NUM_WORK_GROUPS:
@@ -1801,6 +1803,8 @@ nir_system_value_from_intrinsic(nir_intrinsic_op intrin)
   return SYSTEM_VALUE_SAMPLE_MASK_IN;
case nir_intrinsic_load_local_invocation_id:
   return SYSTEM_VALUE_LOCAL_INVOCATION_ID;
+   case nir_intrinsic_load_local_invocation_index:
+  return SYSTEM_VALUE_LOCAL_INVOCATION_INDEX;
case nir_intrinsic_load_num_work_groups:
   return SYSTEM_VALUE_NUM_WORK_GROUPS;
case nir_intrinsic_load_work_group_id:
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 2e1bdfb..20f6520 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1682,6 +1682,8 @@ typedef struct nir_shader_compiler_options {
 
/* Indicates that the driver only has zero-based vertex id */
bool vertex_id_zero_based;
+
+   bool lower_cs_local_index_from_id;
 } nir_shader_compiler_options;
 
 typedef struct nir_shader_info {
diff --git a/src/compiler/nir/nir_gather_info.c 
b/src/compiler/nir/nir_gather_info.c
index 7900fd1..15a9a4f 100644
--- a/src/compiler/nir/nir_gather_info.c
+++ b/src/compiler/nir/nir_gather_info.c
@@ -44,6 +44,7 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, nir_shader 
*shader)
case nir_intrinsic_load_primitive_id:
case nir_intrinsic_load_invocation_id:
case nir_intrinsic_load_local_invocation_id:
+   case nir_intrinsic_load_local_invocation_index:
case nir_intrinsic_load_work_group_id:
case nir_intrinsic_load_num_work_groups:
   shader->info.system_values_read |=
diff --git a/src/compiler/nir/nir_intrinsics.h 
b/src/compiler/nir/nir_intrinsics.h
index bd00fbb..aeb6038 100644
--- a/src/compiler/nir/nir_intrinsics.h
+++ b/src/compiler/nir/nir_intrinsics.h
@@ -299,6 +299,7 @@ SYSTEM_VALUE(tess_level_outer, 4, 0, xx, xx, xx)
 SYSTEM_VALUE(tess_level_inner, 2, 0, xx, xx, xx)
 SYSTEM_VALUE(patch_vertices_in, 1, 0, xx, xx, xx)
 SYSTEM_VALUE(local_invocation_id, 3, 0, xx, xx, xx)
+SYSTEM_VALUE(local_invocation_index, 1, 0, xx, xx, xx)
 SYSTEM_VALUE(work_group_id, 3, 0, xx, xx, xx)
 SYSTEM_VALUE(user_clip_plane, 4, 1, UCP_ID, xx, xx)
 SYSTEM_VALUE(num_work_groups, 3, 0, xx, xx, xx)
diff --git a/src/compiler/nir/nir_lower_system_values.c 
b/src/compiler/nir/nir_lower_system_values.c
index 8310e38..3ca8e08 100644
--- a/src/compiler/nir/nir_lower_system_values.c
+++ b/src/compiler/nir/nir_lower_system_values.c
@@ -48,7 +48,7 @@ convert_block(nir_block *block, nir_builder *b)
 
   b->cursor = nir_after_instr(_var->instr);
 
-  nir_ssa_def *sysval;
+  nir_ssa_def *sysval = NULL;
   switch (var->data.location) {
   case SYSTEM_VALUE_GLOBAL_INVOCATION_ID: {
  /* From the GLSL man page for gl_GlobalInvocationID:
@@ -74,6 +74,12 @@ convert_block(nir_block *block, nir_builder *b)
   }
 
   case SYSTEM_VALUE_LOCAL_INVOCATION_INDEX: {
+ /* If lower_cs_local_index_from_id is true, then we derive the local
+  * index from the local id.
+  */
+ if (!b->shader->options->lower_cs_local_index_from_id)
+break;
+
  /* From the GLSL man page for gl_LocalInvocationIndex:
   *
   *"The value of gl_LocalInvocationIndex is equal to
@@ -111,12 +117,14 @@ convert_block(nir_block *block, nir_builder *b)
 nir_load_system_value(b, nir_intrinsic_load_base_instance, 0));
  break;
 
-  default: {
+  default:
+ break;
+  }
+
+  if (sysval == NULL) {
  nir_intrinsic_op sysval_op =
 nir_intrinsic_from_system_value(var->data.location);
  sysval = nir_load_system_value(b, sysval_op, 0);
- break;
-  } /* default */
   }
 
   nir_ssa_def_rewrite_uses(_var->dest.ssa, nir_src_for_ssa(sysval));
diff --git a/src/mesa/drivers/dri/i965/brw_compiler.c 
b/src/mesa/drivers/dri/i965/brw_compiler.c
index a4855a0..bb06733 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.c
+++ b/src/mesa/drivers/dri/i965/brw_compiler.c
@@ -40,7 +40,8 @@
.lower_fdiv = 

[Mesa-dev] [PATCH v4 00/12] Rework CS local IDs for gen7+

2016-06-01 Thread Jordan Justen
git://people.freedesktop.org/~jljusten/mesa hsw-cs-cross-thread-constants-v4

v4:
 * Support both the old and new layouts until the switch-over to the
   new layout. This minimizes the size of the switch over patch.
   (Jason)

v3:
 * https://lists.freedesktop.org/archives/mesa-dev/2016-May/118722.html

v2:
 * https://lists.freedesktop.org/archives/mesa-dev/2016-May/118566.html

v1:
 * https://lists.freedesktop.org/archives/mesa-dev/2016-May/117952.html


Jordan Justen (12):
  glsl: Add glsl LowerCsDerivedVariables option
  nir: Make lowering gl_LocalInvocationIndex optional
  i965: Add nir channel_num system value
  i965: Add uniform for a CS thread local base ID
  i965: Put CS local thread ID uniform in last push register
  i965: Add nir based intrinsic lowering and thread ID uniform
  i965: Store number of threads in brw_cs_prog_data
  i965: Add CS push constant info to brw_cs_prog_data
  i965: Support new local ID push constant & cross-thread constants
  anv: Support new local ID generation & cross-thread constants
  i965: Enable cross-thread constants and compact local IDs for hsw+
  i965: Remove old CS local ID handling

 src/compiler/glsl/builtin_variables.cpp|  29 ++--
 src/compiler/glsl/glsl_parser_extras.cpp   |   2 +-
 src/compiler/glsl/ir.h |   3 +-
 src/compiler/nir/nir.c |   4 +
 src/compiler/nir/nir.h |   2 +
 src/compiler/nir/nir_gather_info.c |   1 +
 src/compiler/nir/nir_intrinsics.h  |   2 +
 src/compiler/nir/nir_lower_system_values.c |  16 +-
 src/intel/vulkan/anv_cmd_buffer.c  |  52 +++
 src/intel/vulkan/anv_pipeline.c|   4 +
 src/intel/vulkan/anv_private.h |   1 -
 src/intel/vulkan/gen7_cmd_buffer.c |  15 +-
 src/intel/vulkan/gen8_cmd_buffer.c |  13 +-
 src/intel/vulkan/genX_cmd_buffer.c |   4 +-
 src/intel/vulkan/genX_pipeline.c   |  12 +-
 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_compiler.h   |  22 ++-
 src/mesa/drivers/dri/i965/brw_cs.c |   3 +
 src/mesa/drivers/dri/i965/brw_defines.h|   3 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 197 +
 src/mesa/drivers/dri/i965/brw_fs.h |   1 -
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp   |  22 ++-
 src/mesa/drivers/dri/i965/brw_nir.h|   2 +
 src/mesa/drivers/dri/i965/brw_nir_intrinsics.c | 179 ++
 src/mesa/drivers/dri/i965/gen7_cs_state.c  | 124 
 src/mesa/main/mtypes.h |   3 +
 src/mesa/state_tracker/st_extensions.c |   1 +
 27 files changed, 472 insertions(+), 246 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_nir_intrinsics.c

-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 01/12] glsl: Add glsl LowerCsDerivedVariables option

2016-06-01 Thread Jordan Justen
v2:
 * Move lower flag to context constants. (Ken)

Signed-off-by: Jordan Justen 
Reviewed-by: Kenneth Graunke  (v1)
---
 src/compiler/glsl/builtin_variables.cpp  | 29 ++---
 src/compiler/glsl/glsl_parser_extras.cpp |  2 +-
 src/compiler/glsl/ir.h   |  3 ++-
 src/mesa/drivers/dri/i965/brw_context.c  |  1 +
 src/mesa/main/mtypes.h   |  3 +++
 src/mesa/state_tracker/st_extensions.c   |  1 +
 6 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/src/compiler/glsl/builtin_variables.cpp 
b/src/compiler/glsl/builtin_variables.cpp
index 401c713..05b3b0b 100644
--- a/src/compiler/glsl/builtin_variables.cpp
+++ b/src/compiler/glsl/builtin_variables.cpp
@@ -1201,8 +1201,15 @@ builtin_variable_generator::generate_cs_special_vars()
 "gl_LocalInvocationID");
add_system_value(SYSTEM_VALUE_WORK_GROUP_ID, uvec3_t, "gl_WorkGroupID");
add_system_value(SYSTEM_VALUE_NUM_WORK_GROUPS, uvec3_t, "gl_NumWorkGroups");
-   add_variable("gl_GlobalInvocationID", uvec3_t, ir_var_auto, 0);
-   add_variable("gl_LocalInvocationIndex", uint_t, ir_var_auto, 0);
+   if (state->ctx->Const.LowerCsDerivedVariables) {
+  add_variable("gl_GlobalInvocationID", uvec3_t, ir_var_auto, 0);
+  add_variable("gl_LocalInvocationIndex", uint_t, ir_var_auto, 0);
+   } else {
+  add_system_value(SYSTEM_VALUE_GLOBAL_INVOCATION_ID,
+   uvec3_t, "gl_GlobalInvocationID");
+  add_system_value(SYSTEM_VALUE_LOCAL_INVOCATION_INDEX,
+   uint_t, "gl_LocalInvocationIndex");
+   }
 }
 
 
@@ -1431,16 +1438,16 @@ initialize_cs_derived_variables(gl_shader *shader,
  * These are initialized in the main function.
  */
 void
-_mesa_glsl_initialize_derived_variables(gl_shader *shader)
+_mesa_glsl_initialize_derived_variables(struct gl_context *ctx,
+gl_shader *shader)
 {
/* We only need to set CS variables currently. */
-   if (shader->Stage != MESA_SHADER_COMPUTE)
-  return;
+   if (shader->Stage == MESA_SHADER_COMPUTE &&
+   ctx->Const.LowerCsDerivedVariables) {
+  ir_function_signature *const main_sig =
+ _mesa_get_main_function_signature(shader);
 
-   ir_function_signature *const main_sig =
-  _mesa_get_main_function_signature(shader);
-   if (main_sig == NULL)
-  return;
-
-   initialize_cs_derived_variables(shader, main_sig);
+  if (main_sig != NULL)
+ initialize_cs_derived_variables(shader, main_sig);
+   }
 }
diff --git a/src/compiler/glsl/glsl_parser_extras.cpp 
b/src/compiler/glsl/glsl_parser_extras.cpp
index 2e3395e..c9654ac 100644
--- a/src/compiler/glsl/glsl_parser_extras.cpp
+++ b/src/compiler/glsl/glsl_parser_extras.cpp
@@ -1907,7 +1907,7 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct 
gl_shader *shader,
   }
}
 
-   _mesa_glsl_initialize_derived_variables(shader);
+   _mesa_glsl_initialize_derived_variables(ctx, shader);
 
delete state->symbols;
ralloc_free(state);
diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
index e8efd27..93716c4 100644
--- a/src/compiler/glsl/ir.h
+++ b/src/compiler/glsl/ir.h
@@ -2562,7 +2562,8 @@ _mesa_glsl_initialize_variables(exec_list *instructions,
struct _mesa_glsl_parse_state *state);
 
 extern void
-_mesa_glsl_initialize_derived_variables(gl_shader *shader);
+_mesa_glsl_initialize_derived_variables(struct gl_context *ctx,
+gl_shader *shader);
 
 extern void
 _mesa_glsl_initialize_functions(_mesa_glsl_parse_state *state);
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 97dc226..ad8d514 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -599,6 +599,7 @@ brw_initialize_context_constants(struct brw_context *brw)
   ctx->Const.MaxClipPlanes = 8;
 
ctx->Const.LowerTessLevel = true;
+   ctx->Const.LowerCsDerivedVariables = true;
ctx->Const.PrimitiveRestartForPatches = true;
 
ctx->Const.Program[MESA_SHADER_VERTEX].MaxNativeInstructions = 16 * 1024;
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 2233526..d0f3760 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3763,6 +3763,9 @@ struct gl_constants
GLuint MaxTessControlTotalOutputComponents;
bool LowerTessLevel; /**< Lower gl_TessLevel* from float[n] to vecn? */
bool PrimitiveRestartForPatches;
+   bool LowerCsDerivedVariables;/**< Lower gl_GlobalInvocationID and
+ *   gl_LocalInvocationIndex based on
+ *   other builtin variables. */
 };
 
 
diff --git a/src/mesa/state_tracker/st_extensions.c 
b/src/mesa/state_tracker/st_extensions.c
index d35e19f..383983b 100644
--- a/src/mesa/state_tracker/st_extensions.c
+++ b/src/mesa/state_tracker/st_extensions.c

Re: [Mesa-dev] [PATCH 3/5] mesa: Fix add_index_to_name logic

2016-06-01 Thread Ian Romanick
On 05/31/2016 04:45 PM, Ian Romanick wrote:
> On 05/31/2016 02:44 PM, Timothy Arceri wrote:
>> On Tue, 2016-05-31 at 11:52 -0700, Ian Romanick wrote:
>>> From: Ian Romanick 
>>>
>>> Our piglit tests for geometry and tessellation shader inputs were
>>> incorrect.  Array shader inputs and output should have '[0]' on the
>>> end
>>> regardless of stage.  In addtion, transform feedback varyings should
>>> not.
>>
>> Is there a spec quote for this? It doesn't seem right to me since for
>> arrays of arrays would that mean we should end up with gs inputs like
>> this
> 
> Here are all the rules that I think applies:
> 
>   * For an active variable declared as an array of basic types, a single
> entry will be generated, with its name string formed by concatenating
> the name of the array and the string "[0]".
> 
>   * For an active variable declared as a structure, a separate entry will
> be generated for each active structure member.  The name of each entry
> is formed by concatenating the name of the structure, the "."
> character, and the name of the structure member.  If a structure
> member to enumerate is itself a structure or array, these enumeration
> rules are applied recursively.
> 
>   * For an active variable declared as an array of an aggregate data type
> (structures or arrays), a separate entry will be generated for each
> active array element, unless noted immediately below.  The name of
> each entry is formed by concatenating the name of the array, the "["
> character, an integer identifying the element number, and the "]"
> character.  These enumeration rules are applied recursively, treating
> each enumerated array element as a separate active variable.
> 
>> input_name[0][0]
>> input_name[...][0]
>> input_name[num_vertices-1][0]
> 
> Yes, this is correct. We don't do this with or without this patch. I
> don't know of any tests that exercise this.  Alas.
> 
>> otherwise
>>
>> in vec4 input1[];
>> and
>> in vec4 input2[][3];
>>
>> Would both end up as:
>> input1[0]
>> input2[0]
>>
>>>
>>> Signed-off-by: Ian Romanick 
>>> Cc: "12.0" 
>>> ---
>>>  src/mesa/main/shader_query.cpp | 23 ++-
>>>  1 file changed, 10 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/src/mesa/main/shader_query.cpp
>>> b/src/mesa/main/shader_query.cpp
>>> index eec933c..f4b7243 100644
>>> --- a/src/mesa/main/shader_query.cpp
>>> +++ b/src/mesa/main/shader_query.cpp
>>> @@ -696,20 +696,17 @@ _mesa_program_resource_find_index(struct
>>> gl_shader_program *shProg,
>>>  static bool
>>>  add_index_to_name(struct gl_program_resource *res)
>>>  {
>>> -   bool add_index = !((res->Type == GL_PROGRAM_INPUT &&
>>> -   res->StageReferences & (1 <<
>>> MESA_SHADER_GEOMETRY |
>>> -   1 <<
>>> MESA_SHADER_TESS_CTRL |
>>> -   1 <<
>>> MESA_SHADER_TESS_EVAL)) ||
>>> -  (res->Type == GL_PROGRAM_OUTPUT &&
>>> -   res->StageReferences & 1 <<
>>> MESA_SHADER_TESS_CTRL));
>>> -
>>> -   /* Transform feedback varyings have array index already appended
>>> -* in their names.
>>> -*/
>>> -   if (res->Type == GL_TRANSFORM_FEEDBACK_VARYING)
>>> -  add_index = false;
>>> +   if (res->Type != GL_PROGRAM_INPUT && res->Type !=
>>> GL_PROGRAM_OUTPUT)
>>> +  return res->Type != GL_TRANSFORM_FEEDBACK_VARYING;
>>
>> I'm slighlty confused by this. When does this return true? And for
>> transform feedback wont this always end up us false?
>>
>> So isn't it just 
>>
>>if (res->Type == GL_TRANSFORM_FEEDBACK_VARYING)
>>   return false;
> 
> I thought I tried that but it regressed some dEQP tests.  I'll double
> check.

It makes a huge pile of dEQP tests fail because lots of things,
including UBOs and SSBOs, come through this function.  Inputs and
outputs get some special treatment because of the array-of-interface
handing, and xfb variables never get the [0] suffix.  Everything else
gets the [0] suffix.

>>> +
>>> +   const gl_shader_variable *const var = RESOURCE_VAR(res);
>>>  
>>> -   return add_index;
>>> +   assert(var->type->is_array());
>>> +
>>> +   if (var->interface_type != NULL && var->interface_type-
 is_array())
>>> +  return var->type->fields.array->is_array();
>>> +
>>
>> So I'm assuming your doing this since after lowering block[3].foo we
>> end up with foo[3]. However something like block[3].foo[2] will end up
>> as foo[3][2] and blocks can also be arrays of arrays so I'm not sure
>> this will work.
> 
> Correct.  I had forgotten about blocks being arrays-of-arrays.  That
> should be easy enough to fix.
> 
>>> +   return true;
>>>  }
>>>  
>>>  /* Get name length of a program resource. This consists of
> 
> 

[Mesa-dev] [PATCH 7/9] anv/pipeline: Silently pass tests if depth or stencil is missing

2016-06-01 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
Cc: Ian Romanick 
---
 src/intel/vulkan/gen7_pipeline.c  | 12 ++--
 src/intel/vulkan/gen8_pipeline.c  | 12 ++--
 src/intel/vulkan/genX_pipeline_util.h | 30 +-
 3 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/src/intel/vulkan/gen7_pipeline.c b/src/intel/vulkan/gen7_pipeline.c
index 243b18b..0d2d086 100644
--- a/src/intel/vulkan/gen7_pipeline.c
+++ b/src/intel/vulkan/gen7_pipeline.c
@@ -155,6 +155,8 @@ genX(graphics_pipeline_create)(
 VkPipeline* pPipeline)
 {
ANV_FROM_HANDLE(anv_device, device, _device);
+   ANV_FROM_HANDLE(anv_render_pass, pass, pCreateInfo->renderPass);
+   struct anv_subpass *subpass = >subpasses[pCreateInfo->subpass];
struct anv_pipeline *pipeline;
VkResult result;
 
@@ -178,7 +180,7 @@ genX(graphics_pipeline_create)(
assert(pCreateInfo->pRasterizationState);
gen7_emit_rs_state(pipeline, pCreateInfo->pRasterizationState, extra);
 
-   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState);
+   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState, pass, subpass);
 
gen7_emit_cb_state(pipeline, pCreateInfo->pColorBlendState,
 pCreateInfo->pMultisampleState);
@@ -369,10 +371,16 @@ genX(graphics_pipeline_create)(
  wm.PixelShaderUsesSourceW  = wm_prog_data->uses_src_w;
  wm.PixelShaderUsesInputCoverageMask= 
wm_prog_data->uses_sample_mask;
 
+ /* TODO: We could probably do something a bit more intellegent here.
+  * However, CTS tests expect that if earliy fragment tests are not
+  * performed, the shader *will* be executed for every fragment.  In
+  * order to work around this we would have to check whether or not
+  * the shader has side-effects before we can set the mode to NORMAL.
+  */
  if (wm_prog_data->early_fragment_tests) {
 wm.EarlyDepthStencilControl = EDSC_PREPS;
  } else {
-wm.EarlyDepthStencilControl = EDSC_NORMAL;
+wm.EarlyDepthStencilControl = EDSC_PSEXEC;
  }
 
  wm.BarycentricInterpolationMode= 
wm_prog_data->barycentric_interp_modes;
diff --git a/src/intel/vulkan/gen8_pipeline.c b/src/intel/vulkan/gen8_pipeline.c
index 7cc7c51..4b477ee 100644
--- a/src/intel/vulkan/gen8_pipeline.c
+++ b/src/intel/vulkan/gen8_pipeline.c
@@ -268,6 +268,8 @@ genX(graphics_pipeline_create)(
 VkPipeline* pPipeline)
 {
ANV_FROM_HANDLE(anv_device, device, _device);
+   ANV_FROM_HANDLE(anv_render_pass, pass, pCreateInfo->renderPass);
+   struct anv_subpass *subpass = >subpasses[pCreateInfo->subpass];
struct anv_pipeline *pipeline;
VkResult result;
uint32_t offset, length;
@@ -294,7 +296,7 @@ genX(graphics_pipeline_create)(
emit_rs_state(pipeline, pCreateInfo->pRasterizationState,
  pCreateInfo->pMultisampleState, extra);
emit_ms_state(pipeline, pCreateInfo->pMultisampleState);
-   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState);
+   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState, pass, subpass);
emit_cb_state(pipeline, pCreateInfo->pColorBlendState,
pCreateInfo->pMultisampleState);
 
@@ -330,10 +332,16 @@ genX(graphics_pipeline_create)(
   wm.ForceThreadDispatchEnable   = NORMAL;
   wm.PointRasterizationRule  = RASTRULE_UPPER_RIGHT;
 
+  /* TODO: We could probably do something a bit more intellegent here.
+   * However, CTS tests expect that if earliy fragment tests are not
+   * performed, the shader *will* be executed for every fragment.  In
+   * order to work around this we would have to check whether or not
+   * the shader has side-effects before we can set the mode to NORMAL.
+   */
   if (wm_prog_data && wm_prog_data->early_fragment_tests) {
  wm.EarlyDepthStencilControl = PREPS;
   } else {
- wm.EarlyDepthStencilControl = NORMAL;
+ wm.EarlyDepthStencilControl = PSEXEC;
   }
 
   wm.BarycentricInterpolationMode = pipeline->ps_ksp0 == NO_KERNEL ?
diff --git a/src/intel/vulkan/genX_pipeline_util.h 
b/src/intel/vulkan/genX_pipeline_util.h
index fe24048..669b456 100644
--- a/src/intel/vulkan/genX_pipeline_util.h
+++ b/src/intel/vulkan/genX_pipeline_util.h
@@ -21,6 +21,8 @@
  * IN THE SOFTWARE.
  */
 
+#include "vk_format_info.h"
+
 static uint32_t
 vertex_element_comp_control(enum isl_format format, unsigned comp)
 {
@@ -428,7 +430,9 @@ static const uint32_t vk_to_gen_stencil_op[] = {
 
 static void
 emit_ds_state(struct anv_pipeline *pipeline,
-  const VkPipelineDepthStencilStateCreateInfo *info)
+  const VkPipelineDepthStencilStateCreateInfo *info,
+  const 

[Mesa-dev] [PATCH 8/9] nir/spirv: Use breaks instead of returns in constant handling

2016-06-01 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
Cc: Ian Romanick 
---
 src/compiler/spirv/spirv_to_nir.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 4061b8a..bb7aba4 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -1028,7 +1028,7 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
val->constant->value.u[i] = u[comp];
 }
  }
- return;
+ break;
   }
 
   case SpvOpCompositeExtract:
@@ -1105,7 +1105,7 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
   (*c)->value.u[elem + i] = insert->constant->value.u[i];
 }
  }
- return;
+ break;
   }
 
   default: {
@@ -1134,9 +1134,10 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
  for (unsigned k = 0; k < num_components; k++)
 val->constant->value.u[k] = res.u32[k];
 
- return;
+ break;
   } /* default */
   }
+  break;
}
 
case SpvOpConstantNull:
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] nir/lower_indirect_derefs: Use the direct array deref for recursion

2016-06-01 Thread Jason Ekstrand
This fixes about 100 of the new Vulkan CTS tests.

Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
Cc: Connor Abbott 
Cc: Ian Romanick 
Cc: Kenneth Graunke 
---
 src/compiler/nir/nir_lower_indirect_derefs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_lower_indirect_derefs.c 
b/src/compiler/nir/nir_lower_indirect_derefs.c
index 694a6e0..1bf4bf6 100644
--- a/src/compiler/nir/nir_lower_indirect_derefs.c
+++ b/src/compiler/nir/nir_lower_indirect_derefs.c
@@ -50,7 +50,7 @@ emit_indirect_load_store(nir_builder *b, nir_intrinsic_instr 
*orig_instr,
   direct.indirect = NIR_SRC_INIT;
 
   arr_parent->child = 
-  emit_load_store(b, orig_instr, deref, >deref, dest, src);
+  emit_load_store(b, orig_instr, deref, , dest, src);
   arr_parent->child = >deref;
} else {
   int mid = start + (end - start) / 2;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] anv: Fix several of the new Vulkan CTS tests

2016-06-01 Thread Jason Ekstrand
I recently grabbed the latest dev version of the Vulkan CTS and ran it on
our driver.  This series fixes a bunch of the bugs that it exposed.  In an
effort to get more people involved in Vulkan development and in the hopes
of actually getting reviews, I've CC'd at least one person on each patch.
If you got CC'd, it doesn't necesaraly mean you *have* to review it, just
that I think you seemed like a good candidate. :-)

Jason Ekstrand (9):
  anv/clear: Handle ClearImage on 3-D images
  nir/lower_indirect_derefs: Use the direct array deref for recursion
  anv/pipeline: Refactor specialization constant handling a bit
  anv/pipeline: Add support for early depth stencil
  genxml/gen6,7,75: s/BackFace/Backface
  anv/pipeline: Unify gen7/8 emit_ds_state
  anv/pipeline: Silently pass tests if depth or stencil is missing
  nir/spirv: Use breaks instead of returns in constant handling
  nir/spirv: Handle the WorkgroupSize builtin decoration

 src/compiler/nir/nir_lower_indirect_derefs.c |  2 +-
 src/compiler/spirv/spirv_to_nir.c| 29 +-
 src/intel/genxml/gen6.xml|  4 +-
 src/intel/genxml/gen7.xml|  4 +-
 src/intel/genxml/gen75.xml   |  4 +-
 src/intel/vulkan/anv_meta_clear.c|  6 +-
 src/intel/vulkan/anv_pipeline.c  |  9 ++-
 src/intel/vulkan/gen7_cmd_buffer.c   |  2 +-
 src/intel/vulkan/gen7_pipeline.c | 53 +
 src/intel/vulkan/gen8_pipeline.c | 66 +
 src/intel/vulkan/genX_pipeline_util.h| 87 
 11 files changed, 160 insertions(+), 106 deletions(-)

-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] anv/pipeline: Refactor specialization constant handling a bit

2016-06-01 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
Cc: Jordan Justen 
---
 src/intel/vulkan/anv_pipeline.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 789bc1a..372feeb 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -123,13 +123,12 @@ anv_shader_compile_to_nir(struct anv_device *device,
  num_spec_entries = spec_info->mapEntryCount;
  spec_entries = malloc(num_spec_entries * sizeof(*spec_entries));
  for (uint32_t i = 0; i < num_spec_entries; i++) {
-const uint32_t *data =
-   spec_info->pData + spec_info->pMapEntries[i].offset;
-assert((const void *)(data + 1) <=
-   spec_info->pData + spec_info->dataSize);
+VkSpecializationMapEntry entry = spec_info->pMapEntries[i];
+const void *data = spec_info->pData + entry.offset;
+assert(data + entry.size <= spec_info->pData + 
spec_info->dataSize);
 
 spec_entries[i].id = spec_info->pMapEntries[i].constantID;
-spec_entries[i].data = *data;
+spec_entries[i].data = *(const uint32_t *)data;
  }
   }
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] anv/pipeline: Unify gen7/8 emit_ds_state

2016-06-01 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
Cc: Jordan Justen 
---
 src/intel/vulkan/gen7_pipeline.c  | 37 +-
 src/intel/vulkan/gen8_pipeline.c  | 49 -
 src/intel/vulkan/genX_pipeline_util.h | 59 +++
 3 files changed, 60 insertions(+), 85 deletions(-)

diff --git a/src/intel/vulkan/gen7_pipeline.c b/src/intel/vulkan/gen7_pipeline.c
index 14614ac..243b18b 100644
--- a/src/intel/vulkan/gen7_pipeline.c
+++ b/src/intel/vulkan/gen7_pipeline.c
@@ -76,41 +76,6 @@ gen7_emit_rs_state(struct anv_pipeline *pipeline,
 }
 
 static void
-gen7_emit_ds_state(struct anv_pipeline *pipeline,
-   const VkPipelineDepthStencilStateCreateInfo *info)
-{
-   if (info == NULL) {
-  /* We're going to OR this together with the dynamic state.  We need
-   * to make sure it's initialized to something useful.
-   */
-  memset(pipeline->gen7.depth_stencil_state, 0,
- sizeof(pipeline->gen7.depth_stencil_state));
-  return;
-   }
-
-   struct GENX(DEPTH_STENCIL_STATE) state = {
-  .DepthTestEnable = info->depthTestEnable,
-  .DepthBufferWriteEnable = info->depthWriteEnable,
-  .DepthTestFunction = vk_to_gen_compare_op[info->depthCompareOp],
-  .DoubleSidedStencilEnable = true,
-
-  .StencilTestEnable = info->stencilTestEnable,
-  .StencilBufferWriteEnable = info->stencilTestEnable,
-  .StencilFailOp = vk_to_gen_stencil_op[info->front.failOp],
-  .StencilPassDepthPassOp = vk_to_gen_stencil_op[info->front.passOp],
-  .StencilPassDepthFailOp = vk_to_gen_stencil_op[info->front.depthFailOp],
-  .StencilTestFunction = vk_to_gen_compare_op[info->front.compareOp],
-
-  .BackfaceStencilFailOp = vk_to_gen_stencil_op[info->back.failOp],
-  .BackfaceStencilPassDepthPassOp = 
vk_to_gen_stencil_op[info->back.passOp],
-  .BackfaceStencilPassDepthFailOp = 
vk_to_gen_stencil_op[info->back.depthFailOp],
-  .BackfaceStencilTestFunction = 
vk_to_gen_compare_op[info->back.compareOp],
-   };
-
-   GENX(DEPTH_STENCIL_STATE_pack)(NULL, >gen7.depth_stencil_state, 
);
-}
-
-static void
 gen7_emit_cb_state(struct anv_pipeline *pipeline,
const VkPipelineColorBlendStateCreateInfo *info,
const VkPipelineMultisampleStateCreateInfo *ms_info)
@@ -213,7 +178,7 @@ genX(graphics_pipeline_create)(
assert(pCreateInfo->pRasterizationState);
gen7_emit_rs_state(pipeline, pCreateInfo->pRasterizationState, extra);
 
-   gen7_emit_ds_state(pipeline, pCreateInfo->pDepthStencilState);
+   emit_ds_state(pipeline, pCreateInfo->pDepthStencilState);
 
gen7_emit_cb_state(pipeline, pCreateInfo->pColorBlendState,
 pCreateInfo->pMultisampleState);
diff --git a/src/intel/vulkan/gen8_pipeline.c b/src/intel/vulkan/gen8_pipeline.c
index 48774a5..7cc7c51 100644
--- a/src/intel/vulkan/gen8_pipeline.c
+++ b/src/intel/vulkan/gen8_pipeline.c
@@ -216,55 +216,6 @@ emit_cb_state(struct anv_pipeline *pipeline,
 }
 
 static void
-emit_ds_state(struct anv_pipeline *pipeline,
-  const VkPipelineDepthStencilStateCreateInfo *info)
-{
-   uint32_t *dw = GEN_GEN == 8 ?
-  pipeline->gen8.wm_depth_stencil : pipeline->gen9.wm_depth_stencil;
-
-   if (info == NULL) {
-  /* We're going to OR this together with the dynamic state.  We need
-   * to make sure it's initialized to something useful.
-   */
-  memset(pipeline->gen8.wm_depth_stencil, 0,
- sizeof(pipeline->gen8.wm_depth_stencil));
-  memset(pipeline->gen9.wm_depth_stencil, 0,
- sizeof(pipeline->gen9.wm_depth_stencil));
-  return;
-   }
-
-   /* VkBool32 depthBoundsTestEnable; // optional (depth_bounds_test) */
-
-   struct GENX(3DSTATE_WM_DEPTH_STENCIL) wm_depth_stencil = {
-  .DepthTestEnable = info->depthTestEnable,
-  .DepthBufferWriteEnable = info->depthWriteEnable,
-  .DepthTestFunction = vk_to_gen_compare_op[info->depthCompareOp],
-  .DoubleSidedStencilEnable = true,
-
-  .StencilTestEnable = info->stencilTestEnable,
-  .StencilBufferWriteEnable = info->stencilTestEnable,
-  .StencilFailOp = vk_to_gen_stencil_op[info->front.failOp],
-  .StencilPassDepthPassOp = vk_to_gen_stencil_op[info->front.passOp],
-  .StencilPassDepthFailOp = vk_to_gen_stencil_op[info->front.depthFailOp],
-  .StencilTestFunction = vk_to_gen_compare_op[info->front.compareOp],
-  .BackfaceStencilFailOp = vk_to_gen_stencil_op[info->back.failOp],
-  .BackfaceStencilPassDepthPassOp = 
vk_to_gen_stencil_op[info->back.passOp],
-  .BackfaceStencilPassDepthFailOp 
=vk_to_gen_stencil_op[info->back.depthFailOp],
-  .BackfaceStencilTestFunction = 
vk_to_gen_compare_op[info->back.compareOp],
-   };
-
-   /* From the Broadwell PRM:
-*
-*"If Depth_Test_Enable = 1 AND Depth_Test_func = EQUAL, the
-*

[Mesa-dev] [PATCH 9/9] nir/spirv: Handle the WorkgroupSize builtin decoration

2016-06-01 Thread Jason Ekstrand
This fixes the 7 dEQP-VK.pipeline.spec_constant.compute.local_size.* tests
in the latest dev version of the Vulkan CTS.

Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
Cc: Ian Romanick 
---
 src/compiler/spirv/spirv_to_nir.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index bb7aba4..cece645 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -932,6 +932,25 @@ get_specialization(struct vtn_builder *b, struct vtn_value 
*val,
 }
 
 static void
+handle_workgroup_size_decoration_cb(struct vtn_builder *b,
+struct vtn_value *val,
+int member,
+const struct vtn_decoration *dec,
+void *data)
+{
+   assert(member == -1);
+   if (dec->decoration != SpvDecorationBuiltIn ||
+   dec->literals[0] != SpvBuiltInWorkgroupSize)
+  return;
+
+   assert(val->const_type == glsl_vector_type(GLSL_TYPE_UINT, 3));
+
+   b->shader->info.cs.local_size[0] = val->constant->value.u[0];
+   b->shader->info.cs.local_size[1] = val->constant->value.u[1];
+   b->shader->info.cs.local_size[2] = val->constant->value.u[2];
+}
+
+static void
 vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
 const uint32_t *w, unsigned count)
 {
@@ -1151,6 +1170,9 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
default:
   unreachable("Unhandled opcode");
}
+
+   /* Now that we have the value, update the workgroup size if needed */
+   vtn_foreach_decoration(b, val, handle_workgroup_size_decoration_cb, NULL);
 }
 
 static void
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] genxml/gen6,7,75: s/BackFace/Backface

2016-06-01 Thread Jason Ekstrand
This is more consistent with gen8+

Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
Cc: Jordan Justen 
---
 src/intel/genxml/gen6.xml  | 4 ++--
 src/intel/genxml/gen7.xml  | 4 ++--
 src/intel/genxml/gen75.xml | 4 ++--
 src/intel/vulkan/gen7_cmd_buffer.c | 2 +-
 src/intel/vulkan/gen7_pipeline.c   | 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/intel/genxml/gen6.xml b/src/intel/genxml/gen6.xml
index afaea7f..7525fce 100644
--- a/src/intel/genxml/gen6.xml
+++ b/src/intel/genxml/gen6.xml
@@ -176,7 +176,7 @@
 
   
 
-
+
 
 
   
@@ -216,7 +216,7 @@
 
 
 
-
+
   
   
   
diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml
index 7417f55..6f3e8cc 100644
--- a/src/intel/genxml/gen7.xml
+++ b/src/intel/genxml/gen7.xml
@@ -199,7 +199,7 @@
 
   
 
-
+
 
 
   
@@ -239,7 +239,7 @@
 
 
 
-
+
   
   
   
diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml
index 709904f..ac1b6e4 100644
--- a/src/intel/genxml/gen75.xml
+++ b/src/intel/genxml/gen75.xml
@@ -209,7 +209,7 @@
 
   
 
-
+
 
 
   
@@ -249,7 +249,7 @@
 
 
 
-
+
   
   
   
diff --git a/src/intel/vulkan/gen7_cmd_buffer.c 
b/src/intel/vulkan/gen7_cmd_buffer.c
index 331275e..714d14a 100644
--- a/src/intel/vulkan/gen7_cmd_buffer.c
+++ b/src/intel/vulkan/gen7_cmd_buffer.c
@@ -357,7 +357,7 @@ genX(cmd_buffer_flush_dynamic_state)(struct anv_cmd_buffer 
*cmd_buffer)
  .BlendConstantColorBlue = 
cmd_buffer->state.dynamic.blend_constants[2],
  .BlendConstantColorAlpha = 
cmd_buffer->state.dynamic.blend_constants[3],
  .StencilReferenceValue = d->stencil_reference.front & 0xff,
- .BackFaceStencilReferenceValue = d->stencil_reference.back & 0xff,
+ .BackfaceStencilReferenceValue = d->stencil_reference.back & 0xff,
   };
   GENX(COLOR_CALC_STATE_pack)(NULL, cc_state.map, );
   if (!cmd_buffer->device->info.has_llc)
diff --git a/src/intel/vulkan/gen7_pipeline.c b/src/intel/vulkan/gen7_pipeline.c
index 2cfd7bf..14614ac 100644
--- a/src/intel/vulkan/gen7_pipeline.c
+++ b/src/intel/vulkan/gen7_pipeline.c
@@ -104,7 +104,7 @@ gen7_emit_ds_state(struct anv_pipeline *pipeline,
   .BackfaceStencilFailOp = vk_to_gen_stencil_op[info->back.failOp],
   .BackfaceStencilPassDepthPassOp = 
vk_to_gen_stencil_op[info->back.passOp],
   .BackfaceStencilPassDepthFailOp = 
vk_to_gen_stencil_op[info->back.depthFailOp],
-  .BackFaceStencilTestFunction = 
vk_to_gen_compare_op[info->back.compareOp],
+  .BackfaceStencilTestFunction = 
vk_to_gen_compare_op[info->back.compareOp],
};
 
GENX(DEPTH_STENCIL_STATE_pack)(NULL, >gen7.depth_stencil_state, 
);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] anv/pipeline: Add support for early depth stencil

2016-06-01 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
Cc: Jordan Justen 
---
 src/intel/vulkan/gen7_pipeline.c | 8 +++-
 src/intel/vulkan/gen8_pipeline.c | 7 ++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/gen7_pipeline.c b/src/intel/vulkan/gen7_pipeline.c
index 285b191..2cfd7bf 100644
--- a/src/intel/vulkan/gen7_pipeline.c
+++ b/src/intel/vulkan/gen7_pipeline.c
@@ -398,12 +398,18 @@ genX(graphics_pipeline_create)(
  wm.ThreadDispatchEnable= true;
  wm.LineEndCapAntialiasingRegionWidth   = 0; /* 0.5 pixels */
  wm.LineAntialiasingRegionWidth = 1; /* 1.0 pixels */
- wm.EarlyDepthStencilControl= EDSC_NORMAL;
  wm.PointRasterizationRule  = RASTRULE_UPPER_RIGHT;
  wm.PixelShaderComputedDepthMode= 
wm_prog_data->computed_depth_mode;
  wm.PixelShaderUsesSourceDepth  = wm_prog_data->uses_src_depth;
  wm.PixelShaderUsesSourceW  = wm_prog_data->uses_src_w;
  wm.PixelShaderUsesInputCoverageMask= 
wm_prog_data->uses_sample_mask;
+
+ if (wm_prog_data->early_fragment_tests) {
+wm.EarlyDepthStencilControl = EDSC_PREPS;
+ } else {
+wm.EarlyDepthStencilControl = EDSC_NORMAL;
+ }
+
  wm.BarycentricInterpolationMode= 
wm_prog_data->barycentric_interp_modes;
   }
}
diff --git a/src/intel/vulkan/gen8_pipeline.c b/src/intel/vulkan/gen8_pipeline.c
index d966694..48774a5 100644
--- a/src/intel/vulkan/gen8_pipeline.c
+++ b/src/intel/vulkan/gen8_pipeline.c
@@ -376,10 +376,15 @@ genX(graphics_pipeline_create)(
   wm.StatisticsEnable= true;
   wm.LineEndCapAntialiasingRegionWidth   = _05pixels;
   wm.LineAntialiasingRegionWidth = _10pixels;
-  wm.EarlyDepthStencilControl= NORMAL;
   wm.ForceThreadDispatchEnable   = NORMAL;
   wm.PointRasterizationRule  = RASTRULE_UPPER_RIGHT;
 
+  if (wm_prog_data && wm_prog_data->early_fragment_tests) {
+ wm.EarlyDepthStencilControl = PREPS;
+  } else {
+ wm.EarlyDepthStencilControl = NORMAL;
+  }
+
   wm.BarycentricInterpolationMode = pipeline->ps_ksp0 == NO_KERNEL ?
  0 : wm_prog_data->barycentric_interp_modes;
}
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] anv/clear: Handle ClearImage on 3-D images

2016-06-01 Thread Jason Ekstrand
Signed-off-by: Jason Ekstrand 
Cc: "12.0" 
Cc: Nanley Chery 
---
 src/intel/vulkan/anv_meta_clear.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_meta_clear.c 
b/src/intel/vulkan/anv_meta_clear.c
index 18dfae8..fe750c8 100644
--- a/src/intel/vulkan/anv_meta_clear.c
+++ b/src/intel/vulkan/anv_meta_clear.c
@@ -761,9 +761,11 @@ anv_cmd_clear_image(struct anv_cmd_buffer *cmd_buffer,
 
for (uint32_t r = 0; r < range_count; r++) {
   const VkImageSubresourceRange *range = [r];
-
   for (uint32_t l = 0; l < anv_get_levelCount(image, range); ++l) {
- for (uint32_t s = 0; s < anv_get_layerCount(image, range); ++s) {
+ const uint32_t layer_count = image->type == VK_IMAGE_TYPE_3D ?
+  anv_minify(image->extent.depth, l) :
+  anv_get_layerCount(image, range);
+ for (uint32_t s = 0; s < layer_count; ++s) {
 struct anv_image_view iview;
 anv_image_view_init(, cmd_buffer->device,
&(VkImageViewCreateInfo) {
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/10] R600: Cache flush fixes and cleanup v2

2016-06-01 Thread Alex Deucher
On Wed, Jun 1, 2016 at 2:57 PM, Marek Olšák  wrote:
> Hi,
>
> This is version 2 of the previous series. This time it's been tested!!
>
> Tested cards:
> - RV670
> - RV730
> - EG/REDWOOD
> - CAYMAN
>
> This patch series:
> - fixes several bugs around making 3D and CP DMA idle with respect to CP.PFP,
>   which allows removing a lot of cache flushes (= hacks really) and IB flushes
>   around CP DMA
> - removes unnecessary cache flushes
> - moves other cache flushes to places where their frequency is lower
>
> From the perspective of functions:
> - binding shader resources doesn't flush anything (why should it)
> - set_framebuffer_state flushes CB, DB, TC
> - CP DMA copy_buffer only flushes TC, VC, KC. Never CB or DB.
> - CP DMA clear_buffer only flushes TC, VC, KC when shader coherency is
>   requested, or CB when CB coherency is requested. Never DB.
> - fast color clear no longer flushes TC, VC, KC, DB. (implied by clear_buffer)
> - ending streamout newly flushes TC, VC, KC
>
> From the perspective of caches:
> - TC is flushed only by set_framebuffer_state, texture_barrier, before
>   CP DMA (except fast color clear), and after streamout
> - VC & KC are flushed only before CP DMA (except fast color clear) or after
>   streamout
> - CB is flushed by set_framebuffer_state or by fast color clear
> - DB is only flushed by set_framebuffer_state
>
> More testing may be needed, especially testing on GPUs not listed above.
>
> Also available here:
> https://cgit.freedesktop.org/~mareko/mesa/log/?h=r600-opt-flushes
>
> Please review.

For the series:
Reviewed-by: Alex Deucher 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] mesa/glformats: add desktop gl checks on _mesa_base_tex_format

2016-06-01 Thread Ian Romanick
On 05/13/2016 07:57 AM, Alejandro Piñeiro wrote:
> There are several internalformat that are not supported on gl es, so
> it should return -1 if that is the case. This is needed in order to
> get ARB_internalformat_query2 implementation deciding correctly if
> a resource is supported or not on opengl es.
> 
> FWIW, in some cases, _mesa_base_fbo_format has equivalent checks
> for those internalformats, although for this method it is implemented
> as a check/break in most cases, to keep consistency within the function.
> 
> Acked-by: Eduardo Lima 
> Acked-by: Antia Puentes 
> ---
>  src/mesa/main/glformats.c | 76 
> ++-
>  1 file changed, 62 insertions(+), 14 deletions(-)
> 
> diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
> index 24ce7b0..26644ec 100644
> --- a/src/mesa/main/glformats.c
> +++ b/src/mesa/main/glformats.c
> @@ -2293,25 +2293,28 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
> GLint internalFormat)
> case 3:
>return (ctx->API != API_OPENGL_CORE) ? GL_RGB : -1;

I don't think 1, 2, 3, or 4 are allowed as internal formats in OpenGL
ES... they might be allowed in OpenGL ES 1.x, but I'm almost certain
they're not allowed in 2.0+.  If the extra ES checks aren't "reachable,"
I'm not sure what the value is in adding them.

> case GL_RGB:
> +   case GL_RGB8:
> +  return GL_RGB;
> case GL_R3_G3_B2:
> case GL_RGB4:
> case GL_RGB5:
> -   case GL_RGB8:
> case GL_RGB10:
> case GL_RGB12:
> case GL_RGB16:
> -  return GL_RGB;
> +  return _mesa_is_desktop_gl(ctx) ? GL_RGB : -1;
> case 4:
>return (ctx->API != API_OPENGL_CORE) ? GL_RGBA : -1;
> case GL_RGBA:
> -   case GL_RGBA2:
> case GL_RGBA4:
> case GL_RGB5_A1:
> case GL_RGBA8:
> -   case GL_RGB10_A2:
> +  return GL_RGBA;
> +   case GL_RGBA2:
> case GL_RGBA12:
> case GL_RGBA16:
> -  return GL_RGBA;
> +  return _mesa_is_desktop_gl(ctx) ? GL_RGBA : -1;
> +   case GL_RGB10_A2:
> +  return _mesa_is_desktop_gl(ctx) || _mesa_is_gles3(ctx) ? GL_RGBA : -1;
> default:
>; /* fallthrough */
> }
> @@ -2341,7 +2344,10 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
> GLint internalFormat)
>case GL_DEPTH_COMPONENT:
>case GL_DEPTH_COMPONENT16:
>case GL_DEPTH_COMPONENT24:
> + return GL_DEPTH_COMPONENT;
>case GL_DEPTH_COMPONENT32:
> + if (!_mesa_is_desktop_gl(ctx))
> +break;
>   return GL_DEPTH_COMPONENT;
>case GL_DEPTH_STENCIL:
>case GL_DEPTH24_STENCIL8:
> @@ -2374,8 +2380,12 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
> GLint internalFormat)
> case GL_COMPRESSED_INTENSITY:
>return GL_INTENSITY;
> case GL_COMPRESSED_RGB:
> +  if (!_mesa_is_desktop_gl(ctx))
> + break;
>return GL_RGB;
> case GL_COMPRESSED_RGBA:
> +  if (!_mesa_is_desktop_gl(ctx))
> + break;
>return GL_RGBA;
> default:
>; /* fallthrough */
> @@ -2426,37 +2436,57 @@ _mesa_base_tex_format(const struct gl_context *ctx, 
> GLint internalFormat)
>  
> if (ctx->Extensions.EXT_texture_snorm) {
>switch (internalFormat) {
> -  case GL_RED_SNORM:
>case GL_R8_SNORM:
> + return GL_RED;
> +  case GL_RED_SNORM:
>case GL_R16_SNORM:
> + if (!_mesa_is_desktop_gl(ctx))
> +break;
>   return GL_RED;
> -  case GL_RG_SNORM:
>case GL_RG8_SNORM:
> + return GL_RG;
> +  case GL_RG_SNORM:
>case GL_RG16_SNORM:
> + if (!_mesa_is_desktop_gl(ctx))
> +break;
>   return GL_RG;
> -  case GL_RGB_SNORM:
>case GL_RGB8_SNORM:
> + return GL_RGB;
> +  case GL_RGB_SNORM:
>case GL_RGB16_SNORM:
> + if (!_mesa_is_desktop_gl(ctx))
> +break;
>   return GL_RGB;
> -  case GL_RGBA_SNORM:
>case GL_RGBA8_SNORM:
> + return GL_RGBA;
> +  case GL_RGBA_SNORM:
>case GL_RGBA16_SNORM:
> + if (!_mesa_is_desktop_gl(ctx))
> +break;
>   return GL_RGBA;
>case GL_ALPHA_SNORM:
>case GL_ALPHA8_SNORM:
>case GL_ALPHA16_SNORM:
> + if (!_mesa_is_desktop_gl(ctx))
> +break;
>   return GL_ALPHA;
>case GL_LUMINANCE_SNORM:
>case GL_LUMINANCE8_SNORM:
>case GL_LUMINANCE16_SNORM:
> + if (!_mesa_is_desktop_gl(ctx))
> +break;
>   return GL_LUMINANCE;
>case GL_LUMINANCE_ALPHA_SNORM:
>case GL_LUMINANCE8_ALPHA8_SNORM:
>case GL_LUMINANCE16_ALPHA16_SNORM:
> + if (!_mesa_is_desktop_gl(ctx))
> +break;
>   return GL_LUMINANCE_ALPHA;
>case GL_INTENSITY_SNORM:
>case GL_INTENSITY8_SNORM:
>case GL_INTENSITY16_SNORM:
> + if (!_mesa_is_desktop_gl(ctx))
> +

Re: [Mesa-dev] [PATCH 3/5] mesa/formatquery: expand NUM_SAMPLE_COUNTS OpenGL ES comment

2016-06-01 Thread Ian Romanick
On 05/13/2016 07:57 AM, Alejandro Piñeiro wrote:
> For ES 3.0 NUM_SAMPLE_COUNTS spec points that some formats will be
> always zero. But on ES 3.1 can be different to zero.
> 
> The current code is correctly checking exactly against version 3.0,
> but the comment only mentions 3.0 spec. It is clearer mentioning both.
> 
> Acked-by: Eduardo Lima 
> Acked-by: Antia Puentes 
> ---
>  src/mesa/main/formatquery.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/mesa/main/formatquery.c b/src/mesa/main/formatquery.c
> index 1f21d17..1dfec5c 100644
> --- a/src/mesa/main/formatquery.c
> +++ b/src/mesa/main/formatquery.c
> @@ -877,6 +877,9 @@ _mesa_GetInternalformativ(GLenum target, GLenum 
> internalformat, GLenum pname,
> * "Since multisampling is not supported for signed and unsigned
> * integer internal formats, the value of NUM_SAMPLE_COUNTS will be
> * zero for such formats.
> +   *
> +   * But that is not true for GL ES 3.1. This is the reason why we are
> +   * checking against exactly version 30, instead of use _mesa_is_gles3.
> */

I think it would be better to say:

Since OpenGL ES 3.1 adds support for multisampled integer formats,
we have to check the version for 30 exactly.

>if (pname == GL_NUM_SAMPLE_COUNTS && ctx->API == API_OPENGLES2 &&
>ctx->Version == 30 && 
> _mesa_is_enum_format_integer(internalformat)) {
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/5] mesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERRED

2016-06-01 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

On 05/13/2016 07:57 AM, Alejandro Piñeiro wrote:
> The comment clarifies that the driver is called only to try to get
> a preferred internalformat, and that it was already checked if the
> format is supported or not.
> 
> Acked-by: Eduardo Lima 
> Acked-by: Antia Puentes 
> ---
>  src/mesa/main/formatquery.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/formatquery.c b/src/mesa/main/formatquery.c
> index 215c14f..1f21d17 100644
> --- a/src/mesa/main/formatquery.c
> +++ b/src/mesa/main/formatquery.c
> @@ -902,7 +902,10 @@ _mesa_GetInternalformativ(GLenum target, GLenum 
> internalformat, GLenum pname,
> * format for representing resources of the specified 
>  is
> * returned in .
> *
> -   * Therefore, we let the driver answer.
> +   * Therefore, we let the driver answer. Note that if we reach this
> +   * point, it means that the internalformat is supported, so the driver
> +   * is called just to try to get a preferred format. If not supported,
> +   * GL_NONE was already returned and the driver is not called.
> */
>ctx->Driver.QueryInternalFormat(ctx, target, internalformat, pname,
>buffer);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965/formatquery: remove INTERNALFORMAT_PREFERRED implementation

2016-06-01 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

On 05/13/2016 07:57 AM, Alejandro Piñeiro wrote:
> Right now the implementation only checks if the internalformat is
> supported or not. But that implementation is wrong, returning
> unsupported for some internalformats. Additionally, checking if
> the internalformat is supported or not is already done at mesa/main
> before calling the driver hook, so this new check is not needed.
> 
> Acked-by: Eduardo Lima 
> Acked-by: Antia Puentes 
> ---
>  src/mesa/drivers/dri/i965/brw_formatquery.c | 71 
> -
>  1 file changed, 71 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_formatquery.c 
> b/src/mesa/drivers/dri/i965/brw_formatquery.c
> index 210109b..8f7a910 100644
> --- a/src/mesa/drivers/dri/i965/brw_formatquery.c
> +++ b/src/mesa/drivers/dri/i965/brw_formatquery.c
> @@ -65,46 +65,6 @@ brw_query_samples_for_format(struct gl_context *ctx, 
> GLenum target,
> }
>  }
>  
> -/**
> - * Returns a generic GL type from an internal format, so that it can be used
> - * together with the base format to obtain a mesa_format by calling
> - * mesa_format_from_format_and_type().
> - */
> -static GLenum
> -get_generic_type_for_internal_format(GLenum internalFormat)
> -{
> -   if (_mesa_is_color_format(internalFormat)) {
> -  if (_mesa_is_enum_format_unsigned_int(internalFormat))
> - return GL_UNSIGNED_BYTE;
> -  else if (_mesa_is_enum_format_signed_int(internalFormat))
> - return GL_BYTE;
> -   } else {
> -  switch (internalFormat) {
> -  case GL_STENCIL_INDEX:
> -  case GL_STENCIL_INDEX8:
> - return GL_UNSIGNED_BYTE;
> -  case GL_DEPTH_COMPONENT:
> -  case GL_DEPTH_COMPONENT16:
> - return GL_UNSIGNED_SHORT;
> -  case GL_DEPTH_COMPONENT24:
> -  case GL_DEPTH_COMPONENT32:
> - return GL_UNSIGNED_INT;
> -  case GL_DEPTH_COMPONENT32F:
> - return GL_FLOAT;
> -  case GL_DEPTH_STENCIL:
> -  case GL_DEPTH24_STENCIL8:
> - return GL_UNSIGNED_INT_24_8;
> -  case GL_DEPTH32F_STENCIL8:
> - return GL_FLOAT_32_UNSIGNED_INT_24_8_REV;
> -  default:
> - /* fall-through */
> - break;
> -  }
> -   }
> -
> -   return GL_FLOAT;
> -}
> -
>  void
>  brw_query_internal_format(struct gl_context *ctx, GLenum target,
>GLenum internalFormat, GLenum pname, GLint *params)
> @@ -129,37 +89,6 @@ brw_query_internal_format(struct gl_context *ctx, GLenum 
> target,
>break;
> }
>  
> -   case GL_INTERNALFORMAT_PREFERRED: {
> -  params[0] = GL_NONE;
> -
> -  /* We need to resolve an internal format that is compatible with
> -   * the passed internal format, and optimal to the driver. By now,
> -   * we just validate that the passed internal format is supported by
> -   * the driver, and if so return the same internal format, otherwise
> -   * return GL_NONE.
> -   *
> -   * For validating the internal format, we use the
> -   * ctx->TextureFormatSupported map to check that a BRW surface format
> -   * exists, that can be derived from the internal format. But this
> -   * expects a mesa_format, not an internal format. So we need to "come 
> up"
> -   * with a type that is generic enough, to resolve the mesa_format 
> first.
> -   */
> -  GLenum type = get_generic_type_for_internal_format(internalFormat);
> -
> -  /* Get a mesa_format from the internal format and type. */
> -  GLint base_format = _mesa_base_tex_format(ctx, internalFormat);
> -  if (base_format != -1) {
> - mesa_format mesa_format =
> -_mesa_format_from_format_and_type(base_format, type);
> -
> - if (mesa_format < MESA_FORMAT_COUNT &&
> - ctx->TextureFormatSupported[mesa_format]) {
> -params[0] = internalFormat;
> - }
> -  }
> -  break;
> -   }
> -
> default:
>/* By default, we call the driver hook's fallback function from the 
> frontend,
> * which has generic implementation for all pnames.
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v11] mesa: Add MESA_SHADER_CAPTURE_PATH for writing .shader_test files.

2016-06-01 Thread Kenneth Graunke
This writes linked shader programs to .shader_test files to
$MESA_SHADER_CAPTURE_PATH in the format used by shader-db
(http://cgit.freedesktop.org/mesa/shader-db).

It supports both GLSL shaders and ARB programs.  All stages that
are linked together are written in a single .shader_test file.

This eliminates the need for shader-db's split-to-files.py, as Mesa
produces the desired format directly.  It's much more reliable than
parsing stdout/stderr, as those may contain extraneous messages, or
simply be closed by the application and unavailable.

We have many similar features already, but this is a bit different:
- MESA_GLSL=dump writes to stdout, not files.
- MESA_GLSL=log writes each stage to separate files (rather than
  all linked shaders in one file), at draw time (not link time),
  with uniform data and state flag info.
- Tapani's shader replacement mechanism (MESA_SHADER_DUMP_PATH and
  MESA_SHADER_READ_PATH) also uses separate files per shader stage,
  but allows reading in files to replace an app's shader code.

v2:  Dump ARB programs too, not just GLSL.
v3:  Don't dump bogus 0.shader_test file.
v4:  Add "GL_ARB_separate_shader_objects" to the [require] block.
v5:  Print "GLSL 4.00" instead of "GLSL 4.0" in the [require] block.
v6:  Don't hardcode /tmp/mesa.
v7:  Fix memoization of getenv().
v8:  Also print "SSO ENABLED" (suggested by Timothy).
v9:  Also handle ES shaders (suggested by Ilia).
v10: Guard against MESA_SHADER_CAPTURE_PATH being too long; add
 _mesa_warning calls on error handling (suggested by Ben).
v11: Fix crash when variable is unset introduced in v10.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/main/arbprogram.c | 22 
 src/mesa/main/mtypes.h |  1 -
 src/mesa/main/shaderapi.c  | 52 ++
 src/mesa/main/shaderapi.h  |  3 +++
 4 files changed, 77 insertions(+), 1 deletion(-)

Brown paper bag release. :(

diff --git a/src/mesa/main/arbprogram.c b/src/mesa/main/arbprogram.c
index 3f7acda..c0786d4 100644
--- a/src/mesa/main/arbprogram.c
+++ b/src/mesa/main/arbprogram.c
@@ -36,6 +36,7 @@
 #include "main/macros.h"
 #include "main/mtypes.h"
 #include "main/arbprogram.h"
+#include "main/shaderapi.h"
 #include "program/arbprogparse.h"
 #include "program/program.h"
 #include "program/prog_print.h"
@@ -378,6 +379,27 @@ _mesa_ProgramStringARB(GLenum target, GLenum format, 
GLsizei len,
   }
   fflush(stderr);
}
+
+   /* Capture vp-*.shader_test/fp-*.shader_test files. */
+   const char *capture_path = _mesa_get_shader_capture_path();
+   if (capture_path != NULL) {
+  FILE *file;
+  char filename[PATH_MAX];
+  const char *shader_type =
+ target == GL_FRAGMENT_PROGRAM_ARB ? "fragment" : "vertex";
+
+  _mesa_snprintf(filename, sizeof(filename), "%s/%cp-%u.shader_test",
+ capture_path, shader_type[0], base->Id);
+  file = fopen(filename, "w");
+  if (file) {
+ fprintf(file,
+ "[require]\nGL_ARB_%s_program\n\n[%s program]\n%s\n",
+ shader_type, shader_type, (const char *) string);
+ fclose(file);
+  } else {
+ _mesa_warning(ctx, "Failed to open %s", filename);
+  }
+   }
 }
 
 
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 2233526..2d06755 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2882,7 +2882,6 @@ struct gl_shader_program
 #define GLSL_REPORT_ERRORS 0x100  /**< Print compilation errors */
 #define GLSL_DUMP_ON_ERROR 0x200 /**< Dump shaders to stderr on compile error 
*/
 
-
 /**
  * Context state for GLSL vertex/fragment shaders.
  * Extended to support pipeline object
diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index 167e06f..eb6b1f5 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -96,6 +96,29 @@ _mesa_get_shader_flags(void)
return flags;
 }
 
+/**
+ * Memoized version of getenv("MESA_SHADER_CAPTURE_PATH").
+ */
+const char *
+_mesa_get_shader_capture_path(void)
+{
+   static bool read_env_var = false;
+   static const char *path = NULL;
+
+   if (!read_env_var) {
+  path = getenv("MESA_SHADER_CAPTURE_PATH");
+  read_env_var = true;
+  if (path &&
+  strlen(path) > PATH_MAX - strlen("/fp-4294967295.shader_test")) {
+ GET_CURRENT_CONTEXT(ctx);
+ _mesa_warning(ctx, "MESA_SHADER_CAPTURE_PATH too long; ignoring "
+"request to capture shaders");
+ path = NULL;
+  }
+   }
+
+   return path;
+}
 
 /**
  * Initialize context's shader state.
@@ -1046,6 +1069,35 @@ _mesa_link_program(struct gl_context *ctx, struct 
gl_shader_program *shProg)
 
_mesa_glsl_link_shader(ctx, shProg);
 
+   /* Capture .shader_test files. */
+   const char *capture_path = _mesa_get_shader_capture_path();
+   if (shProg->Name != 0 && capture_path != NULL) {
+  FILE *file;
+  char filename[PATH_MAX];
+
+  

Re: [Mesa-dev] [PATCH] egl: Account for default values of texture target and format

2016-06-01 Thread Anuj Phogat
On Wed, Jun 1, 2016 at 9:31 AM, Plamena Manolova
 wrote:
>
> When validating attributes during surface creation we should account
> for the default values of texture target and format (EGL_NO_TEXTURE)
> since the user is not obligated to explicitly set both via the
> attribute list passed to eglCreatePbufferSurface.
>
> Signed-off-by: Plamena Manolova 
> ---
>  src/egl/main/eglsurface.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/src/egl/main/eglsurface.c b/src/egl/main/eglsurface.c
> index 17d7907..99e24dd 100644
> --- a/src/egl/main/eglsurface.c
> +++ b/src/egl/main/eglsurface.c
> @@ -236,6 +236,12 @@ _eglParseSurfaceAttribList(_EGLSurface *surf, const 
> EGLint *attrib_list)
>}
>
>if (type == EGL_PBUFFER_BIT) {
> + if (tex_target == -1)
> +tex_target = surf->TextureTarget;
> +
> + if (tex_format == -1)
> +tex_format = surf->TextureFormat;
> +
>   if ((tex_target == EGL_NO_TEXTURE && tex_format != EGL_NO_TEXTURE) 
> ||
>   (tex_format == EGL_NO_TEXTURE && tex_target != EGL_NO_TEXTURE)) 
> {
>  err = EGL_BAD_MATCH;
> --
> 2.7.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Looks good to me.

Reviewed-by: Anuj Phogat 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/10] r600g: don't flush caches when binding shader resources

2016-06-01 Thread Marek Olšák
From: Marek Olšák 

Reviewed-by: Alex Deucher 
---
 src/gallium/drivers/r600/evergreen_state.c   | 25 +++--
 src/gallium/drivers/r600/r600_hw_context.c   |  4 
 src/gallium/drivers/r600/r600_state.c| 25 +++--
 src/gallium/drivers/r600/r600_state_common.c |  3 ---
 4 files changed, 26 insertions(+), 31 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 36d3b4b..1ac8914 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1287,20 +1287,17 @@ static void evergreen_set_framebuffer_state(struct 
pipe_context *ctx,
struct r600_texture *rtex;
uint32_t i, log_samples;
 
-   if (rctx->framebuffer.state.nr_cbufs) {
-   rctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE | 
R600_CONTEXT_FLUSH_AND_INV;
-   rctx->b.flags |= R600_CONTEXT_FLUSH_AND_INV_CB |
-R600_CONTEXT_FLUSH_AND_INV_CB_META;
-   }
-   if (rctx->framebuffer.state.zsbuf) {
-   rctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE | 
R600_CONTEXT_FLUSH_AND_INV;
-   rctx->b.flags |= R600_CONTEXT_FLUSH_AND_INV_DB;
-
-   rtex = (struct 
r600_texture*)rctx->framebuffer.state.zsbuf->texture;
-   if (rtex->htile_buffer) {
-   rctx->b.flags |= R600_CONTEXT_FLUSH_AND_INV_DB_META;
-   }
-   }
+   /* Flush TC when changing the framebuffer state, because the only
+* client not using TC that can change textures is the framebuffer.
+* Other places don't typically have to flush TC.
+*/
+   rctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE |
+R600_CONTEXT_FLUSH_AND_INV |
+R600_CONTEXT_FLUSH_AND_INV_CB |
+R600_CONTEXT_FLUSH_AND_INV_CB_META |
+R600_CONTEXT_FLUSH_AND_INV_DB |
+R600_CONTEXT_FLUSH_AND_INV_DB_META |
+R600_CONTEXT_INV_TEX_CACHE;
 
util_copy_framebuffer_state(>framebuffer.state, state);
 
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 04c3cb2..71c2dbb 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -97,6 +97,10 @@ void r600_flush_emit(struct r600_context *rctx)
return;
}
 
+   /* Ensure coherency between streamout and shaders. */
+   if (rctx->b.flags & R600_CONTEXT_STREAMOUT_FLUSH)
+   rctx->b.flags |= r600_get_flush_flags(R600_COHERENCY_SHADER);
+
if (rctx->b.flags & R600_CONTEXT_WAIT_3D_IDLE) {
wait_until |= S_008040_WAIT_3D_IDLE(1);
}
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 9a33ab9..cf7f0b3 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -1099,20 +1099,17 @@ static void r600_set_framebuffer_state(struct 
pipe_context *ctx,
struct r600_texture *rtex;
unsigned i;
 
-   if (rctx->framebuffer.state.nr_cbufs) {
-   rctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE | 
R600_CONTEXT_FLUSH_AND_INV;
-   rctx->b.flags |= R600_CONTEXT_FLUSH_AND_INV_CB |
-R600_CONTEXT_FLUSH_AND_INV_CB_META;
-   }
-   if (rctx->framebuffer.state.zsbuf) {
-   rctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE | 
R600_CONTEXT_FLUSH_AND_INV;
-   rctx->b.flags |= R600_CONTEXT_FLUSH_AND_INV_DB;
-
-   rtex = (struct 
r600_texture*)rctx->framebuffer.state.zsbuf->texture;
-   if (rctx->b.chip_class >= R700 && rtex->htile_buffer) {
-   rctx->b.flags |= R600_CONTEXT_FLUSH_AND_INV_DB_META;
-   }
-   }
+   /* Flush TC when changing the framebuffer state, because the only
+* client not using TC that can change textures is the framebuffer.
+* Other places don't typically have to flush TC.
+*/
+   rctx->b.flags |= R600_CONTEXT_WAIT_3D_IDLE |
+R600_CONTEXT_FLUSH_AND_INV |
+R600_CONTEXT_FLUSH_AND_INV_CB |
+R600_CONTEXT_FLUSH_AND_INV_CB_META |
+R600_CONTEXT_FLUSH_AND_INV_DB |
+R600_CONTEXT_FLUSH_AND_INV_DB_META |
+R600_CONTEXT_INV_TEX_CACHE;
 
/* Set the new state. */
util_copy_framebuffer_state(>framebuffer.state, state);
diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index b3814fb..3f8c7b2 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -533,7 +533,6 @@ static void r600_set_index_buffer(struct 

[Mesa-dev] [PATCH 05/10] r600g: fix CP DMA hazard with index buffer fetches (v3)

2016-06-01 Thread Marek Olšák
From: Marek Olšák 

v3: use PFP_SYNC_ME on EG-CM only when supported by the kernel,
otherwise use MEM_WRITE + WAIT_REG_MEM to emulate that
---
 src/gallium/drivers/r600/evergreen_hw_context.c | 16 --
 src/gallium/drivers/r600/evergreend.h   |  1 +
 src/gallium/drivers/r600/r600_blit.c|  2 +-
 src/gallium/drivers/r600/r600_hw_context.c  | 69 -
 src/gallium/drivers/r600/r600_pipe.h|  5 +-
 src/gallium/drivers/r600/r600d.h|  5 ++
 src/gallium/drivers/radeonsi/sid.h  |  2 +-
 7 files changed, 93 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
b/src/gallium/drivers/r600/evergreen_hw_context.c
index f456696..2feb801 100644
--- a/src/gallium/drivers/r600/evergreen_hw_context.c
+++ b/src/gallium/drivers/r600/evergreen_hw_context.c
@@ -85,7 +85,8 @@ void evergreen_dma_copy_buffer(struct r600_context *rctx,
 
 void evergreen_cp_dma_clear_buffer(struct r600_context *rctx,
   struct pipe_resource *dst, uint64_t offset,
-  unsigned size, uint32_t clear_value)
+  unsigned size, uint32_t clear_value,
+  enum r600_coherency coher)
 {
struct radeon_winsys_cs *cs = rctx->b.gfx.cs;
 
@@ -117,7 +118,9 @@ void evergreen_cp_dma_clear_buffer(struct r600_context 
*rctx,
unsigned byte_count = MIN2(size, CP_DMA_MAX_BYTE_COUNT);
unsigned reloc;
 
-   r600_need_cs_space(rctx, 10 + (rctx->b.flags ? 
R600_MAX_FLUSH_CS_DWORDS : 0), FALSE);
+   r600_need_cs_space(rctx,
+  10 + (rctx->b.flags ? 
R600_MAX_FLUSH_CS_DWORDS : 0) +
+  R600_MAX_PFP_SYNC_ME_DWORDS, FALSE);
 
/* Flush the caches for the first copy only. */
if (rctx->b.flags) {
@@ -148,9 +151,16 @@ void evergreen_cp_dma_clear_buffer(struct r600_context 
*rctx,
offset += byte_count;
}
 
+   /* CP DMA is executed in ME, but index buffers are read by PFP.
+* This ensures that ME (CP DMA) is idle before PFP starts fetching
+* indices. If we wanted to execute CP DMA in PFP, this packet
+* should precede it.
+*/
+   if (coher == R600_COHERENCY_SHADER)
+   r600_emit_pfp_sync_me(rctx);
+
/* Invalidate the read caches. */
rctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE |
 R600_CONTEXT_INV_VERTEX_CACHE |
 R600_CONTEXT_INV_TEX_CACHE;
 }
-
diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index c1c6169..a81b6c5 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -88,6 +88,7 @@
 #defineWAIT_REG_MEM_EQUAL  3
 #define PKT3_MEM_WRITE 0x3D
 #define PKT3_INDIRECT_BUFFER   0x32
+#define PKT3_PFP_SYNC_ME  0x42
 #define PKT3_SURFACE_SYNC  0x43
 #define PKT3_ME_INITIALIZE 0x44
 #define PKT3_COND_WRITE0x45
diff --git a/src/gallium/drivers/r600/r600_blit.c 
b/src/gallium/drivers/r600/r600_blit.c
index 282645f..76c3364 100644
--- a/src/gallium/drivers/r600/r600_blit.c
+++ b/src/gallium/drivers/r600/r600_blit.c
@@ -589,7 +589,7 @@ static void r600_clear_buffer(struct pipe_context *ctx, 
struct pipe_resource *ds
if (rctx->screen->b.has_cp_dma &&
rctx->b.chip_class >= EVERGREEN &&
offset % 4 == 0 && size % 4 == 0) {
-   evergreen_cp_dma_clear_buffer(rctx, dst, offset, size, value);
+   evergreen_cp_dma_clear_buffer(rctx, dst, offset, size, value, 
coher);
} else if (rctx->screen->b.has_streamout && offset % 4 == 0 && size % 4 
== 0) {
union pipe_color_union clear_value;
clear_value.ui[0] = value;
diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index bbfe620..1ae3f04 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -365,6 +365,66 @@ void r600_begin_new_cs(struct r600_context *ctx)
ctx->b.initial_gfx_cs_size = ctx->b.gfx.cs->cdw;
 }
 
+void r600_emit_pfp_sync_me(struct r600_context *rctx)
+{
+   struct radeon_winsys_cs *cs = rctx->b.gfx.cs;
+
+   if (rctx->b.chip_class >= EVERGREEN &&
+   rctx->b.screen->info.drm_minor >= 46) {
+   radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
+   radeon_emit(cs, 0);
+   } else {
+   /* Emulate PFP_SYNC_ME by writing a value to memory in ME and
+* waiting for it in PFP.
+*/
+   struct r600_resource *buf = NULL;
+   unsigned offset, reloc;
+ 

Re: [Mesa-dev] [PATCH 0/5] ARB_internalformat_query2 support for OpenGL ES and other fixes

2016-06-01 Thread Ian Romanick
On 06/01/2016 11:35 AM, Alejandro Piñeiro wrote:
> On 13/05/16 23:20, Alejandro Piñeiro wrote:
>>
>> On 13/05/16 17:06, Ilia Mirkin wrote:
>>> On Fri, May 13, 2016 at 10:57 AM, Alejandro Piñeiro
>>>  wrote:
 Earlier this year the support for ARB_internalformat_query2 has landed
 [1][2], initially only for desktop GL.

 But looking more carefully to the spec [3], we found the following:

 "Dependencies

  OpenGL 2.0 or OpenGL ES 2.0 is required"

 Note the *or*. Additionally the spec list other GL ES 2.0/3.0
 dependencies. So that means that the extension can be also applied to
 GL ES 2.0/3.0. FWIW, this mistake is common, as it also happens with
 the khronos registry xml (khronos bug created [4]).
>>> Are you sure it's not a mistake the other way? There's no ES extension
>>> number allocated, and no vendor drivers expose this ext on ES, and
>>> this would be the first GL_ARB_* ext to be exposed in ES... normally
>>> these become GL_OES_bla or GL_KHR_bla.
>> Seems that you were right:
>> https://www.khronos.org/bugzilla/show_bug.cgi?id=1498#c1
>>
>> Although then I don't understand why ARB_internalformat_query2 has those
>> dependencies to OpenGL ES 2.0/3.x and OES extensions:
>> https://www.khronos.org/bugzilla/show_bug.cgi?id=1498#c2
> 
> I didn't get an answer to my last questions on the khronos bug. Taking
> into account IRC comments, it is usual to be slow. In any case, the
> first answer seems to be clear, and ARB_ extensions are not intended to
> be exposed on OpenGL ES, and it seems that ARB_internalformat_query2 is
> not an exception, even if the specification defines the behaviour of the
> extension for OpenGL ES2/ES2.  So at this point, we should just move

I believe the decision was to remove OpenGL ES from the extension.  The
extension was originally targeted as KHR, but, for reasons I don't
recall, that didn't work out.  The bits in the spec about OpenGL ES were
just leftovers from that.

> forward. In my opinion we should just forget about the patch5 of the
> series, the one that exposes the extension, and we should review and
> push the first 4 patches:
>   * patch1 and patch2 affects desktop gl too (fixes some cases for
> INTERNALFORMAT_PREFERRED)
>   * patch3 just expands a comment
>   * patch4 makes _mesa_base_tex_format to take into account opengl es
> too: although it was implemented just for the needs of this extension, I
> still think that it makes sense to do that. And after all, that method
> already have some OpenGL ES checks. The patch just add more of them. In
> any case, this is somewhat more optional.
> 
> Opinions?

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/10] r600g: only do necessary cache flushes in cp_dma_clear_buffer

2016-06-01 Thread Marek Olšák
From: Marek Olšák 

The main impact is that fast color clear doesn't flush TC, CONST, DB.

Reviewed-by: Alex Deucher 
---
 src/gallium/drivers/r600/evergreen_hw_context.c | 15 +--
 src/gallium/drivers/r600/r600_pipe.h| 17 +
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c 
b/src/gallium/drivers/r600/evergreen_hw_context.c
index 2feb801..06f0348 100644
--- a/src/gallium/drivers/r600/evergreen_hw_context.c
+++ b/src/gallium/drivers/r600/evergreen_hw_context.c
@@ -102,15 +102,7 @@ void evergreen_cp_dma_clear_buffer(struct r600_context 
*rctx,
offset += r600_resource(dst)->gpu_address;
 
/* Flush the cache where the resource is bound. */
-   rctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE |
-R600_CONTEXT_INV_VERTEX_CACHE |
-R600_CONTEXT_INV_TEX_CACHE |
-R600_CONTEXT_FLUSH_AND_INV |
-R600_CONTEXT_FLUSH_AND_INV_CB |
-R600_CONTEXT_FLUSH_AND_INV_DB |
-R600_CONTEXT_FLUSH_AND_INV_CB_META |
-R600_CONTEXT_FLUSH_AND_INV_DB_META |
-R600_CONTEXT_STREAMOUT_FLUSH |
+   rctx->b.flags |= r600_get_flush_flags(coher) |
 R600_CONTEXT_WAIT_3D_IDLE;
 
while (size) {
@@ -158,9 +150,4 @@ void evergreen_cp_dma_clear_buffer(struct r600_context 
*rctx,
 */
if (coher == R600_COHERENCY_SHADER)
r600_emit_pfp_sync_me(rctx);
-
-   /* Invalidate the read caches. */
-   rctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE |
-R600_CONTEXT_INV_VERTEX_CACHE |
-R600_CONTEXT_INV_TEX_CACHE;
 }
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 58ab14c..8ae760f 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -932,6 +932,23 @@ static inline bool r600_can_read_depth(struct r600_texture 
*rtex)
rtex->resource.b.b.format == PIPE_FORMAT_Z32_FLOAT);
 }
 
+static inline unsigned r600_get_flush_flags(enum r600_coherency coher)
+{
+   switch (coher) {
+   default:
+   case R600_COHERENCY_NONE:
+   return 0;
+   case R600_COHERENCY_SHADER:
+   return R600_CONTEXT_INV_CONST_CACHE |
+  R600_CONTEXT_INV_VERTEX_CACHE |
+  R600_CONTEXT_INV_TEX_CACHE |
+  R600_CONTEXT_STREAMOUT_FLUSH;
+   case R600_COHERENCY_CB_META:
+   return R600_CONTEXT_FLUSH_AND_INV_CB |
+  R600_CONTEXT_FLUSH_AND_INV_CB_META;
+   }
+}
+
 #define V_028A6C_OUTPRIM_TYPE_POINTLIST0
 #define V_028A6C_OUTPRIM_TYPE_LINESTRIP1
 #define V_028A6C_OUTPRIM_TYPE_TRISTRIP 2
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/10] gallium/radeon: don't use the DMA ring for pipelined buffer uploads

2016-06-01 Thread Marek Olšák
From: Marek Olšák 

Submitting a DMA IB flushes the GFX IB and all GPU caches.

Reviewed-by: Alex Deucher 
---
 src/gallium/drivers/radeon/r600_buffer_common.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
b/src/gallium/drivers/radeon/r600_buffer_common.c
index 145cc9f..a47aa78 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -368,9 +368,9 @@ static void *r600_buffer_transfer_map(struct pipe_context 
*ctx,
box->width + (box->x % 
R600_MAP_BUFFER_ALIGNMENT));
if (staging) {
/* Copy the VRAM buffer to the staging buffer. */
-   rctx->dma_copy(ctx, >b.b, 0,
-  box->x % R600_MAP_BUFFER_ALIGNMENT,
-  0, 0, resource, level, box);
+   ctx->resource_copy_region(ctx, >b.b, 0,
+ box->x % 
R600_MAP_BUFFER_ALIGNMENT,
+ 0, 0, resource, level, box);
 
data = r600_buffer_map_sync_with_rings(rctx, staging, 
PIPE_TRANSFER_READ);
if (!data) {
@@ -398,7 +398,6 @@ static void r600_buffer_do_flush_region(struct pipe_context 
*ctx,
struct pipe_transfer *transfer,
const struct pipe_box *box)
 {
-   struct r600_common_context *rctx = (struct r600_common_context*)ctx;
struct r600_transfer *rtransfer = (struct r600_transfer*)transfer;
struct r600_resource *rbuffer = r600_resource(transfer->resource);
 
@@ -414,7 +413,7 @@ static void r600_buffer_do_flush_region(struct pipe_context 
*ctx,
u_box_1d(soffset, box->width, _box);
 
/* Copy the staging buffer into the original one. */
-   rctx->dma_copy(ctx, dst, 0, box->x, 0, 0, src, 0, _box);
+   ctx->resource_copy_region(ctx, dst, 0, box->x, 0, 0, src, 0, 
_box);
}
 
util_range_add(>valid_buffer_range, box->x,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/10] r600g: remove a CP DMA workaround that's not needed anymore

2016-06-01 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/r600/r600_blit.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_blit.c 
b/src/gallium/drivers/r600/r600_blit.c
index 76c3364..c9d7823 100644
--- a/src/gallium/drivers/r600/r600_blit.c
+++ b/src/gallium/drivers/r600/r600_blit.c
@@ -519,12 +519,6 @@ static void r600_copy_buffer(struct pipe_context *ctx, 
struct pipe_resource *dst
} else {
util_resource_copy_region(ctx, dst, 0, dstx, 0, 0, src, 0, 
src_box);
}
-
-   /* The index buffer (VGT) doesn't seem to see the result of the copying.
-* Can we somehow flush the index buffer cache? Starting a new IB seems
-* to do the trick. */
-   if (rctx->b.chip_class <= R700)
-   rctx->b.gfx.flush(ctx, RADEON_FLUSH_ASYNC, NULL);
 }
 
 /**
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/10] r600g: write WAIT_UNTIL in the correct place

2016-06-01 Thread Marek Olšák
From: Marek Olšák 

This has been wrong all along. Fixing this will allow removing useless
cache flushes.

Cc: 11.1 11.2 12.0 
---
 src/gallium/drivers/r600/r600_hw_context.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 1f7bed8..98b5c7c 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -112,11 +112,22 @@ void r600_flush_emit(struct r600_context *rctx)
}
}
 
+   /* Wait packets must be executed first, because SURFACE_SYNC doesn't
+* wait for shaders if it's not flushing CB or DB.
+*/
if (rctx->b.flags & R600_CONTEXT_PS_PARTIAL_FLUSH) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_PS_PARTIAL_FLUSH) | 
EVENT_INDEX(4));
}
 
+   if (wait_until) {
+   /* Use of WAIT_UNTIL is deprecated on Cayman+ */
+   if (rctx->b.family < CHIP_CAYMAN) {
+   /* wait for things to settle */
+   radeon_set_config_reg(cs, R_008040_WAIT_UNTIL, 
wait_until);
+   }
+   }
+
if (rctx->b.chip_class >= R700 &&
(rctx->b.flags & R600_CONTEXT_FLUSH_AND_INV_CB_META)) {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
@@ -230,14 +241,6 @@ void r600_flush_emit(struct r600_context *rctx)
EVENT_INDEX(0));
}
 
-   if (wait_until) {
-   /* Use of WAIT_UNTIL is deprecated on Cayman+ */
-   if (rctx->b.family < CHIP_CAYMAN) {
-   /* wait for things to settle */
-   radeon_set_config_reg(cs, R_008040_WAIT_UNTIL, 
wait_until);
-   }
-   }
-
/* everything is properly flushed */
rctx->b.flags = 0;
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/10] gallium/u_suballoc: allow different alignment for each allocation

2016-06-01 Thread Marek Olšák
From: Marek Olšák 

Just move the alignment parameter from u_suballocator_create
to u_suballocator_alloc.
---
 src/gallium/auxiliary/util/u_suballoc.c   | 22 ++
 src/gallium/auxiliary/util/u_suballoc.h   |  6 +++---
 src/gallium/drivers/r600/r600_asm.c   |  3 ++-
 src/gallium/drivers/r600/r600_pipe.c  |  2 +-
 src/gallium/drivers/radeon/r600_pipe_common.c |  2 +-
 src/gallium/drivers/radeon/r600_streamout.c   |  2 +-
 src/gallium/drivers/radeonsi/si_descriptors.c |  2 +-
 src/gallium/drivers/radeonsi/si_pipe.c|  2 +-
 8 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_suballoc.c 
b/src/gallium/auxiliary/util/u_suballoc.c
index 3f9ede0..5aaddbc 100644
--- a/src/gallium/auxiliary/util/u_suballoc.c
+++ b/src/gallium/auxiliary/util/u_suballoc.c
@@ -41,7 +41,6 @@ struct u_suballocator {
struct pipe_context *pipe;
 
unsigned size;  /* Size of the whole buffer, in bytes. */
-   unsigned alignment; /* Alignment of each sub-allocation. */
unsigned bind;  /* Bitmask of PIPE_BIND_* flags. */
enum pipe_resource_usage usage;
boolean zero_buffer_memory; /* If the buffer contents should be zeroed. */
@@ -58,8 +57,7 @@ struct u_suballocator {
  * cleared to 0 after the allocation.
  */
 struct u_suballocator *
-u_suballocator_create(struct pipe_context *pipe, unsigned size,
-  unsigned alignment, unsigned bind,
+u_suballocator_create(struct pipe_context *pipe, unsigned size, unsigned bind,
   enum pipe_resource_usage usage,
  boolean zero_buffer_memory)
 {
@@ -68,8 +66,7 @@ u_suballocator_create(struct pipe_context *pipe, unsigned 
size,
   return NULL;
 
allocator->pipe = pipe;
-   allocator->size = align(size, alignment);
-   allocator->alignment = alignment;
+   allocator->size = size;
allocator->bind = bind;
allocator->usage = usage;
allocator->zero_buffer_memory = zero_buffer_memory;
@@ -85,17 +82,18 @@ u_suballocator_destroy(struct u_suballocator *allocator)
 
 void
 u_suballocator_alloc(struct u_suballocator *allocator, unsigned size,
- unsigned *out_offset, struct pipe_resource **outbuf)
+ unsigned alignment, unsigned *out_offset,
+ struct pipe_resource **outbuf)
 {
-   unsigned alloc_size = align(size, allocator->alignment);
+   allocator->offset = align(allocator->offset, alignment);
 
/* Don't allow allocations larger than the buffer size. */
-   if (alloc_size > allocator->size)
+   if (size > allocator->size)
   goto fail;
 
/* Make sure we have enough space in the buffer. */
if (!allocator->buffer ||
-   allocator->offset + alloc_size > allocator->size) {
+   allocator->offset + size > allocator->size) {
   /* Allocate a new buffer. */
   pipe_resource_reference(>buffer, NULL);
   allocator->offset = 0;
@@ -117,15 +115,15 @@ u_suballocator_alloc(struct u_suballocator *allocator, 
unsigned size,
   }
}
 
-   assert(allocator->offset % allocator->alignment == 0);
+   assert(allocator->offset % alignment == 0);
assert(allocator->offset < allocator->buffer->width0);
-   assert(allocator->offset + alloc_size <= allocator->buffer->width0);
+   assert(allocator->offset + size <= allocator->buffer->width0);
 
/* Return the buffer. */
*out_offset = allocator->offset;
pipe_resource_reference(outbuf, allocator->buffer);
 
-   allocator->offset += alloc_size;
+   allocator->offset += size;
return;
 
 fail:
diff --git a/src/gallium/auxiliary/util/u_suballoc.h 
b/src/gallium/auxiliary/util/u_suballoc.h
index 5f9ccde..fb08f16 100644
--- a/src/gallium/auxiliary/util/u_suballoc.h
+++ b/src/gallium/auxiliary/util/u_suballoc.h
@@ -34,8 +34,7 @@
 struct u_suballocator;
 
 struct u_suballocator *
-u_suballocator_create(struct pipe_context *pipe, unsigned size,
-  unsigned alignment, unsigned bind,
+u_suballocator_create(struct pipe_context *pipe, unsigned size, unsigned bind,
   enum pipe_resource_usage usage,
  boolean zero_buffer_memory);
 
@@ -44,6 +43,7 @@ u_suballocator_destroy(struct u_suballocator *allocator);
 
 void
 u_suballocator_alloc(struct u_suballocator *allocator, unsigned size,
- unsigned *out_offset, struct pipe_resource **outbuf);
+ unsigned alignment, unsigned *out_offset,
+ struct pipe_resource **outbuf);
 
 #endif
diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index c48d758..2141cf2 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -2615,7 +2615,8 @@ void *r600_create_vertex_fetch_shader(struct pipe_context 
*ctx,
return NULL;
}
 
-   u_suballocator_alloc(rctx->allocator_fetch_shader, fs_size, 
>offset,
+   

[Mesa-dev] [PATCH 08/10] r600g: only do necessary cache flushes in cp_dma_copy_buffer

2016-06-01 Thread Marek Olšák
From: Marek Olšák 

The main impact is that {upload, draw, upload, draw, ..} doesn't flush
framebuffer caches before every upload.

Reviewed-by: Alex Deucher 
---
 src/gallium/drivers/r600/r600_hw_context.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 1ae3f04..04c3cb2 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -448,15 +448,7 @@ void r600_cp_dma_copy_buffer(struct r600_context *rctx,
src_offset += r600_resource(src)->gpu_address;
 
/* Flush the caches where the resources are bound. */
-   rctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE |
-R600_CONTEXT_INV_VERTEX_CACHE |
-R600_CONTEXT_INV_TEX_CACHE |
-R600_CONTEXT_FLUSH_AND_INV |
-R600_CONTEXT_FLUSH_AND_INV_CB |
-R600_CONTEXT_FLUSH_AND_INV_DB |
-R600_CONTEXT_FLUSH_AND_INV_CB_META |
-R600_CONTEXT_FLUSH_AND_INV_DB_META |
-R600_CONTEXT_STREAMOUT_FLUSH |
+   rctx->b.flags |= r600_get_flush_flags(R600_COHERENCY_SHADER) |
 R600_CONTEXT_WAIT_3D_IDLE;
 
/* There are differences between R700 and EG in CP DMA,
@@ -514,11 +506,6 @@ void r600_cp_dma_copy_buffer(struct r600_context *rctx,
 * should precede it.
 */
r600_emit_pfp_sync_me(rctx);
-
-   /* Invalidate the read caches. */
-   rctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE |
-R600_CONTEXT_INV_VERTEX_CACHE |
-R600_CONTEXT_INV_TEX_CACHE;
 }
 
 void r600_dma_copy_buffer(struct r600_context *rctx,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/10] r600g: properly sync CP with CP DMA on R6xx

2016-06-01 Thread Marek Olšák
From: Marek Olšák 

This will allow removing useless cache & IB flushes.
---
 src/gallium/drivers/r600/r600_hw_context.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c 
b/src/gallium/drivers/r600/r600_hw_context.c
index 98b5c7c..bbfe620 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -406,7 +406,9 @@ void r600_cp_dma_copy_buffer(struct r600_context *rctx,
unsigned byte_count = MIN2(size, CP_DMA_MAX_BYTE_COUNT);
unsigned src_reloc, dst_reloc;
 
-   r600_need_cs_space(rctx, 10 + (rctx->b.flags ? 
R600_MAX_FLUSH_CS_DWORDS : 0), FALSE);
+   r600_need_cs_space(rctx,
+  10 + (rctx->b.flags ? 
R600_MAX_FLUSH_CS_DWORDS : 0) +
+  3, FALSE);
 
/* Flush the caches for the first copy only. */
if (rctx->b.flags) {
@@ -441,6 +443,11 @@ void r600_cp_dma_copy_buffer(struct r600_context *rctx,
dst_offset += byte_count;
}
 
+   /* CP_DMA_CP_SYNC doesn't wait for idle on R6xx, but this does. */
+   if (rctx->b.chip_class == R600)
+   radeon_set_config_reg(cs, R_008040_WAIT_UNTIL,
+ S_008040_WAIT_CP_DMA_IDLE(1));
+
/* Invalidate the read caches. */
rctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE |
 R600_CONTEXT_INV_VERTEX_CACHE |
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/10] gallium/radeon: rename allocator_so_filled_size -> allocator_zeroed_memory

2016-06-01 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeon/r600_pipe_common.c | 8 
 src/gallium/drivers/radeon/r600_pipe_common.h | 2 +-
 src/gallium/drivers/radeon/r600_streamout.c   | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c 
b/src/gallium/drivers/radeon/r600_pipe_common.c
index 8165e93..1415e72 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.c
+++ b/src/gallium/drivers/radeon/r600_pipe_common.c
@@ -369,10 +369,10 @@ bool r600_common_context_init(struct r600_common_context 
*rctx,
r600_query_init(rctx);
cayman_init_msaa(>b);
 
-   rctx->allocator_so_filled_size =
+   rctx->allocator_zeroed_memory =
u_suballocator_create(>b, rscreen->info.gart_page_size,
  0, PIPE_USAGE_DEFAULT, TRUE);
-   if (!rctx->allocator_so_filled_size)
+   if (!rctx->allocator_zeroed_memory)
return false;
 
rctx->uploader = u_upload_create(>b, 1024 * 1024,
@@ -410,8 +410,8 @@ void r600_common_context_cleanup(struct r600_common_context 
*rctx)
 
util_slab_destroy(>pool_transfers);
 
-   if (rctx->allocator_so_filled_size) {
-   u_suballocator_destroy(rctx->allocator_so_filled_size);
+   if (rctx->allocator_zeroed_memory) {
+   u_suballocator_destroy(rctx->allocator_zeroed_memory);
}
rctx->ws->fence_reference(>last_sdma_fence, NULL);
 }
diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
b/src/gallium/drivers/radeon/r600_pipe_common.h
index d693004..8072833 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -471,7 +471,7 @@ struct r600_common_context {
unsignedlast_dirty_tex_descriptor_counter;
 
struct u_upload_mgr *uploader;
-   struct u_suballocator   *allocator_so_filled_size;
+   struct u_suballocator   *allocator_zeroed_memory;
struct util_slab_mempoolpool_transfers;
 
/* Current unaccounted memory usage. */
diff --git a/src/gallium/drivers/radeon/r600_streamout.c 
b/src/gallium/drivers/radeon/r600_streamout.c
index 24216de..eb81846 100644
--- a/src/gallium/drivers/radeon/r600_streamout.c
+++ b/src/gallium/drivers/radeon/r600_streamout.c
@@ -46,7 +46,7 @@ r600_create_so_target(struct pipe_context *ctx,
return NULL;
}
 
-   u_suballocator_alloc(rctx->allocator_so_filled_size, 4, 4,
+   u_suballocator_alloc(rctx->allocator_zeroed_memory, 4, 4,
 >buf_filled_size_offset,
 (struct pipe_resource**)>buf_filled_size);
if (!t->buf_filled_size) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/10] R600: Cache flush fixes and cleanup v2

2016-06-01 Thread Marek Olšák
Hi,

This is version 2 of the previous series. This time it's been tested!!

Tested cards:
- RV670
- RV730
- EG/REDWOOD
- CAYMAN

This patch series:
- fixes several bugs around making 3D and CP DMA idle with respect to CP.PFP,
  which allows removing a lot of cache flushes (= hacks really) and IB flushes
  around CP DMA
- removes unnecessary cache flushes
- moves other cache flushes to places where their frequency is lower

From the perspective of functions:
- binding shader resources doesn't flush anything (why should it)
- set_framebuffer_state flushes CB, DB, TC
- CP DMA copy_buffer only flushes TC, VC, KC. Never CB or DB.
- CP DMA clear_buffer only flushes TC, VC, KC when shader coherency is
  requested, or CB when CB coherency is requested. Never DB.
- fast color clear no longer flushes TC, VC, KC, DB. (implied by clear_buffer)
- ending streamout newly flushes TC, VC, KC

From the perspective of caches:
- TC is flushed only by set_framebuffer_state, texture_barrier, before
  CP DMA (except fast color clear), and after streamout
- VC & KC are flushed only before CP DMA (except fast color clear) or after
  streamout
- CB is flushed by set_framebuffer_state or by fast color clear
- DB is only flushed by set_framebuffer_state

More testing may be needed, especially testing on GPUs not listed above.

Also available here:
https://cgit.freedesktop.org/~mareko/mesa/log/?h=r600-opt-flushes

Please review.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/eu: use simd8 when exec_size != EXECUTE_16

2016-06-01 Thread Francisco Jerez
Alejandro Piñeiro  writes:

> Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time
> for any shader.
>
> Signed-off-by: Jason Ekstrand 
> Signed-off-by: Alejandro Piñeiro 
> ---
>
> This is the change suggested by Jason (so I added him as signed-off). I just
> run a full piglit run to check that there isn't any regression.
>
>  src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index 2538f0d..8218f9c 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -2909,7 +2909,7 @@ brw_set_dp_untyped_atomic_message(struct brw_codegen *p,
>  
> if (devinfo->gen >= 8 || devinfo->is_haswell) {
>if (brw_inst_access_mode(devinfo, p->current) == BRW_ALIGN_1) {
> - if (brw_inst_exec_size(devinfo, p->current) == BRW_EXECUTE_8)
> + if (brw_inst_exec_size(devinfo, p->current) != BRW_EXECUTE_16)
>  msg_control |= 1 << 4; /* SIMD8 mode */
>  
>   brw_inst_set_dp_msg_type(devinfo, insn,
> @@ -2922,7 +2922,7 @@ brw_set_dp_untyped_atomic_message(struct brw_codegen *p,
>brw_inst_set_dp_msg_type(devinfo, insn,
> GEN7_DATAPORT_DC_UNTYPED_ATOMIC_OP);
>  
> -  if (brw_inst_exec_size(devinfo, p->current) == BRW_EXECUTE_8)
> +  if (brw_inst_exec_size(devinfo, p->current) != BRW_EXECUTE_16)

LGTM,

Reviewed-by: Francisco Jerez 

>   msg_control |= 1 << 4; /* SIMD8 mode */
> }
>  
> -- 
> 2.7.4


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Discussion: C++11 std::future in Mesa

2016-06-01 Thread Matt Turner
On Wed, Jun 1, 2016 at 8:51 AM, ⚛ <0xe2.0x9a.0...@gmail.com> wrote:
> On Wed, Jun 1, 2016 at 3:50 PM, Erik Faye-Lund  wrote:
>> On Wed, Jun 1, 2016 at 3:02 PM, ⚛ <0xe2.0x9a.0...@gmail.com> wrote:
>>> On Wed, Jun 1, 2016 at 2:19 PM, Marek Olšák  wrote:
 I'll let you figure it out by yourself.
>>>
>>> Why would you withhold information if you already have it? Are you a
>>> "bad person" or something?
>>
>> The problem has been described in countless threads on this mailing
>> list, but I'll repeat it:
>
> Strange. If it has already been discussed many times then why isn't
> the problem solved yet?

By that metric world peace should have been solved long ago.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] ARB_internalformat_query2 support for OpenGL ES and other fixes

2016-06-01 Thread Alejandro Piñeiro
On 13/05/16 23:20, Alejandro Piñeiro wrote:
>
> On 13/05/16 17:06, Ilia Mirkin wrote:
>> On Fri, May 13, 2016 at 10:57 AM, Alejandro Piñeiro
>>  wrote:
>>> Earlier this year the support for ARB_internalformat_query2 has landed
>>> [1][2], initially only for desktop GL.
>>>
>>> But looking more carefully to the spec [3], we found the following:
>>>
>>> "Dependencies
>>>
>>>  OpenGL 2.0 or OpenGL ES 2.0 is required"
>>>
>>> Note the *or*. Additionally the spec list other GL ES 2.0/3.0
>>> dependencies. So that means that the extension can be also applied to
>>> GL ES 2.0/3.0. FWIW, this mistake is common, as it also happens with
>>> the khronos registry xml (khronos bug created [4]).
>> Are you sure it's not a mistake the other way? There's no ES extension
>> number allocated, and no vendor drivers expose this ext on ES, and
>> this would be the first GL_ARB_* ext to be exposed in ES... normally
>> these become GL_OES_bla or GL_KHR_bla.
> Seems that you were right:
> https://www.khronos.org/bugzilla/show_bug.cgi?id=1498#c1
>
> Although then I don't understand why ARB_internalformat_query2 has those
> dependencies to OpenGL ES 2.0/3.x and OES extensions:
> https://www.khronos.org/bugzilla/show_bug.cgi?id=1498#c2

I didn't get an answer to my last questions on the khronos bug. Taking
into account IRC comments, it is usual to be slow. In any case, the
first answer seems to be clear, and ARB_ extensions are not intended to
be exposed on OpenGL ES, and it seems that ARB_internalformat_query2 is
not an exception, even if the specification defines the behaviour of the
extension for OpenGL ES2/ES2.  So at this point, we should just move
forward. In my opinion we should just forget about the patch5 of the
series, the one that exposes the extension, and we should review and
push the first 4 patches:
  * patch1 and patch2 affects desktop gl too (fixes some cases for
INTERNALFORMAT_PREFERRED)
  * patch3 just expands a comment
  * patch4 makes _mesa_base_tex_format to take into account opengl es
too: although it was implemented just for the needs of this extension, I
still think that it makes sense to do that. And after all, that method
already have some OpenGL ES checks. The patch just add more of them. In
any case, this is somewhat more optional.

Opinions?







___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Discussion: C++11 std::future in Mesa

2016-06-01 Thread Marek Olšák
On Wed, Jun 1, 2016 at 7:40 PM, Patrick Baggett
 wrote:
>>
>>
>> No. Shader compilation can only be asynchronous if it's far enough
>> from a draw call and the app doesn't query its status. If it's next to
>> a draw call, multithreading is useless. Completely useless.
>>

I take my statement back. Multithreading can be useful for compiling
VS and FS in parallel when it's next to a draw call.

>
> I don't know a lot about the shader compilation/linking process, so
> I'm just asking this for my own benefit.
>
> I read that the optimizations take a long time. Is it possible to
> create a sort of -O0 version of the shader while the real version is
> generated by some thread pool? Or would there be some shaders that
> would just fail to run unless optimization took place (and the
> developers count on that)?

The latter. Register allocation is usually required for being able to
run a shader.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] automake: avoid fetching unnecessary data for the generation of git_sha1.h

2016-06-01 Thread Tobias Klausmann



On 01.06.2016 19:31, Eric Engestrom wrote:

On Wed, Jun 01, 2016 at 12:18:57AM +0200, Tobias Klausmann wrote:

Signed-off-by: Tobias Klausmann 
---
  src/Makefile.am | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/Makefile.am b/src/Makefile.am
index f5c0773..d0990dc 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -22,8 +22,8 @@
  git_sha1.h:
@if test -e $(top_srcdir)/.git; then \
if which git > /dev/null; then \
-   git --git-dir=$(top_srcdir)/.git log -n 1 --oneline | \
-   sed 's/^\([^ ]*\) .*/#define MESA_GIT_SHA1 "git-\1"/' \
+   git --git-dir=$(top_srcdir)/.git rev-parse --short HEAD | \
+   sed 's/^\(.*\)/#define MESA_GIT_SHA1 "git-\1"/' \

We don't need sed when the only output is already what we want. I'd find
something like this cleaner, but your patch also works:

printf '#define MESA_GIT_SHA1 "git-%s"\n' > git_sha1.h \
$(git --git-dir=$(top_srcdir)/.git rev-parse --short HEAD)


You are right! Feel free to send a patch if you want! :)
Thanks,
Tobias



Either way, this patch is:
Tested-by: Eric Engestrom 


> git_sha1.h ; \
fi \
fi
--
2.8.3


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] mesa: Allow relax various desktop-only checks for cube arrays

2016-06-01 Thread Ian Romanick
On 05/31/2016 07:39 AM, Ilia Mirkin wrote:
> This has the unfortunate side-effect of opening up cube map arrays to
> ES 3.0 implementations where the backend driver also supports texture
> cubemaps for desktop GL (I'm thinking the DX10.1 GT21x's for example).
> Perhaps we don't care? Otherwise it may be nice to use the new
> _mesa_has_OES_bla_ext() helpers.

I was also going to make some comments that this extension requires
GL_OES_geometry_shader.  I went digging through the extension spec to
understand the rationale for the requirements.  It basically comes down
to OpenGL ES's strong desire to put suffixes on everything to the
detriment of every application developer:

(3) What should the rules on GLSL suffixing be?

RESOLVED: The new sampler and image types are not reserved keywords in
ESSL 3.00, but they are keywords in GLSL 4.40. ESSL 3.10 updates the
reserved keyword list to include all keywords used or reserved in GLSL
4.40 (but not otherwise used in ES), and thus we can use the image
and sampler keywords directly by moving them from the reserved keywords
section. See bug 11179.

It requires GLES 3.1 because samplerCubeArray and friends weren't
reserved words in GLES 3.0.  I'll try to stifle my rage and simply say
that we should expose this extension in GLES 3.0.

> On Tue, May 31, 2016 at 1:28 AM, Chris Forbes  wrote:
>> Signed-off-by: Chris Forbes 
>> ---
>>  src/mesa/main/get.c  | 2 +-
>>  src/mesa/main/get_hash_params.py | 6 +++---
>>  src/mesa/main/teximage.c | 3 ++-
>>  src/mesa/main/texobj.c   | 2 +-
>>  src/mesa/main/texparam.c | 3 ++-
>>  src/mesa/main/texstorage.c   | 3 ++-
>>  6 files changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
>> index 9f70749..4f46572 100644
>> --- a/src/mesa/main/get.c
>> +++ b/src/mesa/main/get.c
>> @@ -1914,7 +1914,7 @@ tex_binding_to_index(const struct gl_context *ctx, 
>> GLenum binding)
>>_mesa_has_OES_texture_buffer(ctx)) ?
>>   TEXTURE_BUFFER_INDEX : -1;
>> case GL_TEXTURE_BINDING_CUBE_MAP_ARRAY:
>> -  return _mesa_is_desktop_gl(ctx) && 
>> ctx->Extensions.ARB_texture_cube_map_array
>> +  return ctx->Extensions.ARB_texture_cube_map_array
>>   ? TEXTURE_CUBE_ARRAY_INDEX : -1;
>> case GL_TEXTURE_BINDING_2D_MULTISAMPLE:
>>return _mesa_is_desktop_gl(ctx) && 
>> ctx->Extensions.ARB_texture_multisample
>> diff --git a/src/mesa/main/get_hash_params.py 
>> b/src/mesa/main/get_hash_params.py
>> index 2124072..7193296 100644
>> --- a/src/mesa/main/get_hash_params.py
>> +++ b/src/mesa/main/get_hash_params.py
>> @@ -458,6 +458,9 @@ descriptor=[
>>[ "MIN_PROGRAM_TEXTURE_GATHER_OFFSET", 
>> "CONTEXT_INT(Const.MinProgramTextureGatherOffset), 
>> extra_ARB_texture_gather"],
>>[ "MAX_PROGRAM_TEXTURE_GATHER_OFFSET", 
>> "CONTEXT_INT(Const.MaxProgramTextureGatherOffset), 
>> extra_ARB_texture_gather"],
>>
>> +# GL_ARB_texture_cube_map_array / ES3.1 with GL_OES_texture_cube_map_array
>> +  [ "TEXTURE_BINDING_CUBE_MAP_ARRAY_ARB", "LOC_CUSTOM, TYPE_INT, 
>> TEXTURE_CUBE_ARRAY_INDEX, extra_ARB_texture_cube_map_array" ],
>> +
>>  # GL_ARB_compute_shader / GLES 3.1
>>[ "MAX_COMPUTE_WORK_GROUP_INVOCATIONS", 
>> "CONTEXT_INT(Const.MaxComputeWorkGroupInvocations), 
>> extra_ARB_compute_shader_es31" ],
>>[ "MAX_COMPUTE_UNIFORM_BLOCKS", 
>> "CONTEXT_INT(Const.Program[MESA_SHADER_COMPUTE].MaxUniformBlocks), 
>> extra_ARB_compute_shader_es31" ],
>> @@ -851,9 +854,6 @@ descriptor=[
>>  # GL_ARB_map_buffer_alignment
>>[ "MIN_MAP_BUFFER_ALIGNMENT", "CONTEXT_INT(Const.MinMapBufferAlignment), 
>> NO_EXTRA" ],
>>
>> -# GL_ARB_texture_cube_map_array
>> -  [ "TEXTURE_BINDING_CUBE_MAP_ARRAY_ARB", "LOC_CUSTOM, TYPE_INT, 
>> TEXTURE_CUBE_ARRAY_INDEX, extra_ARB_texture_cube_map_array" ],
>> -
>>  # GL_ARB_texture_gather
>>[ "MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB", 
>> "CONTEXT_INT(Const.MaxProgramTextureGatherComponents), 
>> extra_ARB_texture_gather"],
>>
>> diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
>> index 58b7f27..bfe0b18 100644
>> --- a/src/mesa/main/teximage.c
>> +++ b/src/mesa/main/teximage.c
>> @@ -1474,8 +1474,9 @@ legal_teximage_target(struct gl_context *ctx, GLuint 
>> dims, GLenum target)
>>case GL_PROXY_TEXTURE_2D_ARRAY_EXT:
>>   return _mesa_is_desktop_gl(ctx) && 
>> ctx->Extensions.EXT_texture_array;
>>case GL_TEXTURE_CUBE_MAP_ARRAY:
>> -  case GL_PROXY_TEXTURE_CUBE_MAP_ARRAY:
>>   return ctx->Extensions.ARB_texture_cube_map_array;
>> +  case GL_PROXY_TEXTURE_CUBE_MAP_ARRAY:
>> + return _mesa_is_desktop_gl(ctx) && 
>> ctx->Extensions.ARB_texture_cube_map_array;
>>default:
>>   return GL_FALSE;
>>}
>> diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c
>> index ed630bd..2e9d9e3 100644
>> --- a/src/mesa/main/texobj.c
>> +++ 

Re: [Mesa-dev] Discussion: C++11 std::future in Mesa

2016-06-01 Thread Patrick Baggett
>
>
> No. Shader compilation can only be asynchronous if it's far enough
> from a draw call and the app doesn't query its status. If it's next to
> a draw call, multithreading is useless. Completely useless.
>

I don't know a lot about the shader compilation/linking process, so
I'm just asking this for my own benefit.

I read that the optimizations take a long time. Is it possible to
create a sort of -O0 version of the shader while the real version is
generated by some thread pool? Or would there be some shaders that
would just fail to run unless optimization took place (and the
developers count on that)?

> We need to get below 33 ms for all shaders needed to be compiled to
> render a frame. If there are 10 VS and 10 PS, one shader must be
> compiled within 1.65 ms on average. I don't see where your random
> guess meets that goal.
>
> Marek
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] automake: avoid fetching unnecessary data for the generation of git_sha1.h

2016-06-01 Thread Eric Engestrom
On Wed, Jun 01, 2016 at 12:18:57AM +0200, Tobias Klausmann wrote:
> Signed-off-by: Tobias Klausmann 
> ---
>  src/Makefile.am | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/Makefile.am b/src/Makefile.am
> index f5c0773..d0990dc 100644
> --- a/src/Makefile.am
> +++ b/src/Makefile.am
> @@ -22,8 +22,8 @@
>  git_sha1.h:
>   @if test -e $(top_srcdir)/.git; then \
>   if which git > /dev/null; then \
> - git --git-dir=$(top_srcdir)/.git log -n 1 --oneline | \
> - sed 's/^\([^ ]*\) .*/#define MESA_GIT_SHA1 "git-\1"/' \
> + git --git-dir=$(top_srcdir)/.git rev-parse --short HEAD | \
> + sed 's/^\(.*\)/#define MESA_GIT_SHA1 "git-\1"/' \

We don't need sed when the only output is already what we want. I'd find
something like this cleaner, but your patch also works:

printf '#define MESA_GIT_SHA1 "git-%s"\n' > git_sha1.h \
$(git --git-dir=$(top_srcdir)/.git rev-parse --short HEAD)

Either way, this patch is:
Tested-by: Eric Engestrom 

>   > git_sha1.h ; \
>   fi \
>   fi
> -- 
> 2.8.3
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/dri: fix winsys handle stride calculation for block formats

2016-06-01 Thread Emil Velikov
[+RobH since he's interested in plumbing similar/same formats into GBM]

Hi Philipp,

On 26 May 2016 at 13:00, Philipp Zabel  wrote:
> Am Donnerstag, den 26.05.2016, 12:43 +0100 schrieb Emil Velikov:
>> Hi gents,
>>
>> On 26 May 2016 at 11:28, Philipp Zabel  wrote:
>> > Hi Michel,
>> >
>> > Am Donnerstag, den 26.05.2016, 17:59 +0900 schrieb Michel Dänzer:
>> >> On 25.05.2016 22:20, Philipp Zabel wrote:
>> >> > This fixes the stride calculation for pipe formats with a block width
>> >> > larger than one.
>> >> >
>> >> > Signed-off-by: Philipp Zabel 
>> >> > ---
>> >> >  src/gallium/state_trackers/dri/dri2.c | 2 +-
>> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/src/gallium/state_trackers/dri/dri2.c 
>> >> > b/src/gallium/state_trackers/dri/dri2.c
>> >> > index 0c84baf..c0b0d21 100644
>> >> > --- a/src/gallium/state_trackers/dri/dri2.c
>> >> > +++ b/src/gallium/state_trackers/dri/dri2.c
>> >> > @@ -804,7 +804,7 @@ dri2_create_image_from_name(__DRIscreen *_screen,
>> >> > if (pf == PIPE_FORMAT_NONE)
>> >> >return NULL;
>> >> >
>> >> > -   whandle.stride = pitch * util_format_get_blocksize(pf);
>> >> > +   whandle.stride = util_format_get_stride(pf, pitch);
>> >> >
>> >> > return dri2_create_image_from_winsys(_screen, width, height, format,
>> >> >  , loaderPrivate);
>> >> >
>> >>
>> >> Reviewed-by: Michel Dänzer 
>> >>
>> >> Do you need somebody to push this patch for you?
>> >
>> > Yes, thank you.
>> >
>> Can we add a note if this fixes a real world case (on which driver
>> and/or format) ? Is it worth adding this patch in stable releases ?
>
> I encountered this when trying to import YUYV buffers via
> EGL_EXT_image_dma_buf_import into the (still out of tree) etnaviv
> gallium driver. Since I currently still have the following patch
> applied, I don't think this is a stable issue, at least regarding YUYV:
>
> --8<--
> Subject: [PATCH] WIP: st/dri: Allow YUYV import
>
> Unclear whether this is the right way, but this allows to import
> dma-buffers with YUYV pixel format.
>
> Signed-off-by: Philipp Zabel 
>
> diff --git a/src/gallium/state_trackers/dri/dri2.c
> b/src/gallium/state_trackers/dri/dri2.c
> index e07389c..bad1d90 100644
> --- a/src/gallium/state_trackers/dri/dri2.c
> +++ b/src/gallium/state_trackers/dri/dri2.c
> @@ -70,6 +70,10 @@ static int convert_fourcc(int format, int
> *dri_components_p)
>format = __DRI_IMAGE_FORMAT_XBGR;
>dri_components = __DRI_IMAGE_COMPONENTS_RGB;
>break;
> +   case __DRI_IMAGE_FOURCC_YUYV:
> +  format = __DRI_IMAGE_FOURCC_YUYV;
> +  dri_components = __DRI_IMAGE_COMPONENTS_Y_XUXV;
> +  break;
> default:
>return -1;
> }
> @@ -118,6 +122,9 @@ static enum pipe_format dri2_format_to_pipe_format
> (int format)
> case __DRI_IMAGE_FORMAT_ABGR:
>pf = PIPE_FORMAT_RGBA_UNORM;
>break;
> +   case __DRI_IMAGE_FOURCC_YUYV:
> +  pf = PIPE_FORMAT_YUYV;
> +  break;
> default:
>pf = PIPE_FORMAT_NONE;
>break;
> -->8--
>
> While I have your attention, should the above be handled by adding a
> __DRI_IMAGE_FORMAT_YUYV instead?
>
I think it's a good idea to have/add it. Internally drivers can handle
__DRI_IMAGE_{FOURCC,FORMAT}_YUYV in whichever way it seems best.
I'm not the most authoritative person to ask though - I'd imagine
Kristian Høgsberg may want to weight in. He added support for i965 for
the said fourcc not too long ago.

Is there a branch with your work so far ? I'd imagine that others
might be interested in taking a look.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] tgsi/scan: add uses_derivatives (v2)

2016-06-01 Thread Roland Scheidegger
Reviewed-by: Roland Scheidegger 

Am 01.06.2016 um 17:32 schrieb Nicolai Hähnle:
> From: Nicolai Hähnle 
> 
> v2:
> - TG4 does not calculate derivatives (Ilia)
> - also handle SAMPLE* instructions (Roland)
> 
> Cc: 12.0 
> Reviewed-by: Marek Olšák  (v1)
> Reviewed-by: Brian Paul  (v1)
> --
> This looks increasingly like something that might better live in the opcode
> info table. Maybe in a separate cleanup so as not to churn stable too much.
> ---
>  src/gallium/auxiliary/tgsi/tgsi_scan.c | 30 ++
>  src/gallium/auxiliary/tgsi/tgsi_scan.h |  1 +
>  2 files changed, 31 insertions(+)
> 
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c 
> b/src/gallium/auxiliary/tgsi/tgsi_scan.c
> index 1baf031..98d86fc 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_scan.c
> +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c
> @@ -68,6 +68,33 @@ is_texture_inst(unsigned opcode)
> tgsi_get_opcode_info(opcode)->is_tex);
>  }
>  
> +
> +/**
> + * Is the opcode an instruction which computes a derivative explicitly or
> + * implicitly?
> + */
> +static bool
> +computes_derivative(unsigned opcode)
> +{
> +   if (tgsi_get_opcode_info(opcode)->is_tex) {
> +  return opcode != TGSI_OPCODE_TG4 &&
> + opcode != TGSI_OPCODE_TXD &&
> + opcode != TGSI_OPCODE_TXF &&
> + opcode != TGSI_OPCODE_TXL &&
> + opcode != TGSI_OPCODE_TXL2 &&
> + opcode != TGSI_OPCODE_TXQ &&
> + opcode != TGSI_OPCODE_TXQ_LZ &&
> + opcode != TGSI_OPCODE_TXQS;
> +   }
> +
> +   return opcode == TGSI_OPCODE_DDX || opcode == TGSI_OPCODE_DDX_FINE ||
> +  opcode == TGSI_OPCODE_DDY || opcode == TGSI_OPCODE_DDY_FINE ||
> +  opcode == TGSI_OPCODE_SAMPLE ||
> +  opcode == TGSI_OPCODE_SAMPLE_B ||
> +  opcode == TGSI_OPCODE_SAMPLE_C;
> +}
> +
> +
>  static void
>  scan_instruction(struct tgsi_shader_info *info,
>   const struct tgsi_full_instruction *fullinst,
> @@ -263,6 +290,9 @@ scan_instruction(struct tgsi_shader_info *info,
> if (is_mem_inst)
>info->num_memory_instructions++;
>  
> +   if (computes_derivative(fullinst->Instruction.Opcode))
> +  info->uses_derivatives = true;
> +
> info->num_instructions++;
>  }
>   
> diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.h 
> b/src/gallium/auxiliary/tgsi/tgsi_scan.h
> index 31adce7..f7eefa4 100644
> --- a/src/gallium/auxiliary/tgsi/tgsi_scan.h
> +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.h
> @@ -115,6 +115,7 @@ struct tgsi_shader_info
> boolean writes_memory; /**< contains stores or atomics to buffers or 
> images */
> boolean is_msaa_sampler[PIPE_MAX_SAMPLERS];
> boolean uses_doubles; /**< uses any of the double instructions */
> +   boolean uses_derivatives;
> unsigned clipdist_writemask;
> unsigned culldist_writemask;
> unsigned num_written_culldistance;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [android-x86-devel] [PATCH 3/3] isl: add support for Android libmesa_isl static library

2016-06-01 Thread Mauro Rossi
> >>> >  MESA_COMMON_MK := $(MESA_TOP)/Android.common.mk
> >>> >  MESA_PYTHON2 := python
> >>> > +MESA_PYTHON3 := python3
> >>
> >> I've just seen that while a few days ago python3 was necessary to build
> >> gen%_pack.h headers,
> >> now the .py scripts just require python2, so this MESA_PYTHON3
> definition
> >> is not needed anymore.
> >>
> >>>
> >>> > new file mode 100644
> >>> > index 000..e0137d5
> >>> > --- /dev/null
> >>> > +++ b/src/intel/genxml/Android.mk
> >>> > @@ -0,0 +1,82 @@
> >>> > +# Copyright © 2016 Intel Corporation
> >>> > +# Copyright © 2016 Mauro Rossi 
> >>> > +#
> >>> > +# Permission is hereby granted, free of charge, to any person
> >>> > obtaining a
> >>> > +# copy of this software and associated documentation files (the
> >>> > "Software"),
> >>> > +# to deal in the Software without restriction, including without
> >>> > limitation
> >>> > +# the rights to use, copy, modify, merge, publish, distribute,
> >>> > sublicense,
> >>> > +# and/or sell copies of the Software, and to permit persons to whom
> >>> > the
> >>> > +# Software is furnished to do so, subject to the following
> conditions:
> >>> > +#
> >>> > +# The above copyright notice and this permission notice shall be
> >>> > included
> >>> > +# in all copies or substantial portions of the Software.
> >>> > +#
> >>> > +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >>> > EXPRESS OR
> >>> > +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> >>> > MERCHANTABILITY,
> >>> > +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> >>> > SHALL
> >>> > +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
> OR
> >>> > OTHER
> >>> > +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> >>> > ARISING
> >>> > +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> OTHER
> >>> > +# DEALINGS IN THE SOFTWARE.
> >>> > +
> >>> > +LOCAL_PATH := $(call my-dir)
> >>> > +
> >>> > +# Import variable GENERATED_FILES.
> >>> > +include $(LOCAL_PATH)/Makefile.sources
> >>> > +
> >>> > +include $(CLEAR_VARS)
> >>> > +
> >>> > +LOCAL_MODULE := libmesa_genxml
> >>> > +
> >>> > +LOCAL_MODULE_CLASS := STATIC_LIBRARIES
> >>> > +
> >>> > +intermediates := $(call local-generated-sources-dir)
> >>> > +
> >>> > +# dummy.c source file is generated to meet the build system's rules.
> >>> > +LOCAL_GENERATED_SOURCES += $(intermediates)/dummy.c
> >>> > +
> >>> > +$(intermediates)/dummy.c:
> >>> > +   @mkdir -p $(dir $@)
> >>> > +   @echo "Gen Dummy: $(PRIVATE_MODULE) <= $(notdir $(@))"
> >>> > +   $(hide) touch $@
> >>> > +
> >>> > +# This is the list of auto-generated files headers
> >>> > +LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/genxml/,
> >>> > $(GENXML_GENERATED_FILES))
> >>> > +
> >>> > +define header-gen
> >>> > +   @mkdir -p $(dir $@)
> >>> > +   @echo "Gen Header: $(PRIVATE_MODULE) <= $(notdir $(@))"
> >>> > +   $(hide) $(PRIVATE_SCRIPT) $(PRIVATE_XML) > $@
> >>> > +endef
> >>> > +
> >>> > +$(intermediates)/genxml/gen6_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON3)
> >>> > $(LOCAL_PATH)/gen_pack_header.py
> >>> > +$(intermediates)/genxml/gen6_pack.h: PRIVATE_XML :=
> >>> > $(LOCAL_PATH)/gen6.xml
> >>> > +$(intermediates)/genxml/gen6_pack.h: $(LOCAL_PATH)/gen6.xml
> >>> > $(LOCAL_PATH)/gen_pack_header.py
> >>> > +   $(call header-gen)
> >>> > +
> >>> > +$(intermediates)/genxml/gen7_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON3)
> >>> > $(LOCAL_PATH)/gen_pack_header.py
> >>> > +$(intermediates)/genxml/gen7_pack.h: PRIVATE_XML :=
> >>> > $(LOCAL_PATH)/gen7.xml
> >>> > +$(intermediates)/genxml/gen7_pack.h: $(LOCAL_PATH)/gen7.xml
> >>> > $(LOCAL_PATH)/gen_pack_header.py
> >>> > +   $(call header-gen)
> >>> > +
> >>> > +$(intermediates)/genxml/gen75_pack.h: PRIVATE_SCRIPT :=
> >>> > $(MESA_PYTHON3) $(LOCAL_PATH)/gen_pack_header.py
> >>> > +$(intermediates)/genxml/gen75_pack.h: PRIVATE_XML :=
> >>> > $(LOCAL_PATH)/gen75.xml
> >>> > +$(intermediates)/genxml/gen75_pack.h: $(LOCAL_PATH)/gen75.xml
> >>> > $(LOCAL_PATH)/gen_pack_header.py
> >>> > +   $(call header-gen)
> >>> > +
> >>> > +$(intermediates)/genxml/gen8_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON3)
> >>> > $(LOCAL_PATH)/gen_pack_header.py
> >>> > +$(intermediates)/genxml/gen8_pack.h: PRIVATE_XML :=
> >>> > $(LOCAL_PATH)/gen8.xml
> >>> > +$(intermediates)/genxml/gen8_pack.h: $(LOCAL_PATH)/gen8.xml
> >>> > $(LOCAL_PATH)/gen_pack_header.py
> >>> > +   $(call header-gen)
> >>> > +
> >>> > +$(intermediates)/genxml/gen9_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON3)
> >>> > $(LOCAL_PATH)/gen_pack_header.py
> >>> > +$(intermediates)/genxml/gen9_pack.h: PRIVATE_XML :=
> >>> > $(LOCAL_PATH)/gen9.xml
> >>> > +$(intermediates)/genxml/gen9_pack.h: $(LOCAL_PATH)/gen9.xml
> >>> > $(LOCAL_PATH)/gen_pack_header.py
> >>> > +   $(call header-gen)
> >>
> >>
> > ...and these PRIVATE SCRIPTS will use use $(MESA_PYTHON) as
> $(MESA_PYTHON3)
> > not needed anymore
> 

[Mesa-dev] Parallel shader compilation (was Re: Discussion: C++11 std::future in Mesa)

2016-06-01 Thread Ian Romanick
On 06/01/2016 05:19 AM, Marek Olšák wrote:
> On Fri, May 27, 2016 at 8:49 PM, ⚛ <0xe2.0x9a.0...@gmail.com> wrote:
>> Hello.
>>
>> http://en.cppreference.com/w/cpp/thread/future
>> http://en.cppreference.com/w/cpp/thread/async
>>
>> Assumption: Shader compilation will need run on separate thread(s).
>>
>> From a certain perspective, one of the easy ways of removing Mesa shader
>> compilation from the "main" thread would be to use std::future for some
>> fields in struct gl_program (defined in mtypes.h) and in related source
>> code.
> 
> Your perspective is completely wrong. First, you need to understand
> why shader compilation speed is a problem for some apps and when you
> do, you'll realize there is only one solution and this is not it. I'll
> let you figure it out by yourself.

I think this is unnecessarily dismissive.  Parallel shader compilation
is a big deal.  Every other GL driver implements it to one degree or
another.  Khronos has standardized a method to interact with it better
(GL_ARB_parallel_shader_compile).  Valve has put a lot of effort into
teaching other game developers how to not "defeat" it.

Compilation caches are, without any doubt, the 90% solution.  This
coupled with mis-coded applications has generally been our argument
against spending time implementing parallel compilation.  The mis-coded
application problem is gradually fading into history.  Heck, I think
most applications don't even bother to check compile or link status in
release builds!  Either way, there are still a number of important cases
that a shader cache cannot help.

1. First run of an application still takes too long.  I regularly hear
developers complain that if the first run of a game takes too long to
start, many people will give up before even playing.  This is more a
problem for mobile games, but I hear the same thing from other developers.

2. Driver updates generally invalidate a shader cache.  This results in
long load times at somewhat unpredictable times.  It also creates a
negative feedback loop where users don't want to update their driver
because they don't want to wait an extra 10 minutes for a game to load
that they're only going to play for 5 minutes.

3. Similar to #2, application updates generally invalidate a shader
cache because they include new shaders.

4. Similar to #3, application developers generally don't get much
benefit from a shader cache while developing their application.  It is
especially painful for them because they tend to have the beefiest,
highest thread count CPUs... and they know enough to know that those
threads are sitting idly by.

If nothing else, compilation of multiple compilation units is
embarrassingly parallel.  None of us would ever consider using a build
system that didn't use all the available processing power in our system.
 The whole notion is just silly.  Even though ccache exists, we still
use 'make -j99', distcc, and icecream.

It's also tempting to think that having SPIR-V in OpenGL will solve
these problems, but that is unlikely.  There are a lot of applications
in the pipeline that, obviously, don't use SPIR-V in OpenGL.  A SPIR-V
solution for end users would still be years away even if we shipped
SPIR-V support tomorrow.  In addition, whenever we look at a profile of
compilation time, almost all of the time is in the backend doing
register allocation and instruction scheduling.  SPIR-V will do nothing
for that.  A shader cache allows us to avoid the work, and parallel
compilation makes it faster in the cases that cannot be avoided.

While I believe that there are still more important things to work on, I
would not oppose someone else working on parallel shader compilation.  I
don't think there was any opposition when Chia-I Wu started working on
it a couple years ago (patch series starting with
https://patchwork.freedesktop.org/patch/32146/).  Everyone was just too
busy to review his patches, and he moved on to other things.

> tl;dr Understand the problem before you propose a solution.
> 
> Marek
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Discussion: C++11 std::future in Mesa

2016-06-01 Thread Marek Olšák
On Wed, Jun 1, 2016 at 6:06 PM, ⚛ <0xe2.0x9a.0...@gmail.com> wrote:
> On Wed, Jun 1, 2016 at 3:53 PM, Marek Olšák  wrote:
>> Because of external factors you can't predict, your driver suddenly
>> receives a bunch of shaders that take 2000 ms to compile right before
>> a draw call. Your budget is 16 ms per frame to get 30 fps, but you
>> can't render the frame if you don't compile those shaders. The problem
>> is how to fit the compilation that takes 2000 ms and is required
>> render the frame into 16 ms. Can you see where I'm going?
>
> I don't understand why transforming parts of shader compilation into
> asynchronous code isn't the right way to go. Async parts of
> compilation can potentially run on other CPU cores, maybe lowering
> those 2000ms down to 750ms.

No. Shader compilation can only be asynchronous if it's far enough
from a draw call and the app doesn't query its status. If it's next to
a draw call, multithreading is useless. Completely useless.

We need to get below 33 ms for all shaders needed to be compiled to
render a frame. If there are 10 VS and 10 PS, one shader must be
compiled within 1.65 ms on average. I don't see where your random
guess meets that goal.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Discussion: C++11 std::future in Mesa

2016-06-01 Thread Emil Velikov
On 1 June 2016 at 17:34, ⚛ <0xe2.0x9a.0...@gmail.com> wrote:
> On Wed, Jun 1, 2016 at 4:06 PM, Brian Paul  wrote:
>> I think the issue here is someone comes along with no history in the project
>> and asserts that he has a great solution for a problem
>
> Asynchronous computation is an essential part of the solution to the
> problem (problem == faster compiler response time in Mesa).
>
>> while developers with
>> many years of experience know it's not that simple.
>
> I never wrote it was simple.
>
True you didn't. Yet the method you presented it implied it is that
simple. Being more explicit in what you mean will do you more good
than bad.

>> It takes time to
>> explain all this and most of us are very busy with other tasks.
>
> I agree. I don't have time to explain to you in detail why it will work.
>
>> Furthermore, people sometimes interpret no response to their ideas as
>> implicit endorsement when that's not the case at all.  Better for the
>> experienced people to give a quick no/nak than to say nothing and let
>> someone assume that his idea is supported.
>
> Ahh, so you feel superior. ... I am wondering how long is that feeling
> going to last.
>
>> I suspect that what Marek was doing.
>
> I don't know.
>
>> Finally, I think some of us have a hard time taking people seriously when
>> they don't identify themselves with their real name.
>
> You will have to make an exception in my case.
Dude will kindly ask you to calm down. Similar to your account request
a while you're acting immature and provocative.

Brian is one of the most reasonable and helpful people around here. I
believe we all owe his some respect if it wasn't for his skills and
contributions then at least for starting the whole project.

If you find such replies annoying/demanding/etc. I would suggest
taking a short break and replying at a later stage. Nobody is gaining
anything in such discussions.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [android-x86-devel] [PATCH 3/3] isl: add support for Android libmesa_isl static library

2016-06-01 Thread Emil Velikov
On 1 June 2016 at 01:25, Mauro Rossi  wrote:
> 2016-06-01 2:22 GMT+02:00 Mauro Rossi :
>> 2016-05-31 16:33 GMT+02:00 Emil Velikov :
>>>
>>> Hi Mauro,
>>>
>>> A couple of questions, nothing serious imho.
>>>
>>> On 30 May 2016 at 23:20, Mauro Rossi  wrote:
>>> > isl library is needed to build i965, libmesa_isl static library is
>>> > added
>>> > to fix related Android building errors.
>>> >
>>> > Any attempt to build libmesa_genxml as phony package module failed to
>>> > deliver
>>> > gen{7,75,8,9}_pack.h autogenerated headers, needed to build
>>> > libmesa_isl_gen{7,75,8,9}
>>> >
>>> > Due to constraints in the Android Build System, libmesa_genxml is built
>>> > as static library
>>> > and at least one source file needs to be compiled, so dummy.c is
>>> > autogenerated for this scope.
>>> >
>>> > libmesa_isl_gen{7,75,8,9} dependencies on libmesa_genxml are declared
>>> > using LOCAL_WHOLE_STATIC_LIBRARIES,
>>> > in order to avoid building errors due to missing
>>> > genxml/gen{7,75,8,9}_pack.h headers
>>> >
>>> > Cc: 
>>> > ---
>>> >  Android.mk   |   3 +
>>> >  src/intel/genxml/Android.mk  |  82 ++
>>> >  src/intel/isl/Android.mk | 157
>>> > +++
>>> >  src/mesa/drivers/dri/i965/Android.mk |   3 +-
>>> >  4 files changed, 244 insertions(+), 1 deletion(-)
>>> >  create mode 100644 src/intel/genxml/Android.mk
>>> >  create mode 100644 src/intel/isl/Android.mk
>>> >
>>> > diff --git a/Android.mk b/Android.mk
>>> > index 6a5596b..8ab80f3 100644
>>> > --- a/Android.mk
>>> > +++ b/Android.mk
>>> > @@ -48,6 +48,7 @@ MESA_DRI_MODULE_UNSTRIPPED_PATH :=
>>> > $(TARGET_OUT_SHARED_LIBRARIES_UNSTRIPPED)/$(M
>>> >
>>> >  MESA_COMMON_MK := $(MESA_TOP)/Android.common.mk
>>> >  MESA_PYTHON2 := python
>>> > +MESA_PYTHON3 := python3
>>
>>
>> I've just seen that while a few days ago python3 was necessary to build
>> gen%_pack.h headers,
>> now the .py scripts just require python2, so this MESA_PYTHON3 definition
>> is not needed anymore.
>>
>>>
>>> > new file mode 100644
>>> > index 000..e0137d5
>>> > --- /dev/null
>>> > +++ b/src/intel/genxml/Android.mk
>>> > @@ -0,0 +1,82 @@
>>> > +# Copyright © 2016 Intel Corporation
>>> > +# Copyright © 2016 Mauro Rossi 
>>> > +#
>>> > +# Permission is hereby granted, free of charge, to any person
>>> > obtaining a
>>> > +# copy of this software and associated documentation files (the
>>> > "Software"),
>>> > +# to deal in the Software without restriction, including without
>>> > limitation
>>> > +# the rights to use, copy, modify, merge, publish, distribute,
>>> > sublicense,
>>> > +# and/or sell copies of the Software, and to permit persons to whom
>>> > the
>>> > +# Software is furnished to do so, subject to the following conditions:
>>> > +#
>>> > +# The above copyright notice and this permission notice shall be
>>> > included
>>> > +# in all copies or substantial portions of the Software.
>>> > +#
>>> > +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>> > EXPRESS OR
>>> > +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> > MERCHANTABILITY,
>>> > +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
>>> > SHALL
>>> > +# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>> > OTHER
>>> > +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>> > ARISING
>>> > +# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>>> > +# DEALINGS IN THE SOFTWARE.
>>> > +
>>> > +LOCAL_PATH := $(call my-dir)
>>> > +
>>> > +# Import variable GENERATED_FILES.
>>> > +include $(LOCAL_PATH)/Makefile.sources
>>> > +
>>> > +include $(CLEAR_VARS)
>>> > +
>>> > +LOCAL_MODULE := libmesa_genxml
>>> > +
>>> > +LOCAL_MODULE_CLASS := STATIC_LIBRARIES
>>> > +
>>> > +intermediates := $(call local-generated-sources-dir)
>>> > +
>>> > +# dummy.c source file is generated to meet the build system's rules.
>>> > +LOCAL_GENERATED_SOURCES += $(intermediates)/dummy.c
>>> > +
>>> > +$(intermediates)/dummy.c:
>>> > +   @mkdir -p $(dir $@)
>>> > +   @echo "Gen Dummy: $(PRIVATE_MODULE) <= $(notdir $(@))"
>>> > +   $(hide) touch $@
>>> > +
>>> > +# This is the list of auto-generated files headers
>>> > +LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/genxml/,
>>> > $(GENXML_GENERATED_FILES))
>>> > +
>>> > +define header-gen
>>> > +   @mkdir -p $(dir $@)
>>> > +   @echo "Gen Header: $(PRIVATE_MODULE) <= $(notdir $(@))"
>>> > +   $(hide) $(PRIVATE_SCRIPT) $(PRIVATE_XML) > $@
>>> > +endef
>>> > +
>>> > +$(intermediates)/genxml/gen6_pack.h: PRIVATE_SCRIPT := $(MESA_PYTHON3)
>>> > $(LOCAL_PATH)/gen_pack_header.py
>>> > +$(intermediates)/genxml/gen6_pack.h: PRIVATE_XML :=
>>> > $(LOCAL_PATH)/gen6.xml
>>> > +$(intermediates)/genxml/gen6_pack.h: 

[Mesa-dev] [PATCH] i965/eu: use simd8 when exec_size != EXECUTE_16

2016-06-01 Thread Alejandro Piñeiro
Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time
for any shader.

Signed-off-by: Jason Ekstrand 
Signed-off-by: Alejandro Piñeiro 
---

This is the change suggested by Jason (so I added him as signed-off). I just
run a full piglit run to check that there isn't any regression.

 src/mesa/drivers/dri/i965/brw_eu_emit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 2538f0d..8218f9c 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -2909,7 +2909,7 @@ brw_set_dp_untyped_atomic_message(struct brw_codegen *p,
 
if (devinfo->gen >= 8 || devinfo->is_haswell) {
   if (brw_inst_access_mode(devinfo, p->current) == BRW_ALIGN_1) {
- if (brw_inst_exec_size(devinfo, p->current) == BRW_EXECUTE_8)
+ if (brw_inst_exec_size(devinfo, p->current) != BRW_EXECUTE_16)
 msg_control |= 1 << 4; /* SIMD8 mode */
 
  brw_inst_set_dp_msg_type(devinfo, insn,
@@ -2922,7 +2922,7 @@ brw_set_dp_untyped_atomic_message(struct brw_codegen *p,
   brw_inst_set_dp_msg_type(devinfo, insn,
GEN7_DATAPORT_DC_UNTYPED_ATOMIC_OP);
 
-  if (brw_inst_exec_size(devinfo, p->current) == BRW_EXECUTE_8)
+  if (brw_inst_exec_size(devinfo, p->current) != BRW_EXECUTE_16)
  msg_control |= 1 << 4; /* SIMD8 mode */
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa: Enable LTO compilation

2016-06-01 Thread Chuck Atkins
>
> > With gcc 5.3.1 I end up with lib{GL,OSMesa}.so @ 44M and
> > libswrAVX{,2}.so @ 70M.  With flto turned on it drops WAY down to
> > lib{GL,OSMesa}.so @ 13M and libswrAVX{,2}.so @ 18M
>
> I assume those numbers are including debugging symbols? How do stripped
> binaries compare?
>

Silly me, I didn't even consider that since I wasn't explicitly doing a
debug build.  That being said, after explicitly stripping the binaries with
"strip -g", the resulting difference is negligible between the two (LTO vs
non-LTO).
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   >