Re: [Mesa-dev] [PATCH] nir: put compact into bitfields in nir_variable_data

2017-09-05 Thread Jason Ekstrand
On Tue, Sep 5, 2017 at 8:19 PM, Dave Airlie  wrote:

> From: Dave Airlie 
>
> This being declared bool means it won't get merged with the previous
> bitfields, this seems like an oversight rather than deliberate.
>

Really?  That's silly... This is fine with me.

Reviewed-by: Jason Ekstrand 


> Noticed when running pahole.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/compiler/nir/nir.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 9313b7ac90..8330e6d7ce 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -220,7 +220,7 @@ typedef struct nir_variable {
> * be tightly packed.  In other words, consecutive array elements
> * should be stored one component apart, rather than one slot apart.
> */
> -  bool compact:1;
> +  unsigned compact:1;
>
>/**
> * Whether this is a fragment shader output implicitly initialized
> with
> --
> 2.13.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 102496] Frontbuffer rendering corruption on mesa master

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102496

--- Comment #3 from Tapani Pälli  ---
I'm seeing the 'no animation' on i965 too. Sometimes animation happens but most
of the time not.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 102530] [bisected] Kodi crashes when launching a stream - commit bd2662bf

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102530

--- Comment #14 from Tapani Pälli  ---
(In reply to Alexandre Demers from comment #13)
> Created attachment 133983 [details]
> Kodi segfault with MESA_NO_ERROR=0
> 
> Core dump produced by Kodi when MESA_NO_ERROR=0

Have you tried it Kodi works with current Mesa? This backtrace is also made
with old Mesa so whatever the problem is it might have been fixed already.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: disallow mixed varying types within a location

2017-09-05 Thread Timothy Arceri



On 06/09/17 11:59, Ilia Mirkin wrote:

On Tue, Sep 5, 2017 at 9:54 PM, Timothy Arceri  wrote:


On 06/09/17 11:23, Ilia Mirkin wrote:


The enhanced layouts spec has all kinds of restrictions about what can
and cannot be mixed in a location. Integer/float(/presumably double)
can't occupy a single location, interpolation has to be the same, as
well as auxiliary storage (including patch!).

The implication of this is ... don't specify explicit locations/components
if you want better packing, since the auto-packer doesn't care at all
about most of these restrictions. Sad.



There are still use cases such as SSO, tessellation shaders and varyings
used by interpolateAt (although we just enable the enhanced layout packing
rules by default for those anyway) were we cannot use the auto-packer.

As far as the patch goes this should really be in link_varyings.cpp rather
than linker.cpp, also there is already related validation code in
cross_validate_outputs_to_inputs() any reason for not just modifying the
code there?


This applies to whole shader stages. So e.g. in SSO, you still
validate the inputs and the outputs. Similarly, you do this for vertex
shader inputs.


I'm not following what you are trying to say here. 
cross_validate_outputs_to_inputs() does almost the same thing you are 
doing here, but also validates the outputs from one stage with the input 
from another. You just need to adjust the offsets for patches like you 
have done here and add the missing interpolation checks etc, the base 
type is already validated there.




Ideally it'd be done earlier on, but we need to wait for the interface
types to go away, or else it'd be a disaster.

Most of link_varyings is concerned with inter-stage logic. It could be
moved there, of course, just didn't really seem to belong.


link_varyings should be used for all things varyings unless it really 
doesn't make sense. There is no reason to dump everything in linker.cpp. 
The only reason there are still bits in linker.cpp is because Paul left 
the project in the middle of re-factoring things and beside a single 
patch from me that moved more varying related code here a while ago, 
there has been pretty much zero effort in re-factoring any of the GLSL 
IR compiler into more sensible pieces.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: put compact into bitfields in nir_variable_data

2017-09-05 Thread Dave Airlie
From: Dave Airlie 

This being declared bool means it won't get merged with the previous
bitfields, this seems like an oversight rather than deliberate.

Noticed when running pahole.

Signed-off-by: Dave Airlie 
---
 src/compiler/nir/nir.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 9313b7ac90..8330e6d7ce 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -220,7 +220,7 @@ typedef struct nir_variable {
* be tightly packed.  In other words, consecutive array elements
* should be stored one component apart, rather than one slot apart.
*/
-  bool compact:1;
+  unsigned compact:1;
 
   /**
* Whether this is a fragment shader output implicitly initialized with
-- 
2.13.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Vulkan-CPU has been renamed to Kazan

2017-09-05 Thread Jacob Lifshay
The new name means "Volcano" in Japanese.

For those who don't remember, Kazan is a work-in-progress
software-rendering Vulkan implementation.

I moved the source code to https://github.com/kazan-3d/kazan, additionally,
I registered the domain name kazan-3d.org, which currently redirects to the
source code on GitHub.

I renamed the project because vulkan-cpu infringes the Vulkan trademark.

The source code will still be available at the old URL (
https://github.com/programmerjake/vulkan-cpu) to avoid breaking any links,
however I probably won't keep the old repository up-to-date.

Jacob Lifshay
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/mtypes: repack gl_texture_object.

2017-09-05 Thread Dave Airlie
On 6 September 2017 at 03:11, Marek Olšák  wrote:
> On Tue, Sep 5, 2017 at 5:50 PM, Brian Paul  wrote:
>> On 09/04/2017 05:29 AM, Marek Olšák wrote:
>>>
>>> On Sun, Sep 3, 2017 at 1:18 PM, Dave Airlie  wrote:

 From: Dave Airlie 

 reduces size from 1144 to 1128.

 Signed-off-by: Dave Airlie 
 ---
   src/mesa/main/mtypes.h | 10 +-
   1 file changed, 5 insertions(+), 5 deletions(-)

 diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
 index d44897b..3d68a6d 100644
 --- a/src/mesa/main/mtypes.h
 +++ b/src/mesa/main/mtypes.h
 @@ -1012,7 +1012,6 @@ struct gl_texture_object
  struct gl_sampler_object Sampler;

  GLenum DepthMode;   /**< GL_ARB_depth_texture */
>>>
>>>
>>> The patch looks good, but here are some ideas for future improvements:
>>>
>>> GLenum can be uint16_t everywhere, because GL doesn't set higher bits:
>>>
>>> typedef uint16_t GLenum16.
>>> s/GLenum/GLenum16/
>>>
 -   bool StencilSampling;   /**< Should we sample stencil instead of
 depth? */

  GLfloat Priority;   /**< in [0,1] */
  GLint BaseLevel;/**< min mipmap level, OpenGL 1.2 */
 @@ -1033,12 +1032,17 @@ struct gl_texture_object
  GLboolean Immutable;/**< GL_ARB_texture_storage */
  GLboolean _IsFloat; /**< GL_OES_float_texture */
  GLboolean _IsHalfFloat; /**< GL_OES_half_float_texture */
 +   bool StencilSampling;   /**< Should we sample stencil instead of
 depth? */
 +   bool HandleAllocated;   /**< GL_ARB_bindless_texture */
>>>
>>>
>>> All bools can be 1 bit:
>>>
>>> bool x:1;
>>> GLboolean y:1;
>>>
>>> etc.
>>>

  GLuint MinLevel;/**< GL_ARB_texture_view */
  GLuint MinLayer;/**< GL_ARB_texture_view */
  GLuint NumLevels;   /**< GL_ARB_texture_view */
  GLuint NumLayers;   /**< GL_ARB_texture_view */
>>>
>>>
>>> MinLevel, NumLevels can be ubyte (uint8_t). MinLayer, NumLayers can be
>>> ushort (uint16_t)... simply by considering the range of possible
>>> values.
>>
>>
>> There's lots of opportunities along these lines in gl_texture_image. And
>> since we often have many gl_texture_images per gl_texture_object, and we
>> often have many textures, it'll probably have considerable impact.  I've
>> suggested this in the past but never got around to working on it.
>>
>> I recall Eric Anholt mentioning a memory profiling tool that was helpful for
>> finding wasted space in structures, etc.  I don't recall the name right now.
>> Eric?
>
> Dave used pahole for this patch series too. It can't obviously suggest
> what I suggested above (like changing the types and bits).

Yup this was pahole, doing what Marek describes is definitely something
that can be done, but needs a lot more care and attention.

Replacing bool with unsigned :1 fields isn't always a win, as you then
have a mask/shift on the accesses so overall may end up slowing things
down, and increasing instruction count etc.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: disallow mixed varying types within a location

2017-09-05 Thread Ilia Mirkin
On Tue, Sep 5, 2017 at 9:54 PM, Timothy Arceri  wrote:
>
> On 06/09/17 11:23, Ilia Mirkin wrote:
>>
>> The enhanced layouts spec has all kinds of restrictions about what can
>> and cannot be mixed in a location. Integer/float(/presumably double)
>> can't occupy a single location, interpolation has to be the same, as
>> well as auxiliary storage (including patch!).
>>
>> The implication of this is ... don't specify explicit locations/components
>> if you want better packing, since the auto-packer doesn't care at all
>> about most of these restrictions. Sad.
>
>
> There are still use cases such as SSO, tessellation shaders and varyings
> used by interpolateAt (although we just enable the enhanced layout packing
> rules by default for those anyway) were we cannot use the auto-packer.
>
> As far as the patch goes this should really be in link_varyings.cpp rather
> than linker.cpp, also there is already related validation code in
> cross_validate_outputs_to_inputs() any reason for not just modifying the
> code there?

This applies to whole shader stages. So e.g. in SSO, you still
validate the inputs and the outputs. Similarly, you do this for vertex
shader inputs.

Ideally it'd be done earlier on, but we need to wait for the interface
types to go away, or else it'd be a disaster.

Most of link_varyings is concerned with inter-stage logic. It could be
moved there, of course, just didn't really seem to belong.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: disallow mixed varying types within a location

2017-09-05 Thread Timothy Arceri


On 06/09/17 11:23, Ilia Mirkin wrote:

The enhanced layouts spec has all kinds of restrictions about what can
and cannot be mixed in a location. Integer/float(/presumably double)
can't occupy a single location, interpolation has to be the same, as
well as auxiliary storage (including patch!).

The implication of this is ... don't specify explicit locations/components
if you want better packing, since the auto-packer doesn't care at all
about most of these restrictions. Sad.


There are still use cases such as SSO, tessellation shaders and varyings 
used by interpolateAt (although we just enable the enhanced layout 
packing rules by default for those anyway) were we cannot use the 
auto-packer.


As far as the patch goes this should really be in link_varyings.cpp 
rather than linker.cpp, also there is already related validation code in 
cross_validate_outputs_to_inputs() any reason for not just modifying the 
code there?




This fixes:

KHR-GL45.enhanced_layouts.varying_location_aliasing_with_mixed_types
KHR-GL45.enhanced_layouts.varying_location_aliasing_with_mixed_interpolation
KHR-GL45.enhanced_layouts.varying_location_aliasing_with_mixed_auxiliary_storage

See https://github.com/KhronosGroup/OpenGL-API/issues/13 for some more
info, where I asked about patch vs non-patch locations.

Signed-off-by: Ilia Mirkin 
---
  src/compiler/glsl/link_varyings.cpp |  23 
  src/compiler/glsl/linker.cpp| 115 
  src/compiler/glsl/linker.h  |   4 ++
  3 files changed, 119 insertions(+), 23 deletions(-)

diff --git a/src/compiler/glsl/link_varyings.cpp 
b/src/compiler/glsl/link_varyings.cpp
index 528506fd0eb..20187166203 100644
--- a/src/compiler/glsl/link_varyings.cpp
+++ b/src/compiler/glsl/link_varyings.cpp
@@ -40,29 +40,6 @@
  #include "program.h"
  
  
-/**

- * Get the varying type stripped of the outermost array if we're processing
- * a stage whose varyings are arrays indexed by a vertex number (such as
- * geometry shader inputs).
- */
-static const glsl_type *
-get_varying_type(const ir_variable *var, gl_shader_stage stage)
-{
-   const glsl_type *type = var->type;
-
-   if (!var->data.patch &&
-   ((var->data.mode == ir_var_shader_out &&
- stage == MESA_SHADER_TESS_CTRL) ||
-(var->data.mode == ir_var_shader_in &&
- (stage == MESA_SHADER_TESS_CTRL || stage == MESA_SHADER_TESS_EVAL ||
-  stage == MESA_SHADER_GEOMETRY {
-  assert(type->is_array());
-  type = type->fields.array;
-   }
-
-   return type;
-}
-
  static void
  create_xfb_varying_names(void *mem_ctx, const glsl_type *t, char **name,
   size_t name_length, unsigned *count,
diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index 5c3f1d12bbc..3afe5b52a91 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -2155,6 +2155,109 @@ link_cs_input_layout_qualifiers(struct 
gl_shader_program *prog,
 }
  }
  
+/**

+ * Get the varying type stripped of the outermost array if we're processing
+ * a stage whose varyings are arrays indexed by a vertex number (such as
+ * geometry shader inputs).
+ */
+const glsl_type *
+get_varying_type(const ir_variable *var, gl_shader_stage stage)
+{
+   const glsl_type *type = var->type;
+
+   if (!var->data.patch &&
+   ((var->data.mode == ir_var_shader_out &&
+ stage == MESA_SHADER_TESS_CTRL) ||
+(var->data.mode == ir_var_shader_in &&
+ (stage == MESA_SHADER_TESS_CTRL || stage == MESA_SHADER_TESS_EVAL ||
+  stage == MESA_SHADER_GEOMETRY {
+  assert(type->is_array());
+  type = type->fields.array;
+   }
+
+   return type;
+}
+
+static void
+validate_intrastage_location_types(struct gl_shader_program *prog,
+   struct gl_linked_shader *shader)
+{
+   struct data_type {
+  // 0: unused, 1: float, 2: int, 3: 64-bit
+  unsigned var_type:2;
+  unsigned interpolation:2;
+  bool centroid:1;
+  bool sample:1;
+  bool patch:1;
+   } data_types[2][MAX_VARYING] = {};
+
+   foreach_in_list(ir_instruction, node, shader->ir) {
+  ir_variable *var = node->as_variable();
+  if (!var || !var->data.explicit_location)
+ continue;
+
+  if (var->data.mode != ir_var_shader_in &&
+  var->data.mode != ir_var_shader_out)
+ continue;
+
+  bool output = var->data.mode == ir_var_shader_out;
+  int var_slot;
+  if (!output && shader->Stage == MESA_SHADER_VERTEX) {
+ var_slot = var->data.location - VERT_ATTRIB_GENERIC0;
+ if (var_slot >= VERT_ATTRIB_GENERIC_MAX)
+continue;
+  } else if (var->data.patch) {
+ var_slot = var->data.location - VARYING_SLOT_PATCH0;
+ if (var_slot >= MAX_VARYING)
+continue;
+  } else {
+ var_slot = var->data.location - VARYING_SLOT_VAR0;
+ if (var_slot >= MAX_VARYING)
+continue;
+  

[Mesa-dev] [PATCH] glsl: disallow mixed varying types within a location

2017-09-05 Thread Ilia Mirkin
The enhanced layouts spec has all kinds of restrictions about what can
and cannot be mixed in a location. Integer/float(/presumably double)
can't occupy a single location, interpolation has to be the same, as
well as auxiliary storage (including patch!).

The implication of this is ... don't specify explicit locations/components
if you want better packing, since the auto-packer doesn't care at all
about most of these restrictions. Sad.

This fixes:

KHR-GL45.enhanced_layouts.varying_location_aliasing_with_mixed_types
KHR-GL45.enhanced_layouts.varying_location_aliasing_with_mixed_interpolation
KHR-GL45.enhanced_layouts.varying_location_aliasing_with_mixed_auxiliary_storage

See https://github.com/KhronosGroup/OpenGL-API/issues/13 for some more
info, where I asked about patch vs non-patch locations.

Signed-off-by: Ilia Mirkin 
---
 src/compiler/glsl/link_varyings.cpp |  23 
 src/compiler/glsl/linker.cpp| 115 
 src/compiler/glsl/linker.h  |   4 ++
 3 files changed, 119 insertions(+), 23 deletions(-)

diff --git a/src/compiler/glsl/link_varyings.cpp 
b/src/compiler/glsl/link_varyings.cpp
index 528506fd0eb..20187166203 100644
--- a/src/compiler/glsl/link_varyings.cpp
+++ b/src/compiler/glsl/link_varyings.cpp
@@ -40,29 +40,6 @@
 #include "program.h"
 
 
-/**
- * Get the varying type stripped of the outermost array if we're processing
- * a stage whose varyings are arrays indexed by a vertex number (such as
- * geometry shader inputs).
- */
-static const glsl_type *
-get_varying_type(const ir_variable *var, gl_shader_stage stage)
-{
-   const glsl_type *type = var->type;
-
-   if (!var->data.patch &&
-   ((var->data.mode == ir_var_shader_out &&
- stage == MESA_SHADER_TESS_CTRL) ||
-(var->data.mode == ir_var_shader_in &&
- (stage == MESA_SHADER_TESS_CTRL || stage == MESA_SHADER_TESS_EVAL ||
-  stage == MESA_SHADER_GEOMETRY {
-  assert(type->is_array());
-  type = type->fields.array;
-   }
-
-   return type;
-}
-
 static void
 create_xfb_varying_names(void *mem_ctx, const glsl_type *t, char **name,
  size_t name_length, unsigned *count,
diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
index 5c3f1d12bbc..3afe5b52a91 100644
--- a/src/compiler/glsl/linker.cpp
+++ b/src/compiler/glsl/linker.cpp
@@ -2155,6 +2155,109 @@ link_cs_input_layout_qualifiers(struct 
gl_shader_program *prog,
}
 }
 
+/**
+ * Get the varying type stripped of the outermost array if we're processing
+ * a stage whose varyings are arrays indexed by a vertex number (such as
+ * geometry shader inputs).
+ */
+const glsl_type *
+get_varying_type(const ir_variable *var, gl_shader_stage stage)
+{
+   const glsl_type *type = var->type;
+
+   if (!var->data.patch &&
+   ((var->data.mode == ir_var_shader_out &&
+ stage == MESA_SHADER_TESS_CTRL) ||
+(var->data.mode == ir_var_shader_in &&
+ (stage == MESA_SHADER_TESS_CTRL || stage == MESA_SHADER_TESS_EVAL ||
+  stage == MESA_SHADER_GEOMETRY {
+  assert(type->is_array());
+  type = type->fields.array;
+   }
+
+   return type;
+}
+
+static void
+validate_intrastage_location_types(struct gl_shader_program *prog,
+   struct gl_linked_shader *shader)
+{
+   struct data_type {
+  // 0: unused, 1: float, 2: int, 3: 64-bit
+  unsigned var_type:2;
+  unsigned interpolation:2;
+  bool centroid:1;
+  bool sample:1;
+  bool patch:1;
+   } data_types[2][MAX_VARYING] = {};
+
+   foreach_in_list(ir_instruction, node, shader->ir) {
+  ir_variable *var = node->as_variable();
+  if (!var || !var->data.explicit_location)
+ continue;
+
+  if (var->data.mode != ir_var_shader_in &&
+  var->data.mode != ir_var_shader_out)
+ continue;
+
+  bool output = var->data.mode == ir_var_shader_out;
+  int var_slot;
+  if (!output && shader->Stage == MESA_SHADER_VERTEX) {
+ var_slot = var->data.location - VERT_ATTRIB_GENERIC0;
+ if (var_slot >= VERT_ATTRIB_GENERIC_MAX)
+continue;
+  } else if (var->data.patch) {
+ var_slot = var->data.location - VARYING_SLOT_PATCH0;
+ if (var_slot >= MAX_VARYING)
+continue;
+  } else {
+ var_slot = var->data.location - VARYING_SLOT_VAR0;
+ if (var_slot >= MAX_VARYING)
+continue;
+  }
+
+  if (var_slot < 0)
+ continue;
+
+  const glsl_type *type = get_varying_type(var, shader->Stage);
+  const glsl_type *type_without_array = type->without_array();
+  unsigned num_elements = type->count_attribute_slots(false);
+  unsigned var_type;
+  if (glsl_base_type_is_64bit(type_without_array->base_type))
+ var_type = 3;
+  else if (glsl_base_type_is_integer(type_without_array->base_type))
+ var_type = 2;
+  else
+ var_type = 1;
+
+  

Re: [Mesa-dev] [PATCH 4/9] radv: store the shader binary into radv_shader_variant

2017-09-05 Thread Timothy Arceri



On 06/09/17 05:17, Samuel Pitoiset wrote:

This will allow to dump the active shaders when a hang is


This will allow us to

Otherwise 1-6, 8-9:

Reviewed-by: Timothy Arceri 

I'll let Dave or Bas comment on the other two.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] spirv: Add vtn_fail and vtn_assert helpers

2017-09-05 Thread Jason Ekstrand
On Tue, Sep 5, 2017 at 5:32 PM, Ian Romanick  wrote:

> On 08/17/2017 10:22 AM, Jason Ekstrand wrote:
> > These helpers are much nicer than just using assert because they don't
> > kill your process.  Instead, it longjmps back to spirv_to_nir(), cleans
> > up all the temporary memory, and nicely returns NULL.  While crashing is
> > completely OK in the Vulkan world, it's not considered to be quite so
> > nice in GL.  This should help us to make SPIR-V parsing much more
> > robust.  The one downside here is that vtn_assert is not compiled out in
> > release builds like assert() is so it isn't free.
> > ---
> >  src/compiler/spirv/spirv_to_nir.c | 20 
> >  src/compiler/spirv/vtn_private.h  | 31 +++
> >  2 files changed, 51 insertions(+)
> >
> > diff --git a/src/compiler/spirv/spirv_to_nir.c
> b/src/compiler/spirv/spirv_to_nir.c
> > index e59f2b2..af542e8 100644
> > --- a/src/compiler/spirv/spirv_to_nir.c
> > +++ b/src/compiler/spirv/spirv_to_nir.c
> > @@ -104,6 +104,20 @@ _vtn_warn(struct vtn_builder *b, const char *file,
> unsigned line,
> > va_end(args);
> >  }
> >
> > +void
> > +_vtn_fail(struct vtn_builder *b, const char *file, unsigned line,
> > +  const char *fmt, ...)
> > +{
> > +   va_list args;
> > +
> > +   va_start(args, fmt);
> > +   vtn_log_err(b, NIR_SPIRV_DEBUG_LEVEL_ERROR, "SPIR-V parsing
> FAILED:\n",
> > +   file, line, fmt, args);
> > +   va_end(args);
> > +
> > +   longjmp(b->fail_jump, 1);
> > +}
> > +
> >  struct spec_constant_value {
> > bool is_double;
> > union {
> > @@ -3420,6 +3434,12 @@ spirv_to_nir(const uint32_t *words, size_t
> word_count,
> > b->entry_point_name = entry_point_name;
> > b->ext = ext;
> >
> > +   /* See also _vtn_fail() */
> > +   if (setjmp(b->fail_jump)) {
> > +  ralloc_free(b);
> > +  return NULL;
> > +   }
> > +
> > const uint32_t *word_end = words + word_count;
> >
> > /* Handle the SPIR-V header (first 4 dwords)  */
> > diff --git a/src/compiler/spirv/vtn_private.h b/src/compiler/spirv/vtn_
> private.h
> > index 3eb601d..f640289 100644
> > --- a/src/compiler/spirv/vtn_private.h
> > +++ b/src/compiler/spirv/vtn_private.h
> > @@ -28,6 +28,8 @@
> >  #ifndef _VTN_PRIVATE_H_
> >  #define _VTN_PRIVATE_H_
> >
> > +#include 
> > +
> >  #include "nir/nir.h"
> >  #include "nir/nir_builder.h"
> >  #include "util/u_dynarray.h"
> > @@ -49,6 +51,32 @@ void _vtn_warn(struct vtn_builder *b, const char
> *file, unsigned line,
> > const char *fmt, ...) PRINTFLIKE(4, 5);
> >  #define vtn_warn(...) _vtn_warn(b, __FILE__, __LINE__, __VA_ARGS__)
> >
> > +/** Fail SPIR-V parsing
> > + *
> > + * This function logs an error and then bails out of the shader compile
> using
> > + * longjmp.  This being safe relies on two things:
> > + *
> > + *  1) We must guarantee that setjmp is called after allocating the
> builder
> > + * and setting up b->debug (so that logging works) but before
> before any
> > + * errors have a chance to occur.
> > + *
> > + *  2) While doing the SPIR-V -> NIR conversion, we need to be careful
> to
> > + * ensure that all heap allocations happen through ralloc and are
> parented
> > + * to the builder.
> > + *
> > + * So long as these two things continue to hold, we can easily longjmp
> back to
> > + * spirv_to_nir(), clean up the builder, and return NULL.
> > + */
> > +void _vtn_fail(struct vtn_builder *b, const char *file, unsigned line,
> > +   const char *fmt, ...) NORETURN PRINTFLIKE(4, 5);
> > +#define vtn_fail(...) _vtn_fail(b, __FILE__, __LINE__, __VA_ARGS__)
> > +
> > +#define vtn_assert(expr) \
> > +   do { \
> > +  if (!likely(expr)) \
> > + vtn_fail("%s", #expr); \
> > +   } while (0)
>
> I'm not a huge fan of this particular detail.  When you see "assert" in
> a name, that carries a bunch of implicit information with it.  In this
> case, that information is, by design, not true.  Primarily, it does
> happen in release builds.  It does still lead to an abrupt failure, but
> a different sort.  Maybe vtn_fail_when() would be better... the down
> side of that is all the conditions would have to inverted in the next
> patch.  Ugh.
>

Yeah, I understand both of those reservations.  The one that concerns me
the most is that it happens in debug builds; that's definitely unexpected.
As far as aborting, it does perform a full stop, it's just not quite the
same.  I'd be ok with switching over to something else.  How about
vtn_fail_if?  Or maybe we could follow the perl pattern and do
vtn_or_fail.  Thoughs?


> For that reason, it makes me uncomfortable when I see things with
> side-effects in a thing called foo_assert() (the SpvOpExtInst in the
> next patch).
>

Yeah, the side-effects are a bit desturbing.  I'm happy to change that.


> Hm... I'm not sure what to suggest, and this series has been out for a
> couple weeks.  What are your thoughts?
>

No worries.  I'm not in too 

Re: [Mesa-dev] [PATCH 0/8] swr: update rasterizer

2017-09-05 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak  

> On Sep 5, 2017, at 1:57 PM, Tim Rowley  wrote:
> 
> Highlight is starting to unify the simd/simd16 code, removing lots of
> temporary code duplication.
> 
> No piglit or vtk test regressions.
> 
> Tim Rowley (8):
>  swr/rast: Allow gather of floats from fetch shader with 2-4GB offsets
>  swr: set caps for VB 4-byte alignment
>  swr/rast: Removed some trailing whitespace caught during review
>  swr/rast: FE/Binner - unify SIMD8/16 functions using simdlib types
>  swr/rast: SIMD16 PA - rename Assemble_simd16 to Assemble
>  swr/rast: SIMD16 FE remove templated immediates workaround
>  swr/rast: Remove use of C++14 template variable
>  swr/rast: FE/Clipper - unify SIMD8/16 functions using simdlib types
> 
> .../swr/rasterizer/codegen/gen_llvm_ir_macros.py   |1 +
> .../codegen/templates/gen_ar_eventhandlerfile.hpp  |4 +-
> src/gallium/drivers/swr/rasterizer/core/binner.cpp | 2312 ++--
> src/gallium/drivers/swr/rasterizer/core/binner.h   |  192 +-
> src/gallium/drivers/swr/rasterizer/core/clip.cpp   |   16 +-
> src/gallium/drivers/swr/rasterizer/core/clip.h | 1654 --
> .../drivers/swr/rasterizer/core/conservativeRast.h |1 +
> src/gallium/drivers/swr/rasterizer/core/fifo.hpp   |4 +-
> .../drivers/swr/rasterizer/core/frontend.cpp   |6 +-
> src/gallium/drivers/swr/rasterizer/core/pa.h   |   20 +-
> src/gallium/drivers/swr/rasterizer/core/state.h|7 +
> src/gallium/drivers/swr/rasterizer/core/utils.h|8 +
> .../drivers/swr/rasterizer/jitter/fetch_jit.cpp|7 +-
> src/gallium/drivers/swr/swr_screen.cpp |9 +-
> 14 files changed, 1193 insertions(+), 3048 deletions(-)
> 
> -- 
> 2.7.4
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: skip varyings without slot

2017-09-05 Thread Timothy Arceri



On 01/09/17 21:15, Juan A. Suarez Romero wrote:

On Thu, 2017-06-29 at 14:43 +1000, Timothy Arceri wrote:

On 27/06/17 21:20, Juan A. Suarez Romero wrote:

On Tue, 2017-06-27 at 09:29 +1000, Timothy Arceri wrote:

On 16/06/17 18:12, Juan A. Suarez Romero wrote:


Commit 00620782c9 (i965: use nir_shader_gather_info() over
do_set_program_inouts()) changed how we compute the outputs written.

In the previous version it was using the IR declared outputs, while in
the new one it uses NIR to parse the instructions that write outputs.

Thus, if the shader has declared some output that is not written later
in the code, like this:

~~~
struct S {
   vec4 a;
   vec4 b;
   vec4 c;
};

layout (xfb_offset = sizeof_type) out S s;

void main()
{

   s.a = vec4(1.0, 0.0, 0.0, 1.0);
   s.c = vec4(0.0, 1.0, 0.0, 1.0);
}
~~~

The former version computing 3 outputs written (s.a, s.b and s.c), while
the new version only counts 2 (s.a and s.c).

This means that with the new version, then could be varyings in the VUE
map that do not have an slot assigned (s.b), that must be skipped.

This fixes KHR-GL45.enhanced_layouts.xfb_capture_struct.
---
src/mesa/drivers/dri/i965/genX_state_upload.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
b/src/mesa/drivers/dri/i965/genX_state_upload.c
index a5ad2ca..573f0e3 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -3102,9 +3102,10 @@ genX(upload_3dstate_so_decl_list)(struct brw_context 
*brw,
  const unsigned stream_id = output->StreamId;
  assert(stream_id < MAX_VERTEX_STREAMS);

-  buffer_mask[stream_id] |= 1 << buffer;

+  if (vue_map->varying_to_slot[varying] == -1)
+ continue;

-  assert(vue_map->varying_to_slot[varying] >= 0);

+  buffer_mask[stream_id] |= 1 << buffer;



My feeling is we should try to avoid adding it to the VUE map in the
first place rather than trying to work around it.



It isn't in the VUE map. That's the reason to skip it.

Maybe you mean not adding it in the linked_xfb_info?


oh, right. I had it the wrong way around in my head.

I think the problem is we setup xfb in the glsl linker but then run all
the NIR optimisation before calling nir_shader_gather_info().

However I'm not sure removing the assert is the best idea, as it could
result in real issues being hidden.

Ideally we would run the NIR opts before we do the final linking in GLSL
IR. I've outlined how this can be done in past emails (which I can't
seem to find), but its a lot of work. Nicolai's spirv might make is
easier to do, but there will still be things like a nir varying packing
pass required which I believe will be outside of what Nicolai needs for
his changes.

For now I believe this issue only impacts debug builds so I'm not sure
removing the assert and silently skipping is a good idea.

I'll let others comment further.




After couple of months, didn't get any other feedback.

Should this be R-b?


No.


As said, it is fixing a crash when running a CTS
test.


As I've said above the crash is unfortunate but it's a false positive 
that doesn't impact the release build what so ever (please correct me if 
this is wrong). The assert will however catch real issues as well that 
your patch would just ignore so I'd rather just leave it as is.


If we really want to fix this then we should run the NIR opts before we 
do the final linking in GLSL IR as described above, however there are 
some significant outstanding tasks that need to be completed before that 
can be done. For example we need a nir varying packing pass.





J.A.



J.A.




Is it not possible to do that instead?



  /* Mesa doesn't store entries for gl_SkipComponents in the Outputs[]
   * array.  Instead, it simply increments DstOffset for the following







___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] spirv: Add vtn_fail and vtn_assert helpers

2017-09-05 Thread Ian Romanick
On 08/17/2017 10:22 AM, Jason Ekstrand wrote:
> These helpers are much nicer than just using assert because they don't
> kill your process.  Instead, it longjmps back to spirv_to_nir(), cleans
> up all the temporary memory, and nicely returns NULL.  While crashing is
> completely OK in the Vulkan world, it's not considered to be quite so
> nice in GL.  This should help us to make SPIR-V parsing much more
> robust.  The one downside here is that vtn_assert is not compiled out in
> release builds like assert() is so it isn't free.
> ---
>  src/compiler/spirv/spirv_to_nir.c | 20 
>  src/compiler/spirv/vtn_private.h  | 31 +++
>  2 files changed, 51 insertions(+)
> 
> diff --git a/src/compiler/spirv/spirv_to_nir.c 
> b/src/compiler/spirv/spirv_to_nir.c
> index e59f2b2..af542e8 100644
> --- a/src/compiler/spirv/spirv_to_nir.c
> +++ b/src/compiler/spirv/spirv_to_nir.c
> @@ -104,6 +104,20 @@ _vtn_warn(struct vtn_builder *b, const char *file, 
> unsigned line,
> va_end(args);
>  }
>  
> +void
> +_vtn_fail(struct vtn_builder *b, const char *file, unsigned line,
> +  const char *fmt, ...)
> +{
> +   va_list args;
> +
> +   va_start(args, fmt);
> +   vtn_log_err(b, NIR_SPIRV_DEBUG_LEVEL_ERROR, "SPIR-V parsing FAILED:\n",
> +   file, line, fmt, args);
> +   va_end(args);
> +
> +   longjmp(b->fail_jump, 1);
> +}
> +
>  struct spec_constant_value {
> bool is_double;
> union {
> @@ -3420,6 +3434,12 @@ spirv_to_nir(const uint32_t *words, size_t word_count,
> b->entry_point_name = entry_point_name;
> b->ext = ext;
>  
> +   /* See also _vtn_fail() */
> +   if (setjmp(b->fail_jump)) {
> +  ralloc_free(b);
> +  return NULL;
> +   }
> +
> const uint32_t *word_end = words + word_count;
>  
> /* Handle the SPIR-V header (first 4 dwords)  */
> diff --git a/src/compiler/spirv/vtn_private.h 
> b/src/compiler/spirv/vtn_private.h
> index 3eb601d..f640289 100644
> --- a/src/compiler/spirv/vtn_private.h
> +++ b/src/compiler/spirv/vtn_private.h
> @@ -28,6 +28,8 @@
>  #ifndef _VTN_PRIVATE_H_
>  #define _VTN_PRIVATE_H_
>  
> +#include 
> +
>  #include "nir/nir.h"
>  #include "nir/nir_builder.h"
>  #include "util/u_dynarray.h"
> @@ -49,6 +51,32 @@ void _vtn_warn(struct vtn_builder *b, const char *file, 
> unsigned line,
> const char *fmt, ...) PRINTFLIKE(4, 5);
>  #define vtn_warn(...) _vtn_warn(b, __FILE__, __LINE__, __VA_ARGS__)
>  
> +/** Fail SPIR-V parsing
> + *
> + * This function logs an error and then bails out of the shader compile using
> + * longjmp.  This being safe relies on two things:
> + *
> + *  1) We must guarantee that setjmp is called after allocating the builder
> + * and setting up b->debug (so that logging works) but before before any
> + * errors have a chance to occur.
> + *
> + *  2) While doing the SPIR-V -> NIR conversion, we need to be careful to
> + * ensure that all heap allocations happen through ralloc and are 
> parented
> + * to the builder.
> + *
> + * So long as these two things continue to hold, we can easily longjmp back 
> to
> + * spirv_to_nir(), clean up the builder, and return NULL.
> + */
> +void _vtn_fail(struct vtn_builder *b, const char *file, unsigned line,
> +   const char *fmt, ...) NORETURN PRINTFLIKE(4, 5);
> +#define vtn_fail(...) _vtn_fail(b, __FILE__, __LINE__, __VA_ARGS__)
> +
> +#define vtn_assert(expr) \
> +   do { \
> +  if (!likely(expr)) \
> + vtn_fail("%s", #expr); \
> +   } while (0)

I'm not a huge fan of this particular detail.  When you see "assert" in
a name, that carries a bunch of implicit information with it.  In this
case, that information is, by design, not true.  Primarily, it does
happen in release builds.  It does still lead to an abrupt failure, but
a different sort.  Maybe vtn_fail_when() would be better... the down
side of that is all the conditions would have to inverted in the next
patch.  Ugh.

For that reason, it makes me uncomfortable when I see things with
side-effects in a thing called foo_assert() (the SpvOpExtInst in the
next patch).

Hm... I'm not sure what to suggest, and this series has been out for a
couple weeks.  What are your thoughts?

> +
>  enum vtn_value_type {
> vtn_value_type_invalid = 0,
> vtn_value_type_undef,
> @@ -474,6 +502,9 @@ struct vtn_decoration {
>  struct vtn_builder {
> nir_builder nb;
>  
> +   /* Used by vtn_fail to jump back to the beginning of SPIR-V compilation */
> +   jmp_buf fail_jump;
> +
> const uint32_t *spirv;
>  
> nir_shader *shader;
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] intel: Remove unused Kabylake pci ids

2017-09-05 Thread Matt Turner
The series is

Reviewed-by: Matt Turner 

I think It should be tagged for the stable branch as well. Does anyone
else have an opinion?

I tested a KBL-R system (the 0x5917 PCI ID) with it set as a GT1.5 and
a GT2 and in both cases is passed piglit.

Are you planning to send patches for the kernel and libdrm? If not, I
can handle that.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/17] i965: Make BLORP properly avoid batch wrapping.

2017-09-05 Thread Kenneth Graunke
We need to set brw->no_batch_wrap to actually avoid flushing in the
middle of our BLORP operation, and instead grow the batchbuffer.
---
 src/mesa/drivers/dri/i965/genX_blorp_exec.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exec.c 
b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
index feb87923ccb..5bff7eaff59 100644
--- a/src/mesa/drivers/dri/i965/genX_blorp_exec.c
+++ b/src/mesa/drivers/dri/i965/genX_blorp_exec.c
@@ -224,9 +224,7 @@ genX(blorp_exec)(struct blorp_batch *batch,
 retry:
intel_batchbuffer_require_space(brw, estimated_max_batch_usage, 
RENDER_RING);
intel_batchbuffer_save_state(brw);
-   struct brw_bo *saved_bo = brw->batch.bo;
-   uint32_t saved_used = USED_BATCH(brw->batch);
-   uint32_t saved_state_used = brw->batch.state_used;
+   brw->no_batch_wrap = true;
 
 #if GEN_GEN == 6
/* Emit workaround flushes when we switch from drawing to blorping. */
@@ -254,17 +252,7 @@ retry:
 
blorp_exec(batch, params);
 
-   /* Make sure we didn't wrap the batch unintentionally, and make sure we
-* reserved enough space that a wrap will never happen.
-*/
-   assert(brw->batch.bo == saved_bo);
-   assert((USED_BATCH(brw->batch) - saved_used) * 4 +
-  (brw->batch.state_used - saved_state_used) <
-  estimated_max_batch_usage);
-   /* Shut up compiler warnings on release build */
-   (void)saved_bo;
-   (void)saved_used;
-   (void)saved_state_used;
+   brw->no_batch_wrap = false;
 
/* Check if the blorp op we just did would make our batch likely to fail to
 * map all the BOs into the GPU at batch exec time later.  If so, flush the
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/17] i965: Replace exit(1) with abort() when command submission fails.

2017-09-05 Thread Kenneth Graunke
Calling exit(1) when execbuffer fails is not necessarily safe.
When running Piglit tests with a Mesa that submitted invalid commands
to the GPU, I discovered the following problem:

1. do_flush_locked fails and calls exit(1)...invoking atexit handlers.
2. Piglit tries to clean up after itself, and does eglMakeCurrent to
   release the current context.
3. MakeCurrent calls glFlush (or the internal hook for that)
4. glFlush calls intel_batchbuffer_flush

So we end up trying to flush the batch...in the middle of flushing the
batch...with code that isn't designed to be reentrant.  So it breaks
even worse than before.  In my case, it just outright crashed.

Calling abort() is probably not what we want either, but it at least
bypasses atexit() handlers, so it won't hit this ugly reentrancy
problem.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 491aa12dd63..637705226f0 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -679,7 +679,7 @@ do_flush_locked(struct brw_context *brw, int in_fence_fd, 
int *out_fence_fd)
 
if (ret != 0) {
   fprintf(stderr, "intel_do_flush_locked failed: %s\n", strerror(-ret));
-  exit(1);
+  abort();
}
 
return ret;
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 16/17] i965: Delete BATCH_RESERVED handling.

2017-09-05 Thread Kenneth Graunke
Now that we can grom the batchbuffer if we absolutely need the extra
space, we don't need to reserve space for the final do-or-die ending
commands.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 11 +++
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 26 --
 2 files changed, 3 insertions(+), 34 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 118f75c4d71..0af9101e5f4 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -168,7 +168,6 @@ intel_batchbuffer_reset(struct intel_batchbuffer *batch,
add_exec_bo(batch, batch->bo);
assert(batch->bo->index == 0);
 
-   batch->reserved_space = BATCH_RESERVED;
batch->needs_sol_reset = false;
batch->state_base_address_emitted = false;
 
@@ -318,8 +317,7 @@ intel_batchbuffer_require_space(struct brw_context *brw, 
GLuint sz,
 
/* For now, flush as if the batch and state buffers still shared a BO */
const unsigned batch_used = USED_BATCH(*batch) * 4;
-   if (batch_used + sz >=
-   BATCH_SZ - batch->reserved_space - batch->state_used) {
+   if (batch_used + sz >= BATCH_SZ - batch->state_used) {
   if (!brw->no_batch_wrap) {
  intel_batchbuffer_flush(brw);
   } else {
@@ -327,8 +325,7 @@ intel_batchbuffer_require_space(struct brw_context *brw, 
GLuint sz,
 MIN2(batch->bo->size + batch->bo->size / 2, MAX_BATCH_SIZE);
  grow_buffer(brw, >bo, >map, batch_used, new_size);
  batch->map_next = (void *) batch->map + batch_used;
- assert(batch_used + sz <
-batch->bo->size - batch->reserved_space - batch->state_used);
+ assert(batch_used + sz < batch->bo->size - batch->state_used);
   }
}
 
@@ -831,8 +828,6 @@ _intel_batchbuffer_flush_fence(struct brw_context *brw,
   bytes_for_state, 100.0f * bytes_for_state / STATE_SZ);
}
 
-   brw->batch.reserved_space = 0;
-
brw_finish_batch(brw);
 
/* Mark the end of the buffer. */
@@ -967,7 +962,7 @@ brw_state_batch(struct brw_context *brw,
uint32_t offset = ALIGN(batch->state_used, alignment);
 
/* For now, follow the old flushing behavior. */
-   int batch_space = batch->reserved_space + USED_BATCH(*batch) * 4;
+   int batch_space = USED_BATCH(*batch) * 4;
 
if (offset + size >= STATE_SZ - batch_space) {
   if (!brw->no_batch_wrap) {
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
index 8a2e3cfc9bb..c02cafed521 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h
@@ -10,32 +10,6 @@
 extern "C" {
 #endif
 
-/**
- * Number of bytes to reserve for commands necessary to complete a batch.
- *
- * This includes:
- * - MI_BATCHBUFFER_END (4 bytes)
- * - Optional MI_NOOP for ensuring the batch length is qword aligned (4 bytes)
- * - Any state emitted by vtbl->finish_batch():
- *   - Gen4-5 record ending occlusion query values (4 * 4 = 16 bytes)
- *   - Disabling OA counters on Gen6+ (3 DWords = 12 bytes)
- *   - Ending MI_REPORT_PERF_COUNT on Gen5+, plus associated PIPE_CONTROLs:
- * - Two sets of PIPE_CONTROLs, which become 4 PIPE_CONTROLs each on SNB,
- *   which are 5 DWords each ==> 2 * 4 * 5 * 4 = 160 bytes
- * - 3 DWords for MI_REPORT_PERF_COUNT itself on Gen6+.  ==> 12 bytes.
- *   On Ironlake, it's 6 DWords, but we have some slack due to the lack of
- *   Sandybridge PIPE_CONTROL madness.
- *   - CC_STATE workaround on HSW (17 * 4 = 68 bytes)
- * - 10 dwords for initial mi_flush
- * - 2 dwords for CC state setup
- * - 5 dwords for the required pipe control at the end
- *   - Restoring L3 configuration: (24 dwords = 96 bytes)
- * - 2*6 dwords for two PIPE_CONTROL flushes.
- * - 7 dwords for L3 configuration set-up.
- * - 5 dwords for L3 atomic set-up (on HSW).
- */
-#define BATCH_RESERVED 308
-
 struct intel_batchbuffer;
 
 void intel_batchbuffer_init(struct intel_screen *screen,
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/17] i965: Move brw_state_batch code to intel_batchbuffer.c

2017-09-05 Thread Kenneth Graunke
The batch buffer and state buffer code is fairly tied together,
and having it in one .c file will make refactoring easier.

Also, drop some commentary above brw_state_batch.  The "aperture
checking performance hacks" are long since gone, so that paragraph
makes little sense at this point.
---
 src/mesa/drivers/dri/i965/Makefile.sources|  1 -
 src/mesa/drivers/dri/i965/brw_state.h |  4 +-
 src/mesa/drivers/dri/i965/brw_state_batch.c   | 93 ---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 46 +
 4 files changed, 47 insertions(+), 97 deletions(-)
 delete mode 100644 src/mesa/drivers/dri/i965/brw_state_batch.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 9687eb957e1..e33dea07128 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -41,7 +41,6 @@ i965_FILES = \
brw_queryobj.c \
brw_reset.c \
brw_sf.c \
-   brw_state_batch.c \
brw_state.h \
brw_state_upload.c \
brw_structs.h \
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 1cbddaba786..c8b71e72de5 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -184,9 +184,7 @@ void brw_destroy_caches( struct brw_context *brw );
 
 void brw_print_program_cache(struct brw_context *brw);
 
-/***
- * brw_state_batch.c
- */
+/* intel_batchbuffer.c */
 void *brw_state_batch(struct brw_context *brw,
   int size, int alignment, uint32_t *out_offset);
 uint32_t brw_state_batch_size(struct brw_context *brw, uint32_t offset);
diff --git a/src/mesa/drivers/dri/i965/brw_state_batch.c 
b/src/mesa/drivers/dri/i965/brw_state_batch.c
deleted file mode 100644
index 5b6f3af93d8..000
--- a/src/mesa/drivers/dri/i965/brw_state_batch.c
+++ /dev/null
@@ -1,93 +0,0 @@
-/*
- Copyright (C) Intel Corp.  2006.  All Rights Reserved.
- Intel funded Tungsten Graphics to
- develop this 3D driver.
-
- Permission is hereby granted, free of charge, to any person obtaining
- a copy of this software and associated documentation files (the
- "Software"), to deal in the Software without restriction, including
- without limitation the rights to use, copy, modify, merge, publish,
- distribute, sublicense, and/or sell copies of the Software, and to
- permit persons to whom the Software is furnished to do so, subject to
- the following conditions:
-
- The above copyright notice and this permission notice (including the
- next paragraph) shall be included in all copies or substantial
- portions of the Software.
-
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
- IN NO EVENT SHALL THE COPYRIGHT OWNER(S) AND/OR ITS SUPPLIERS BE
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
- **/
- /*
-  * Authors:
-  *   Keith Whitwell 
-  */
-
-#include "brw_state.h"
-#include "intel_batchbuffer.h"
-#include "main/imports.h"
-#include "util/hash_table.h"
-#include "util/ralloc.h"
-
-uint32_t
-brw_state_batch_size(struct brw_context *brw, uint32_t offset)
-{
-   struct hash_entry *entry =
-  _mesa_hash_table_search(brw->batch.state_batch_sizes,
-  (void *) (uintptr_t) offset);
-   return entry ? (uintptr_t) entry->data : 0;
-}
-
-/**
- * Allocates a block of space in the batchbuffer for indirect state.
- *
- * We don't want to allocate separate BOs for every bit of indirect
- * state in the driver.  It means overallocating by a significant
- * margin (4096 bytes, even if the object is just a 20-byte surface
- * state), and more buffers to walk and count for aperture size checking.
- *
- * However, due to the restrictions imposed by the aperture size
- * checking performance hacks, we can't have the batch point at a
- * separate indirect state buffer, because once the batch points at
- * it, no more relocations can be added to it.  So, we sneak these
- * buffers in at the top of the batchbuffer.
- */
-void *
-brw_state_batch(struct brw_context *brw,
-int size,
-int alignment,
-uint32_t *out_offset)
-{
-   struct intel_batchbuffer *batch = >batch;
-   uint32_t offset;
-
-   assert(size < batch->bo->size);
-   offset = ROUND_DOWN_TO(batch->state_batch_offset - size, alignment);
-
-   /* If allocating from the top would wrap below the batchbuffer, or
-* if the batch's used space (plus the reserved pad) collides with our
-* space, then flush 

[Mesa-dev] [PATCH 11/17] i965: Prepare INTEL_DEBUG=bat decoding for a separate statebuffer.

2017-09-05 Thread Kenneth Graunke
We'll need to read from both buffers when decoding state.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 102 +-
 1 file changed, 52 insertions(+), 50 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 4ed76c4c40f..491aa12dd63 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -288,21 +288,23 @@ do_batch_dump(struct brw_context *brw)
if (batch->ring != RENDER_RING)
   return;
 
-   uint32_t *data = brw_bo_map(brw, batch->bo, MAP_READ);
-   if (data == NULL) {
-  fprintf(stderr, "WARNING: failed to map batchbuffer\n");
+   uint32_t *batch_data = brw_bo_map(brw, batch->bo, MAP_READ);
+   uint32_t *state = batch_data;
+   if (batch == NULL || state == NULL) {
+  fprintf(stderr, "WARNING: failed to map batchbuffer/statebuffer\n");
   return;
}
 
-   uint32_t *end = data + USED_BATCH(*batch);
-   uint32_t gtt_offset = batch->bo->gtt_offset;
+   uint32_t *end = batch_data + USED_BATCH(*batch);
+   uint32_t batch_gtt_offset = batch->bo->gtt_offset;
+   uint32_t state_gtt_offset = batch->bo->gtt_offset;
int length;
 
bool color = INTEL_DEBUG & DEBUG_COLOR;
const char *header_color = color ? BLUE_HEADER : "";
const char *reset_color  = color ? NORMAL : "";
 
-   for (uint32_t *p = data; p < end; p += length) {
+   for (uint32_t *p = batch_data; p < end; p += length) {
   struct gen_group *inst = gen_spec_find_instruction(spec, p);
   length = gen_group_get_length(inst, p);
   assert(inst == NULL || length > 0);
@@ -312,7 +314,7 @@ do_batch_dump(struct brw_context *brw)
  continue;
   }
 
-  uint64_t offset = gtt_offset + 4 * (p - data);
+  uint64_t offset = batch_gtt_offset + 4 * (p - batch_data);
 
   fprintf(stderr, "%s0x%08"PRIx64":  0x%08x:  %-80s%s\n", header_color,
   offset, p[0], gen_group_get_name(inst), reset_color);
@@ -322,26 +324,26 @@ do_batch_dump(struct brw_context *brw)
   switch (gen_group_get_opcode(inst) >> 16) {
   case _3DSTATE_PIPELINED_POINTERS:
  /* Note: these Gen4-5 pointers are full relocations rather than
-  * offsets from the start of the batch.  So we need to subtract
-  * gtt_offset (the start of the batch) to obtain an offset we
+  * offsets from the start of the statebuffer.  So we need to subtract
+  * gtt_offset (the start of the statebuffer) to obtain an offset we
   * can add to the map and get at the data.
   */
- decode_struct(brw, spec, "VS_STATE", data, gtt_offset,
-   (p[1] & ~0x1fu) - gtt_offset, color);
+ decode_struct(brw, spec, "VS_STATE", state, state_gtt_offset,
+   (p[1] & ~0x1fu) - state_gtt_offset, color);
  if (p[2] & 1) {
-decode_struct(brw, spec, "GS_STATE", data, gtt_offset,
-  (p[2] & ~0x1fu) - gtt_offset, color);
+decode_struct(brw, spec, "GS_STATE", state, state_gtt_offset,
+  (p[2] & ~0x1fu) - state_gtt_offset, color);
  }
  if (p[3] & 1) {
-decode_struct(brw, spec, "CLIP_STATE", data, gtt_offset,
-  (p[3] & ~0x1fu) - gtt_offset, color);
+decode_struct(brw, spec, "CLIP_STATE", state, state_gtt_offset,
+  (p[3] & ~0x1fu) - state_gtt_offset, color);
  }
- decode_struct(brw, spec, "SF_STATE", data, gtt_offset,
-   (p[4] & ~0x1fu) - gtt_offset, color);
- decode_struct(brw, spec, "WM_STATE", data, gtt_offset,
-   (p[5] & ~0x1fu) - gtt_offset, color);
- decode_struct(brw, spec, "COLOR_CALC_STATE", data, gtt_offset,
-   (p[6] & ~0x3fu) - gtt_offset, color);
+ decode_struct(brw, spec, "SF_STATE", state, state_gtt_offset,
+   (p[4] & ~0x1fu) - state_gtt_offset, color);
+ decode_struct(brw, spec, "WM_STATE", state, state_gtt_offset,
+   (p[5] & ~0x1fu) - state_gtt_offset, color);
+ decode_struct(brw, spec, "COLOR_CALC_STATE", state, state_gtt_offset,
+   (p[6] & ~0x3fu) - state_gtt_offset, color);
  break;
   case _3DSTATE_BINDING_TABLE_POINTERS_VS:
   case _3DSTATE_BINDING_TABLE_POINTERS_HS:
@@ -355,11 +357,11 @@ do_batch_dump(struct brw_context *brw)
 
  uint32_t bt_offset = p[1] & ~0x1fu;
  int bt_entries = brw_state_batch_size(brw, bt_offset) / 4;
- uint32_t *bt_pointers = [bt_offset / 4];
+ uint32_t *bt_pointers = [bt_offset / 4];
  for (int i = 0; i < bt_entries; i++) {
 fprintf(stderr, "SURFACE_STATE - BTI = %d\n", i);
-gen_print_group(stderr, group, gtt_offset + bt_pointers[i],
-[bt_pointers[i] / 4], color);
+gen_print_group(stderr, 

[Mesa-dev] [PATCH 17/17] i965: Disentangle batch and state buffer flushing.

2017-09-05 Thread Kenneth Graunke
We now flush the batch when either the batchbuffer or statebuffer
reaches the original intended batch size, instead of when the sum of
the two reaches a certain size (which makes no sense now that they're
separate buffers).

With this change, we also need to update our "are we near the end?"
estimate to require separate batch and state buffer space.  I obtained
these estimates by looking at the size of draw calls in the Unreal 4
Elemental Demo (using INTEL_DEBUG=flush and always_flush_batch=true).

This will increase the batch size by perhaps 2-4x, which will almost
certainly have a performance impact, and may impact overall system
responsiveness.

XXX: benchmark, may need a lot of tuning.
---
 src/mesa/drivers/dri/i965/brw_compute.c   | 18 --
 src/mesa/drivers/dri/i965/brw_draw.c  | 18 --
 src/mesa/drivers/dri/i965/brw_state.h |  1 +
 src/mesa/drivers/dri/i965/genX_blorp_exec.c   |  4 ++--
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 25 +
 5 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
b/src/mesa/drivers/dri/i965/brw_compute.c
index 1bad7ac7a0c..7f0278ac92b 100644
--- a/src/mesa/drivers/dri/i965/brw_compute.c
+++ b/src/mesa/drivers/dri/i965/brw_compute.c
@@ -167,7 +167,6 @@ static void
 brw_dispatch_compute_common(struct gl_context *ctx)
 {
struct brw_context *brw = brw_context(ctx);
-   int estimated_buffer_space_needed;
bool fail_next = false;
 
if (!_mesa_check_conditional_render(ctx))
@@ -180,20 +179,11 @@ brw_dispatch_compute_common(struct gl_context *ctx)
 
brw_predraw_resolve_inputs(brw);
 
-   const int sampler_state_size = 16; /* 16 bytes */
-   estimated_buffer_space_needed = 512; /* batchbuffer commands */
-   estimated_buffer_space_needed += (BRW_MAX_TEX_UNIT *
- (sampler_state_size +
-  sizeof(struct 
gen5_sampler_default_color)));
-   estimated_buffer_space_needed += 1024; /* push constants */
-   estimated_buffer_space_needed += 512; /* misc. pad */
-
-   /* Flush the batch if it's approaching full, so that we don't wrap while
-* we've got validated state that needs to be in the same batch as the
-* primitives.
+   /* Flush the batch if the batch/state buffers are nearly full.  We can
+* grow them if needed, but this is not free, so we'd like to avoid it.
 */
-   intel_batchbuffer_require_space(brw, estimated_buffer_space_needed,
-   RENDER_RING);
+   intel_batchbuffer_require_space(brw, 600, RENDER_RING);
+   brw_require_statebuffer_space(brw, 2500);
intel_batchbuffer_save_state(brw);
 
  retry:
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index d1ec2e3f09d..06c6ed72c98 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -669,26 +669,16 @@ brw_try_draw_prims(struct gl_context *ctx,
brw->ctx.NewDriverState |= BRW_NEW_VERTICES;
 
for (i = 0; i < nr_prims; i++) {
-  int estimated_max_prim_size;
-  const int sampler_state_size = 16;
-
-  estimated_max_prim_size = 512; /* batchbuffer commands */
-  estimated_max_prim_size += BRW_MAX_TEX_UNIT *
- (sampler_state_size + sizeof(struct gen5_sampler_default_color));
-  estimated_max_prim_size += 1024; /* gen6 VS push constants */
-  estimated_max_prim_size += 1024; /* gen6 WM push constants */
-  estimated_max_prim_size += 512; /* misc. pad */
-
   /* Flag BRW_NEW_DRAW_CALL on every draw.  This allows us to have
* atoms that happen on every draw call.
*/
   brw->ctx.NewDriverState |= BRW_NEW_DRAW_CALL;
 
-  /* Flush the batch if it's approaching full, so that we don't wrap while
-   * we've got validated state that needs to be in the same batch as the
-   * primitives.
+  /* Flush the batch if the batch/state buffers are nearly full.  We can
+   * grow them if needed, but this is not free, so we'd like to avoid it.
*/
-  intel_batchbuffer_require_space(brw, estimated_max_prim_size, 
RENDER_RING);
+  intel_batchbuffer_require_space(brw, 1500, RENDER_RING);
+  brw_require_statebuffer_space(brw, 2400);
   intel_batchbuffer_save_state(brw);
 
   if (brw->num_instances != prims[i].num_instances ||
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index c8b71e72de5..9718739dea9 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -185,6 +185,7 @@ void brw_destroy_caches( struct brw_context *brw );
 void brw_print_program_cache(struct brw_context *brw);
 
 /* intel_batchbuffer.c */
+void brw_require_statebuffer_space(struct brw_context *brw, int size);
 void *brw_state_batch(struct brw_context *brw,
   int size, int alignment, uint32_t *out_offset);
 uint32_t 

[Mesa-dev] [PATCH 09/17] i965: Refactor relocs into a brw_reloc_list structure.

2017-09-05 Thread Kenneth Graunke
I'm planning on splitting batch and state into separate buffers, at
which point we'll need two relocation lists.  In preparation for that,
this patch refactors the relocation stuff into a structure we can
replicate...which looks a lot like anv_reloc_list.
---
 src/mesa/drivers/dri/i965/brw_context.h   | 12 ++---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 39 ---
 2 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index b3a8fa01aff..09fb66699fc 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -436,6 +436,12 @@ enum brw_gpu_ring {
BLT_RING,
 };
 
+struct brw_reloc_list {
+   struct drm_i915_gem_relocation_entry *relocs;
+   int reloc_count;
+   int reloc_array_size;
+};
+
 struct intel_batchbuffer {
/** Current batchbuffer being queued up. */
struct brw_bo *bo;
@@ -455,9 +461,7 @@ struct intel_batchbuffer {
bool needs_sol_reset;
bool state_base_address_emitted;
 
-   struct drm_i915_gem_relocation_entry *relocs;
-   int reloc_count;
-   int reloc_array_size;
+   struct brw_reloc_list batch_relocs;
unsigned int valid_reloc_flags;
 
/** The validation list */
@@ -471,7 +475,7 @@ struct intel_batchbuffer {
 
struct {
   uint32_t *map_next;
-  int reloc_count;
+  int batch_reloc_count;
   int exec_count;
} saved;
 
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 1f6a43e406d..8ada0bcdc9b 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -58,6 +58,15 @@ uint_key_hash(const void *key)
return (uintptr_t) key;
 }
 
+static void
+init_reloc_list(struct brw_reloc_list *rlist, int count)
+{
+   rlist->reloc_count = 0;
+   rlist->reloc_array_size = count;
+   rlist->relocs = malloc(rlist->reloc_array_size *
+  sizeof(struct drm_i915_gem_relocation_entry));
+}
+
 void
 intel_batchbuffer_init(struct intel_screen *screen,
struct intel_batchbuffer *batch)
@@ -65,10 +74,8 @@ intel_batchbuffer_init(struct intel_screen *screen,
struct brw_bufmgr *bufmgr = screen->bufmgr;
const struct gen_device_info *devinfo = >devinfo;
 
-   batch->reloc_count = 0;
-   batch->reloc_array_size = 250;
-   batch->relocs = malloc(batch->reloc_array_size *
-  sizeof(struct drm_i915_gem_relocation_entry));
+   init_reloc_list(>batch_relocs, 250);
+
batch->exec_count = 0;
batch->exec_array_size = 100;
batch->exec_bos =
@@ -177,7 +184,7 @@ void
 intel_batchbuffer_save_state(struct brw_context *brw)
 {
brw->batch.saved.map_next = brw->batch.map_next;
-   brw->batch.saved.reloc_count = brw->batch.reloc_count;
+   brw->batch.saved.batch_reloc_count = brw->batch.batch_relocs.reloc_count;
brw->batch.saved.exec_count = brw->batch.exec_count;
 }
 
@@ -188,7 +195,7 @@ intel_batchbuffer_reset_to_saved(struct brw_context *brw)
 i < brw->batch.exec_count; i++) {
   brw_bo_unreference(brw->batch.exec_bos[i]);
}
-   brw->batch.reloc_count = brw->batch.saved.reloc_count;
+   brw->batch.batch_relocs.reloc_count = brw->batch.saved.batch_reloc_count;
brw->batch.exec_count = brw->batch.saved.exec_count;
 
brw->batch.map_next = brw->batch.saved.map_next;
@@ -202,7 +209,7 @@ intel_batchbuffer_free(struct intel_batchbuffer *batch)
for (int i = 0; i < batch->exec_count; i++) {
   brw_bo_unreference(batch->exec_bos[i]);
}
-   free(batch->relocs);
+   free(batch->batch_relocs.relocs);
free(batch->exec_bos);
free(batch->validation_list);
 
@@ -426,7 +433,7 @@ brw_new_batch(struct brw_context *brw)
   brw_bo_unreference(brw->batch.exec_bos[i]);
   brw->batch.exec_bos[i] = NULL;
}
-   brw->batch.reloc_count = 0;
+   brw->batch.batch_relocs.reloc_count = 0;
brw->batch.exec_count = 0;
brw->batch.aperture_space = 0;
 
@@ -640,8 +647,8 @@ do_flush_locked(struct brw_context *brw, int in_fence_fd, 
int *out_fence_fd)
 
   struct drm_i915_gem_exec_object2 *entry = >validation_list[0];
   assert(entry->handle == batch->bo->gem_handle);
-  entry->relocation_count = batch->reloc_count;
-  entry->relocs_ptr = (uintptr_t) batch->relocs;
+  entry->relocation_count = batch->batch_relocs.reloc_count;
+  entry->relocs_ptr = (uintptr_t) batch->batch_relocs.relocs;
 
   if (batch->use_batch_first) {
  flags |= I915_EXEC_BATCH_FIRST | I915_EXEC_HANDLE_LUT;
@@ -766,12 +773,14 @@ brw_emit_reloc(struct intel_batchbuffer *batch, uint32_t 
batch_offset,
struct brw_bo *target, uint32_t target_offset,
unsigned int reloc_flags)
 {
+   struct brw_reloc_list *rlist = >batch_relocs;
+
assert(target != NULL);
 
-   if (batch->reloc_count == batch->reloc_array_size) {
-  batch->reloc_array_size *= 2;
-  

[Mesa-dev] [PATCH 13/17] i965: Use a separate state buffer, but avoid changing flushing behavior.

2017-09-05 Thread Kenneth Graunke
Previously, we emitted GPU commands and indirect state into the same
buffer, using a stack/heap like system where we filled in commands from
the start of the buffer, and state from the end of the buffer.  We then
flushed before the two met in the middle.

Meeting in the middle is fatal, so you have to be certain that you
reserve the correct amount of space before emitting commands or state
for a draw.  Currently, we will assert !no_batch_wrap and die if the
estimate is ever too small.  This has been mercifully obscure, but has
happened on a number of occasions, and could in theory happen to any
application that issues a large draw at just the wrong time.

Estimating the amount of batch space required is painful - it's hard to
get right, and getting it right involves a lot of code that would burn
CPU time, and also be painful to maintain.  Rolling back to a saved
state and retrying is also painful - failing to save/restore all the
required state will break things, and redoing state emission burns a
lot of CPU.  memcpy'ing to a new batch and continuing is painful,
because commands we issue for a draw depend on earlier commands as well
(such as STATE_BASE_ADDRESS, or the GPU being in a pirtacular state).

The best plan is to never run out of space, which is totally doable but
pretty wasteful - a pessimal draw requires a huge amount of space, and
rarely occurs.  Instead, we'd like to grow the batch buffer if we need
more space and can't safely flush.

We can't grow with a meet in the middle approach - we'd have to move the
state to the end, which would mean updating every offset from dynamic
state base address.  Using separate batch and state buffers, where both
fill starting at the beginning, makes it easy to grow either as needed.

This patch separates the two concepts.  We create a separate state
buffer, with a second relocation list, and use that for brw_state_batch.

However, this patch tries to retain the original flushing behavior - it
adds the amount of batch and state space together, as if they were still
co-existing in a single buffer.  The hope is to flush at the same time
as before.  This is necessary to avoid provoking bugs caused by broken
batch wrap handling (which we'll fix shortly).  It also avoids suddenly
increasing the size of the batch (due to state not taking up space),
which could have a significant performance impact.  We'll tune it later.
---
 src/mesa/drivers/dri/i965/brw_context.h   |  7 ++-
 src/mesa/drivers/dri/i965/brw_misc_state.c| 26 +-
 src/mesa/drivers/dri/i965/gen4_blorp_exec.h   |  2 +-
 src/mesa/drivers/dri/i965/genX_blorp_exec.c   | 22 +---
 src/mesa/drivers/dri/i965/genX_state_upload.c | 31 +++
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 75 ++-
 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 23 +++-
 7 files changed, 115 insertions(+), 71 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 09fb66699fc..07676dc7b7f 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -447,6 +447,8 @@ struct intel_batchbuffer {
struct brw_bo *bo;
/** Last BO submitted to the hardware.  Used for glFinish(). */
struct brw_bo *last_bo;
+   /** Current statebuffer being queued up. */
+   struct brw_bo *state_bo;
 
 #ifdef DEBUG
uint16_t emit, total;
@@ -454,14 +456,16 @@ struct intel_batchbuffer {
uint16_t reserved_space;
uint32_t *map_next;
uint32_t *map;
+   uint32_t *state_map;
+   uint32_t state_used;
 
-   uint32_t state_batch_offset;
enum brw_gpu_ring ring;
bool use_batch_first;
bool needs_sol_reset;
bool state_base_address_emitted;
 
struct brw_reloc_list batch_relocs;
+   struct brw_reloc_list state_relocs;
unsigned int valid_reloc_flags;
 
/** The validation list */
@@ -476,6 +480,7 @@ struct intel_batchbuffer {
struct {
   uint32_t *map_next;
   int batch_reloc_count;
+  int state_reloc_count;
   int exec_count;
} saved;
 
diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c 
b/src/mesa/drivers/dri/i965/brw_misc_state.c
index 9b8ae70f103..53137cc4524 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -65,15 +65,15 @@ upload_pipelined_state_pointers(struct brw_context *brw)
 
BEGIN_BATCH(7);
OUT_BATCH(_3DSTATE_PIPELINED_POINTERS << 16 | (7 - 2));
-   OUT_RELOC(brw->batch.bo, 0, brw->vs.base.state_offset);
+   OUT_RELOC(brw->batch.state_bo, 0, brw->vs.base.state_offset);
if (brw->ff_gs.prog_active)
-  OUT_RELOC(brw->batch.bo, 0, brw->ff_gs.state_offset | 1);
+  OUT_RELOC(brw->batch.state_bo, 0, brw->ff_gs.state_offset | 1);
else
   OUT_BATCH(0);
-   OUT_RELOC(brw->batch.bo, 0, brw->clip.state_offset | 1);
-   OUT_RELOC(brw->batch.bo, 0, brw->sf.state_offset);
-   OUT_RELOC(brw->batch.bo, 0, brw->wm.base.state_offset);
-   OUT_RELOC(brw->batch.bo, 0, 

[Mesa-dev] [PATCH 10/17] i965: Split brw_emit_reloc into brw_batch_reloc and brw_state_reloc.

2017-09-05 Thread Kenneth Graunke
brw_batch_reloc emits a relocation from the batchbuffer to elsewhere.
brw_state_reloc emits a relocation from the statebuffer to elsewhere.

For now, they do the same thing, but when we actually split the two
buffers, we'll change brw_state_reloc to use the state buffer.
---
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 46 
 src/mesa/drivers/dri/i965/genX_blorp_exec.c  | 10 +++---
 src/mesa/drivers/dri/i965/genX_state_upload.c|  7 ++--
 src/mesa/drivers/dri/i965/intel_batchbuffer.c| 39 ++--
 src/mesa/drivers/dri/i965/intel_batchbuffer.h| 19 ++
 5 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index 1f89b723544..d110482cc8e 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -176,9 +176,9 @@ brw_emit_surface_state(struct brw_context *brw,
  surf_offset);
 
isl_surf_fill_state(>isl_dev, state, .surf = >surf, .view = ,
-   .address = brw_emit_reloc(>batch,
- *surf_offset + 
brw->isl_dev.ss.addr_offset,
- mt->bo, offset, reloc_flags),
+   .address = brw_state_reloc(>batch,
+  *surf_offset + 
brw->isl_dev.ss.addr_offset,
+  mt->bo, offset, reloc_flags),
.aux_surf = aux_surf, .aux_usage = aux_usage,
.aux_address = aux_offset,
.mocs = mocs, .clear_color = clear_color,
@@ -194,11 +194,11 @@ brw_emit_surface_state(struct brw_context *brw,
*/
   assert((aux_offset & 0xfff) == 0);
   uint32_t *aux_addr = state + brw->isl_dev.ss.aux_addr_offset;
-  *aux_addr = brw_emit_reloc(>batch,
- *surf_offset +
- brw->isl_dev.ss.aux_addr_offset,
- aux_bo, *aux_addr,
- reloc_flags);
+  *aux_addr = brw_state_reloc(>batch,
+  *surf_offset +
+  brw->isl_dev.ss.aux_addr_offset,
+  aux_bo, *aux_addr,
+  reloc_flags);
}
 }
 
@@ -607,10 +607,10 @@ brw_emit_buffer_surface_state(struct brw_context *brw,
 
isl_buffer_fill_state(>isl_dev, dw,
  .address = !bo ? buffer_offset :
-brw_emit_reloc(>batch,
-   *out_offset + 
brw->isl_dev.ss.addr_offset,
-   bo, buffer_offset,
-   reloc_flags),
+brw_state_reloc(>batch,
+*out_offset + 
brw->isl_dev.ss.addr_offset,
+bo, buffer_offset,
+reloc_flags),
  .size = buffer_size,
  .format = surface_format,
  .stride = pitch,
@@ -777,8 +777,8 @@ brw_update_sol_surface(struct brw_context *brw,
   BRW_SURFACE_MIPMAPLAYOUT_BELOW << BRW_SURFACE_MIPLAYOUT_SHIFT |
   surface_format << BRW_SURFACE_FORMAT_SHIFT |
   BRW_SURFACE_RC_READ_WRITE;
-   surf[1] = brw_emit_reloc(>batch,
-*out_offset + 4, bo, offset_bytes, RELOC_WRITE);
+   surf[1] = brw_state_reloc(>batch,
+ *out_offset + 4, bo, offset_bytes, RELOC_WRITE);
surf[2] = (width << BRW_SURFACE_WIDTH_SHIFT |
  height << BRW_SURFACE_HEIGHT_SHIFT);
surf[3] = (depth << BRW_SURFACE_DEPTH_SHIFT |
@@ -870,9 +870,9 @@ emit_null_surface_state(struct brw_context *brw,
 
surf[0] = (BRW_SURFACE_2D << BRW_SURFACE_TYPE_SHIFT |
  ISL_FORMAT_B8G8R8A8_UNORM << BRW_SURFACE_FORMAT_SHIFT);
-   surf[1] = brw_emit_reloc(>batch, *out_offset + 4,
-brw->wm.multisampled_null_render_target_bo,
-0, RELOC_WRITE);
+   surf[1] = brw_state_reloc(>batch, *out_offset + 4,
+ brw->wm.multisampled_null_render_target_bo,
+ 0, RELOC_WRITE);
 
surf[2] = ((width - 1) << BRW_SURFACE_WIDTH_SHIFT |
   (height - 1) << BRW_SURFACE_HEIGHT_SHIFT);
@@ -940,12 +940,12 @@ gen4_update_renderbuffer_surface(struct brw_context *brw,
 
/* reloc */
assert(mt->offset % mt->cpp == 0);
-   surf[1] = brw_emit_reloc(>batch, offset + 4, mt->bo,
-mt->offset +
-intel_renderbuffer_get_tile_offsets(irb,
- 

[Mesa-dev] [PATCH 01/17] i965: Don't special case the batchbuffer when reference counting.

2017-09-05 Thread Kenneth Graunke
We don't need to special case the batch - when we add the batch to the
validation list, we can simply increase the refcount to 2, and when we
make a new batch, we'll drop it back down to 1 (when unreferencing all
buffers in the validation list).  The final reference is still held by
brw->batch.bo, as it was before.

This removes the special case from a bunch of loops.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 73cf2528272..08d35ace135 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -115,8 +115,7 @@ add_exec_bo(struct intel_batchbuffer *batch, struct brw_bo 
*bo)
  return index;
}
 
-   if (bo != batch->bo)
-  brw_bo_reference(bo);
+   brw_bo_reference(bo);
 
if (batch->exec_count == batch->exec_array_size) {
   batch->exec_array_size *= 2;
@@ -199,9 +198,7 @@ intel_batchbuffer_reset_to_saved(struct brw_context *brw)
 {
for (int i = brw->batch.saved.exec_count;
 i < brw->batch.exec_count; i++) {
-  if (brw->batch.exec_bos[i] != brw->batch.bo) {
- brw_bo_unreference(brw->batch.exec_bos[i]);
-  }
+  brw_bo_unreference(brw->batch.exec_bos[i]);
}
brw->batch.reloc_count = brw->batch.saved.reloc_count;
brw->batch.exec_count = brw->batch.saved.exec_count;
@@ -217,9 +214,7 @@ intel_batchbuffer_free(struct intel_batchbuffer *batch)
free(batch->cpu_map);
 
for (int i = 0; i < batch->exec_count; i++) {
-  if (batch->exec_bos[i] != batch->bo) {
- brw_bo_unreference(batch->exec_bos[i]);
-  }
+  brw_bo_unreference(batch->exec_bos[i]);
}
free(batch->relocs);
free(batch->exec_bos);
@@ -449,9 +444,7 @@ brw_new_batch(struct brw_context *brw)
 {
/* Unreference any BOs held by the previous batch, and reset counts. */
for (int i = 0; i < brw->batch.exec_count; i++) {
-  if (brw->batch.exec_bos[i] != brw->batch.bo) {
- brw_bo_unreference(brw->batch.exec_bos[i]);
-  }
+  brw_bo_unreference(brw->batch.exec_bos[i]);
   brw->batch.exec_bos[i] = NULL;
}
brw->batch.reloc_count = 0;
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/17] i965: Grow the batch/state buffers if we need space and can't flush.

2017-09-05 Thread Kenneth Graunke
Previously, we would just assert fail and die in this case.  The only
safeguard is the "estimated max prim size" checks when starting a draw
(or compute dispatch or BLORP operation)...which are woefully broken.

Growing is fairly straightforward:

1. Allocate a new larger BO.
2. memcpy the existing contents over to the new buffer
3. Set the new BO to the same GTT offset as the old BO.  When emitting
   relocations, we write the presumed GTT offset of the target BO.  If
   we changed it, we'd have to update all the existing values (by
   walking the relocation list and looking at offsets), which is more
   expensive.  With the old BO freed, ideally the kernel could simply
   place the new BO at that offset anyway.
4. Update the validation list to contain the new BO.
5. Update the relocation list to have the GEM handle for the new BO
   (which we can skip if using I915_EXEC_HANDLE_LUT).
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 104 --
 1 file changed, 99 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 909f56f9792..118f75c4d71 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -43,6 +43,9 @@
 #define BATCH_SZ (8192*sizeof(uint32_t))
 #define STATE_SZ (8192*sizeof(uint32_t))
 
+/* Don't exceed this - batchbuffers need to fit in the ring! */
+#define MAX_BATCH_SIZE 65536
+
 static void
 intel_batchbuffer_reset(struct intel_batchbuffer *batch,
 struct brw_bufmgr *bufmgr);
@@ -228,6 +231,78 @@ intel_batchbuffer_free(struct intel_batchbuffer *batch)
   _mesa_hash_table_destroy(batch->state_batch_sizes, NULL);
 }
 
+static void
+replace_bo_in_reloc_list(struct brw_reloc_list *rlist,
+ uint32_t old_handle, uint32_t new_handle)
+{
+   for (int i = 0; i < rlist->reloc_count; i++) {
+  if (rlist->relocs[i].target_handle == old_handle)
+ rlist->relocs[i].target_handle = new_handle;
+   }
+}
+
+static void
+grow_buffer(struct brw_context *brw,
+struct brw_bo **bo_ptr,
+uint32_t **map_ptr,
+unsigned existing_bytes,
+unsigned new_size)
+{
+   struct intel_batchbuffer *batch = >batch;
+   struct brw_bufmgr *bufmgr = brw->bufmgr;
+
+   uint32_t *old_map = *map_ptr;
+   struct brw_bo *old_bo = *bo_ptr;
+
+   struct brw_bo *new_bo = brw_bo_alloc(bufmgr, old_bo->name, new_size, 4096);
+   uint32_t *new_map = brw_bo_map(brw, new_bo, MAP_READ | MAP_WRITE);
+
+   perf_debug("Growing %s - ran out of space\n", old_bo->name);
+
+   /* Copy existing data to the new larger buffer */
+   memcpy(new_map, old_map, existing_bytes);
+
+   /* Try to put the new BO at the same GTT offset as the old BO (which
+* we're throwing away, so it doesn't need to be there).
+*
+* This guarantees that our relocations continue to work: values we've
+* already written into the buffer, values we're going to write into the
+* buffer, and the validation/relocation lists all will match.
+*/
+   new_bo->gtt_offset = old_bo->gtt_offset;
+   new_bo->index = old_bo->index;
+
+   /* Batch/state buffers are per-context, and if we've run out of space,
+* we must have actually used them before, so...they will be in the list.
+*/
+   assert(old_bo->index < batch->exec_count);
+   assert(batch->exec_bos[old_bo->index] == old_bo);
+
+   /* Update the validation list to use the new BO. */
+   batch->exec_bos[old_bo->index] = new_bo;
+   batch->validation_list[old_bo->index].handle = new_bo->gem_handle;
+   brw_bo_reference(new_bo);
+   brw_bo_unreference(old_bo);
+
+   if (!batch->use_batch_first) {
+  /* We're not using I915_EXEC_HANDLE_LUT, which means we need to go
+   * update the relocation list entries to point at the new BO as well.
+   * (With newer kernels, the "handle" is an offset into the validation
+   * list, which remains unchanged, so we can skip this.)
+   */
+  replace_bo_in_reloc_list(>batch_relocs,
+   old_bo->gem_handle, new_bo->gem_handle);
+  replace_bo_in_reloc_list(>state_relocs,
+   old_bo->gem_handle, new_bo->gem_handle);
+   }
+
+   /* Drop the *bo_ptr reference.  This should free the old BO. */
+   brw_bo_unreference(old_bo);
+
+   *bo_ptr = new_bo;
+   *map_ptr = new_map;
+}
+
 void
 intel_batchbuffer_require_space(struct brw_context *brw, GLuint sz,
 enum brw_gpu_ring ring)
@@ -242,9 +317,20 @@ intel_batchbuffer_require_space(struct brw_context *brw, 
GLuint sz,
}
 
/* For now, flush as if the batch and state buffers still shared a BO */
-   if (USED_BATCH(*batch) * 4 + sz >=
-   BATCH_SZ - batch->reserved_space - batch->state_used)
-  intel_batchbuffer_flush(brw);
+   const unsigned batch_used = USED_BATCH(*batch) * 4;
+   if (batch_used + sz >=
+   BATCH_SZ - 

[Mesa-dev] [PATCH 04/17] i965: Use batch->bo->size in brw_emit_reloc assertion.

2017-09-05 Thread Kenneth Graunke
This makes the assertion safe against batchbuffers growing.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 5ac34e59299..a7243a27aeb 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -806,7 +806,7 @@ brw_emit_reloc(struct intel_batchbuffer *batch, uint32_t 
batch_offset,
}
 
/* Check args */
-   assert(batch_offset <= BATCH_SZ - sizeof(uint32_t));
+   assert(batch_offset <= batch->bo->size - sizeof(uint32_t));
 
unsigned int index = add_exec_bo(batch, target);
struct drm_i915_gem_exec_object2 *entry = >validation_list[index];
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/17] i965: Drop a useless ret == 0 check.

2017-09-05 Thread Kenneth Graunke
Prior to the previous patch, we would pwrite the batchbuffer contents,
and wanted to skip the execbuffer if that failed.  Now, we write things
directly to the map, so we don't need this check.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 40 ---
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 9b37470f926..df094bb6047 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -199,8 +199,6 @@ intel_batchbuffer_reset_to_saved(struct brw_context *brw)
 void
 intel_batchbuffer_free(struct intel_batchbuffer *batch)
 {
-   free(batch->cpu_map);
-
for (int i = 0; i < batch->exec_count; i++) {
   brw_bo_unreference(batch->exec_bos[i]);
}
@@ -642,31 +640,29 @@ do_flush_locked(struct brw_context *brw, int in_fence_fd, 
int *out_fence_fd)
   if (batch->needs_sol_reset)
  flags |= I915_EXEC_GEN7_SOL_RESET;
 
-  if (ret == 0) {
- uint32_t hw_ctx = batch->ring == RENDER_RING ? brw->hw_ctx : 0;
-
- struct drm_i915_gem_exec_object2 *entry = >validation_list[0];
- assert(entry->handle == batch->bo->gem_handle);
- entry->relocation_count = batch->reloc_count;
- entry->relocs_ptr = (uintptr_t) batch->relocs;
+  uint32_t hw_ctx = batch->ring == RENDER_RING ? brw->hw_ctx : 0;
 
- if (batch->use_batch_first) {
-flags |= I915_EXEC_BATCH_FIRST | I915_EXEC_HANDLE_LUT;
- } else {
-/* Move the batch to the end of the validation list */
-struct drm_i915_gem_exec_object2 tmp;
-const unsigned index = batch->exec_count - 1;
+  struct drm_i915_gem_exec_object2 *entry = >validation_list[0];
+  assert(entry->handle == batch->bo->gem_handle);
+  entry->relocation_count = batch->reloc_count;
+  entry->relocs_ptr = (uintptr_t) batch->relocs;
 
-tmp = *entry;
-*entry = batch->validation_list[index];
-batch->validation_list[index] = tmp;
- }
+  if (batch->use_batch_first) {
+ flags |= I915_EXEC_BATCH_FIRST | I915_EXEC_HANDLE_LUT;
+  } else {
+ /* Move the batch to the end of the validation list */
+ struct drm_i915_gem_exec_object2 tmp;
+ const unsigned index = batch->exec_count - 1;
 
- ret = execbuffer(dri_screen->fd, batch, hw_ctx,
-  4 * USED_BATCH(*batch),
-  in_fence_fd, out_fence_fd, flags);
+ tmp = *entry;
+ *entry = batch->validation_list[index];
+ batch->validation_list[index] = tmp;
   }
 
+  ret = execbuffer(dri_screen->fd, batch, hw_ctx,
+   4 * USED_BATCH(*batch),
+   in_fence_fd, out_fence_fd, flags);
+
   throttle(brw);
}
 
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/17] i965: Remove map fallback in INTEL_DEBUG=bat code.

2017-09-05 Thread Kenneth Graunke
This only made sense for the shadow copy of the batch.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index df094bb6047..7703db92d83 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -281,16 +281,14 @@ do_batch_dump(struct brw_context *brw)
if (batch->ring != RENDER_RING)
   return;
 
-   void *map = brw_bo_map(brw, batch->bo, MAP_READ);
-   if (map == NULL) {
-  fprintf(stderr,
-  "WARNING: failed to map batchbuffer, "
-  "dumping uploaded data instead.\n");
+   uint32_t *data = brw_bo_map(brw, batch->bo, MAP_READ);
+   if (data == NULL) {
+  fprintf(stderr, "WARNING: failed to map batchbuffer\n");
+  return;
}
 
-   uint32_t *data = map ? map : batch->map;
uint32_t *end = data + USED_BATCH(*batch);
-   uint32_t gtt_offset = map ? batch->bo->gtt_offset : 0;
+   uint32_t gtt_offset = batch->bo->gtt_offset;
int length;
 
bool color = INTEL_DEBUG & DEBUG_COLOR;
@@ -411,9 +409,7 @@ do_batch_dump(struct brw_context *brw)
   }
}
 
-   if (map != NULL) {
-  brw_bo_unmap(batch->bo);
-   }
+   brw_bo_unmap(batch->bo);
 }
 #else
 static void do_batch_dump(struct brw_context *brw) { }
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/17] i965: Add an INTEL_DEBUG=flush option for printing batch statistics.

2017-09-05 Thread Kenneth Graunke
When a batch is flushed, INTEL_DEBUG=bat prints a message indicating
which part of the code triggered the flushed, and some statistics about
the batch/state buffer utilization.

It also decodes the batchbuffer in debug builds...which is so much
output that it drowns out the utilization messages, if that's all you
care about.

INTEL_DEBUG=flush now just does the utilization messages.
INTEL_DEBUG=bat continues to do both (as the message is a good indicator
that we're starting decode of a new batch).
---
 docs/envvars.html | 1 +
 src/intel/common/gen_debug.c  | 1 +
 src/intel/common/gen_debug.h  | 2 +-
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 2 +-
 4 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/envvars.html b/docs/envvars.html
index 9e2f8163644..ddd1470327e 100644
--- a/docs/envvars.html
+++ b/docs/envvars.html
@@ -175,6 +175,7 @@ See the Xlib software driver 
page for details.
do32 - generate compute shader SIMD32 programs even if workgroup size 
doesn't exceed the SIMD16 limit
dri - emit messages about the DRI interface
fbo - emit messages about framebuffers
+   flush - emit batchbuffer usage statistics
fs - dump shader assembly for fragment shaders
gs - dump shader assembly for geometry shaders
hex - print instruction hex dump with the disassembly
diff --git a/src/intel/common/gen_debug.c b/src/intel/common/gen_debug.c
index b604d56ef86..068a43ecfdb 100644
--- a/src/intel/common/gen_debug.c
+++ b/src/intel/common/gen_debug.c
@@ -57,6 +57,7 @@ static const struct debug_control debug_control[] = {
{ "vert",DEBUG_VERTS },
{ "dri", DEBUG_DRI },
{ "sf",  DEBUG_SF },
+   { "flush",   DEBUG_FLUSH },
{ "wm",  DEBUG_WM },
{ "urb", DEBUG_URB },
{ "vs",  DEBUG_VS },
diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h
index d290303682e..ea34f7247f3 100644
--- a/src/intel/common/gen_debug.h
+++ b/src/intel/common/gen_debug.h
@@ -57,7 +57,7 @@ extern uint64_t INTEL_DEBUG;
 #define DEBUG_VERTS   (1ull << 13)
 #define DEBUG_DRI (1ull << 14)
 #define DEBUG_SF  (1ull << 15)
-/* Hole - feel free to reuse  (1ull << 16) */
+#define DEBUG_FLUSH   (1ull << 16)
 #define DEBUG_WM  (1ull << 17)
 #define DEBUG_URB (1ull << 18)
 #define DEBUG_VS  (1ull << 19)
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 08d35ace135..422e6754d54 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -731,7 +731,7 @@ _intel_batchbuffer_flush_fence(struct brw_context *brw,
   brw_bo_reference(brw->throttle_batch[0]);
}
 
-   if (unlikely(INTEL_DEBUG & DEBUG_BATCH)) {
+   if (unlikely(INTEL_DEBUG & (DEBUG_BATCH | DEBUG_FLUSH))) {
   int bytes_for_commands = 4 * USED_BATCH(brw->batch);
   int bytes_for_state = brw->batch.bo->size - 
brw->batch.state_batch_offset;
   int total_bytes = bytes_for_commands + bytes_for_state;
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/17] i965: Drop CPU-side shadow copy of the batchbuffer for non-LLC systems.

2017-09-05 Thread Kenneth Graunke
Now that we have write-combining maps, our writes to the batch should be
reasonably fast.  (In the past, we only had uncached maps, which were
slow...so we kept a CPU-side shadow copy for write combining purposes.)

There are a few places that still read back a DWord or so from the
batch, which will unfortunately now have uncached performance.  We
should eliminate those.

XXX: benchmark, see if this is something we can live with for now, or
if we really need to fix it right away.
---
 src/mesa/drivers/dri/i965/brw_context.h   |  1 -
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 34 +--
 2 files changed, 6 insertions(+), 29 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 92fc16de136..b3a8fa01aff 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -448,7 +448,6 @@ struct intel_batchbuffer {
uint16_t reserved_space;
uint32_t *map_next;
uint32_t *map;
-   uint32_t *cpu_map;
 
uint32_t state_batch_offset;
enum brw_gpu_ring ring;
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index a7243a27aeb..9b37470f926 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -44,8 +44,7 @@
 
 static void
 intel_batchbuffer_reset(struct intel_batchbuffer *batch,
-struct brw_bufmgr *bufmgr,
-bool has_llc);
+struct brw_bufmgr *bufmgr);
 
 static bool
 uint_key_compare(const void *a, const void *b)
@@ -66,12 +65,6 @@ intel_batchbuffer_init(struct intel_screen *screen,
struct brw_bufmgr *bufmgr = screen->bufmgr;
const struct gen_device_info *devinfo = >devinfo;
 
-   if (!devinfo->has_llc) {
-  batch->cpu_map = malloc(BATCH_SZ);
-  batch->map = batch->cpu_map;
-  batch->map_next = batch->cpu_map;
-   }
-
batch->reloc_count = 0;
batch->reloc_array_size = 250;
batch->relocs = malloc(batch->reloc_array_size *
@@ -96,7 +89,7 @@ intel_batchbuffer_init(struct intel_screen *screen,
if (devinfo->gen == 6)
   batch->valid_reloc_flags |= EXEC_OBJECT_NEEDS_GTT;
 
-   intel_batchbuffer_reset(batch, bufmgr, devinfo->has_llc);
+   intel_batchbuffer_reset(batch, bufmgr);
 }
 
 #define READ_ONCE(x) (*(volatile __typeof__(x) *)&(x))
@@ -144,8 +137,7 @@ add_exec_bo(struct intel_batchbuffer *batch, struct brw_bo 
*bo)
 
 static void
 intel_batchbuffer_reset(struct intel_batchbuffer *batch,
-struct brw_bufmgr *bufmgr,
-bool has_llc)
+struct brw_bufmgr *bufmgr)
 {
if (batch->last_bo != NULL) {
   brw_bo_unreference(batch->last_bo);
@@ -154,9 +146,7 @@ intel_batchbuffer_reset(struct intel_batchbuffer *batch,
batch->last_bo = batch->bo;
 
batch->bo = brw_bo_alloc(bufmgr, "batchbuffer", BATCH_SZ, 4096);
-   if (has_llc) {
-  batch->map = brw_bo_map(NULL, batch->bo, MAP_READ | MAP_WRITE);
-   }
+   batch->map = brw_bo_map(NULL, batch->bo, MAP_READ | MAP_WRITE);
batch->map_next = batch->map;
 
add_exec_bo(batch, batch->bo);
@@ -179,9 +169,7 @@ intel_batchbuffer_reset(struct intel_batchbuffer *batch,
 static void
 intel_batchbuffer_reset_and_clear_render_cache(struct brw_context *brw)
 {
-   const struct gen_device_info *devinfo = >screen->devinfo;
-
-   intel_batchbuffer_reset(>batch, brw->bufmgr, devinfo->has_llc);
+   intel_batchbuffer_reset(>batch, brw->bufmgr);
brw_render_cache_set_clear(brw);
 }
 
@@ -629,17 +617,7 @@ do_flush_locked(struct brw_context *brw, int in_fence_fd, 
int *out_fence_fd)
struct intel_batchbuffer *batch = >batch;
int ret = 0;
 
-   if (devinfo->has_llc) {
-  brw_bo_unmap(batch->bo);
-   } else {
-  ret = brw_bo_subdata(batch->bo, 0, 4 * USED_BATCH(*batch), batch->map);
-  if (ret == 0 && batch->state_batch_offset != batch->bo->size) {
- ret = brw_bo_subdata(batch->bo,
-  batch->state_batch_offset,
-  batch->bo->size - batch->state_batch_offset,
-  (char *)batch->map + batch->state_batch_offset);
-  }
-   }
+   brw_bo_unmap(batch->bo);
 
if (!brw->screen->no_hw) {
   /* The requirement for using I915_EXEC_NO_RELOC are:
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/17] i965: Growing the batch buffer, separate state buffers

2017-09-05 Thread Kenneth Graunke
Hello,

This series separates GPU commands and indirect state into two distinct
buffers - the batch buffer and the state buffer.  It then adds support
for growing the batch/state buffers, in case we need more space but are
in a "critical section" where we can't safely "wrap" (flush) the batch.
Growing ends up being fairly cheap, especially on modern kernels where
we have I915_EXEC_HANDLE_LUT and I915_EXEC_BATCH_FIRST.

We then drop the BATCH_RESERVED logic (no one will miss that math) and
most of the estimated max prim arithmetic (we still estimate, but the
consequence is a small amount of overhead rather than certain death).

This fixes a long standing bug, and also opens up the possibility of
having larger batches.

A couple issues remain: the series drops the malloc'd shadow copy of
the batch for non-LLC systems.  I haven't checked how dire this is.
The last patch also dramatically impacts batch sizes, which we'll need
to benchmark.  However, I thought I'd get the code out there for review
before finishing that - I don't expect the mechanics to change much.

Enjoy!

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/17] i965: Delete a batch size assertion that isn't very useful.

2017-09-05 Thread Kenneth Graunke
This assertion prevents you from doing intel_batchbuffer_require_space
with a size so huge it won't fit in the batchbuffer.  This doesn't seem
like a common mistake, and I've never seen the assert to be useful.

Soon, I hope to have batches grow, at which point this won't make sense.
---
 src/mesa/drivers/dri/i965/intel_batchbuffer.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c 
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
index 422e6754d54..5ac34e59299 100644
--- a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
+++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
@@ -238,9 +238,6 @@ intel_batchbuffer_require_space(struct brw_context *brw, 
GLuint sz,
   intel_batchbuffer_flush(brw);
}
 
-#ifdef DEBUG
-   assert(sz < BATCH_SZ - BATCH_RESERVED);
-#endif
if (intel_batchbuffer_space(>batch) < sz)
   intel_batchbuffer_flush(brw);
 
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: Fix vkCopyImage with both depth and stencil aspects.

2017-09-05 Thread Bas Nieuwenhuizen
Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver"
---
 src/amd/vulkan/radv_meta_blit2d.c | 206 --
 1 file changed, 107 insertions(+), 99 deletions(-)

diff --git a/src/amd/vulkan/radv_meta_blit2d.c 
b/src/amd/vulkan/radv_meta_blit2d.c
index 6763384cada..05e49fea762 100644
--- a/src/amd/vulkan/radv_meta_blit2d.c
+++ b/src/amd/vulkan/radv_meta_blit2d.c
@@ -37,7 +37,8 @@ enum blit2d_src_type {
 static void
 create_iview(struct radv_cmd_buffer *cmd_buffer,
  struct radv_meta_blit2d_surf *surf,
- struct radv_image_view *iview, VkFormat depth_format)
+ struct radv_image_view *iview, VkFormat depth_format,
+  VkImageAspectFlagBits aspects)
 {
VkFormat format;
 
@@ -53,7 +54,7 @@ create_iview(struct radv_cmd_buffer *cmd_buffer,
 .viewType = VK_IMAGE_VIEW_TYPE_2D,
 .format = format,
 .subresourceRange = {
-.aspectMask = surf->aspect_mask,
+.aspectMask = aspects,
 .baseMipLevel = surf->level,
 .levelCount = 1,
 .baseArrayLayer = surf->layer,
@@ -95,7 +96,8 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
 struct radv_meta_blit2d_surf *src_img,
 struct radv_meta_blit2d_buffer *src_buf,
 struct blit2d_src_temps *tmp,
-enum blit2d_src_type src_type, VkFormat depth_format)
+enum blit2d_src_type src_type, VkFormat depth_format,
+VkImageAspectFlagBits aspects)
 {
struct radv_device *device = cmd_buffer->device;
 
@@ -122,7 +124,7 @@ blit2d_bind_src(struct radv_cmd_buffer *cmd_buffer,
  VK_SHADER_STAGE_FRAGMENT_BIT, 16, 4,
  _buf->pitch);
} else {
-   create_iview(cmd_buffer, src_img, >iview, depth_format);
+   create_iview(cmd_buffer, src_img, >iview, depth_format, 
aspects);
 
radv_meta_push_descriptor_set(cmd_buffer, 
VK_PIPELINE_BIND_POINT_GRAPHICS,
  
device->meta_state.blit2d.p_layouts[src_type],
@@ -159,9 +161,10 @@ blit2d_bind_dst(struct radv_cmd_buffer *cmd_buffer,
 uint32_t width,
 uint32_t height,
VkFormat depth_format,
-struct blit2d_dst_temps *tmp)
+struct blit2d_dst_temps *tmp,
+VkImageAspectFlagBits aspects)
 {
-   create_iview(cmd_buffer, dst, >iview, depth_format);
+   create_iview(cmd_buffer, dst, >iview, depth_format, aspects);
 
radv_CreateFramebuffer(radv_device_to_handle(cmd_buffer->device),
   &(VkFramebufferCreateInfo) {
@@ -234,106 +237,111 @@ radv_meta_blit2d_normal_dst(struct radv_cmd_buffer 
*cmd_buffer,
struct radv_device *device = cmd_buffer->device;
 
for (unsigned r = 0; r < num_rects; ++r) {
-   VkFormat depth_format = 0;
-   if (dst->aspect_mask == VK_IMAGE_ASPECT_STENCIL_BIT)
-   depth_format = 
vk_format_stencil_only(dst->image->vk_format);
-   else if (dst->aspect_mask == VK_IMAGE_ASPECT_DEPTH_BIT)
-   depth_format = 
vk_format_depth_only(dst->image->vk_format);
-   struct blit2d_src_temps src_temps;
-   blit2d_bind_src(cmd_buffer, src_img, src_buf, _temps, 
src_type, depth_format);
-
-   struct blit2d_dst_temps dst_temps;
-   blit2d_bind_dst(cmd_buffer, dst, rects[r].dst_x + 
rects[r].width,
-   rects[r].dst_y + rects[r].height, depth_format, 
_temps);
-
-   float vertex_push_constants[4] = {
-   rects[r].src_x,
-   rects[r].src_y,
-   rects[r].src_x + rects[r].width,
-   rects[r].src_y + rects[r].height,
-   };
+   unsigned i;
+   for_each_bit(i, dst->aspect_mask) {
+   unsigned aspect_mask = 1u << i;
+   VkFormat depth_format = 0;
+   if (aspect_mask == VK_IMAGE_ASPECT_STENCIL_BIT)
+   depth_format = 
vk_format_stencil_only(dst->image->vk_format);
+   else if (aspect_mask == VK_IMAGE_ASPECT_DEPTH_BIT)
+   depth_format = 
vk_format_depth_only(dst->image->vk_format);
+   struct blit2d_src_temps src_temps;
+   blit2d_bind_src(cmd_buffer, src_img, src_buf, 
_temps, src_type, depth_format, aspect_mask);
+
+   struct blit2d_dst_temps 

[Mesa-dev] [Bug 102530] [bisected] Kodi crashes when launching a stream - commit bd2662bf

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102530

--- Comment #13 from Alexandre Demers  ---
Created attachment 133983
  --> https://bugs.freedesktop.org/attachment.cgi?id=133983=edit
Kodi segfault with MESA_NO_ERROR=0

Core dump produced by Kodi when MESA_NO_ERROR=0

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 102461] [llvmpipe] piglit glean fragprog1 XPD test 1 regression

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102461

Vinson Lee  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #1 from Vinson Lee  ---
commit 79674066b6f98be96cb63a0332ac421858544a20
Author: Marek Olšák 
Date:   Mon Aug 28 23:28:33 2017 +0200

st/mesa: fix XPD lowering - don't read dst

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102461

Reviewed-by: Brian Paul 

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] intel: Remove unused device info for KBL GT1.5

2017-09-05 Thread Anuj Phogat
Signed-off-by: Anuj Phogat 
---
 src/intel/common/gen_device_info.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/src/intel/common/gen_device_info.c 
b/src/intel/common/gen_device_info.c
index c0eb7c3..a9a1399 100644
--- a/src/intel/common/gen_device_info.c
+++ b/src/intel/common/gen_device_info.c
@@ -574,17 +574,6 @@ static const struct gen_device_info 
gen_device_info_kbl_gt1 = {
.l3_banks = 2,
 };
 
-static const struct gen_device_info gen_device_info_kbl_gt1_5 = {
-   GEN9_FEATURES,
-   .is_kabylake = true,
-   .gt = 1,
-
-   .max_cs_threads = 7 * 6,
-   .num_slices = 1,
-   .num_subslices = { 3, },
-   .l3_banks = 4,
-};
-
 static const struct gen_device_info gen_device_info_kbl_gt2 = {
GEN9_FEATURES,
.is_kabylake = true,
-- 
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] intel: Fix few KBL brand strings

2017-09-05 Thread Anuj Phogat
Signed-off-by: Anuj Phogat 
---
 include/pci_ids/i965_pci_ids.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 5c42712..80f7712 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -154,8 +154,8 @@ CHIPSET(0x591B, kbl_gt2, "Intel(R) HD Graphics 630 (Kaby 
Lake GT2)")
 CHIPSET(0x591D, kbl_gt2, "Intel(R) HD Graphics P630 (Kaby Lake GT2)")
 CHIPSET(0x591E, kbl_gt2, "Intel(R) HD Graphics 615 (Kaby Lake GT2)")
 CHIPSET(0x5921, kbl_gt2, "Intel(R) Kabylake GT2F")
-CHIPSET(0x5926, kbl_gt3, "Intel(R) Iris Plus Graphics 640 (Kaby Lake GT3)")
-CHIPSET(0x5927, kbl_gt3, "Intel(R) Iris Plus Graphics 650 (Kaby Lake GT3)")
+CHIPSET(0x5926, kbl_gt3, "Intel(R) Iris Plus Graphics 640 (Kaby Lake GT3e)")
+CHIPSET(0x5927, kbl_gt3, "Intel(R) Iris Plus Graphics 650 (Kaby Lake GT3e)")
 CHIPSET(0x3184, glk, "Intel(R) HD Graphics (Geminilake)")
 CHIPSET(0x3185, glk_2x6, "Intel(R) HD Graphics (Geminilake 2x6)")
 CHIPSET(0x3E90, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
-- 
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] intel: Change a KBL pci id to GT2 from GT1.5

2017-09-05 Thread Anuj Phogat
Signed-off-by: Anuj Phogat 
---
 include/pci_ids/i965_pci_ids.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 80f7712..6f92084 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -146,7 +146,7 @@ CHIPSET(0x5A85, bxt_2x6, "Intel(R) HD Graphics 500 (Broxton 
2x6)")
 CHIPSET(0x5902, kbl_gt1, "Intel(R) HD Graphics 610 (Kaby Lake GT1)")
 CHIPSET(0x5906, kbl_gt1, "Intel(R) HD Graphics 610 (Kaby Lake GT1)")
 CHIPSET(0x590B, kbl_gt1, "Intel(R) Kabylake GT1")
-CHIPSET(0x5917, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
+CHIPSET(0x5917, kbl_gt2, "Intel(R) Kabylake GT2")
 CHIPSET(0x5912, kbl_gt2, "Intel(R) HD Graphics 630 (Kaby Lake GT2)")
 CHIPSET(0x5916, kbl_gt2, "Intel(R) HD Graphics 620 (Kaby Lake GT2)")
 CHIPSET(0x591A, kbl_gt2, "Intel(R) HD Graphics P630 (Kaby Lake GT2)")
-- 
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] intel: Remove unused Kabylake pci ids

2017-09-05 Thread Anuj Phogat
These PCI IDs are not used in any Kabylake SKUs.

Signed-off-by: Anuj Phogat 
---
 include/pci_ids/i965_pci_ids.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 57e70b7..5c42712 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -145,12 +145,7 @@ CHIPSET(0x5A84, bxt, "Intel(R) HD Graphics 505 
(Broxton)")
 CHIPSET(0x5A85, bxt_2x6, "Intel(R) HD Graphics 500 (Broxton 2x6)")
 CHIPSET(0x5902, kbl_gt1, "Intel(R) HD Graphics 610 (Kaby Lake GT1)")
 CHIPSET(0x5906, kbl_gt1, "Intel(R) HD Graphics 610 (Kaby Lake GT1)")
-CHIPSET(0x590A, kbl_gt1, "Intel(R) Kabylake GT1")
-CHIPSET(0x5908, kbl_gt1, "Intel(R) Kabylake GT1")
 CHIPSET(0x590B, kbl_gt1, "Intel(R) Kabylake GT1")
-CHIPSET(0x590E, kbl_gt1, "Intel(R) Kabylake GT1")
-CHIPSET(0x5913, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
-CHIPSET(0x5915, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
 CHIPSET(0x5917, kbl_gt1_5, "Intel(R) Kabylake GT1.5")
 CHIPSET(0x5912, kbl_gt2, "Intel(R) HD Graphics 630 (Kaby Lake GT2)")
 CHIPSET(0x5916, kbl_gt2, "Intel(R) HD Graphics 620 (Kaby Lake GT2)")
@@ -159,10 +154,8 @@ CHIPSET(0x591B, kbl_gt2, "Intel(R) HD Graphics 630 (Kaby 
Lake GT2)")
 CHIPSET(0x591D, kbl_gt2, "Intel(R) HD Graphics P630 (Kaby Lake GT2)")
 CHIPSET(0x591E, kbl_gt2, "Intel(R) HD Graphics 615 (Kaby Lake GT2)")
 CHIPSET(0x5921, kbl_gt2, "Intel(R) Kabylake GT2F")
-CHIPSET(0x5923, kbl_gt3, "Intel(R) Kabylake GT3")
 CHIPSET(0x5926, kbl_gt3, "Intel(R) Iris Plus Graphics 640 (Kaby Lake GT3)")
 CHIPSET(0x5927, kbl_gt3, "Intel(R) Iris Plus Graphics 650 (Kaby Lake GT3)")
-CHIPSET(0x593B, kbl_gt4, "Intel(R) Kabylake GT4")
 CHIPSET(0x3184, glk, "Intel(R) HD Graphics (Geminilake)")
 CHIPSET(0x3185, glk_2x6, "Intel(R) HD Graphics (Geminilake 2x6)")
 CHIPSET(0x3E90, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
-- 
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] intel: Add brand string for KBL-R

2017-09-05 Thread Anuj Phogat
Signed-off-by: Anuj Phogat 
---
 include/pci_ids/i965_pci_ids.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 6f92084..4a51e44 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -146,7 +146,7 @@ CHIPSET(0x5A85, bxt_2x6, "Intel(R) HD Graphics 500 (Broxton 
2x6)")
 CHIPSET(0x5902, kbl_gt1, "Intel(R) HD Graphics 610 (Kaby Lake GT1)")
 CHIPSET(0x5906, kbl_gt1, "Intel(R) HD Graphics 610 (Kaby Lake GT1)")
 CHIPSET(0x590B, kbl_gt1, "Intel(R) Kabylake GT1")
-CHIPSET(0x5917, kbl_gt2, "Intel(R) Kabylake GT2")
+CHIPSET(0x5917, kbl_gt2, "Intel(R) UHD Graphics 620 (Kabylake GT2)")
 CHIPSET(0x5912, kbl_gt2, "Intel(R) HD Graphics 630 (Kaby Lake GT2)")
 CHIPSET(0x5916, kbl_gt2, "Intel(R) HD Graphics 620 (Kaby Lake GT2)")
 CHIPSET(0x591A, kbl_gt2, "Intel(R) HD Graphics P630 (Kaby Lake GT2)")
-- 
2.9.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965/fs: Define new shader opcode to set rounding modes

2017-09-05 Thread Francisco Jerez
Alejandro Piñeiro  writes:

> Although it is possible to emit them directly as AND/OR on brw_fs_nir,
> having a specific opcode makes it easier to remove duplicate settings
> later.
>
> v2: (Curro)
>   - Set thread control to 'switch' when using the control register
>   - Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
> with the rounding mode.
>   - Avoid magic numbers setting rounding mode field at control register.
>
> Signed-off-by:  Alejandro Piñeiro 
> Signed-off-by:  Jose Maria Casanova Crespo 
> ---
>  src/intel/compiler/brw_eu.h |  3 +++
>  src/intel/compiler/brw_eu_defines.h | 17 +
>  src/intel/compiler/brw_eu_emit.c| 34 
> +
>  src/intel/compiler/brw_fs_generator.cpp |  5 +
>  src/intel/compiler/brw_shader.cpp   |  4 
>  5 files changed, 63 insertions(+)
>
> diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h
> index 8e597b212a6..106bf03530d 100644
> --- a/src/intel/compiler/brw_eu.h
> +++ b/src/intel/compiler/brw_eu.h
> @@ -500,6 +500,9 @@ brw_broadcast(struct brw_codegen *p,
>struct brw_reg src,
>struct brw_reg idx);
>  
> +void
> +brw_rounding_mode(struct brw_codegen *p,
> +  enum brw_rnd_mode mode);

Missing whitespace line.

>  /***
>   * brw_eu_util.c:
>   */
> diff --git a/src/intel/compiler/brw_eu_defines.h 
> b/src/intel/compiler/brw_eu_defines.h
> index da482b73c58..91d88fe8952 100644
> --- a/src/intel/compiler/brw_eu_defines.h
> +++ b/src/intel/compiler/brw_eu_defines.h
> @@ -388,6 +388,9 @@ enum opcode {
> SHADER_OPCODE_TYPED_SURFACE_WRITE,
> SHADER_OPCODE_TYPED_SURFACE_WRITE_LOGICAL,
>  
> +

Redundant whitespace.

> +   SHADER_OPCODE_RND_MODE,
> +
> SHADER_OPCODE_MEMORY_FENCE,
>  
> SHADER_OPCODE_GEN4_SCRATCH_READ,
> @@ -1214,4 +1217,18 @@ enum brw_message_target {
>  /* R0 */
>  # define GEN7_GS_PAYLOAD_INSTANCE_ID_SHIFT   27
>  
> +/* CR0.0[5:4] Floating-Point Rounding Modes
> + *  Skylake PRM, Volume 7 Part 1, "Control Register", page 756
> + */
> +
> +#define BRW_CR0_RND_MODE_MASK 0x30
> +#define BRW_CR0_RND_MODE_SHIFT4
> +
> +enum PACKED brw_rnd_mode {
> +   BRW_RND_MODE_RTNE = 0,  /* Round to Nearest or Even */
> +   BRW_RND_MODE_RU = 1,/* Round Up, toward +inf */
> +   BRW_RND_MODE_RD = 2,/* Round Down, toward -inf */
> +   BRW_RND_MODE_RTZ = 3/* Round Toward Zero */
> +};
> +
>  #endif /* BRW_EU_DEFINES_H */
> diff --git a/src/intel/compiler/brw_eu_emit.c 
> b/src/intel/compiler/brw_eu_emit.c
> index 8c952e7da26..12164653e47 100644
> --- a/src/intel/compiler/brw_eu_emit.c
> +++ b/src/intel/compiler/brw_eu_emit.c
> @@ -3530,3 +3530,37 @@ brw_WAIT(struct brw_codegen *p)
> brw_inst_set_exec_size(devinfo, insn, BRW_EXECUTE_1);
> brw_inst_set_mask_control(devinfo, insn, BRW_MASK_DISABLE);
>  }
> +
> +/**
> + * Changes the floating point rounding mode updating the control register
> + * field defined at cr0.0[5-6] bits. This function supports the changes to
> + * RTNE (00), RU (01), RD (10) and RTZ (11) rounding using bitwise 
> operations.
> + * Only RTNE and RTZ rounding are enabled at nir.
> + */
> +

Redundant whitespace.

> +void
> +brw_rounding_mode(struct brw_codegen *p,
> +  enum brw_rnd_mode mode)
> +{
> +   const unsigned bits  = mode << BRW_CR0_RND_MODE_SHIFT;
> +
> +   if (bits != BRW_CR0_RND_MODE_MASK) {
> +  brw_inst *inst = brw_AND(p, brw_cr0_reg(0), brw_cr0_reg(0),
> +   brw_imm_ud(~BRW_CR0_RND_MODE_MASK));
> +
> +  /* From the Skylake PRM, Volume 7, page 760:
> +   *  "Implementation Restriction on Register Access: When the control
> +   *   register is used as an explicit source and/or destination, 
> hardware
> +   *   does not ensure execution pipeline coherency. Software must set 
> the
> +   *   thread control field to ‘switch’ for an instruction that uses
> +   *   control register as an explicit operand."
> +   */
> +  brw_inst_set_thread_control(p->devinfo, inst, BRW_THREAD_SWITCH);
> +}
> +
> +   if (bits) {
> +  brw_inst *inst = brw_OR(p, brw_cr0_reg(0), brw_cr0_reg(0),
> +  brw_imm_ud(bits));
> +  brw_inst_set_thread_control(p->devinfo, inst, BRW_THREAD_SWITCH);
> +   }
> +}
> diff --git a/src/intel/compiler/brw_fs_generator.cpp 
> b/src/intel/compiler/brw_fs_generator.cpp
> index afaec5c9497..ff9880ebfe8 100644
> --- a/src/intel/compiler/brw_fs_generator.cpp
> +++ b/src/intel/compiler/brw_fs_generator.cpp
> @@ -2144,6 +2144,11 @@ fs_generator::generate_code(const cfg_t *cfg, int 
> dispatch_width)
>   brw_DIM(p, dst, retype(src[0], BRW_REGISTER_TYPE_F));
>   break;
>  
> +  case SHADER_OPCODE_RND_MODE:
> + assert(src[0].file == BRW_IMMEDIATE_VALUE);
> +   

Re: [Mesa-dev] [PATCH 12/23] intel: Add simple logging façade for Android

2017-09-05 Thread Rob Herring
On Sat, Sep 2, 2017 at 3:17 AM, Chad Versace  wrote:
> I'm bringing up Vulkan in the Android container of Chrome OS (ARC++).
>
> On Android, stdio goes to /dev/null. On Android, remote gdb is even more
> painful than the usual remote gdb. On Android, nothing works like you
> expect and debugging is hell. I need logging.

We do!

You used to be able to do logwrapper at least for system level
services, but that now is a pain to get working thanks to SELinux.

> This patch introduces a small, simple logging API that can easily wrap
> Android's API. On non-Android platforms, this logger does nothing fancy.
> It follows the time-honored Unix tradition of spewing everything to
> stderr with minimal fuss.
>
> My goal here is not perfection. My goal is to make a minimal, clean API,
> that people hate merely a little instead of a lot, and that's good
> enough to let me bring up Android Vulkan.  And it needs to be fast,
> which means it must be small. No one wants to their game to miss frames
> while aiming a flaming bow into the jaws of an angry robot t-rex, and
> thus become t-rex breakfast, because some fool had too much fun desiging
> a bloated, ideal logging API.
>
> If people like it, perhaps we should quickly promote it to src/util.

The only thing I don't like is being Intel specific. There's already a
gallium API (with Android support floating around) as well as ddebug
(which I started Android support for, but haven't gotten that working
yet). Of course, some things still just call fprintf(strerr,...) or
other C lib functions directly. I've hacked up files with "#define
fprintf() ALOGE()" in places I've needed it.

Rob
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] mesa: don't use %s for PACKAGE_VERSION macro

2017-09-05 Thread Eric Anholt
Emil Velikov  writes:

> From: Emil Velikov 
>
> The macro itself is a well defined string, which cannot cause issues
> with printf or other printf-like functions.
>
> All other places through Mesa already use it directly, so let's update
> the final two instances.
>
> Signed-off-by: Emil Velikov 

sha1 is so much more informative than date/time, and improves
reproducibility.  Patch 1-2 are:

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] radv: dump shader stats when a hang is detected

2017-09-05 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_debug.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/amd/vulkan/radv_debug.c b/src/amd/vulkan/radv_debug.c
index f2339dfe71..d9d6b95bb6 100644
--- a/src/amd/vulkan/radv_debug.c
+++ b/src/amd/vulkan/radv_debug.c
@@ -68,6 +68,16 @@ radv_dump_trace(struct radv_device *device, struct 
radeon_winsys_cs *cs)
fclose(f);
 }
 
+static void
+radv_dump_shader(struct radv_device *device,
+struct radv_shader_variant *variant, gl_shader_stage stage)
+{
+   fprintf(stderr, "%s:\n", radv_get_shader_name(variant, stage));
+   fprintf(stderr, "\n%s\n\n", variant->binary.disasm_string);
+
+   radv_shader_dump_stats(device, variant, stage, stderr);
+}
+
 static void
 radv_dump_gfx_shaders(struct radv_pipeline *pipeline)
 {
@@ -81,9 +91,7 @@ radv_dump_gfx_shaders(struct radv_pipeline *pipeline)
variant = pipeline->shaders[stage];
assert(variant);
 
-   fprintf(stderr, "%s:\n%s\n\n",
-   radv_get_shader_name(variant, stage),
-   variant->binary.disasm_string);
+   radv_dump_shader(pipeline->device, variant, stage);
}
 }
 
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] radv: dump the active GFX shaders when a hang is detected

2017-09-05 Thread Samuel Pitoiset
Only the ASM is currently dumped.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_debug.c | 58 +
 1 file changed, 53 insertions(+), 5 deletions(-)

diff --git a/src/amd/vulkan/radv_debug.c b/src/amd/vulkan/radv_debug.c
index a1c0a61997..f2339dfe71 100644
--- a/src/amd/vulkan/radv_debug.c
+++ b/src/amd/vulkan/radv_debug.c
@@ -68,13 +68,49 @@ radv_dump_trace(struct radv_device *device, struct 
radeon_winsys_cs *cs)
fclose(f);
 }
 
+static void
+radv_dump_gfx_shaders(struct radv_pipeline *pipeline)
+{
+   unsigned mask;
+
+   mask = pipeline->active_stages;
+   while (mask) {
+   struct radv_shader_variant *variant;
+   int stage = u_bit_scan();
+
+   variant = pipeline->shaders[stage];
+   assert(variant);
+
+   fprintf(stderr, "%s:\n%s\n\n",
+   radv_get_shader_name(variant, stage),
+   variant->binary.disasm_string);
+   }
+}
+
+static struct radv_pipeline *
+radv_get_bound_pipeline(struct radv_device *device)
+{
+   uint64_t *ptr = (uint64_t *)device->trace_id_ptr;
+
+   return (struct radv_pipeline *)ptr[1];
+}
+
+static void
+radv_dump_gfx_state(struct radv_device *device)
+{
+   struct radv_pipeline *pipeline;
+
+   pipeline = radv_get_bound_pipeline(device);
+   if (!pipeline)
+   return;
+
+   radv_dump_gfx_shaders(pipeline);
+}
+
 static bool
-radv_gpu_hang_occured(struct radv_queue *queue)
+radv_gpu_hang_occured(struct radv_queue *queue, enum ring_type ring)
 {
struct radeon_winsys *ws = queue->device->ws;
-   enum ring_type ring;
-
-   ring = radv_queue_family_to_ring(queue->queue_family_index);
 
if (!ws->ctx_wait_idle(queue->hw_ctx, ring, queue->queue_idx))
return true;
@@ -86,9 +122,12 @@ void
 radv_check_gpu_hangs(struct radv_queue *queue, struct radeon_winsys_cs *cs)
 {
struct radv_device *device = queue->device;
+   enum ring_type ring;
uint64_t addr;
 
-   if (!radv_gpu_hang_occured(queue))
+   ring = radv_queue_family_to_ring(queue->queue_family_index);
+
+   if (!radv_gpu_hang_occured(queue, ring))
return;
 
if (ac_vm_fault_occured(device->physical_device->rad_info.chip_class,
@@ -97,6 +136,15 @@ radv_check_gpu_hangs(struct radv_queue *queue, struct 
radeon_winsys_cs *cs)
fprintf(stderr, "Failing VM page: 0x%08"PRIx64"\n\n", addr);
}
 
+   switch (ring) {
+   case RING_GFX:
+   radv_dump_gfx_state(device);
+   break;
+   default:
+   assert(0);
+   break;
+   }
+
radv_dump_trace(queue->device, cs);
abort();
 }
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/9] radv: add radv_shader_dump_stats() helper

2017-09-05 Thread Samuel Pitoiset
To dump the shader stats when a hang is detected.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_pipeline.c | 64 +-
 src/amd/vulkan/radv_shader.c   | 70 ++
 src/amd/vulkan/radv_shader.h   |  6 
 3 files changed, 77 insertions(+), 63 deletions(-)

diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 7347a0d211..efa1113aa8 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -78,75 +78,13 @@ void radv_DestroyPipeline(
 
 static void radv_dump_pipeline_stats(struct radv_device *device, struct 
radv_pipeline *pipeline)
 {
-   unsigned lds_increment = device->physical_device->rad_info.chip_class 
>= CIK ? 512 : 256;
-   struct radv_shader_variant *var;
-   struct ac_shader_config *conf;
int i;
-   FILE *file = stderr;
-   unsigned max_simd_waves;
-   unsigned lds_per_wave = 0;
-
-   switch (device->physical_device->rad_info.family) {
-   /* These always have 8 waves: */
-   case CHIP_POLARIS10:
-   case CHIP_POLARIS11:
-   case CHIP_POLARIS12:
-   max_simd_waves = 8;
-   break;
-   default:
-   max_simd_waves = 10;
-   }
 
for (i = 0; i < MESA_SHADER_STAGES; i++) {
if (!pipeline->shaders[i])
continue;
-   var = pipeline->shaders[i];
-
-   conf = >config;
-
-   if (i == MESA_SHADER_FRAGMENT) {
-   lds_per_wave = conf->lds_size * lds_increment +
-   align(var->info.fs.num_interp * 48, 
lds_increment);
-   }
 
-   if (conf->num_sgprs) {
-   if (device->physical_device->rad_info.chip_class >= VI)
-   max_simd_waves = MIN2(max_simd_waves, 800 / 
conf->num_sgprs);
-   else
-   max_simd_waves = MIN2(max_simd_waves, 512 / 
conf->num_sgprs);
-   }
-
-   if (conf->num_vgprs)
-   max_simd_waves = MIN2(max_simd_waves, 256 / 
conf->num_vgprs);
-
-   /* LDS is 64KB per CU (4 SIMDs), divided into 16KB blocks per 
SIMD
-* that PS can use.
-*/
-   if (lds_per_wave)
-   max_simd_waves = MIN2(max_simd_waves, 16384 / 
lds_per_wave);
-
-   fprintf(file, "\n%s:\n",
-   radv_get_shader_name(var, i));
-   if (i == MESA_SHADER_FRAGMENT) {
-   fprintf(file, "*** SHADER CONFIG ***\n"
-   "SPI_PS_INPUT_ADDR = 0x%04x\n"
-   "SPI_PS_INPUT_ENA  = 0x%04x\n",
-   conf->spi_ps_input_addr, 
conf->spi_ps_input_ena);
-   }
-   fprintf(file, "*** SHADER STATS ***\n"
-   "SGPRS: %d\n"
-   "VGPRS: %d\n"
-   "Spilled SGPRs: %d\n"
-   "Spilled VGPRs: %d\n"
-   "Code Size: %d bytes\n"
-   "LDS: %d blocks\n"
-   "Scratch: %d bytes per wave\n"
-   "Max Waves: %d\n"
-   "\n\n\n",
-   conf->num_sgprs, conf->num_vgprs,
-   conf->spilled_sgprs, conf->spilled_vgprs, 
var->code_size,
-   conf->lds_size, conf->scratch_bytes_per_wave,
-   max_simd_waves);
+   radv_shader_dump_stats(device, pipeline->shaders[i], i, stderr);
}
 }
 
diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index 0596fb7f54..262dbf34cc 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -526,3 +526,73 @@ radv_get_shader_name(struct radv_shader_variant *var, 
gl_shader_stage stage)
};
 }
 
+void
+radv_shader_dump_stats(struct radv_device *device,
+  struct radv_shader_variant *variant,
+  gl_shader_stage stage,
+  FILE *file)
+{
+   unsigned lds_increment = device->physical_device->rad_info.chip_class 
>= CIK ? 512 : 256;
+   struct ac_shader_config *conf;
+   unsigned max_simd_waves;
+   unsigned lds_per_wave = 0;
+
+   switch (device->physical_device->rad_info.family) {
+   /* These always have 8 waves: */
+   case CHIP_POLARIS10:
+   case CHIP_POLARIS11:
+   case CHIP_POLARIS12:
+   max_simd_waves = 8;
+   break;
+   default:
+   max_simd_waves = 10;
+   }
+
+   conf = >config;
+
+   if (stage == MESA_SHADER_FRAGMENT) {
+   lds_per_wave = conf->lds_size * lds_increment +
+  align(variant->info.fs.num_interp * 48,
+

[Mesa-dev] [PATCH 6/9] radv: store the bound pipeline pointer into the trace BO

2017-09-05 Thread Samuel Pitoiset
When a GPU hang is detected in radv_gpu_hang_occured() we know
which command buffer is faulty but the bound pipeline might
have been updated during the execution.

The pointer to the radv_pipeline object is emitted just after
the second trace ID, that way it would be easy to dump the
active shaders at the moment of the hang.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_cmd_buffer.c | 48 ++--
 1 file changed, 41 insertions(+), 7 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index cc11f272e8..774879e0fd 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -332,6 +332,19 @@ radv_cmd_buffer_upload_data(struct radv_cmd_buffer 
*cmd_buffer,
return true;
 }
 
+static void
+radv_emit_write_data_packet(struct radeon_winsys_cs *cs, uint64_t va,
+   unsigned count, uint32_t *data)
+{
+   radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + count, 0));
+   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
+   S_370_WR_CONFIRM(1) |
+   S_370_ENGINE_SEL(V_370_ME));
+   radeon_emit(cs, va);
+   radeon_emit(cs, va >> 32);
+   radeon_emit_array(cs, data, count);
+}
+
 void radv_cmd_buffer_trace_emit(struct radv_cmd_buffer *cmd_buffer)
 {
struct radv_device *device = cmd_buffer->device;
@@ -349,17 +362,36 @@ void radv_cmd_buffer_trace_emit(struct radv_cmd_buffer 
*cmd_buffer)
 
++cmd_buffer->state.trace_id;
device->ws->cs_add_buffer(cs, device->trace_bo, 8);
-   radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
-   radeon_emit(cs, S_370_DST_SEL(V_370_MEM_ASYNC) |
-   S_370_WR_CONFIRM(1) |
-   S_370_ENGINE_SEL(V_370_ME));
-   radeon_emit(cs, va);
-   radeon_emit(cs, va >> 32);
-   radeon_emit(cs, cmd_buffer->state.trace_id);
+   radv_emit_write_data_packet(cs, va, 1, _buffer->state.trace_id);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, AC_ENCODE_TRACE_POINT(cmd_buffer->state.trace_id));
 }
 
+static void
+radv_cmd_buffer_bound_pipeline_emit(struct radv_cmd_buffer *cmd_buffer,
+   struct radv_pipeline *pipeline)
+{
+   struct radv_device *device = cmd_buffer->device;
+   struct radeon_winsys_cs *cs = cmd_buffer->cs;
+   uint32_t data[2];
+   uint64_t va;
+
+   if (!device->trace_bo)
+   return;
+
+   /* The 64-bit pointer is stored after the second trace ID. */
+   va = device->ws->buffer_get_va(device->trace_bo) + 8;
+
+   MAYBE_UNUSED unsigned cdw_max = radeon_check_space(device->ws,
+  cmd_buffer->cs, 6);
+
+   data[0] = (uintptr_t)pipeline;
+   data[1] = (uintptr_t)pipeline >> 32;
+
+   device->ws->cs_add_buffer(cs, device->trace_bo, 8);
+   radv_emit_write_data_packet(cs, va, 2, data);
+}
+
 static void
 radv_emit_graphics_blend_state(struct radv_cmd_buffer *cmd_buffer,
   struct radv_pipeline *pipeline)
@@ -2351,6 +2383,8 @@ void radv_CmdBindPipeline(
assert(!"invalid bind point");
break;
}
+
+   radv_cmd_buffer_bound_pipeline_emit(cmd_buffer, pipeline);
 }
 
 void radv_CmdSetViewport(
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] radv: free the disasm string when shaders are destroyed

2017-09-05 Thread Samuel Pitoiset
To dump the ASM when RADV_TRACE_FILE is used and a hang is
detected.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_shader.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index 44a1f64737..0596fb7f54 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -426,7 +426,6 @@ shader_variant_create(struct radv_device *device,
free(binary->rodata);
free(binary->global_symbol_offsets);
free(binary->relocs);
-   free(binary->disasm_string);
variant->ref_count = 1;
return variant;
 }
@@ -471,6 +470,8 @@ void
 radv_shader_variant_destroy(struct radv_device *device,
struct radv_shader_variant *variant)
 {
+   struct ac_shader_binary *binary = >binary;
+
if (!p_atomic_dec_zero(>ref_count))
return;
 
@@ -478,6 +479,7 @@ radv_shader_variant_destroy(struct radv_device *device,
list_del(>slab_list);
mtx_unlock(>shader_slab_mutex);
 
+   free(binary->disasm_string);
free(variant);
 }
 
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] radv: store the shader binary into radv_shader_variant

2017-09-05 Thread Samuel Pitoiset
This will allow to dump the active shaders when a hang is
detected. Only the ASM will be dumped for now.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_shader.c | 27 ++-
 src/amd/vulkan/radv_shader.h |  1 +
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index de7d9a2752..44a1f64737 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -326,10 +326,10 @@ radv_destroy_shader_slabs(struct radv_device *device)
 static void
 radv_fill_shader_variant(struct radv_device *device,
 struct radv_shader_variant *variant,
-struct ac_shader_binary *binary,
 gl_shader_stage stage)
 {
bool scratch_enabled = variant->config.scratch_bytes_per_wave > 0;
+   struct ac_shader_binary *binary = >binary;
unsigned vgpr_comp_cnt = 0;
 
if (scratch_enabled && !device->llvm_supports_spill)
@@ -387,12 +387,13 @@ shader_variant_create(struct radv_device *device,
bool dump_shaders = device->debug_flags & RADV_DEBUG_DUMP_SHADERS;
enum ac_target_machine_options tm_options = 0;
struct radv_shader_variant *variant;
-   struct ac_shader_binary binary;
+   struct ac_shader_binary *binary;
LLVMTargetMachineRef tm;
 
variant = calloc(1, sizeof(struct radv_shader_variant));
if (!variant)
return NULL;
+   binary = >binary;
 
options->family = chip_family;
options->chip_class = device->physical_device->rad_info.chip_class;
@@ -404,28 +405,28 @@ shader_variant_create(struct radv_device *device,
tm = ac_create_target_machine(chip_family, tm_options);
 
if (gs_copy_shader) {
-   ac_create_gs_copy_shader(tm, shader, , >config,
+   ac_create_gs_copy_shader(tm, shader, binary, >config,
 >info, options, dump_shaders);
} else {
-   ac_compile_nir_shader(tm, , >config,
+   ac_compile_nir_shader(tm, binary, >config,
  >info, shader, options,
  dump_shaders);
}
 
LLVMDisposeTargetMachine(tm);
 
-   radv_fill_shader_variant(device, variant, , stage);
+   radv_fill_shader_variant(device, variant, stage);
 
if (code_out) {
-   *code_out = binary.code;
-   *code_size_out = binary.code_size;
+   *code_out = binary->code;
+   *code_size_out = binary->code_size;
} else
-   free(binary.code);
-   free(binary.config);
-   free(binary.rodata);
-   free(binary.global_symbol_offsets);
-   free(binary.relocs);
-   free(binary.disasm_string);
+   free(binary->code);
+   free(binary->config);
+   free(binary->rodata);
+   free(binary->global_symbol_offsets);
+   free(binary->relocs);
+   free(binary->disasm_string);
variant->ref_count = 1;
return variant;
 }
diff --git a/src/amd/vulkan/radv_shader.h b/src/amd/vulkan/radv_shader.h
index b0bf22eb76..aaf6e49e80 100644
--- a/src/amd/vulkan/radv_shader.h
+++ b/src/amd/vulkan/radv_shader.h
@@ -44,6 +44,7 @@ struct radv_shader_variant {
 
struct radeon_winsys_bo *bo;
uint64_t bo_offset;
+   struct ac_shader_binary binary;
struct ac_shader_config config;
struct ac_shader_variant_info info;
unsigned rsrc1;
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] radv: add shader_variant_create() helper function

2017-09-05 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_shader.c | 114 +--
 1 file changed, 56 insertions(+), 58 deletions(-)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index c0fbdd3d49..de7d9a2752 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -374,42 +374,47 @@ radv_fill_shader_variant(struct radv_device *device,
memcpy(ptr, binary->code, binary->code_size);
 }
 
-struct radv_shader_variant *
-radv_shader_variant_create(struct radv_device *device,
-  struct nir_shader *shader,
-  struct radv_pipeline_layout *layout,
-  const struct ac_shader_variant_key *key,
-  void **code_out,
-  unsigned *code_size_out)
+static struct radv_shader_variant *
+shader_variant_create(struct radv_device *device,
+ struct nir_shader *shader,
+ gl_shader_stage stage,
+ struct ac_nir_compiler_options *options,
+ bool gs_copy_shader,
+ void **code_out,
+ unsigned *code_size_out)
 {
-   struct radv_shader_variant *variant = calloc(1, sizeof(struct 
radv_shader_variant));
enum radeon_family chip_family = 
device->physical_device->rad_info.family;
+   bool dump_shaders = device->debug_flags & RADV_DEBUG_DUMP_SHADERS;
+   enum ac_target_machine_options tm_options = 0;
+   struct radv_shader_variant *variant;
+   struct ac_shader_binary binary;
LLVMTargetMachineRef tm;
+
+   variant = calloc(1, sizeof(struct radv_shader_variant));
if (!variant)
return NULL;
 
-   struct ac_nir_compiler_options options = {0};
-   options.layout = layout;
-   if (key)
-   options.key = *key;
+   options->family = chip_family;
+   options->chip_class = device->physical_device->rad_info.chip_class;
 
-   struct ac_shader_binary binary;
-   enum ac_target_machine_options tm_options = 0;
-   options.unsafe_math = !!(device->debug_flags & RADV_DEBUG_UNSAFE_MATH);
-   options.family = chip_family;
-   options.chip_class = device->physical_device->rad_info.chip_class;
-   options.supports_spill = device->llvm_supports_spill;
-   if (options.supports_spill)
+   if (options->supports_spill)
tm_options |= AC_TM_SUPPORTS_SPILL;
if (device->instance->perftest_flags & RADV_PERFTEST_SISCHED)
tm_options |= AC_TM_SISCHED;
tm = ac_create_target_machine(chip_family, tm_options);
-   ac_compile_nir_shader(tm, , >config,
- >info, shader, ,
- device->debug_flags & RADV_DEBUG_DUMP_SHADERS);
+
+   if (gs_copy_shader) {
+   ac_create_gs_copy_shader(tm, shader, , >config,
+>info, options, dump_shaders);
+   } else {
+   ac_compile_nir_shader(tm, , >config,
+ >info, shader, options,
+ dump_shaders);
+   }
+
LLVMDisposeTargetMachine(tm);
 
-   radv_fill_shader_variant(device, variant, , shader->stage);
+   radv_fill_shader_variant(device, variant, , stage);
 
if (code_out) {
*code_out = binary.code;
@@ -426,46 +431,39 @@ radv_shader_variant_create(struct radv_device *device,
 }
 
 struct radv_shader_variant *
-radv_create_gs_copy_shader(struct radv_device *device, struct nir_shader *nir,
-  void **code_out, unsigned *code_size_out,
-  bool multiview)
+radv_shader_variant_create(struct radv_device *device,
+  struct nir_shader *shader,
+  struct radv_pipeline_layout *layout,
+  const struct ac_shader_variant_key *key,
+  void **code_out,
+  unsigned *code_size_out)
 {
-   struct radv_shader_variant *variant = calloc(1, sizeof(struct 
radv_shader_variant));
-   enum radeon_family chip_family = 
device->physical_device->rad_info.family;
-   LLVMTargetMachineRef tm;
-   if (!variant)
-   return NULL;
+   struct ac_nir_compiler_options options = {0};
 
+   options.layout = layout;
+   if (key)
+   options.key = *key;
+
+   options.unsafe_math = !!(device->debug_flags & RADV_DEBUG_UNSAFE_MATH);
+   options.supports_spill = device->llvm_supports_spill;
+
+   return shader_variant_create(device, shader, shader->stage,
+, false, code_out, code_size_out);
+}
+
+struct radv_shader_variant *
+radv_create_gs_copy_shader(struct radv_device *device,
+  struct nir_shader *shader,
+  void 

[Mesa-dev] [PATCH 1/9] radv: move shaders related code to radv_shader.c

2017-09-05 Thread Samuel Pitoiset
Reduce size of radv_pipeline.c and improve code isolation. More
code can probably moved but it's a start.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/Makefile.sources  |   1 +
 src/amd/vulkan/radv_cmd_buffer.c |  28 +-
 src/amd/vulkan/radv_debug.c  |   1 +
 src/amd/vulkan/radv_device.c |   1 +
 src/amd/vulkan/radv_meta.h   |   1 +
 src/amd/vulkan/radv_pipeline.c   | 458 +-
 src/amd/vulkan/radv_pipeline_cache.c |   1 +
 src/amd/vulkan/radv_private.h|  43 +--
 src/amd/vulkan/radv_shader.c | 526 +++
 src/amd/vulkan/radv_shader.h | 104 +++
 src/amd/vulkan/si_cmd_buffer.c   |   1 +
 11 files changed, 642 insertions(+), 523 deletions(-)
 create mode 100644 src/amd/vulkan/radv_shader.c
 create mode 100644 src/amd/vulkan/radv_shader.h

diff --git a/src/amd/vulkan/Makefile.sources b/src/amd/vulkan/Makefile.sources
index 96399a246e..9489219f5b 100644
--- a/src/amd/vulkan/Makefile.sources
+++ b/src/amd/vulkan/Makefile.sources
@@ -58,6 +58,7 @@ VULKAN_FILES := \
radv_pipeline_cache.c \
radv_private.h \
radv_radeon_winsys.h \
+   radv_shader.c \
radv_query.c \
radv_util.c \
radv_util.h \
diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 4766b115dc..cc11f272e8 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -27,6 +27,7 @@
 
 #include "radv_private.h"
 #include "radv_radeon_winsys.h"
+#include "radv_shader.h"
 #include "radv_cs.h"
 #include "sid.h"
 #include "gfx9d.h"
@@ -400,33 +401,6 @@ static unsigned radv_pack_float_12p4(float x)
   x >= 4096 ? 0x : x * 16;
 }
 
-uint32_t
-radv_shader_stage_to_user_data_0(gl_shader_stage stage, bool has_gs, bool 
has_tess)
-{
-   switch (stage) {
-   case MESA_SHADER_FRAGMENT:
-   return R_00B030_SPI_SHADER_USER_DATA_PS_0;
-   case MESA_SHADER_VERTEX:
-   if (has_tess)
-   return R_00B530_SPI_SHADER_USER_DATA_LS_0;
-   else
-   return has_gs ? R_00B330_SPI_SHADER_USER_DATA_ES_0 : 
R_00B130_SPI_SHADER_USER_DATA_VS_0;
-   case MESA_SHADER_GEOMETRY:
-   return R_00B230_SPI_SHADER_USER_DATA_GS_0;
-   case MESA_SHADER_COMPUTE:
-   return R_00B900_COMPUTE_USER_DATA_0;
-   case MESA_SHADER_TESS_CTRL:
-   return R_00B430_SPI_SHADER_USER_DATA_HS_0;
-   case MESA_SHADER_TESS_EVAL:
-   if (has_gs)
-   return R_00B330_SPI_SHADER_USER_DATA_ES_0;
-   else
-   return R_00B130_SPI_SHADER_USER_DATA_VS_0;
-   default:
-   unreachable("unknown shader");
-   }
-}
-
 struct ac_userdata_info *
 radv_lookup_user_sgpr(struct radv_pipeline *pipeline,
  gl_shader_stage stage,
diff --git a/src/amd/vulkan/radv_debug.c b/src/amd/vulkan/radv_debug.c
index 949eeea2f3..a1c0a61997 100644
--- a/src/amd/vulkan/radv_debug.c
+++ b/src/amd/vulkan/radv_debug.c
@@ -30,6 +30,7 @@
 
 #include "ac_debug.h"
 #include "radv_debug.h"
+#include "radv_shader.h"
 
 bool
 radv_init_trace(struct radv_device *device)
diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 7c218b1478..d146db43e2 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -31,6 +31,7 @@
 #include 
 #include "radv_debug.h"
 #include "radv_private.h"
+#include "radv_shader.h"
 #include "radv_cs.h"
 #include "util/disk_cache.h"
 #include "util/strtod.h"
diff --git a/src/amd/vulkan/radv_meta.h b/src/amd/vulkan/radv_meta.h
index adc889bf4e..d84d8cb68c 100644
--- a/src/amd/vulkan/radv_meta.h
+++ b/src/amd/vulkan/radv_meta.h
@@ -27,6 +27,7 @@
 #define RADV_META_H
 
 #include "radv_private.h"
+#include "radv_shader.h"
 
 #ifdef __cplusplus
 extern "C" {
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index f2d1b491b7..7e19af8c3f 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -29,6 +29,7 @@
 #include "util/u_atomic.h"
 #include "radv_debug.h"
 #include "radv_private.h"
+#include "radv_shader.h"
 #include "nir/nir.h"
 #include "nir/nir_builder.h"
 #include "spirv/nir_spirv.h"
@@ -46,73 +47,6 @@
 #include "util/debug.h"
 #include "ac_exp_param.h"
 
-void radv_shader_variant_destroy(struct radv_device *device,
- struct radv_shader_variant *variant);
-
-static const struct nir_shader_compiler_options nir_options = {
-   .vertex_id_zero_based = true,
-   .lower_scmp = true,
-   .lower_flrp32 = true,
-   .lower_fsat = true,
-   .lower_fdiv = true,
-   .lower_sub = true,
-   .lower_pack_snorm_2x16 = true,
-   .lower_pack_snorm_4x8 = true,
-   .lower_pack_unorm_2x16 = true,
-   .lower_pack_unorm_4x8 = true,
-   .lower_unpack_snorm_2x16 = true,
- 

[Mesa-dev] [PATCH 2/9] radv: drop 'dump' parameters from some shader related functions

2017-09-05 Thread Samuel Pitoiset
The device object contains the debug flags.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_pipeline.c | 16 +++-
 src/amd/vulkan/radv_shader.c   | 17 +
 src/amd/vulkan/radv_shader.h   |  8 +++-
 3 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 7e19af8c3f..7347a0d211 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -166,7 +166,6 @@ radv_pipeline_compile(struct radv_pipeline *pipeline,
nir_shader *nir;
void *code = NULL;
unsigned code_size = 0;
-   bool dump = (pipeline->device->debug_flags & RADV_DEBUG_DUMP_SHADERS);
 
if (module->nir)
_mesa_sha1_compute(module->nir->info.name,
@@ -196,14 +195,14 @@ radv_pipeline_compile(struct radv_pipeline *pipeline,
 
nir = radv_shader_compile_to_nir(pipeline->device,
 module, entrypoint, stage,
-spec_info, dump);
+spec_info);
if (nir == NULL)
return NULL;
 
if (!variant) {
variant = radv_shader_variant_create(pipeline->device, nir,
 layout, key, ,
-_size, dump);
+_size);
}
 
if (stage == MESA_SHADER_GEOMETRY && !pipeline->gs_copy_shader) {
@@ -211,7 +210,7 @@ radv_pipeline_compile(struct radv_pipeline *pipeline,
unsigned gs_copy_code_size = 0;
pipeline->gs_copy_shader = radv_create_gs_copy_shader(
pipeline->device, nir, _copy_code,
-   _copy_code_size, dump, 
key->has_multiview_view_index);
+   _copy_code_size, key->has_multiview_view_index);
 
if (pipeline->gs_copy_shader) {
pipeline->gs_copy_shader =
@@ -276,7 +275,6 @@ radv_tess_pipeline_compile(struct radv_pipeline *pipeline,
unsigned tes_code_size = 0, tcs_code_size = 0;
struct ac_shader_variant_key tes_key;
struct ac_shader_variant_key tcs_key;
-   bool dump = (pipeline->device->debug_flags & RADV_DEBUG_DUMP_SHADERS);
 
tes_key = radv_compute_tes_key(radv_pipeline_has_gs(pipeline),
   
pipeline->shaders[MESA_SHADER_FRAGMENT]->info.fs.prim_id_input);
@@ -314,13 +312,13 @@ radv_tess_pipeline_compile(struct radv_pipeline *pipeline,
 
tes_nir = radv_shader_compile_to_nir(pipeline->device,
 tes_module, tes_entrypoint, 
MESA_SHADER_TESS_EVAL,
-tes_spec_info, dump);
+tes_spec_info);
if (tes_nir == NULL)
return;
 
tcs_nir = radv_shader_compile_to_nir(pipeline->device,
 tcs_module, tcs_entrypoint, 
MESA_SHADER_TESS_CTRL,
-tcs_spec_info, dump);
+tcs_spec_info);
if (tcs_nir == NULL)
return;
 
@@ -329,7 +327,7 @@ radv_tess_pipeline_compile(struct radv_pipeline *pipeline,
 
tes_variant = radv_shader_variant_create(pipeline->device, tes_nir,
 layout, _key, _code,
-_code_size, dump);
+_code_size);
 
tcs_key = radv_compute_tcs_key(tes_nir->info.tess.primitive_mode, 
input_vertices);
if (tcs_module->nir)
@@ -341,7 +339,7 @@ radv_tess_pipeline_compile(struct radv_pipeline *pipeline,
 
tcs_variant = radv_shader_variant_create(pipeline->device, tcs_nir,
 layout, _key, _code,
-_code_size, dump);
+_code_size);
 
if (!tes_module->nir)
ralloc_free(tes_nir);
diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index 9bb8f1ddf2..c0fbdd3d49 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -150,8 +150,7 @@ radv_shader_compile_to_nir(struct radv_device *device,
   struct radv_shader_module *module,
   const char *entrypoint_name,
   gl_shader_stage stage,
-  const VkSpecializationInfo *spec_info,
-  bool dump)
+  const VkSpecializationInfo *spec_info)
 {
if (strcmp(entrypoint_name, "main") != 0) {
radv_finishme("Multiple shaders per module not really 
supported");
@@ -263,7 +262,7 @@ 

[Mesa-dev] [Bug 102496] Frontbuffer rendering corruption on mesa master

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102496

--- Comment #2 from Bruce Cherniak  ---
"no animation" may be a better description.  This is exactly what I'm seeing on
llvmpipe, swr, and softpipe.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/8] swr/rast: SIMD16 FE remove templated immediates workaround

2017-09-05 Thread Tim Rowley
Fixed properly in gcc-compatible fashion.
---
 src/gallium/drivers/swr/rasterizer/core/binner.cpp | 110 -
 1 file changed, 20 insertions(+), 90 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index e09ff7a..832c47d 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -404,35 +404,6 @@ void ProcessUserClipDist(PA_STATE& pa, uint32_t primIndex, 
uint8_t clipDistMask,
 }
 }
 
-// WA linux compiler issue with SIMDLIB and shift immediates
-#define SIMD_WA_SXXI_EPI32 1
-
-#if SIMD_WA_SXXI_EPI32
-template
-simdscalari simd_wa_slli_epi32(simdscalari a)
-{
-return SIMD256::slli_epi32(a);
-}
-
-template
-simd16scalari simd_wa_slli_epi32(simd16scalari a)
-{
-return SIMD512::slli_epi32(a);
-}
-
-template
-simdscalari simd_wa_srai_epi32(simdscalari a)
-{
-return SIMD256::srai_epi32(a);
-}
-
-template
-simd16scalari simd_wa_srai_epi32(simd16scalari a)
-{
-return SIMD512::srai_epi32(a);
-}
-
-#endif
 INLINE
 void TransposeVertices(simd4scalar()[8], const simdscalar , const 
simdscalar , const simdscalar )
 {
@@ -804,17 +775,10 @@ endBinTriangles:
 }
 
 // Convert triangle bbox to macrotile units.
-#if SIMD_WA_SXXI_EPI32
-bbox.xmin = 
simd_wa_srai_epi32(bbox.xmin);
-bbox.ymin = 
simd_wa_srai_epi32(bbox.ymin);
-bbox.xmax = 
simd_wa_srai_epi32(bbox.xmax);
-bbox.ymax = 
simd_wa_srai_epi32(bbox.ymax);
-#else
-bbox.xmin = 
SIMD_T::srai_epi32(bbox.xmin);
-bbox.ymin = 
SIMD_T::srai_epi32(bbox.ymin);
-bbox.xmax = 
SIMD_T::srai_epi32(bbox.xmax);
-bbox.ymax = 
SIMD_T::srai_epi32(bbox.ymax);
-#endif
+bbox.xmin = SIMD_T::template 
srai_epi32(bbox.xmin);
+bbox.ymin = SIMD_T::template 
srai_epi32(bbox.ymin);
+bbox.xmax = SIMD_T::template 
srai_epi32(bbox.xmax);
+bbox.ymax = SIMD_T::template 
srai_epi32(bbox.ymax);
 
 OSALIGNSIMD16(uint32_t) aMTLeft[SIMD_WIDTH], aMTRight[SIMD_WIDTH], 
aMTTop[SIMD_WIDTH], aMTBottom[SIMD_WIDTH];
 
@@ -1034,13 +998,8 @@ void BinPostSetupPointsImpl(
 primMask &= ~SIMD_T::movemask_ps(SIMD_T::castsi_ps(vYi));
 
 // compute macro tile coordinates 
-#if SIMD_WA_SXXI_EPI32
-typename SIMD_T::Integer macroX = 
simd_wa_srai_epi32(vXi);
-typename SIMD_T::Integer macroY = 
simd_wa_srai_epi32(vYi);
-#else
-typename SIMD_T::Integer macroX = 
SIMD_T::srai_epi32(vXi);
-typename SIMD_T::Integer macroY = 
SIMD_T::srai_epi32(vYi);
-#endif
+typename SIMD_T::Integer macroX = SIMD_T::template 
srai_epi32(vXi);
+typename SIMD_T::Integer macroY = SIMD_T::template 
srai_epi32(vYi);
 
 OSALIGNSIMD16(uint32_t) aMacroX[SIMD_WIDTH], aMacroY[SIMD_WIDTH];
 
@@ -1048,30 +1007,15 @@ void BinPostSetupPointsImpl(
 SIMD_T::store_si(reinterpret_cast(aMacroY), macroY);
 
 // compute raster tile coordinates
-#if SIMD_WA_SXXI_EPI32
-typename SIMD_T::Integer rasterX = 
simd_wa_srai_epi32(vXi);
-typename SIMD_T::Integer rasterY = 
simd_wa_srai_epi32(vYi);
-#else
-typename SIMD_T::Integer rasterX = 
SIMD_T::srai_epi32(vXi);
-typename SIMD_T::Integer rasterY = 
SIMD_T::srai_epi32(vYi);
-#endif
+typename SIMD_T::Integer rasterX = SIMD_T::template 
srai_epi32(vXi);
+typename SIMD_T::Integer rasterY = SIMD_T::template 
srai_epi32(vYi);
 
 // compute raster tile relative x,y for coverage mask
-#if SIMD_WA_SXXI_EPI32
-typename SIMD_T::Integer tileAlignedX = 
simd_wa_slli_epi32(rasterX);
-typename SIMD_T::Integer tileAlignedY = 
simd_wa_slli_epi32(rasterY);
-#else
-typename SIMD_T::Integer tileAlignedX = 
SIMD_T::slli_epi32(rasterX);
-typename SIMD_T::Integer tileAlignedY = 
SIMD_T::slli_epi32(rasterY);
-#endif
+typename SIMD_T::Integer tileAlignedX = SIMD_T::template 
slli_epi32(rasterX);
+typename SIMD_T::Integer tileAlignedY = SIMD_T::template 
slli_epi32(rasterY);
 
-#if SIMD_WA_SXXI_EPI32
-typename SIMD_T::Integer tileRelativeX = 
SIMD_T::sub_epi32(simd_wa_srai_epi32(vXi), tileAlignedX);
-typename SIMD_T::Integer tileRelativeY = 
SIMD_T::sub_epi32(simd_wa_srai_epi32(vYi), tileAlignedY);
-#else
-typename SIMD_T::Integer tileRelativeX = 
SIMD_T::sub_epi32(SIMD_T::srai_epi32(vXi), tileAlignedX);
-typename SIMD_T::Integer tileRelativeY = 
SIMD_T::sub_epi32(SIMD_T::srai_epi32(vYi), tileAlignedY);
-#endif
+typename SIMD_T::Integer tileRelativeX = 
SIMD_T::sub_epi32(SIMD_T::template srai_epi32(vXi), 
tileAlignedX);
+typename SIMD_T::Integer tileRelativeY = 
SIMD_T::sub_epi32(SIMD_T::template srai_epi32(vYi), 
tileAlignedY);
 
 OSALIGNSIMD16(uint32_t) aTileRelativeX[SIMD_WIDTH];
 OSALIGNSIMD16(uint32_t) aTileRelativeY[SIMD_WIDTH];
@@ -1223,17 +1167,10 @@ void BinPostSetupPointsImpl(
 primMask = primMask & ~maskOutsideScissor;
 
 // Convert 

[Mesa-dev] [PATCH 7/8] swr/rast: Remove use of C++14 template variable

2017-09-05 Thread Tim Rowley
SWR rasterizer must remain C++11 compliant.
---
 src/gallium/drivers/swr/rasterizer/core/binner.cpp |  6 +++---
 src/gallium/drivers/swr/rasterizer/core/binner.h   | 14 +++---
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.cpp 
b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
index 832c47d..01c2f8f 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.cpp
@@ -502,7 +502,7 @@ void SIMDCALL BinTrianglesImpl(
 }
 
 // Adjust for pixel center location
-typename SIMD_T::Float offset = 
g_pixelOffsets[rastState.pixelLocation];
+typename SIMD_T::Float offset = 
SwrPixelOffsets::GetOffset(rastState.pixelLocation);
 
 tri[0].x = SIMD_T::add_ps(tri[0].x, offset);
 tri[0].y = SIMD_T::add_ps(tri[0].y, offset);
@@ -1332,7 +1332,7 @@ void BinPointsImpl(
 }
 }
 
-typename SIMD_T::Float offset = 
g_pixelOffsets[rastState.pixelLocation];
+typename SIMD_T::Float offset = 
SwrPixelOffsets::GetOffset(rastState.pixelLocation);
 
 prim[0].x = SIMD_T::add_ps(prim[0].x, offset);
 prim[0].y = SIMD_T::add_ps(prim[0].y, offset);
@@ -1666,7 +1666,7 @@ void SIMDCALL BinLinesImpl(
 }
 
 // adjust for pixel center location
-typename SIMD_T::Float offset = 
g_pixelOffsets[rastState.pixelLocation];
+typename SIMD_T::Float offset = 
SwrPixelOffsets::GetOffset(rastState.pixelLocation);
 
 prim[0].x = SIMD_T::add_ps(prim[0].x, offset);
 prim[0].y = SIMD_T::add_ps(prim[0].y, offset);
diff --git a/src/gallium/drivers/swr/rasterizer/core/binner.h 
b/src/gallium/drivers/swr/rasterizer/core/binner.h
index e842aa6..97e113f 100644
--- a/src/gallium/drivers/swr/rasterizer/core/binner.h
+++ b/src/gallium/drivers/swr/rasterizer/core/binner.h
@@ -31,11 +31,19 @@
 //
 /// @brief Offsets added to post-viewport vertex positions based on
 /// raster state.
+///
+/// Can't use templated variable because we must stick with C++11 features.
+/// Template variables were introduced with C++14
 template 
-static const typename SIMD_T::Float g_pixelOffsets[SWR_PIXEL_LOCATION_UL + 1] =
+struct SwrPixelOffsets
 {
-SIMD_T::set1_ps(0.0f),  // SWR_PIXEL_LOCATION_CENTER
-SIMD_T::set1_ps(0.5f),  // SWR_PIXEL_LOCATION_UL
+public:
+INLINE static typename SIMD_T::Float GetOffset(uint32_t loc)
+{
+SWR_ASSERT(loc <= 1);
+
+return SIMD_T::set1_ps(loc ? 0.5f : 0.0f);
+}
 };
 
 //
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/8] swr: update rasterizer

2017-09-05 Thread Tim Rowley
Highlight is starting to unify the simd/simd16 code, removing lots of
temporary code duplication.

No piglit or vtk test regressions.

Tim Rowley (8):
  swr/rast: Allow gather of floats from fetch shader with 2-4GB offsets
  swr: set caps for VB 4-byte alignment
  swr/rast: Removed some trailing whitespace caught during review
  swr/rast: FE/Binner - unify SIMD8/16 functions using simdlib types
  swr/rast: SIMD16 PA - rename Assemble_simd16 to Assemble
  swr/rast: SIMD16 FE remove templated immediates workaround
  swr/rast: Remove use of C++14 template variable
  swr/rast: FE/Clipper - unify SIMD8/16 functions using simdlib types

 .../swr/rasterizer/codegen/gen_llvm_ir_macros.py   |1 +
 .../codegen/templates/gen_ar_eventhandlerfile.hpp  |4 +-
 src/gallium/drivers/swr/rasterizer/core/binner.cpp | 2312 ++--
 src/gallium/drivers/swr/rasterizer/core/binner.h   |  192 +-
 src/gallium/drivers/swr/rasterizer/core/clip.cpp   |   16 +-
 src/gallium/drivers/swr/rasterizer/core/clip.h | 1654 --
 .../drivers/swr/rasterizer/core/conservativeRast.h |1 +
 src/gallium/drivers/swr/rasterizer/core/fifo.hpp   |4 +-
 .../drivers/swr/rasterizer/core/frontend.cpp   |6 +-
 src/gallium/drivers/swr/rasterizer/core/pa.h   |   20 +-
 src/gallium/drivers/swr/rasterizer/core/state.h|7 +
 src/gallium/drivers/swr/rasterizer/core/utils.h|8 +
 .../drivers/swr/rasterizer/jitter/fetch_jit.cpp|7 +-
 src/gallium/drivers/swr/swr_screen.cpp |9 +-
 14 files changed, 1193 insertions(+), 3048 deletions(-)

-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/8] swr/rast: SIMD16 PA - rename Assemble_simd16 to Assemble

2017-09-05 Thread Tim Rowley
For consistency and to support overloading.
---
 src/gallium/drivers/swr/rasterizer/core/clip.h | 18 +-
 .../drivers/swr/rasterizer/core/frontend.cpp   |  6 +++---
 src/gallium/drivers/swr/rasterizer/core/pa.h   | 22 +++---
 3 files changed, 15 insertions(+), 31 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index ffc69c4..5238284 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -399,8 +399,8 @@ public:
 simd16vector vClipCullDistLo[3];
 simd16vector vClipCullDistHi[3];
 
-pa.Assemble_simd16(VERTEX_CLIPCULL_DIST_LO_SLOT, vClipCullDistLo);
-pa.Assemble_simd16(VERTEX_CLIPCULL_DIST_HI_SLOT, vClipCullDistHi);
+pa.Assemble(VERTEX_CLIPCULL_DIST_LO_SLOT, vClipCullDistLo);
+pa.Assemble(VERTEX_CLIPCULL_DIST_HI_SLOT, vClipCullDistHi);
 
 DWORD index;
 while (_BitScanForward(, cullMask))
@@ -680,7 +680,7 @@ public:
 {
 #if USE_SIMD16_FRONTEND
 simd16vector attrib_simd16[NumVertsPerPrim];
-bool assemble = 
clipPa.Assemble_simd16(VERTEX_POSITION_SLOT, attrib_simd16);
+bool assemble = clipPa.Assemble(VERTEX_POSITION_SLOT, 
attrib_simd16);
 
 if (assemble)
 {
@@ -731,7 +731,7 @@ public:
 
 // assemble pos
 simd16vector tmpVector[NumVertsPerPrim];
-pa.Assemble_simd16(VERTEX_POSITION_SLOT, tmpVector);
+pa.Assemble(VERTEX_POSITION_SLOT, tmpVector);
 for (uint32_t i = 0; i < NumVertsPerPrim; ++i)
 {
 vertices[i].attrib[VERTEX_POSITION_SLOT] = tmpVector[i];
@@ -748,7 +748,7 @@ public:
 maxSlot = std::max(maxSlot, mapSlot);
 uint32_t inputSlot = backendState.vertexAttribOffset + mapSlot;
 
-pa.Assemble_simd16(inputSlot, tmpVector);
+pa.Assemble(inputSlot, tmpVector);
 
 // if constant interpolation enabled for this attribute, assign 
the provoking
 // vertex values to all edges
@@ -771,7 +771,7 @@ public:
 // assemble user clip distances if enabled
 if (this->state.rastState.clipDistanceMask & 0xf)
 {
-pa.Assemble_simd16(VERTEX_CLIPCULL_DIST_LO_SLOT, tmpVector);
+pa.Assemble(VERTEX_CLIPCULL_DIST_LO_SLOT, tmpVector);
 for (uint32_t i = 0; i < NumVertsPerPrim; ++i)
 {
 vertices[i].attrib[VERTEX_CLIPCULL_DIST_LO_SLOT] = 
tmpVector[i];
@@ -780,7 +780,7 @@ public:
 
 if (this->state.rastState.clipDistanceMask & 0xf0)
 {
-pa.Assemble_simd16(VERTEX_CLIPCULL_DIST_HI_SLOT, tmpVector);
+pa.Assemble(VERTEX_CLIPCULL_DIST_HI_SLOT, tmpVector);
 for (uint32_t i = 0; i < NumVertsPerPrim; ++i)
 {
 vertices[i].attrib[VERTEX_CLIPCULL_DIST_HI_SLOT] = 
tmpVector[i];
@@ -919,7 +919,7 @@ public:
 do
 {
 simd16vector attrib[NumVertsPerPrim];
-bool assemble = 
clipPa.Assemble_simd16(VERTEX_POSITION_SLOT, attrib);
+bool assemble = clipPa.Assemble(VERTEX_POSITION_SLOT, 
attrib);
 
 if (assemble)
 {
@@ -1060,7 +1060,7 @@ public:
 if (state.backendState.readViewportArrayIndex)
 {
 simd16vector vpiAttrib[NumVertsPerPrim];
-pa.Assemble_simd16(VERTEX_SGV_SLOT, vpiAttrib);
+pa.Assemble(VERTEX_SGV_SLOT, vpiAttrib);
 
 // OOB indices => forced to zero.
 simd16scalari vpai = 
_simd16_castps_si(vpiAttrib[0][VERTEX_SGV_VAI_COMP]);
diff --git a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp 
b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
index 406a0e0..f882869 100644
--- a/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/frontend.cpp
@@ -929,7 +929,7 @@ static void GeometryShaderStage(
 #if USE_SIMD16_FRONTEND
 simd16vector attrib_simd16[3];
 
-bool assemble = 
gsPa.Assemble_simd16(VERTEX_POSITION_SLOT, attrib_simd16);
+bool assemble = gsPa.Assemble(VERTEX_POSITION_SLOT, 
attrib_simd16);
 
 #else
 bool assemble = gsPa.Assemble(VERTEX_POSITION_SLOT, 
attrib);
@@ -1297,7 +1297,7 @@ static void TessellationStages(
 AR_BEGIN(FEPAAssemble, pDC->drawId);
 bool assemble =
 #if USE_SIMD16_FRONTEND
-tessPa.Assemble_simd16(VERTEX_POSITION_SLOT, 
prim_simd16);
+tessPa.Assemble(VERTEX_POSITION_SLOT, prim_simd16);
 #else
 tessPa.Assemble(VERTEX_POSITION_SLOT, prim);
 #endif
@@ -1646,7 +1646,7 @@ void ProcessDraw(
 simd16vector 

[Mesa-dev] [PATCH 8/8] swr/rast: FE/Clipper - unify SIMD8/16 functions using simdlib types

2017-09-05 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/core/clip.cpp |   16 +-
 src/gallium/drivers/swr/rasterizer/core/clip.h   | 1650 ++
 src/gallium/drivers/swr/rasterizer/core/state.h  |7 +
 3 files changed, 465 insertions(+), 1208 deletions(-)

diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.cpp 
b/src/gallium/drivers/swr/rasterizer/core/clip.cpp
index 4b5512c..a40f077 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.cpp
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.cpp
@@ -32,9 +32,9 @@
 #include "core/clip.h"
 
 // Temp storage used by the clipper
-THREAD simdvertex tlsTempVertices[7];
+THREAD SIMDVERTEX_T tlsTempVertices[7];
 #if USE_SIMD16_FRONTEND
-THREAD simd16vertex tlsTempVertices_simd16[7];
+THREAD SIMDVERTEX_T tlsTempVertices_simd16[7];
 #endif
 
 float ComputeInterpFactor(float boundaryCoord0, float boundaryCoord1)
@@ -164,7 +164,7 @@ void ClipTriangles(DRAW_CONTEXT *pDC, PA_STATE& pa, 
uint32_t workerId, simdvecto
 {
 SWR_CONTEXT *pContext = pDC->pContext;
 AR_BEGIN(FEClipTriangles, pDC->drawId);
-Clipper<3> clipper(workerId, pDC);
+Clipper clipper(workerId, pDC);
 clipper.ExecuteStage(pa, prims, primMask, primId);
 AR_END(FEClipTriangles, 1);
 }
@@ -173,7 +173,7 @@ void ClipLines(DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t 
workerId, simdvector pr
 {
 SWR_CONTEXT *pContext = pDC->pContext;
 AR_BEGIN(FEClipLines, pDC->drawId);
-Clipper<2> clipper(workerId, pDC);
+Clipper clipper(workerId, pDC);
 clipper.ExecuteStage(pa, prims, primMask, primId);
 AR_END(FEClipLines, 1);
 }
@@ -182,7 +182,7 @@ void ClipPoints(DRAW_CONTEXT *pDC, PA_STATE& pa, uint32_t 
workerId, simdvector p
 {
 SWR_CONTEXT *pContext = pDC->pContext;
 AR_BEGIN(FEClipPoints, pDC->drawId);
-Clipper<1> clipper(workerId, pDC);
+Clipper clipper(workerId, pDC);
 clipper.ExecuteStage(pa, prims, primMask, primId);
 AR_END(FEClipPoints, 1);
 }
@@ -195,7 +195,7 @@ void SIMDCALL ClipTriangles_simd16(DRAW_CONTEXT *pDC, 
PA_STATE& pa, uint32_t wor
 
 enum { VERTS_PER_PRIM = 3 };
 
-Clipper clipper(workerId, pDC);
+Clipper clipper(workerId, pDC);
 
 pa.useAlternateOffset = false;
 clipper.ExecuteStage(pa, prims, primMask, primId);
@@ -210,7 +210,7 @@ void SIMDCALL ClipLines_simd16(DRAW_CONTEXT *pDC, PA_STATE& 
pa, uint32_t workerI
 
 enum { VERTS_PER_PRIM = 2 };
 
-Clipper clipper(workerId, pDC);
+Clipper clipper(workerId, pDC);
 
 pa.useAlternateOffset = false;
 clipper.ExecuteStage(pa, prims, primMask, primId);
@@ -225,7 +225,7 @@ void SIMDCALL ClipPoints_simd16(DRAW_CONTEXT *pDC, 
PA_STATE& pa, uint32_t worker
 
 enum { VERTS_PER_PRIM = 1 };
 
-Clipper clipper(workerId, pDC);
+Clipper clipper(workerId, pDC);
 
 pa.useAlternateOffset = false;
 clipper.ExecuteStage(pa, prims, primMask, primId);
diff --git a/src/gallium/drivers/swr/rasterizer/core/clip.h 
b/src/gallium/drivers/swr/rasterizer/core/clip.h
index 5238284..d7b559b 100644
--- a/src/gallium/drivers/swr/rasterizer/core/clip.h
+++ b/src/gallium/drivers/swr/rasterizer/core/clip.h
@@ -33,9 +33,9 @@
 #include "rdtsc_core.h"
 
 // Temp storage used by the clipper
-extern THREAD simdvertex tlsTempVertices[7];
+extern THREAD SIMDVERTEX_T tlsTempVertices[7];
 #if USE_SIMD16_FRONTEND
-extern THREAD simd16vertex tlsTempVertices_simd16[7];
+extern THREAD SIMDVERTEX_T tlsTempVertices_simd16[7];
 #endif
 
 enum SWR_CLIPCODES
@@ -61,29 +61,29 @@ enum SWR_CLIPCODES
 
 #define GUARDBAND_CLIP_MASK 
(FRUSTUM_NEAR|FRUSTUM_FAR|GUARDBAND_LEFT|GUARDBAND_TOP|GUARDBAND_RIGHT|GUARDBAND_BOTTOM|NEGW)
 
-INLINE
-void ComputeClipCodes(const API_STATE& state, const simdvector& vertex, 
simdscalar& clipCodes, simdscalari const )
+template
+void ComputeClipCodes(const API_STATE , const typename SIMD_T::Vec4 
, typename SIMD_T::Float , typename SIMD_T::Integer const 
)
 {
-clipCodes = _simd_setzero_ps();
+clipCodes = SIMD_T::setzero_ps();
 
 // -w
-simdscalar vNegW = _simd_mul_ps(vertex.w, _simd_set1_ps(-1.0f));
+typename SIMD_T::Float vNegW = 
SIMD_T::mul_ps(vertex.w,SIMD_T::set1_ps(-1.0f));
 
 // FRUSTUM_LEFT
-simdscalar vRes = _simd_cmplt_ps(vertex.x, vNegW);
-clipCodes = _simd_and_ps(vRes, 
_simd_castsi_ps(_simd_set1_epi32(FRUSTUM_LEFT)));
+typename SIMD_T::Float vRes = SIMD_T::cmplt_ps(vertex.x, vNegW);
+clipCodes = SIMD_T::and_ps(vRes, 
SIMD_T::castsi_ps(SIMD_T::set1_epi32(FRUSTUM_LEFT)));
 
 // FRUSTUM_TOP
-vRes = _simd_cmplt_ps(vertex.y, vNegW);
-clipCodes = _simd_or_ps(clipCodes, _simd_and_ps(vRes, 
_simd_castsi_ps(_simd_set1_epi32(FRUSTUM_TOP;
+vRes = SIMD_T::cmplt_ps(vertex.y, vNegW);
+clipCodes = SIMD_T::or_ps(clipCodes, SIMD_T::and_ps(vRes, 
SIMD_T::castsi_ps(SIMD_T::set1_epi32(FRUSTUM_TOP;
 
 // FRUSTUM_RIGHT
-vRes = _simd_cmpgt_ps(vertex.x, vertex.w);
- 

[Mesa-dev] [PATCH 2/8] swr: set caps for VB 4-byte alignment

2017-09-05 Thread Tim Rowley
Needed to compensate for change to fetch jit requiring
alignment.

Fixes regressions in piglit: vertex-buffer-offsets and about
another hundred of the vs-input*byte* tests.
---
 src/gallium/drivers/swr/swr_screen.cpp | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index cc8d995..85bf765 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -263,6 +263,12 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
case PIPE_CAP_FAKE_SW_MSAA:
   return (swr_screen(screen)->msaa_max_count > 1) ? 0 : 1;
 
+   /* fetch jit change for 2-4GB buffers requires alignment */
+   case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
+   case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
+   case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
+  return 1;
+
   /* unsupported features */
case PIPE_CAP_ANISOTROPIC_FILTER:
case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK:
@@ -274,9 +280,6 @@ swr_get_param(struct pipe_screen *screen, enum pipe_cap 
param)
case PIPE_CAP_COMPUTE:
case PIPE_CAP_TGSI_VS_LAYER_VIEWPORT:
case PIPE_CAP_TGSI_CAN_COMPACT_CONSTANTS:
-   case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
-   case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
-   case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
case PIPE_CAP_TGSI_TEXCOORD:
case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/8] swr/rast: Allow gather of floats from fetch shader with 2-4GB offsets

2017-09-05 Thread Tim Rowley
---
 src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py | 1 +
 src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp  | 7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py 
b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
index 2ed2b2f..025d38a 100644
--- a/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
+++ b/src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_ir_macros.py
@@ -45,6 +45,7 @@ intrinsics = [
 ['VGATHERPD', 'x86_avx2_gather_d_pd_256', ['src', 'pBase', 'indices', 
'mask', 'scale']],
 ['VGATHERPS', 'x86_avx2_gather_d_ps_256', ['src', 'pBase', 'indices', 
'mask', 'scale']],
 ['VGATHERDD', 'x86_avx2_gather_d_d_256', ['src', 'pBase', 'indices', 
'mask', 'scale']],
+['VPSRLI', 'x86_avx2_psrli_d', ['src', 'imm']],
 ['VSQRTPS', 'x86_avx_sqrt_ps_256', ['a']],
 ['VRSQRTPS', 'x86_avx_rsqrt_ps_256', ['a']],
 ['VRCPPS', 'x86_avx_rcp_ps_256', ['a']],
diff --git a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp 
b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
index dcfe897..761c58c 100644
--- a/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
+++ b/src/gallium/drivers/swr/rasterizer/jitter/fetch_jit.cpp
@@ -1005,7 +1005,12 @@ void FetchJit::JitGatherVertices(const 
FETCH_COMPILE_STATE ,
 Value *vMask = vGatherMask;
 
 // Gather a SIMD of vertices
-vVertexElements[currentVertexElement++] = 
GATHERPS(gatherSrc, pStreamBase, vOffsets, vMask, C((char)1));
+// APIs allow a 4GB range for offsets
+// However, GATHERPS uses signed 32-bit 
offsets, so only a 2GB range :(
+// But, we know that elements must be aligned 
for FETCH. :)
+// Right shift the offset by a bit and then 
scale by 2 to remove the sign extension.
+Value* vShiftedOffsets = VPSRLI(vOffsets, 
C(1));
+vVertexElements[currentVertexElement++] = 
GATHERPS(gatherSrc, pStreamBase, vShiftedOffsets, vMask, C((char)2));
 }
 else
 {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/8] swr/rast: Removed some trailing whitespace caught during review

2017-09-05 Thread Tim Rowley
---
 .../rasterizer/codegen/templates/gen_ar_eventhandlerfile.hpp |  4 ++--
 src/gallium/drivers/swr/rasterizer/core/fifo.hpp |  4 ++--
 src/gallium/drivers/swr/rasterizer/core/pa.h | 12 ++--
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git 
a/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_eventhandlerfile.hpp
 
b/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_eventhandlerfile.hpp
index 0ca9a78..d1852b3 100644
--- 
a/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_eventhandlerfile.hpp
+++ 
b/src/gallium/drivers/swr/rasterizer/codegen/templates/gen_ar_eventhandlerfile.hpp
@@ -23,7 +23,7 @@
 * @file ${filename}
 *
 * @brief Event handler interface.  auto-generated file
-* 
+*
 * DO NOT EDIT
 *
 * Generation Command Line:
@@ -57,7 +57,7 @@ namespace ArchRast
 std::stringstream outDir;
 outDir << KNOB_DEBUG_OUTPUT_DIR << pBaseName << "_" << pid << 
std::ends;
 CreateDirectory(outDir.str().c_str(), NULL);
-
+
 // There could be multiple threads creating thread pools. We
 // want to make sure they are uniquly identified by adding in
 // the creator's thread id into the filename.
diff --git a/src/gallium/drivers/swr/rasterizer/core/fifo.hpp 
b/src/gallium/drivers/swr/rasterizer/core/fifo.hpp
index 3be72f3..43d3a83 100644
--- a/src/gallium/drivers/swr/rasterizer/core/fifo.hpp
+++ b/src/gallium/drivers/swr/rasterizer/core/fifo.hpp
@@ -79,7 +79,7 @@ struct QUEUE
 long initial = InterlockedCompareExchange(, 1, 0);
 return (initial == 0);
 }
-
+
 void unlock()
 {
 mLock = 0;
@@ -112,7 +112,7 @@ struct QUEUE
 __m256 vSrc = _mm256_load_ps(pSrc + i*KNOB_SIMD_WIDTH);
 _mm256_stream_ps(pDst + i*KNOB_SIMD_WIDTH, vSrc);
 };
-
+
 const uint32_t numSimdLines = sizeof(T) / (KNOB_SIMD_WIDTH*4);
 static_assert(numSimdLines * KNOB_SIMD_WIDTH * 4 == sizeof(T),
 "FIFO element size should be multiple of SIMD width.");
diff --git a/src/gallium/drivers/swr/rasterizer/core/pa.h 
b/src/gallium/drivers/swr/rasterizer/core/pa.h
index cb3470f..87dba22 100644
--- a/src/gallium/drivers/swr/rasterizer/core/pa.h
+++ b/src/gallium/drivers/swr/rasterizer/core/pa.h
@@ -162,7 +162,7 @@ struct PA_STATE_OPT : public PA_STATE
 bool   isStreaming{ false };
 
 SIMDMASK   junkIndices  { 0 };  // temporary index store 
for unused virtual function
-
+
 PA_STATE_OPT() {}
 PA_STATE_OPT(DRAW_CONTEXT* pDC, uint32_t numPrims, uint8_t* pStream, 
uint32_t streamSizeInVerts,
 uint32_t vertexStride, bool in_isStreaming, PRIMITIVE_TOPOLOGY topo = 
TOP_UNKNOWN);
@@ -412,7 +412,7 @@ struct PA_STATE_CUT : public PA_STATE
 uint32_t vertsPerPrim{ 0 };
 bool processCutVerts{ false };   // vertex indices with cuts should be 
processed as normal, otherwise they
  // are ignored.  Fetch shader sends 
invalid verts on cuts that should be ignored
- // while the GS sends valid verts for 
every index 
+ // while the GS sends valid verts for 
every index
 
 simdvector  junkVector;  // junk simdvector for unimplemented 
API
 #if ENABLE_AVX512_SIMD16
@@ -575,7 +575,7 @@ struct PA_STATE_CUT : public PA_STATE
 return CheckBit(this->pCutIndices[vertexIndex], vertexOffset);
 }
 
-// iterates across the unprocessed verts until we hit the end or we 
+// iterates across the unprocessed verts until we hit the end or we
 // have assembled SIMD prims
 void ProcessVerts()
 {
@@ -583,7 +583,7 @@ struct PA_STATE_CUT : public PA_STATE
 this->numRemainingVerts > 0 &&
 this->curVertex != this->headVertex)
 {
-// if cut index, restart topology 
+// if cut index, restart topology
 if (IsCutIndex(this->curVertex))
 {
 if (this->processCutVerts)
@@ -923,7 +923,7 @@ struct PA_STATE_CUT : public PA_STATE
 case 6:
 SWR_ASSERT(this->adjExtraVert != -1, "Algorith failure!");
 AssembleTriStripAdj();
-
+
 uint32_t nextTri[6];
 if (this->reverseWinding)
 {
@@ -939,7 +939,7 @@ struct PA_STATE_CUT : public PA_STATE
 nextTri[1] = this->adjExtraVert;
 nextTri[2] = this->vert[3];
 nextTri[4] = this->vert[4];
-nextTri[5] = this->vert[0]; 
+nextTri[5] = this->vert[0];
 }
 for (uint32_t i = 0; i < 6; ++i)
 {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95346

--- Comment #27 from Kai  ---
(In reply to Kai from comment #26)
> [...]
> 
> The full stack used (fully updated Debian testing as a base) was:
> GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
> Mesa: Git:master/39a69f0692
> libdrm: 2.4.82-1
> LLVM: SVN:trunk/r312520 (6.0 devel)
> X.Org: 2:1.19.3-2
> Linux: 4.12.10

This line should have read "Linux: 4.13.0". Forgot I already rebooted.

> Firmware (firmware-amd-graphics): 20170823-1
> libclc: Git:master/7331b0a1fa
> DDX (xserver-xorg-video-amdgpu): 1.3.0-1

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95346

Kai  changed:

   What|Removed |Added

 Attachment #130866|0   |1
is obsolete||

--- Comment #26 from Kai  ---
Created attachment 133974
  --> https://bugs.freedesktop.org/attachment.cgi?id=133974=edit
Planests look good now

I can confirm, that the rendering of the planets has improved significantly
with the stack detailed below (see attached screenshot), but it's still off
from attachment 133972. Though the version I'm now uploading here might
actually be how it is now intended to be rendered. A quick YouTube search for
gameplay from this game version showed the blueish highlights on the dark side
of the planet in almost all instances I've found. (See eg.
 for some footage from an official
channel.) Therefore I think this bug report should be closed.

Game version: 1.6.2 (d7ec)

The full stack used (fully updated Debian testing as a base) was:
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/39a69f0692
libdrm: 2.4.82-1
LLVM: SVN:trunk/r312520 (6.0 devel)
X.Org: 2:1.19.3-2
Linux: 4.12.10
Firmware (firmware-amd-graphics): 20170823-1
libclc: Git:master/7331b0a1fa
DDX (xserver-xorg-video-amdgpu): 1.3.0-1

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/uvd: fix the assertion check for YUYV format

2017-09-05 Thread Christian König

Am 05.09.2017 um 19:37 schrieb Leo Liu:

Fixes:7319ff87("radeon/uvd: add YUYV format support for target buffer")

Signed-off-by: Leo Liu 


Reviewed-by: Christian König 


---
  src/gallium/drivers/radeon/radeon_uvd.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index 00d6267018..5330b03872 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -1588,9 +1588,11 @@ void ruvd_set_dt_surfaces(struct ruvd_msg *msg, struct 
radeon_surf *luma,
msg->body.decode.dt_chroma_bottom_offset = 
msg->body.decode.dt_chroma_top_offset;
}
  
-		assert(luma->u.legacy.bankw == chroma->u.legacy.bankw);

-   assert(luma->u.legacy.bankh == chroma->u.legacy.bankh);
-   assert(luma->u.legacy.mtilea == chroma->u.legacy.mtilea);
+   if (chroma)
+   assert(luma->u.legacy.bankw == chroma->u.legacy.bankw);
+   assert(luma->u.legacy.bankh == chroma->u.legacy.bankh);
+   assert(luma->u.legacy.mtilea == 
chroma->u.legacy.mtilea);
+   }
  
  		msg->body.decode.dt_surf_tile_config |= RUVD_BANK_WIDTH(bank_wh(luma->u.legacy.bankw));

msg->body.decode.dt_surf_tile_config |= 
RUVD_BANK_HEIGHT(bank_wh(luma->u.legacy.bankh));



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/4] i965/screen: Report the correct number of image planes

2017-09-05 Thread Jason Ekstrand
On Tue, Sep 5, 2017 at 10:25 AM, Emil Velikov 
wrote:

> Hi Jason,
>
> On 5 September 2017 at 16:48, Jason Ekstrand  wrote:
> > For non-CCS images, we were reporting just one plane even though they
> > may have multiple in the case of YUV.
> >
> > Reviewed-by: Ben Widawsky 
> I think we want this for stable, right?
>

Maybe?  Ben and I were debating it.  I don't think it would hurt to send it
to stable but I also doubt it benefits us given that everything seems to be
working.  I'm happy to add the tag if you'd like.


> The series looks good, FWIW
> Reviewed-by: Emil Velikov 
>
> -Emil
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/4] dri/image: Add a format modifier attributes query

2017-09-05 Thread Jason Ekstrand
On Tue, Sep 5, 2017 at 10:14 AM, Emil Velikov 
wrote:

> Hi Jason,
>
> On 5 September 2017 at 16:48, Jason Ekstrand  wrote:
>
> > +   GLboolean (*queryDmaBufFormatModifierAttribs)(__DRIscreen *screen,
> > + uint32_t fourcc,
> We seems to be using "int fourcc" throughout the file. Worth saying
> consistent and doing the same there?
>

I did, and then Daniel told me to make it uint32_t.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] radeonsi/gfx9: proper workaround for LS/HS VGPR initialization bug

2017-09-05 Thread Marek Olšák
On Tue, Sep 5, 2017 at 9:56 AM, Nicolai Hähnle  wrote:
> From: Nicolai Hähnle 
>
> When the HS wave is empty, the hardware writes the LS VGPRs starting at
> v0 instead of v2. Workaround by shifting them back into place when
> necessary. For simplicity, this is always done in the LS prolog.
>
> According to the hardware team, this will be fixed in future chips,
> so take that into account already.
>
> Note that this is not a bug fix, as the bug was already worked
> around by commit 166823bfd26 ("radeonsi/gfx9: add a temporary workaround
> for a tessellation driver bug"). This change merely replaces the
> workaround by one that should be better.
>
> v2: add workaround code to shader only when necessary
> ---
>  src/gallium/drivers/radeonsi/si_pipe.h  |  1 +
>  src/gallium/drivers/radeonsi/si_shader.c| 71 
> ++---
>  src/gallium/drivers/radeonsi/si_shader.h|  1 +
>  src/gallium/drivers/radeonsi/si_state_draw.c| 27 --
>  src/gallium/drivers/radeonsi/si_state_shaders.c |  8 +++
>  5 files changed, 84 insertions(+), 24 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
> b/src/gallium/drivers/radeonsi/si_pipe.h
> index 3ae06584427..9832fd19ff6 100644
> --- a/src/gallium/drivers/radeonsi/si_pipe.h
> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
> @@ -382,20 +382,21 @@ struct si_context {
> booldb_flush_stencil_inplace:1;
> booldb_depth_clear:1;
> booldb_depth_disable_expclear:1;
> booldb_stencil_clear:1;
> booldb_stencil_disable_expclear:1;
> boolocclusion_queries_disabled:1;
> boolgenerate_mipmap_for_depth:1;
>
> /* Emitted draw state. */
> boolgs_tri_strip_adj_fix:1;
> +   boolls_vgpr_fix:1;
> int last_index_size;
> int last_base_vertex;
> int last_start_instance;
> int last_drawid;
> int last_sh_base_reg;
> int last_primitive_restart_en;
> int last_restart_index;
> int last_gs_out_prim;
> int last_prim;
> int last_multi_vgt_param;
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index 2cddfe97aa5..bae9b8384dd 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -5536,20 +5536,22 @@ si_generate_gs_copy_shader(struct si_screen *sscreen,
>  }
>
>  static void si_dump_shader_key_vs(const struct si_shader_key *key,
>   const struct si_vs_prolog_bits *prolog,
>   const char *prefix, FILE *f)
>  {
> fprintf(f, "  %s.instance_divisor_is_one = %u\n",
> prefix, prolog->instance_divisor_is_one);
> fprintf(f, "  %s.instance_divisor_is_fetched = %u\n",
> prefix, prolog->instance_divisor_is_fetched);
> +   fprintf(f, "  %s.ls_vgpr_fix = %u\n",
> +   prefix, prolog->ls_vgpr_fix);
>
> fprintf(f, "  mono.vs.fix_fetch = {");
> for (int i = 0; i < SI_MAX_ATTRIBS; i++)
> fprintf(f, !i ? "%u" : ", %u", key->mono.vs_fix_fetch[i]);
> fprintf(f, "}\n");
>  }
>
>  static void si_dump_shader_key(unsigned processor, const struct si_shader 
> *shader,
>FILE *f)
>  {
> @@ -5728,20 +5730,28 @@ static void si_init_exec_from_input(struct 
> si_shader_context *ctx,
>  {
> LLVMValueRef args[] = {
> LLVMGetParam(ctx->main_fn, param),
> LLVMConstInt(ctx->i32, bitoffset, 0),
> };
> lp_build_intrinsic(ctx->gallivm.builder,
>"llvm.amdgcn.init.exec.from.input",
>ctx->voidt, args, 2, LP_FUNC_ATTR_CONVERGENT);
>  }
>
> +static bool si_vs_needs_prolog(const struct si_shader_selector *sel,
> +  const struct si_vs_prolog_bits *key)
> +{
> +   /* VGPR initialization fixup for Vega10 and Raven is always done in 
> the
> +* VS prolog. */
> +   return sel->vs_needs_prolog || key->ls_vgpr_fix;
> +}
> +
>  static bool si_compile_tgsi_main(struct si_shader_context *ctx,
>  bool is_monolithic)
>  {
> struct si_shader *shader = ctx->shader;
> struct si_shader_selector *sel = shader->selector;
> struct lp_build_tgsi_context *bld_base = >bld_base;
>
> // TODO clean all this up!
> switch (ctx->type) {
> case PIPE_SHADER_VERTEX:
> @@ -5804,21 +5814,21 @@ static bool 

[Mesa-dev] [Bug 95346] Stellaris - Black/super dark planets

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95346

--- Comment #25 from Gert Wollny  ---
Created attachment 133972
  --> https://bugs.freedesktop.org/attachment.cgi?id=133972=edit
screenshot on r600g with mesa-git43e8808b8

I've tested the trace on r600g and also in software mode on mesa git 43e8808b8
and I'd say that the planet looks quite okay (see partial screenshot) and very
similar to the Windows screenshot (a bit more glossy though).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] mesa: replace date/time macros with MESA_GIT_SHA1

2017-09-05 Thread Emil Velikov
On 5 September 2017 at 15:51, Rob Herring  wrote:
> On Tue, Sep 5, 2017 at 9:23 AM, Emil Velikov  wrote:
>> From: Emil Velikov 
>>
>> Former is non-deterministic and compilers throw a warning about it.
>>
>> Cc: Rob Herring 
>> Signed-off-by: Emil Velikov 
>> ---
>> I think the patch is a good idea, although kind of split about it.
>> Any arguments for/against would be appreciated.
>
> I guess if I had to pick, I'd rather have the git sha1. But really
> both are useful. The time is useful if you want to verify you are
> running your last build. I've heard people sometimes forget to copy
> things over, but that's never happened to me. ;P
>
Precisely my train of thought as well. Although I always double-check
which libraries I'm using - I've been bit a couple of times in the
past.

>> ---
>>  src/mesa/main/context.c | 10 +++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
>> index be3f3610117..cc37a0dc4dc 100644
>> --- a/src/mesa/main/context.c
>> +++ b/src/mesa/main/context.c
>> @@ -138,6 +138,7 @@
>>  #include "math/m_matrix.h"
>>  #include "main/dispatch.h" /* for _gloffset_COUNT */
>>  #include "macros.h"
>> +#include "git_sha1.h"
>>
>>  #ifdef USE_SPARC_ASM
>>  #include "sparc/sparc.h"
>> @@ -398,10 +399,13 @@ one_time_init( struct gl_context *ctx )
>>
>>atexit(one_time_fini);
>>
>> -#if defined(DEBUG) && defined(__DATE__) && defined(__TIME__)
>> +#if defined(DEBUG)
>>if (MESA_VERBOSE != 0) {
>> - _mesa_debug(ctx, "Mesa " PACKAGE_VERSION " DEBUG build %s %s\n",
>> - __DATE__, __TIME__);
>> + _mesa_debug(ctx, "Mesa " PACKAGE_VERSION " DEBUG build"
>> +#ifdef MESA_GIT_SHA1
>
> Would be nice to have a default so you can avoid the ifdef.
>
Good point - silly copy/paste from other parts in Mesa.
Can I fix this everywhere as a follow-up?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/3] docs/release-calendar: update and extend

2017-09-05 Thread Emil Velikov
From: Emil Velikov 

v2: Use correct 17.1.10 version, adjust some names.

Cc: Juan A. Suárez 
Cc: Andres Gomez 
Signed-off-by: Emil Velikov 
Reviewed-by: Eric Engestrom 
---
 docs/release-calendar.html | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/docs/release-calendar.html b/docs/release-calendar.html
index 554eb6a540f..56564b52ea8 100644
--- a/docs/release-calendar.html
+++ b/docs/release-calendar.html
@@ -39,59 +39,58 @@ if you'd like to nominate a patch in the next stable 
release.
 Notes
 
 
-17.1
+17.1
 2017-09-08
 17.1.9
 Andres Gomez
-Final planned release for the 17.1 series
+
 
-
-17.2
-2017-08-25
-17.2.0-rc6
-Emil Velikov
-May be promoted to 17.2.0 final
+2017-09-22
+17.1.10
+Juan A. Suarez Romero
+Final planned release for the 17.1 series
 
 
-2017-09-08
+17.2
+2017-09-15
 17.2.1
 Emil Velikov
 
 
 
-2017-09-22
+2017-09-29
 17.2.2
 Juan A. Suarez Romero
 
 
 
-2017-10-06
+2017-10-13
 17.2.3
 Emil Velikov
 
 
 
-2017-10-20
+2017-10-27
 17.2.4
-Juan A. Suarez Romero
+Andres Gomez
 
 
 
-2017-11-03
+2017-11-10
 17.2.5
 Andres Gomez
 
 
 
-2017-11-17
+2017-11-24
 17.2.6
 Andres Gomez
 
 
 
-2017-12-01
+2017-12-08
 17.2.7
-Andres Gomez
+Emil Velikov
 Final planned release for the 17.2 series
 
 
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeon/uvd: fix the assertion check for YUYV format

2017-09-05 Thread Leo Liu
Fixes:7319ff87("radeon/uvd: add YUYV format support for target buffer")

Signed-off-by: Leo Liu 
---
 src/gallium/drivers/radeon/radeon_uvd.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index 00d6267018..5330b03872 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -1588,9 +1588,11 @@ void ruvd_set_dt_surfaces(struct ruvd_msg *msg, struct 
radeon_surf *luma,
msg->body.decode.dt_chroma_bottom_offset = 
msg->body.decode.dt_chroma_top_offset;
}
 
-   assert(luma->u.legacy.bankw == chroma->u.legacy.bankw);
-   assert(luma->u.legacy.bankh == chroma->u.legacy.bankh);
-   assert(luma->u.legacy.mtilea == chroma->u.legacy.mtilea);
+   if (chroma)
+   assert(luma->u.legacy.bankw == chroma->u.legacy.bankw);
+   assert(luma->u.legacy.bankh == chroma->u.legacy.bankh);
+   assert(luma->u.legacy.mtilea == 
chroma->u.legacy.mtilea);
+   }
 
msg->body.decode.dt_surf_tile_config |= 
RUVD_BANK_WIDTH(bank_wh(luma->u.legacy.bankw));
msg->body.decode.dt_surf_tile_config |= 
RUVD_BANK_HEIGHT(bank_wh(luma->u.legacy.bankh));
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/4] i965/screen: Report the correct number of image planes

2017-09-05 Thread Emil Velikov
Hi Jason,

On 5 September 2017 at 16:48, Jason Ekstrand  wrote:
> For non-CCS images, we were reporting just one plane even though they
> may have multiple in the case of YUV.
>
> Reviewed-by: Ben Widawsky 
I think we want this for stable, right?

The series looks good, FWIW
Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 102502] [bisected] Kodi crashes since commit 707d2e8b - gallium: fold u_trim_pipe_prim call from st/mesa to drivers

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102502

--- Comment #2 from Alexandre Demers  ---
The other segfault has been reported as bug 102530. However, they seem
unrelated. Also, bug 102530 may aleready have been fixed, but I can't confirm
until the current bug is fixed or the patch is reverted (I'll try the later
when I'll get back home).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/4] dri/image: Add a format modifier attributes query

2017-09-05 Thread Emil Velikov
Hi Jason,

On 5 September 2017 at 16:48, Jason Ekstrand  wrote:

> +   GLboolean (*queryDmaBufFormatModifierAttribs)(__DRIscreen *screen,
> + uint32_t fourcc,
We seems to be using "int fourcc" throughout the file. Worth saying
consistent and doing the same there?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/mtypes: repack gl_texture_object.

2017-09-05 Thread Marek Olšák
On Tue, Sep 5, 2017 at 5:50 PM, Brian Paul  wrote:
> On 09/04/2017 05:29 AM, Marek Olšák wrote:
>>
>> On Sun, Sep 3, 2017 at 1:18 PM, Dave Airlie  wrote:
>>>
>>> From: Dave Airlie 
>>>
>>> reduces size from 1144 to 1128.
>>>
>>> Signed-off-by: Dave Airlie 
>>> ---
>>>   src/mesa/main/mtypes.h | 10 +-
>>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
>>> index d44897b..3d68a6d 100644
>>> --- a/src/mesa/main/mtypes.h
>>> +++ b/src/mesa/main/mtypes.h
>>> @@ -1012,7 +1012,6 @@ struct gl_texture_object
>>>  struct gl_sampler_object Sampler;
>>>
>>>  GLenum DepthMode;   /**< GL_ARB_depth_texture */
>>
>>
>> The patch looks good, but here are some ideas for future improvements:
>>
>> GLenum can be uint16_t everywhere, because GL doesn't set higher bits:
>>
>> typedef uint16_t GLenum16.
>> s/GLenum/GLenum16/
>>
>>> -   bool StencilSampling;   /**< Should we sample stencil instead of
>>> depth? */
>>>
>>>  GLfloat Priority;   /**< in [0,1] */
>>>  GLint BaseLevel;/**< min mipmap level, OpenGL 1.2 */
>>> @@ -1033,12 +1032,17 @@ struct gl_texture_object
>>>  GLboolean Immutable;/**< GL_ARB_texture_storage */
>>>  GLboolean _IsFloat; /**< GL_OES_float_texture */
>>>  GLboolean _IsHalfFloat; /**< GL_OES_half_float_texture */
>>> +   bool StencilSampling;   /**< Should we sample stencil instead of
>>> depth? */
>>> +   bool HandleAllocated;   /**< GL_ARB_bindless_texture */
>>
>>
>> All bools can be 1 bit:
>>
>> bool x:1;
>> GLboolean y:1;
>>
>> etc.
>>
>>>
>>>  GLuint MinLevel;/**< GL_ARB_texture_view */
>>>  GLuint MinLayer;/**< GL_ARB_texture_view */
>>>  GLuint NumLevels;   /**< GL_ARB_texture_view */
>>>  GLuint NumLayers;   /**< GL_ARB_texture_view */
>>
>>
>> MinLevel, NumLevels can be ubyte (uint8_t). MinLayer, NumLayers can be
>> ushort (uint16_t)... simply by considering the range of possible
>> values.
>
>
> There's lots of opportunities along these lines in gl_texture_image. And
> since we often have many gl_texture_images per gl_texture_object, and we
> often have many textures, it'll probably have considerable impact.  I've
> suggested this in the past but never got around to working on it.
>
> I recall Eric Anholt mentioning a memory profiling tool that was helpful for
> finding wasted space in structures, etc.  I don't recall the name right now.
> Eric?

Dave used pahole for this patch series too. It can't obviously suggest
what I suggested above (like changing the types and bits).

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] egl: remove unused 'Screens' array from _egl_display

2017-09-05 Thread Emil Velikov
On 5 September 2017 at 12:48, Tapani Pälli  wrote:
> This was used by EGL_MESA_screen_surface that has been removed
> in commit 7a58262e58d8edac3308777def0950032628edee.
>
> Signed-off-by: Tapani Pälli 
Reviewed-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 102038] assertion failure in update_framebuffer_size

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102038

Brad King  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #20 from Brad King  ---
The VTK test suite passes again since these two patches were merged.  Thanks!

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 102038] assertion failure in update_framebuffer_size

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102038

--- Comment #19 from Bruce Cherniak  ---
The swr driver patch has been committed, as well.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/mtypes: repack gl_texture_object.

2017-09-05 Thread Emil Velikov
On 5 September 2017 at 16:50, Brian Paul  wrote:

>
>
> There's lots of opportunities along these lines in gl_texture_image. And
> since we often have many gl_texture_images per gl_texture_object, and we
> often have many textures, it'll probably have considerable impact.  I've
> suggested this in the past but never got around to working on it.
>
> I recall Eric Anholt mentioning a memory profiling tool that was helpful for
> finding wasted space in structures, etc.  I don't recall the name right now.
> Eric?
>
There's pahole (suspecting that Dave used that for the series) and
PVS-Studio (a proprietary tool).
Seemingly the latter has a "free" version [1] or one could request
their project to be scanned [2].

-Emil

[1] https://www.viva64.com/en/b/0457/
[2] https://www.viva64.com/en/b/0473/
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/mtypes: repack gl_texture_object.

2017-09-05 Thread Christian Gmeiner
2017-09-05 17:50 GMT+02:00 Brian Paul :
> On 09/04/2017 05:29 AM, Marek Olšák wrote:
>>
>> On Sun, Sep 3, 2017 at 1:18 PM, Dave Airlie  wrote:
>>>
>>> From: Dave Airlie 
>>>
>>> reduces size from 1144 to 1128.
>>>
>>> Signed-off-by: Dave Airlie 
>>> ---
>>>   src/mesa/main/mtypes.h | 10 +-
>>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
>>> index d44897b..3d68a6d 100644
>>> --- a/src/mesa/main/mtypes.h
>>> +++ b/src/mesa/main/mtypes.h
>>> @@ -1012,7 +1012,6 @@ struct gl_texture_object
>>>  struct gl_sampler_object Sampler;
>>>
>>>  GLenum DepthMode;   /**< GL_ARB_depth_texture */
>>
>>
>> The patch looks good, but here are some ideas for future improvements:
>>
>> GLenum can be uint16_t everywhere, because GL doesn't set higher bits:
>>
>> typedef uint16_t GLenum16.
>> s/GLenum/GLenum16/
>>
>>> -   bool StencilSampling;   /**< Should we sample stencil instead of
>>> depth? */
>>>
>>>  GLfloat Priority;   /**< in [0,1] */
>>>  GLint BaseLevel;/**< min mipmap level, OpenGL 1.2 */
>>> @@ -1033,12 +1032,17 @@ struct gl_texture_object
>>>  GLboolean Immutable;/**< GL_ARB_texture_storage */
>>>  GLboolean _IsFloat; /**< GL_OES_float_texture */
>>>  GLboolean _IsHalfFloat; /**< GL_OES_half_float_texture */
>>> +   bool StencilSampling;   /**< Should we sample stencil instead of
>>> depth? */
>>> +   bool HandleAllocated;   /**< GL_ARB_bindless_texture */
>>
>>
>> All bools can be 1 bit:
>>
>> bool x:1;
>> GLboolean y:1;
>>
>> etc.
>>
>>>
>>>  GLuint MinLevel;/**< GL_ARB_texture_view */
>>>  GLuint MinLayer;/**< GL_ARB_texture_view */
>>>  GLuint NumLevels;   /**< GL_ARB_texture_view */
>>>  GLuint NumLayers;   /**< GL_ARB_texture_view */
>>
>>
>> MinLevel, NumLevels can be ubyte (uint8_t). MinLayer, NumLayers can be
>> ushort (uint16_t)... simply by considering the range of possible
>> values.
>
>
> There's lots of opportunities along these lines in gl_texture_image. And
> since we often have many gl_texture_images per gl_texture_object, and we
> often have many textures, it'll probably have considerable impact.  I've
> suggested this in the past but never got around to working on it.
>
> I recall Eric Anholt mentioning a memory profiling tool that was helpful for
> finding wasted space in structures, etc.  I don't recall the name right now.
> Eric?
>

maybe you thought about pahole - https://linux.die.net/man/1/pahole

greets
--
Christian Gmeiner, MSc

https://christian-gmeiner.info
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 102038] assertion failure in update_framebuffer_size

2017-09-05 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=102038

--- Comment #18 from Brian Paul  ---
The state tracker patch has been committed.
I'll leave the swr driver patch to Bruce.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe, tgsi: hook up dx10 gather4 opcode

2017-09-05 Thread Jose Fonseca

On 05/09/17 17:01, srol...@vmware.com wrote:

From: Roland Scheidegger 

Trivial. We already support tg4 for legacy tex opcodes, so the actual
texture sampling code already handles it.
(Just like TG4, we don't handle additional capabilities and always sample
red channel.)
---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 28 ++---
  src/gallium/auxiliary/tgsi/tgsi_exec.c  |  5 -
  2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index b7f1140..f16c579 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -2232,6 +2232,7 @@ emit_sample(struct lp_build_tgsi_soa_context *bld,
  const struct tgsi_full_instruction *inst,
  enum lp_build_tex_modifier modifier,
  boolean compare,
+enum lp_sampler_op_type sample_type,
  LLVMValueRef *texel)
  {
 struct gallivm_state *gallivm = bld->bld_base.base.gallivm;
@@ -2245,7 +2246,7 @@ emit_sample(struct lp_build_tgsi_soa_context *bld,
  
 unsigned num_offsets, num_derivs, i;

 unsigned layer_coord = 0;
-   unsigned sample_key = LP_SAMPLER_OP_TEXTURE << LP_SAMPLER_OP_TYPE_SHIFT;
+   unsigned sample_key = sample_type << LP_SAMPLER_OP_TYPE_SHIFT;
  
 memset(, 0, sizeof(params));
  
@@ -3186,7 +3187,7 @@ sample_emit(

 struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
  
 emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_NONE,

-   FALSE, emit_data->output);
+   FALSE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
  }
  
  static void

@@ -3198,7 +3199,7 @@ sample_b_emit(
 struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
  
 emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_LOD_BIAS,

-   FALSE, emit_data->output);
+   FALSE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
  }
  
  static void

@@ -3210,7 +3211,7 @@ sample_c_emit(
 struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
  
 emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_NONE,

-   TRUE, emit_data->output);
+   TRUE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
  }
  
  static void

@@ -3222,7 +3223,7 @@ sample_c_lz_emit(
 struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
  
 emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_LOD_ZERO,

-   TRUE, emit_data->output);
+   TRUE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
  }
  
  static void

@@ -3234,7 +3235,7 @@ sample_d_emit(
 struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
  
 emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_EXPLICIT_DERIV,

-   FALSE, emit_data->output);
+   FALSE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
  }
  
  static void

@@ -3246,7 +3247,19 @@ sample_l_emit(
 struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
  
 emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_EXPLICIT_LOD,

-   FALSE, emit_data->output);
+   FALSE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
+}
+
+static void
+gather4_emit(
+   const struct lp_build_tgsi_action * action,
+   struct lp_build_tgsi_context * bld_base,
+   struct lp_build_emit_data * emit_data)
+{
+   struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
+
+   emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_NONE,
+   FALSE, LP_SAMPLER_OP_GATHER, emit_data->output);
  }
  
  static void

@@ -3871,6 +3884,7 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm,
 bld.bld_base.op_actions[TGSI_OPCODE_SAMPLE_I].emit = sample_i_emit;
 bld.bld_base.op_actions[TGSI_OPCODE_SAMPLE_I_MS].emit = sample_i_emit;
 bld.bld_base.op_actions[TGSI_OPCODE_SAMPLE_L].emit = sample_l_emit;
+   bld.bld_base.op_actions[TGSI_OPCODE_GATHER4].emit = gather4_emit;
 bld.bld_base.op_actions[TGSI_OPCODE_SVIEWINFO].emit = sviewinfo_emit;
  
 if (gs_iface) {

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index c58ea6a..1264df0 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -2631,6 +2631,9 @@ exec_sample(struct tgsi_exec_machine *mach,
   lod = 
   control = TGSI_SAMPLER_LOD_EXPLICIT;
}
+  else if (modifier == TEX_MODIFIER_GATHER) {
+ control = TGSI_SAMPLER_GATHER;
+  }
else {
   assert(modifier == TEX_MODIFIER_LEVEL_ZERO);
   control = TGSI_SAMPLER_LOD_ZERO;
@@ -5687,7 +5690,7 @@ exec_instruction(
break;
  
 case TGSI_OPCODE_GATHER4:

-  assert(0);
+  exec_sample(mach, inst, TEX_MODIFIER_GATHER, FALSE);
break;
  
 case TGSI_OPCODE_SVIEWINFO:




LGTM.

Reviewed-by: Jose Fonseca 

[Mesa-dev] [PATCH] llvmpipe, tgsi: hook up dx10 gather4 opcode

2017-09-05 Thread sroland
From: Roland Scheidegger 

Trivial. We already support tg4 for legacy tex opcodes, so the actual
texture sampling code already handles it.
(Just like TG4, we don't handle additional capabilities and always sample
red channel.)
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 28 ++---
 src/gallium/auxiliary/tgsi/tgsi_exec.c  |  5 -
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index b7f1140..f16c579 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -2232,6 +2232,7 @@ emit_sample(struct lp_build_tgsi_soa_context *bld,
 const struct tgsi_full_instruction *inst,
 enum lp_build_tex_modifier modifier,
 boolean compare,
+enum lp_sampler_op_type sample_type,
 LLVMValueRef *texel)
 {
struct gallivm_state *gallivm = bld->bld_base.base.gallivm;
@@ -2245,7 +2246,7 @@ emit_sample(struct lp_build_tgsi_soa_context *bld,
 
unsigned num_offsets, num_derivs, i;
unsigned layer_coord = 0;
-   unsigned sample_key = LP_SAMPLER_OP_TEXTURE << LP_SAMPLER_OP_TYPE_SHIFT;
+   unsigned sample_key = sample_type << LP_SAMPLER_OP_TYPE_SHIFT;
 
memset(, 0, sizeof(params));
 
@@ -3186,7 +3187,7 @@ sample_emit(
struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
 
emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_NONE,
-   FALSE, emit_data->output);
+   FALSE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
 }
 
 static void
@@ -3198,7 +3199,7 @@ sample_b_emit(
struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
 
emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_LOD_BIAS,
-   FALSE, emit_data->output);
+   FALSE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
 }
 
 static void
@@ -3210,7 +3211,7 @@ sample_c_emit(
struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
 
emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_NONE,
-   TRUE, emit_data->output);
+   TRUE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
 }
 
 static void
@@ -3222,7 +3223,7 @@ sample_c_lz_emit(
struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
 
emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_LOD_ZERO,
-   TRUE, emit_data->output);
+   TRUE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
 }
 
 static void
@@ -3234,7 +3235,7 @@ sample_d_emit(
struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
 
emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_EXPLICIT_DERIV,
-   FALSE, emit_data->output);
+   FALSE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
 }
 
 static void
@@ -3246,7 +3247,19 @@ sample_l_emit(
struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
 
emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_EXPLICIT_LOD,
-   FALSE, emit_data->output);
+   FALSE, LP_SAMPLER_OP_TEXTURE, emit_data->output);
+}
+
+static void
+gather4_emit(
+   const struct lp_build_tgsi_action * action,
+   struct lp_build_tgsi_context * bld_base,
+   struct lp_build_emit_data * emit_data)
+{
+   struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
+
+   emit_sample(bld, emit_data->inst, LP_BLD_TEX_MODIFIER_NONE,
+   FALSE, LP_SAMPLER_OP_GATHER, emit_data->output);
 }
 
 static void
@@ -3871,6 +3884,7 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm,
bld.bld_base.op_actions[TGSI_OPCODE_SAMPLE_I].emit = sample_i_emit;
bld.bld_base.op_actions[TGSI_OPCODE_SAMPLE_I_MS].emit = sample_i_emit;
bld.bld_base.op_actions[TGSI_OPCODE_SAMPLE_L].emit = sample_l_emit;
+   bld.bld_base.op_actions[TGSI_OPCODE_GATHER4].emit = gather4_emit;
bld.bld_base.op_actions[TGSI_OPCODE_SVIEWINFO].emit = sviewinfo_emit;
 
if (gs_iface) {
diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index c58ea6a..1264df0 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -2631,6 +2631,9 @@ exec_sample(struct tgsi_exec_machine *mach,
  lod = 
  control = TGSI_SAMPLER_LOD_EXPLICIT;
   }
+  else if (modifier == TEX_MODIFIER_GATHER) {
+ control = TGSI_SAMPLER_GATHER;
+  }
   else {
  assert(modifier == TEX_MODIFIER_LEVEL_ZERO);
  control = TGSI_SAMPLER_LOD_ZERO;
@@ -5687,7 +5690,7 @@ exec_instruction(
   break;
 
case TGSI_OPCODE_GATHER4:
-  assert(0);
+  exec_sample(mach, inst, TEX_MODIFIER_GATHER, FALSE);
   break;
 
case TGSI_OPCODE_SVIEWINFO:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 42/44] spirv: Rework barriers

2017-09-05 Thread Jason Ekstrand
On Tue, Sep 5, 2017 at 8:33 AM, Connor Abbott  wrote:

> As a quick drive-by, yeah, I noticed this too, and it's going to
> require fixes to radv to not break things since none of the other NIR
> opcodes are hooked up (this will be needed for the NIR path in
> radeonsi too, since GLSL-to-NIR already uses those opcodes).
>

Thanks for pointing that out!  Dave, Bas, would either of you mind getting
those properly hooked up?  As a side-note, this patch also made me consider
re-working the NIR barrier intrinsics to be a bit more SPIR-V like.

--Jason


> On Tue, Sep 5, 2017 at 11:13 AM, Jason Ekstrand 
> wrote:
> > Our previous handling of barriers always used the big hammer and didn't
> > correctly emit memory barriers when specified along with a control
> > barrier.  This commit completely reworks the way we emit barriers to
> > make things both more precise and more correct.
> > ---
> >  src/compiler/spirv/spirv_to_nir.c | 132 ++
> ++--
> >  1 file changed, 114 insertions(+), 18 deletions(-)
> >
> > diff --git a/src/compiler/spirv/spirv_to_nir.c
> b/src/compiler/spirv/spirv_to_nir.c
> > index 8653685..6fb27cb 100644
> > --- a/src/compiler/spirv/spirv_to_nir.c
> > +++ b/src/compiler/spirv/spirv_to_nir.c
> > @@ -2570,36 +2570,132 @@ vtn_handle_composite(struct vtn_builder *b,
> SpvOp opcode,
> >  }
> >
> >  static void
> > +vtn_emit_barrier(struct vtn_builder *b, nir_intrinsic_op op)
> > +{
> > +   nir_intrinsic_instr *intrin = nir_intrinsic_instr_create(b->shader,
> op);
> > +   nir_builder_instr_insert(>nb, >instr);
> > +}
> > +
> > +static void
> > +vtn_emit_memory_barrier(struct vtn_builder *b, SpvScope scope,
> > +SpvMemorySemanticsMask semantics)
> > +{
> > +   static const SpvMemorySemanticsMask all_memory_semantics =
> > +  SpvMemorySemanticsUniformMemoryMask |
> > +  SpvMemorySemanticsWorkgroupMemoryMask |
> > +  SpvMemorySemanticsAtomicCounterMemoryMask |
> > +  SpvMemorySemanticsImageMemoryMask;
> > +
> > +   /* If we're not actually doing a memory barrier, bail */
> > +   if (!(semantics & all_memory_semantics))
> > +  return;
> > +
> > +   /* GL and Vulkan don't have these */
> > +   assert(scope != SpvScopeCrossDevice);
> > +
> > +   if (scope == SpvScopeSubgroup)
> > +  return; /* Nothing to do here */
> > +
> > +   if (scope == SpvScopeWorkgroup) {
> > +  vtn_emit_barrier(b, nir_intrinsic_group_memory_barrier);
> > +  return;
> > +   }
> > +
> > +   /* There's only two scopes thing left */
> > +   assert(scope == SpvScopeInvocation || scope == SpvScopeDevice);
> > +
> > +   if ((semantics & all_memory_semantics) == all_memory_semantics) {
> > +  vtn_emit_barrier(b, nir_intrinsic_memory_barrier);
> > +  return;
> > +   }
> > +
> > +   /* Issue a bunch of more specific barriers */
> > +   uint32_t bits = semantics;
> > +   while (bits) {
> > +  SpvMemorySemanticsMask semantic = 1 << u_bit_scan();
> > +  switch (semantic) {
> > +  case SpvMemorySemanticsUniformMemoryMask:
> > + vtn_emit_barrier(b, nir_intrinsic_memory_barrier_buffer);
> > + break;
> > +  case SpvMemorySemanticsWorkgroupMemoryMask:
> > + vtn_emit_barrier(b, nir_intrinsic_memory_barrier_shared);
> > + break;
> > +  case SpvMemorySemanticsAtomicCounterMemoryMask:
> > + vtn_emit_barrier(b, nir_intrinsic_memory_barrier_
> atomic_counter);
> > + break;
> > +  case SpvMemorySemanticsImageMemoryMask:
> > + vtn_emit_barrier(b, nir_intrinsic_memory_barrier_image);
> > + break;
> > +  default:
> > + break;;
> > +  }
> > +   }
> > +}
> > +
> > +static void
> >  vtn_handle_barrier(struct vtn_builder *b, SpvOp opcode,
> > const uint32_t *w, unsigned count)
> >  {
> > -   nir_intrinsic_op intrinsic_op;
> > switch (opcode) {
> > case SpvOpEmitVertex:
> > case SpvOpEmitStreamVertex:
> > -  intrinsic_op = nir_intrinsic_emit_vertex;
> > -  break;
> > case SpvOpEndPrimitive:
> > -   case SpvOpEndStreamPrimitive:
> > -  intrinsic_op = nir_intrinsic_end_primitive;
> > -  break;
> > -   case SpvOpMemoryBarrier:
> > -  intrinsic_op = nir_intrinsic_memory_barrier;
> > -  break;
> > -   case SpvOpControlBarrier:
> > -  intrinsic_op = nir_intrinsic_barrier;
> > +   case SpvOpEndStreamPrimitive: {
> > +  nir_intrinsic_op intrinsic_op;
> > +  switch (opcode) {
> > +  case SpvOpEmitVertex:
> > +  case SpvOpEmitStreamVertex:
> > + intrinsic_op = nir_intrinsic_emit_vertex;
> > + break;
> > +  case SpvOpEndPrimitive:
> > +  case SpvOpEndStreamPrimitive:
> > + intrinsic_op = nir_intrinsic_end_primitive;
> > + break;
> > +  default:
> > + unreachable("Invalid opcode");
> > +  }
> > +
> > +  nir_intrinsic_instr *intrin =
> > + nir_intrinsic_instr_create(b->shader, intrinsic_op);
> 

Re: [Mesa-dev] [PATCH] mesa/mtypes: repack gl_texture_object.

2017-09-05 Thread Brian Paul

On 09/04/2017 05:29 AM, Marek Olšák wrote:

On Sun, Sep 3, 2017 at 1:18 PM, Dave Airlie  wrote:

From: Dave Airlie 

reduces size from 1144 to 1128.

Signed-off-by: Dave Airlie 
---
  src/mesa/main/mtypes.h | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index d44897b..3d68a6d 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -1012,7 +1012,6 @@ struct gl_texture_object
 struct gl_sampler_object Sampler;

 GLenum DepthMode;   /**< GL_ARB_depth_texture */


The patch looks good, but here are some ideas for future improvements:

GLenum can be uint16_t everywhere, because GL doesn't set higher bits:

typedef uint16_t GLenum16.
s/GLenum/GLenum16/


-   bool StencilSampling;   /**< Should we sample stencil instead of depth? 
*/

 GLfloat Priority;   /**< in [0,1] */
 GLint BaseLevel;/**< min mipmap level, OpenGL 1.2 */
@@ -1033,12 +1032,17 @@ struct gl_texture_object
 GLboolean Immutable;/**< GL_ARB_texture_storage */
 GLboolean _IsFloat; /**< GL_OES_float_texture */
 GLboolean _IsHalfFloat; /**< GL_OES_half_float_texture */
+   bool StencilSampling;   /**< Should we sample stencil instead of depth? 
*/
+   bool HandleAllocated;   /**< GL_ARB_bindless_texture */


All bools can be 1 bit:

bool x:1;
GLboolean y:1;

etc.



 GLuint MinLevel;/**< GL_ARB_texture_view */
 GLuint MinLayer;/**< GL_ARB_texture_view */
 GLuint NumLevels;   /**< GL_ARB_texture_view */
 GLuint NumLayers;   /**< GL_ARB_texture_view */


MinLevel, NumLevels can be ubyte (uint8_t). MinLayer, NumLayers can be
ushort (uint16_t)... simply by considering the range of possible
values.


There's lots of opportunities along these lines in gl_texture_image. 
And since we often have many gl_texture_images per gl_texture_object, 
and we often have many textures, it'll probably have considerable 
impact.  I've suggested this in the past but never got around to working 
on it.


I recall Eric Anholt mentioning a memory profiling tool that was helpful 
for finding wasted space in structures, etc.  I don't recall the name 
right now.  Eric?


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/4] i965/screen: Report the correct number of image planes

2017-09-05 Thread Jason Ekstrand
For non-CCS images, we were reporting just one plane even though they
may have multiple in the case of YUV.

Reviewed-by: Ben Widawsky 
---
 src/mesa/drivers/dri/i965/intel_screen.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index d39509b..ace244f 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -799,7 +799,14 @@ intel_query_image(__DRIimage *image, int attrib, int 
*value)
case __DRI_IMAGE_ATTRIB_FOURCC:
   return intel_lookup_fourcc(image->dri_format, value);
case __DRI_IMAGE_ATTRIB_NUM_PLANES:
-  *value = isl_drm_modifier_has_aux(image->modifier) ? 2 : 1;
+  if (isl_drm_modifier_has_aux(image->modifier)) {
+ assert(!image->planar_format || image->planar_format->nplanes == 1);
+ *value = 2;
+  } else if (image->planar_format) {
+ *value = image->planar_format->nplanes;
+  } else {
+ *value = 1;
+  }
   return true;
case __DRI_IMAGE_ATTRIB_OFFSET:
   *value = image->offset;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 4/4] i965/screen: Implement queryDmaBufFormatModifierAttirbs

2017-09-05 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/intel_screen.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index ace244f..3217fee 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -823,6 +823,27 @@ intel_query_image(__DRIimage *image, int attrib, int 
*value)
}
 }
 
+static GLboolean
+intel_query_format_modifier_attribs(__DRIscreen *dri_screen,
+uint32_t fourcc, uint64_t modifier,
+int attrib, uint64_t *value)
+{
+   struct intel_screen *screen = dri_screen->driverPrivate;
+   const struct intel_image_format *f = intel_image_format_lookup(fourcc);
+
+   if (!modifier_is_supported(>devinfo, f, 0, modifier))
+  return false;
+
+   switch (attrib) {
+   case __DRI_IMAGE_FORMAT_MODIFIER_ATTRIB_PLANE_COUNT:
+  *value = isl_drm_modifier_has_aux(modifier) ? 2 : f->nplanes;
+  return true;
+
+   default:
+  return false;
+   }
+}
+
 static __DRIimage *
 intel_dup_image(__DRIimage *orig_image, void *loaderPrivate)
 {
@@ -1267,7 +1288,7 @@ intel_from_planar(__DRIimage *parent, int plane, void 
*loaderPrivate)
 }
 
 static const __DRIimageExtension intelImageExtension = {
-.base = { __DRI_IMAGE, 15 },
+.base = { __DRI_IMAGE, 16 },
 
 .createImageFromName= intel_create_image_from_name,
 .createImageFromRenderbuffer= intel_create_image_from_renderbuffer,
@@ -1289,6 +1310,7 @@ static const __DRIimageExtension intelImageExtension = {
 .createImageFromDmaBufs2= intel_create_image_from_dma_bufs2,
 .queryDmaBufFormats = intel_query_dma_buf_formats,
 .queryDmaBufModifiers   = intel_query_dma_buf_modifiers,
+.queryDmaBufFormatModifierAttribs   = intel_query_format_modifier_attribs,
 };
 
 static uint64_t
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/4] gbm: Add a gbm_device_get_format_modifier_plane_count function

2017-09-05 Thread Jason Ekstrand
This allows the user to query the number of planes required by a given
format+modifier combination without having to create a bo or surface.
---
 src/gbm/backends/dri/gbm_dri.c | 26 ++
 src/gbm/main/gbm.c | 14 ++
 src/gbm/main/gbm.h |  5 +
 src/gbm/main/gbmint.h  |  3 +++
 4 files changed, 48 insertions(+)

diff --git a/src/gbm/backends/dri/gbm_dri.c b/src/gbm/backends/dri/gbm_dri.c
index 1b2cc4c..1361645 100644
--- a/src/gbm/backends/dri/gbm_dri.c
+++ b/src/gbm/backends/dri/gbm_dri.c
@@ -639,6 +639,30 @@ gbm_dri_is_format_supported(struct gbm_device *gbm,
 }
 
 static int
+gbm_dri_get_format_modifier_plane_count(struct gbm_device *gbm,
+uint32_t format,
+uint64_t modifier)
+{
+   struct gbm_dri_device *dri = gbm_dri_device(gbm);
+   uint64_t plane_count;
+
+   if (dri->image->base.version < 16 ||
+   !dri->image->queryDmaBufFormatModifierAttribs)
+  return -1;
+
+   format = gbm_format_canonicalize(format);
+   if (gbm_format_to_dri_format(format) == 0)
+  return -1;
+
+   if (!dri->image->queryDmaBufFormatModifierAttribs(
+ dri->screen, format, modifier,
+ __DRI_IMAGE_FORMAT_MODIFIER_ATTRIB_PLANE_COUNT, _count))
+  return -1;
+
+   return plane_count;
+}
+
+static int
 gbm_dri_bo_write(struct gbm_bo *_bo, const void *buf, size_t count)
 {
struct gbm_dri_bo *bo = gbm_dri_bo(_bo);
@@ -1348,6 +1372,8 @@ dri_device_create(int fd)
dri->base.bo_map = gbm_dri_bo_map;
dri->base.bo_unmap = gbm_dri_bo_unmap;
dri->base.is_format_supported = gbm_dri_is_format_supported;
+   dri->base.get_format_modifier_plane_count =
+  gbm_dri_get_format_modifier_plane_count;
dri->base.bo_write = gbm_dri_bo_write;
dri->base.bo_get_fd = gbm_dri_bo_get_fd;
dri->base.bo_get_planes = gbm_dri_bo_get_planes;
diff --git a/src/gbm/main/gbm.c b/src/gbm/main/gbm.c
index df61ff6..a1c1e8f 100644
--- a/src/gbm/main/gbm.c
+++ b/src/gbm/main/gbm.c
@@ -85,6 +85,20 @@ gbm_device_is_format_supported(struct gbm_device *gbm,
return gbm->is_format_supported(gbm, format, usage);
 }
 
+/** Get the number of planes that are required for a given format+modifier
+ *
+ * \param gbm The gbm device returned from gbm_create_device()
+ * \param format The format to query
+ * \param modifier The modifier to query
+ */
+int
+gbm_device_get_format_modifier_plane_count(struct gbm_device *gbm,
+   uint32_t format,
+   uint64_t modifier)
+{
+   return gbm->get_format_modifier_plane_count(gbm, format, modifier);
+}
+
 /** Destroy the gbm device and free all resources associated with it.
  *
  * \param gbm The device created using gbm_create_device()
diff --git a/src/gbm/main/gbm.h b/src/gbm/main/gbm.h
index aed26a0..7710e61b 100644
--- a/src/gbm/main/gbm.h
+++ b/src/gbm/main/gbm.h
@@ -238,6 +238,11 @@ int
 gbm_device_is_format_supported(struct gbm_device *gbm,
uint32_t format, uint32_t usage);
 
+int
+gbm_device_get_format_modifier_plane_count(struct gbm_device *gbm,
+   uint32_t format,
+   uint64_t modifier);
+
 void
 gbm_device_destroy(struct gbm_device *gbm);
 
diff --git a/src/gbm/main/gbmint.h b/src/gbm/main/gbmint.h
index c27a7a5..9220a4a 100644
--- a/src/gbm/main/gbmint.h
+++ b/src/gbm/main/gbmint.h
@@ -61,6 +61,9 @@ struct gbm_device {
int (*is_format_supported)(struct gbm_device *gbm,
   uint32_t format,
   uint32_t usage);
+   int (*get_format_modifier_plane_count)(struct gbm_device *device,
+  uint32_t format,
+  uint64_t modifier);
 
struct gbm_bo *(*bo_create)(struct gbm_device *gbm,
uint32_t width, uint32_t height,
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 0/4] gbm: Add a modifier_plane_count query

2017-09-05 Thread Jason Ekstrand
This is mostly just a re-send of the original patch series I sent out only
with a couple of reviews and fixes applied.  I'm happy with it and I think
Daniel can confirm that it fixes the problem we're having in modesetting
when trying to enable CCS.  Anyone opposed?

Jason Ekstrand (4):
  dri/image: Add a format modifier attributes query
  gbm: Add a gbm_device_get_format_modifier_plane_count function
  i965/screen: Report the correct number of image planes
  i965/screen: Implement queryDmaBufFormatModifierAttirbs

 include/GL/internal/dri_interface.h  | 27 +-
 src/gbm/backends/dri/gbm_dri.c   | 26 +
 src/gbm/main/gbm.c   | 14 ++
 src/gbm/main/gbm.h   |  5 +
 src/gbm/main/gbmint.h|  3 +++
 src/mesa/drivers/dri/i965/intel_screen.c | 33 ++--
 6 files changed, 105 insertions(+), 3 deletions(-)

-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/4] dri/image: Add a format modifier attributes query

2017-09-05 Thread Jason Ekstrand
---
 include/GL/internal/dri_interface.h | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/GL/internal/dri_interface.h 
b/include/GL/internal/dri_interface.h
index 1c91bde..783ff1c 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -1180,7 +1180,7 @@ struct __DRIdri2ExtensionRec {
  * extensions.
  */
 #define __DRI_IMAGE "DRI_IMAGE"
-#define __DRI_IMAGE_VERSION 15
+#define __DRI_IMAGE_VERSION 16
 
 /**
  * These formats correspond to the similarly named MESA_FORMAT_*
@@ -1360,6 +1360,13 @@ enum __DRIChromaSiting {
 #define __BLIT_FLAG_FLUSH  0x0001
 #define __BLIT_FLAG_FINISH 0x0002
 
+/**
+ * queryDmaBufFormatModifierAttribs attributes
+ */
+
+/* Available in version 16 */
+#define __DRI_IMAGE_FORMAT_MODIFIER_ATTRIB_PLANE_COUNT   0x0001
+
 typedef struct __DRIimageRec  __DRIimage;
 typedef struct __DRIimageExtensionRec __DRIimageExtension;
 struct __DRIimageExtensionRec {
@@ -1600,6 +1607,24 @@ struct __DRIimageExtensionRec {
  int max, uint64_t *modifiers,
  unsigned int *external_only,
  int *count);
+
+   /**
+* dmabuf format modifier attribute query for a given format and modifier.
+*
+* \param fourccThe format to query. If this format is not supported by
+*  the driver, return false.
+* \param modifier  The modifier to query. If this format+modifier is not
+*  supported by the driver, return false.
+* \param attribThe __DRI_IMAGE_FORMAT_MODIFIER_ATTRIB to query.
+* \param value A pointer to where to store the result of the query.
+*
+* Returns true upon success.
+*
+* \since 16
+*/
+   GLboolean (*queryDmaBufFormatModifierAttribs)(__DRIscreen *screen,
+ uint32_t fourcc, uint64_t 
modifier,
+ int attrib, uint64_t *value);
 };
 
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 42/44] spirv: Rework barriers

2017-09-05 Thread Connor Abbott
As a quick drive-by, yeah, I noticed this too, and it's going to
require fixes to radv to not break things since none of the other NIR
opcodes are hooked up (this will be needed for the NIR path in
radeonsi too, since GLSL-to-NIR already uses those opcodes).

On Tue, Sep 5, 2017 at 11:13 AM, Jason Ekstrand  wrote:
> Our previous handling of barriers always used the big hammer and didn't
> correctly emit memory barriers when specified along with a control
> barrier.  This commit completely reworks the way we emit barriers to
> make things both more precise and more correct.
> ---
>  src/compiler/spirv/spirv_to_nir.c | 132 
> --
>  1 file changed, 114 insertions(+), 18 deletions(-)
>
> diff --git a/src/compiler/spirv/spirv_to_nir.c 
> b/src/compiler/spirv/spirv_to_nir.c
> index 8653685..6fb27cb 100644
> --- a/src/compiler/spirv/spirv_to_nir.c
> +++ b/src/compiler/spirv/spirv_to_nir.c
> @@ -2570,36 +2570,132 @@ vtn_handle_composite(struct vtn_builder *b, SpvOp 
> opcode,
>  }
>
>  static void
> +vtn_emit_barrier(struct vtn_builder *b, nir_intrinsic_op op)
> +{
> +   nir_intrinsic_instr *intrin = nir_intrinsic_instr_create(b->shader, op);
> +   nir_builder_instr_insert(>nb, >instr);
> +}
> +
> +static void
> +vtn_emit_memory_barrier(struct vtn_builder *b, SpvScope scope,
> +SpvMemorySemanticsMask semantics)
> +{
> +   static const SpvMemorySemanticsMask all_memory_semantics =
> +  SpvMemorySemanticsUniformMemoryMask |
> +  SpvMemorySemanticsWorkgroupMemoryMask |
> +  SpvMemorySemanticsAtomicCounterMemoryMask |
> +  SpvMemorySemanticsImageMemoryMask;
> +
> +   /* If we're not actually doing a memory barrier, bail */
> +   if (!(semantics & all_memory_semantics))
> +  return;
> +
> +   /* GL and Vulkan don't have these */
> +   assert(scope != SpvScopeCrossDevice);
> +
> +   if (scope == SpvScopeSubgroup)
> +  return; /* Nothing to do here */
> +
> +   if (scope == SpvScopeWorkgroup) {
> +  vtn_emit_barrier(b, nir_intrinsic_group_memory_barrier);
> +  return;
> +   }
> +
> +   /* There's only two scopes thing left */
> +   assert(scope == SpvScopeInvocation || scope == SpvScopeDevice);
> +
> +   if ((semantics & all_memory_semantics) == all_memory_semantics) {
> +  vtn_emit_barrier(b, nir_intrinsic_memory_barrier);
> +  return;
> +   }
> +
> +   /* Issue a bunch of more specific barriers */
> +   uint32_t bits = semantics;
> +   while (bits) {
> +  SpvMemorySemanticsMask semantic = 1 << u_bit_scan();
> +  switch (semantic) {
> +  case SpvMemorySemanticsUniformMemoryMask:
> + vtn_emit_barrier(b, nir_intrinsic_memory_barrier_buffer);
> + break;
> +  case SpvMemorySemanticsWorkgroupMemoryMask:
> + vtn_emit_barrier(b, nir_intrinsic_memory_barrier_shared);
> + break;
> +  case SpvMemorySemanticsAtomicCounterMemoryMask:
> + vtn_emit_barrier(b, nir_intrinsic_memory_barrier_atomic_counter);
> + break;
> +  case SpvMemorySemanticsImageMemoryMask:
> + vtn_emit_barrier(b, nir_intrinsic_memory_barrier_image);
> + break;
> +  default:
> + break;;
> +  }
> +   }
> +}
> +
> +static void
>  vtn_handle_barrier(struct vtn_builder *b, SpvOp opcode,
> const uint32_t *w, unsigned count)
>  {
> -   nir_intrinsic_op intrinsic_op;
> switch (opcode) {
> case SpvOpEmitVertex:
> case SpvOpEmitStreamVertex:
> -  intrinsic_op = nir_intrinsic_emit_vertex;
> -  break;
> case SpvOpEndPrimitive:
> -   case SpvOpEndStreamPrimitive:
> -  intrinsic_op = nir_intrinsic_end_primitive;
> -  break;
> -   case SpvOpMemoryBarrier:
> -  intrinsic_op = nir_intrinsic_memory_barrier;
> -  break;
> -   case SpvOpControlBarrier:
> -  intrinsic_op = nir_intrinsic_barrier;
> +   case SpvOpEndStreamPrimitive: {
> +  nir_intrinsic_op intrinsic_op;
> +  switch (opcode) {
> +  case SpvOpEmitVertex:
> +  case SpvOpEmitStreamVertex:
> + intrinsic_op = nir_intrinsic_emit_vertex;
> + break;
> +  case SpvOpEndPrimitive:
> +  case SpvOpEndStreamPrimitive:
> + intrinsic_op = nir_intrinsic_end_primitive;
> + break;
> +  default:
> + unreachable("Invalid opcode");
> +  }
> +
> +  nir_intrinsic_instr *intrin =
> + nir_intrinsic_instr_create(b->shader, intrinsic_op);
> +
> +  switch (opcode) {
> +  case SpvOpEmitStreamVertex:
> +  case SpvOpEndStreamPrimitive:
> + nir_intrinsic_set_stream_id(intrin, w[1]);
> + break;
> +  default:
> + break;
> +  }
> +
> +  nir_builder_instr_insert(>nb, >instr);
>break;
> -   default:
> -  unreachable("unknown barrier instruction");
> }
>
> -   nir_intrinsic_instr *intrin =
> -  nir_intrinsic_instr_create(b->shader, intrinsic_op);
> +   case SpvOpMemoryBarrier: {
> +  SpvScope scope = 

Re: [Mesa-dev] [PATCH 3/3] docs/release-calendar: update and extend

2017-09-05 Thread Andres Gomez
On Tue, 2017-09-05 at 15:21 +0100, Emil Velikov wrote:
> From: Emil Velikov 
> 
> Cc: Andres Gomez 
> Signed-off-by: Emil Velikov 
> ---
>  docs/release-calendar.html | 27 +--
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/docs/release-calendar.html b/docs/release-calendar.html
> index 554eb6a540f..f95ef9f939a 100644
> --- a/docs/release-calendar.html
> +++ b/docs/release-calendar.html
> @@ -39,57 +39,56 @@ if you'd like to nominate a patch in the next stable 
> release.
>  Notes
>  
>  
> -17.1
> +17.1
>  2017-09-08
>  17.1.9
>  Andres Gomez
> -Final planned release for the 17.1 series
> +
>  
> -
> -17.2
> -2017-08-25
> -17.2.0-rc6
> +2017-09-22
> +17.1.9
   ^^
17.1.10

>  Emil Velikov

I saw you will be attending XDC so maybe you'd prefer Juan to take
charge of this one?

> -May be promoted to 17.2.0 final
> +Final planned release for the 17.1 series
>  
>  
> -2017-09-08
> +17.2
> +2017-09-15
>  17.2.1
>  Emil Velikov
>  
>  
>  
> -2017-09-22
> +2017-09-29
>  17.2.2
>  Juan A. Suarez Romero
>  
>  
>  
> -2017-10-06
> +2017-10-13
>  17.2.3
>  Emil Velikov
>  
>  
>  
> -2017-10-20
> +2017-10-27
>  17.2.4
>  Juan A. Suarez Romero

With the change of date, I'll take care of this one, not Juan.

>  
>  
>  
> -2017-11-03
> +2017-11-10
>  17.2.5
>  Andres Gomez
>  
>  
>  
> -2017-11-17
> +2017-11-24
>  17.2.6
>  Andres Gomez
>  
>  
>  
> -2017-12-01
> +2017-12-08
>  17.2.7
>  Andres Gomez

With the change of date, it is not a very good week for me nor for
Juan. Could you take care of this one?

>  Final planned release for the 17.2 series
-- 
Br,

Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 44/44] compiler/nir_types: Handle vectors in glsl_get_array_element

2017-09-05 Thread Jason Ekstrand
Most of NIR doesn't allow doing array indexing on a vector (though it
does on a matrix).  However, nir_lower_io handles it just fine and this
behavior is needed for shared variables in Vulkan.  This commit makes
glsl_get_array_element do something sensible for vector types and makes
nir_validate happy with them.
---
 src/compiler/nir_types.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp
index 5583bc0..978f7d7 100644
--- a/src/compiler/nir_types.cpp
+++ b/src/compiler/nir_types.cpp
@@ -39,6 +39,8 @@ glsl_get_array_element(const glsl_type* type)
 {
if (type->is_matrix())
   return type->column_type();
+   else if (type->is_vector())
+  return type->get_scalar_type();
return type->fields.array;
 }
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 43/44] nir: Validate base types on array dereferences

2017-09-05 Thread Jason Ekstrand
We were already validating that the parent type goes along with the
child type but we weren't actually validating that the parent type is
reasonable.  This fixes that.
---
 src/compiler/nir/nir_validate.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/compiler/nir/nir_validate.c b/src/compiler/nir/nir_validate.c
index cdbe6a6..fc74dea 100644
--- a/src/compiler/nir/nir_validate.c
+++ b/src/compiler/nir/nir_validate.c
@@ -397,7 +397,8 @@ validate_alu_instr(nir_alu_instr *instr, validate_state 
*state)
 }
 
 static void
-validate_deref_chain(nir_deref *deref, validate_state *state)
+validate_deref_chain(nir_deref *deref, nir_variable_mode mode,
+ validate_state *state)
 {
validate_assert(state, deref->child == NULL || ralloc_parent(deref->child) 
== deref);
 
@@ -405,6 +406,19 @@ validate_deref_chain(nir_deref *deref, validate_state 
*state)
while (deref != NULL) {
   switch (deref->deref_type) {
   case nir_deref_type_array:
+ if (mode == nir_var_shared) {
+/* Shared variables have a bit more relaxed rules because we need
+ * to be able to handle array derefs on vectors.  Fortunately,
+ * nir_lower_io handles these just fine.
+ */
+validate_assert(state, glsl_type_is_array(parent->type) ||
+   glsl_type_is_matrix(parent->type) ||
+   glsl_type_is_vector(parent->type));
+ } else {
+/* Most of NIR cannot handle array derefs on vectors */
+validate_assert(state, glsl_type_is_array(parent->type) ||
+   glsl_type_is_matrix(parent->type));
+ }
  validate_assert(state, deref->type == 
glsl_get_array_element(parent->type));
  if (nir_deref_as_array(deref)->deref_array_type ==
  nir_deref_array_type_indirect)
@@ -451,7 +465,7 @@ validate_deref_var(void *parent_mem_ctx, nir_deref_var 
*deref, validate_state *s
 
validate_var_use(deref->var, state);
 
-   validate_deref_chain(>deref, state);
+   validate_deref_chain(>deref, deref->var->data.mode, state);
 }
 
 static void
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] docs/releasing: polish LLVM_CONFIG wording/handling

2017-09-05 Thread Andres Gomez
On Tue, 2017-09-05 at 15:21 +0100, Emil Velikov wrote:
> From: Emil Velikov 
> 
> Use concistent way to manage "non-default" llvm installations, clearly
  ^^
consistent

> documenting it.
> 
> AKA, use LLVM_CONFIG throughout and unset for the Windows/mingw builds.
> 
> Cc: Andres Gomez 
> Signed-off-by: Emil Velikov 
> ---
>  docs/releasing.html | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/docs/releasing.html b/docs/releasing.html
> index d74411532c8..15c7185949c 100644
> --- a/docs/releasing.html
> +++ b/docs/releasing.html
> @@ -437,8 +437,11 @@ Here is one solution that I've been using.
>   chmod 755 -fR $__build_root; rm -rf $__build_root
>   mkdir -p $__build_root  cd $__build_root
>  
> - # For the distcheck, you may want to specify which LLVM to use:
> + # For the native builds - such as distcheck, scons, sanity test, you
> + # may want to specify which LLVM to use:
>   # export LLVM_CONFIG=/usr/lib/llvm-3.9/bin/llvm-config
> +
> + # Do a full distcheck
>   $__mesa_root/autogen.sh  make distcheck
>  
>   # Build check the tarballs (scons, linux)
> @@ -447,22 +450,22 @@ Here is one solution that I've been using.
>   cd ..  rm -rf mesa-$__version
>  
>   # Build check the tarballs (scons, windows/mingw)
> - # You may need to unset LLVM if you set it before:
> - # unset LLVM_CONFIG
> + # Temporary drop LLVM_CONFIG, unless you have a Windows/mingw one.
> + # save_LLVM_CONFIG=`echo $LLVM_CONFIG`; unset LLVM_CONFIG
>   tar -xaf mesa-$__version.tar.xz  cd mesa-$__version
>   scons platform=windows toolchain=crossmingw
>   cd ..  rm -rf mesa-$__version
>  
>   # Test the automake binaries
>   tar -xaf mesa-$__version.tar.xz  cd mesa-$__version
> - # You may want to specify which LLVM to use:
> + # Restore LLVM_CONFIG, if applicable:
> + # export LLVM_CONFIG=`echo $save_LLVM_CONFIG`

I would also add "; unset save_LLVM_CONFIG" at the end of the previous
line.

Other than that, this is:

Reviewed-by: Andres Gomez 

>   ./configure \
>   --with-dri-drivers=i965,swrast \
>   --with-gallium-drivers=swrast \
>   --with-vulkan-drivers=intel \
>   --enable-llvm-shared-libs \
>   --enable-llvm \
> - --with-llvm-prefix=/usr/lib/llvm-3.9 \
>   --enable-glx-tls \
>   --enable-gbm \
>   --enable-egl \
-- 
Br,

Andres
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 42/44] spirv: Rework barriers

2017-09-05 Thread Jason Ekstrand
Our previous handling of barriers always used the big hammer and didn't
correctly emit memory barriers when specified along with a control
barrier.  This commit completely reworks the way we emit barriers to
make things both more precise and more correct.
---
 src/compiler/spirv/spirv_to_nir.c | 132 --
 1 file changed, 114 insertions(+), 18 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 8653685..6fb27cb 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -2570,36 +2570,132 @@ vtn_handle_composite(struct vtn_builder *b, SpvOp 
opcode,
 }
 
 static void
+vtn_emit_barrier(struct vtn_builder *b, nir_intrinsic_op op)
+{
+   nir_intrinsic_instr *intrin = nir_intrinsic_instr_create(b->shader, op);
+   nir_builder_instr_insert(>nb, >instr);
+}
+
+static void
+vtn_emit_memory_barrier(struct vtn_builder *b, SpvScope scope,
+SpvMemorySemanticsMask semantics)
+{
+   static const SpvMemorySemanticsMask all_memory_semantics =
+  SpvMemorySemanticsUniformMemoryMask |
+  SpvMemorySemanticsWorkgroupMemoryMask |
+  SpvMemorySemanticsAtomicCounterMemoryMask |
+  SpvMemorySemanticsImageMemoryMask;
+
+   /* If we're not actually doing a memory barrier, bail */
+   if (!(semantics & all_memory_semantics))
+  return;
+
+   /* GL and Vulkan don't have these */
+   assert(scope != SpvScopeCrossDevice);
+
+   if (scope == SpvScopeSubgroup)
+  return; /* Nothing to do here */
+
+   if (scope == SpvScopeWorkgroup) {
+  vtn_emit_barrier(b, nir_intrinsic_group_memory_barrier);
+  return;
+   }
+
+   /* There's only two scopes thing left */
+   assert(scope == SpvScopeInvocation || scope == SpvScopeDevice);
+
+   if ((semantics & all_memory_semantics) == all_memory_semantics) {
+  vtn_emit_barrier(b, nir_intrinsic_memory_barrier);
+  return;
+   }
+
+   /* Issue a bunch of more specific barriers */
+   uint32_t bits = semantics;
+   while (bits) {
+  SpvMemorySemanticsMask semantic = 1 << u_bit_scan();
+  switch (semantic) {
+  case SpvMemorySemanticsUniformMemoryMask:
+ vtn_emit_barrier(b, nir_intrinsic_memory_barrier_buffer);
+ break;
+  case SpvMemorySemanticsWorkgroupMemoryMask:
+ vtn_emit_barrier(b, nir_intrinsic_memory_barrier_shared);
+ break;
+  case SpvMemorySemanticsAtomicCounterMemoryMask:
+ vtn_emit_barrier(b, nir_intrinsic_memory_barrier_atomic_counter);
+ break;
+  case SpvMemorySemanticsImageMemoryMask:
+ vtn_emit_barrier(b, nir_intrinsic_memory_barrier_image);
+ break;
+  default:
+ break;;
+  }
+   }
+}
+
+static void
 vtn_handle_barrier(struct vtn_builder *b, SpvOp opcode,
const uint32_t *w, unsigned count)
 {
-   nir_intrinsic_op intrinsic_op;
switch (opcode) {
case SpvOpEmitVertex:
case SpvOpEmitStreamVertex:
-  intrinsic_op = nir_intrinsic_emit_vertex;
-  break;
case SpvOpEndPrimitive:
-   case SpvOpEndStreamPrimitive:
-  intrinsic_op = nir_intrinsic_end_primitive;
-  break;
-   case SpvOpMemoryBarrier:
-  intrinsic_op = nir_intrinsic_memory_barrier;
-  break;
-   case SpvOpControlBarrier:
-  intrinsic_op = nir_intrinsic_barrier;
+   case SpvOpEndStreamPrimitive: {
+  nir_intrinsic_op intrinsic_op;
+  switch (opcode) {
+  case SpvOpEmitVertex:
+  case SpvOpEmitStreamVertex:
+ intrinsic_op = nir_intrinsic_emit_vertex;
+ break;
+  case SpvOpEndPrimitive:
+  case SpvOpEndStreamPrimitive:
+ intrinsic_op = nir_intrinsic_end_primitive;
+ break;
+  default:
+ unreachable("Invalid opcode");
+  }
+
+  nir_intrinsic_instr *intrin =
+ nir_intrinsic_instr_create(b->shader, intrinsic_op);
+
+  switch (opcode) {
+  case SpvOpEmitStreamVertex:
+  case SpvOpEndStreamPrimitive:
+ nir_intrinsic_set_stream_id(intrin, w[1]);
+ break;
+  default:
+ break;
+  }
+
+  nir_builder_instr_insert(>nb, >instr);
   break;
-   default:
-  unreachable("unknown barrier instruction");
}
 
-   nir_intrinsic_instr *intrin =
-  nir_intrinsic_instr_create(b->shader, intrinsic_op);
+   case SpvOpMemoryBarrier: {
+  SpvScope scope = vtn_constant_value(b, w[1])->values[0].u32[0];
+  SpvMemorySemanticsMask semantics =
+ vtn_constant_value(b, w[2])->values[0].u32[0];
+  vtn_emit_memory_barrier(b, scope, semantics);
+  return;
+   }
+
+   case SpvOpControlBarrier: {
+  SpvScope execution_scope =
+ vtn_constant_value(b, w[1])->values[0].u32[0];
+  if (execution_scope == SpvScopeWorkgroup)
+ vtn_emit_barrier(b, nir_intrinsic_barrier);
 
-   if (opcode == SpvOpEmitStreamVertex || opcode == SpvOpEndStreamPrimitive)
-  nir_intrinsic_set_stream_id(intrin, w[1]);
+  SpvScope memory_scope =
+ 

[Mesa-dev] [PATCH 41/44] spirv: Add a vtn_constant_value helper

2017-09-05 Thread Jason Ekstrand
---
 src/compiler/spirv/vtn_private.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/compiler/spirv/vtn_private.h b/src/compiler/spirv/vtn_private.h
index 8458462..e7a7c36 100644
--- a/src/compiler/spirv/vtn_private.h
+++ b/src/compiler/spirv/vtn_private.h
@@ -557,6 +557,12 @@ vtn_value(struct vtn_builder *b, uint32_t value_id,
return val;
 }
 
+static inline nir_constant *
+vtn_constant_value(struct vtn_builder *b, uint32_t value_id)
+{
+   return vtn_value(b, value_id, vtn_value_type_constant)->constant;
+}
+
 void _vtn_warn(const char *file, int line, const char *msg, ...);
 #define vtn_warn(...) _vtn_warn(__FILE__, __LINE__, __VA_ARGS__)
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 33/44] intel/compiler: Call nir_lower_system_values in brw_preprocess_nir

2017-09-05 Thread Jason Ekstrand
---
 src/intel/compiler/brw_nir.c| 2 ++
 src/intel/vulkan/anv_pipeline.c | 1 -
 src/mesa/drivers/dri/i965/brw_program.c | 2 --
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index ce21c01..88e42d1 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -603,6 +603,8 @@ brw_preprocess_nir(const struct brw_compiler *compiler, 
nir_shader *nir)
 
const bool is_scalar = compiler->scalar_stage[nir->stage];
 
+   OPT(nir_lower_system_values);
+
if (nir->stage == MESA_SHADER_GEOMETRY)
   OPT(nir_lower_gs_intrinsics);
 
diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index a3a1bcf..582654e 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -186,7 +186,6 @@ anv_shader_compile_to_nir(struct anv_pipeline *pipeline,
NIR_PASS_V(nir, nir_propagate_invariant);
NIR_PASS_V(nir, nir_lower_io_to_temporaries,
   entry_point->impl, true, false);
-   NIR_PASS_V(nir, nir_lower_system_values);
 
/* Vulkan uses the separate-shader linking model */
nir->info.separate_shader = true;
diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
b/src/mesa/drivers/dri/i965/brw_program.c
index be76947..bc831c0 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -90,8 +90,6 @@ brw_create_nir(struct brw_context *brw,
 
(void)progress;
 
-   NIR_PASS(progress, nir, nir_lower_system_values);
-
nir = brw_preprocess_nir(brw->screen->compiler, nir);
 
if (stage == MESA_SHADER_FRAGMENT) {
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >