Re: [Mesa-dev] [glsl] indvar in ir_loop

2013-10-25 Thread Liu Xin

Ian,

I am sure I comes across trouble for the following code.

(function main
  (signature void
(parameters
)
(
  (loop ((declare () int i@0x8d19434)) ((constant int (0)) ) 
((constant int (32)) ) ((constant int (1)) ) (
(call foo ((var_ref sampler2d@0x8eef134) (var_ref 
myTexCoord@0x8eef05c) ))


  ))

the loop is generated by hand, using the following code.

ir_loop * loop = new (ctx)ir_loop();

ir_variable * indvar = new (ctx) 
ir_variable(glsl_type::int_type, i,ir_var_auto);
ir_dereference * idx = new (ctx) 
ir_dereference_variable(indvar);


loop-from = new(ctx) ir_constant(0);
loop-to = new(ctx) ir_constant(32);
loop-increment =  new (ctx) ir_constant(1);
loop-cmp = ir_binop_less;
loop-counter = indvar;

loop-body_instructions = sig-body;
sig-body.make_empty();

call_link_visitor(link_function.cpp) can not see the variable 
i@0x8d19434. it's because call_link_visitor extends from 
ir_hierachical_visitor. ir_loop::accept(ir_hierarchical_visitor 
*v)**doesn't look at ir_loop::counter .


that is to say, it assumes the indvar is out of loop contruct, right?  
perhaps my usage is wrong.


my changeset makes it like breeze.

index be8b36a..4e4dd4c 100644*
*--- a/src/glsl/ir_hv_accept.cpp
+++ b/src/glsl/ir_hv_accept.cpp
@@ -71,6 +71,7 @@ ir_loop::accept(ir_hierarchical_visitor *v)
if (s != visit_continue)
   return (s == visit_continue_with_parent) ? visit_continue : s;

+   if (this-counter) s = this-counter-accept(v);
s = visit_list_elements(v, this-body_instructions);
if (s == visit_stop)
   return s;
*
*

thanks,
--lx

On 10/12/2013 05:39 AM, Ian Romanick wrote:

On 10/10/2013 11:14 PM, Liu Xin wrote:

Hi, Mesa developers,

According to glsl v1.0, we have loop construct:
for (for-init-statement; condition(opt); expression)
statement-no-new-scope

Variables declared in for-init-statement or condition are only in scope
until the end of the
statement-no-new-scope of the for loop.

let's assume I have a fragment shader:

~/testbed$ cat indvar.frag
void main(void)
{
 vec4 a[32];
 for(int i=0; i10; ++i) {
 if (i == 9)
 gl_FragColor = a[i];
}
}


I found current glsl compiler emits HIR like this:

The HIR loses all notions of scope.


(function main
   (signature void
 (parameters
 )
 (
   (declare () int i@0x988eb84)
   (declare () (array vec4 32) a@0x988ec5c)
   (declare (temporary ) int assignment_tmp@0x988eaac)
   (assign (constant bool (1)) (x) (var_ref
assignment_tmp@0x988eaac)  (constant int (0)) )
   (assign (constant bool (1)) (x) (var_ref i@0x988eb84)  (var_ref
assignment_tmp@0x988eaac) )
   (loop () () () () (
 (if (expression bool ! (expression bool  (var_ref i@0x988eb84)
(constant int (10)) ) ) (
   break
 )
 ())

 (if (expression bool all_equal (var_ref i@0x988eb84) (constant
int (9)) ) (
   (declare (temporary ) vec4 assignment_tmp@0x987cee4)
   (assign (constant bool (1)) (xyzw) (var_ref
assignment_tmp@0x987cee4)  (array_ref (var_ref a@0x988ec5c) (var_ref
i@0x988eb84) ) )
   (assign (constant bool (1)) (xyzw) (var_ref
gl_FragColor@0x96d8fc4)  (var_ref assignment_tmp@0x987cee4) )
 )
 ())

 (declare (temporary ) int assignment_tmp@0x987cb84)
 (assign (constant bool (1)) (x) (var_ref
assignment_tmp@0x987cb84)  (expression int + (var_ref i@0x988eb84)
(constant int (1)) ) )
 (assign (constant bool (1)) (x) (var_ref i@0x988eb84)  (var_ref
assignment_tmp@0x987cb84) )
   ))

 ))

)

I think glsl compiler translates AST like this

int i = 0;
for (;;) {
 if (i  10) break;
 if (i == 9) gl_FragColor = a [ i ] ;
 i = i + 1;
}

Is it correct?

I believe this block is implicitly surrounded by { and }.  I'm pretty
sure that we have test cases for the situation you're describing, but
I'd have to go dig around.


Another question, for class ir_loop, why is ir_loop::counter ir_variable
while from/to/increment are all ir_rvalue? I create an ir_variable for
ir_loop counter, but hierarchical visitor won't access it. I don't think
ir_loop::accept(ir_hierarchical_visitor *v)  will visit ir_loop::counter
at all.

ir_loop::counter is the variable that hold the loop counter.
ir_loop::from is the initial value of the counter, ir_loop::to is the
end value, and ir_loop::increment is the value that ::counter is
modified by on each iteration.


thanks,
--lx
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH 00/18] Implement GLX_MESA_query_renderer

2013-10-25 Thread Dave Airlie

 Do either of you guys plan to implement support for this extension?  The
 value to developers is obviously increased if more drivers support the
 extension.  This extension was born from feedback that I received from
 people at FOSDEM and from various game developers at Game Developer
 Conference and elsewhere.

 I'd like to land this extension, and I haven't received any review.  I
 know you guys are both pretty busy, so I don't expect detailed reviews.
  I would really appreciate a quick skim of the extension spec (patch 15)
 and an Acked-by or two.

Is there a test app or piglit set for this? I might try and fit in
looking at this,

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] R600: Make sure OQAP defs and uses happen in the same clause

2013-10-25 Thread Vincent Lejeune
This patch should work when checking than no OQAP is used before beeing queued, 
assuming that a value in OQAP is consumed
and cannot be read twice. However I'm not sure I cover all LDS instructions 
that queues a value, I only use LDS_RET_READ in switch case.

Vincent



- Mail original -
 De : Tom Stellard t...@stellard.net
 À : Vincent Lejeune v...@ovi.com
 Cc : llvm-comm...@cs.uiuc.edu llvm-comm...@cs.uiuc.edu; 
 mesa-dev@lists.freedesktop.org mesa-dev@lists.freedesktop.org; Tom 
 Stellard thomas.stell...@amd.com
 Envoyé le : Mardi 22 octobre 2013 23h20
 Objet : Re: [PATCH] R600: Make sure OQAP defs and uses happen in the same 
 clause
 
 Hi Vincent,
 
 Here is an updated patch.  I wasn't sure where to put the assertion to
 check that UnscheduledNoLiveOut{Defs,Uses} is empty when switching to a
 new clause.  I tried adding it to R600SchedStartegy::schedNode() behind
 the if (NextInstKind != CurInstKind) condition, but it always failed.
 Any suggestions on where I should but it?
 
 -Tom
 
 
 On Mon, Oct 21, 2013 at 12:40:28PM -0700, Vincent Lejeune wrote:
 
 
 
 
  - Mail original -
   De : Tom Stellard t...@stellard.net
   À : llvm-comm...@cs.uiuc.edu
   Cc : mesa-dev@lists.freedesktop.org; Tom Stellard 
 thomas.stell...@amd.com
   Envoyé le : Vendredi 11 octobre 2013 20h10
   Objet : [PATCH] R600: Make sure OQAP defs and uses happen in the same 
 clause
   
   From: Tom Stellard thomas.stell...@amd.com
   
   Reading the special OQAP register pops the top value off the LDS
   input queue and returns it to the instruction.  This queue is
   invalidated at the end of an ALU clause and leaving values in the 
 queue
   can lead to GPU hangs.  This means that if we load a value into the 
 queue,
   we must use it before the end of the clause.
   
   This fixes some hangs in the OpenCV test suite.
   ---
   lib/Target/R600/R600MachineScheduler.cpp | 25 
 +
   lib/Target/R600/R600MachineScheduler.h   |  4 ++--
   test/CodeGen/R600/lds-input-queue.ll     | 26 
 ++
   3 files changed, 41 insertions(+), 14 deletions(-)
   create mode 100644 test/CodeGen/R600/lds-input-queue.ll
   
   diff --git a/lib/Target/R600/R600MachineScheduler.cpp 
   b/lib/Target/R600/R600MachineScheduler.cpp
   index 6c26d9e..611b7f4 100644
   --- a/lib/Target/R600/R600MachineScheduler.cpp
   +++ b/lib/Target/R600/R600MachineScheduler.cpp
   @@ -93,11 +93,12 @@ SUnit* R600SchedStrategy::pickNode(bool 
 IsTopNode) 
   {
      }
   
   
   -  // We want to scheduled AR defs as soon as possible to make sure 
 they 
   aren't
   -  // put in a different ALU clause from their uses.
   -  if (!SU  !UnscheduledARDefs.empty()) {
   -      SU = UnscheduledARDefs[0];
   -      UnscheduledARDefs.erase(UnscheduledARDefs.begin());
   +  // We want to scheduled defs that cannot be live outside of this 
 clause 
   +  // as soon as possible to make sure they aren't put in a 
 different
   +  // ALU clause from their uses.
   +  if (!SU  !UnscheduledNoLiveOutDefs.empty()) {
   +      SU = UnscheduledNoLiveOutDefs[0];
   +      
 UnscheduledNoLiveOutDefs.erase(UnscheduledNoLiveOutDefs.begin());
          NextInstKind = IDAlu;
      }
   
   @@ -132,9 +133,9 @@ SUnit* R600SchedStrategy::pickNode(bool 
 IsTopNode) 
   {
   
      // We want to schedule the AR uses as late as possible to make sure 
 that
      // the AR defs have been released.
   -  if (!SU  !UnscheduledARUses.empty()) {
   -      SU = UnscheduledARUses[0];
   -      UnscheduledARUses.erase(UnscheduledARUses.begin());
   +  if (!SU  !UnscheduledNoLiveOutUses.empty()) {
   +      SU = UnscheduledNoLiveOutUses[0];
   +      
 UnscheduledNoLiveOutUses.erase(UnscheduledNoLiveOutUses.begin());
 
  Can we use std::queueSUnit* instead of a std::vector for 
 UnscheduledNoLiveOutUses ?
  I had to use a vector because I needed to be able to pop non topmost SUnit 
 in some case
  (to fit Instruction Group const read limitation) but I would rather avoid 
 erase(iterator) call
  when possible.
 
 
          NextInstKind = IDAlu;
      }
   
   @@ -217,15 +218,15 @@ void R600SchedStrategy::releaseBottomNode(SUnit 
 *SU) 
   {
   
      int IK = getInstKind(SU);
   
   -  // Check for AR register defines
   +  // Check for registers that do not live across ALU clauses.
      for (MachineInstr::const_mop_iterator I = 
   SU-getInstr()-operands_begin(),
                                            E = 
   SU-getInstr()-operands_end();
                                            I != E; ++I) {
   -    if (I-isReg()  I-getReg() == AMDGPU::AR_X) 
 {
   +    if (I-isReg()  (I-getReg() == AMDGPU::AR_X || 
   I-getReg() == AMDGPU::OQAP)) {
          if (I-isDef()) {
   -        UnscheduledARDefs.push_back(SU);
   +        UnscheduledNoLiveOutDefs.push_back(SU);
          } else {
   -        UnscheduledARUses.push_back(SU);
   +        UnscheduledNoLiveOutUses.push_back(SU);
          }
          return;
        }
   diff --git 

Re: [Mesa-dev] [PATCH] i965: Make fs gl_PrimitiveID input work even when there's no gs.

2013-10-25 Thread Paul Berry
On 23 October 2013 10:51, Eric Anholt e...@anholt.net wrote:

 Paul Berry stereotype...@gmail.com writes:

  When a geometry shader is present, the fragment shader gl_PrimitiveID
  input acts like an ordinary varying, receiving data from the gs
  gl_PrimitiveID output.  When there's no geometry shader, we have to
  ask the fixed function SF hardware to provide the primitive ID to the
  fragment shader instead.
 
  Previously, the SF setup code would handle this situation by
  recognizing that the FS gl_PrimitiveID input didn't match to any VS
  output; since normally an FS input with no corresponding VS output
  leads to undefined data, the SF setup code used to just arbitrarily
  assign it to receive data from attribute 0.
 
  This patch changes the SF setup code so that instead of arbitrarily
  using attribute 0, it assigns the unmatched FS input to receive
  gl_PrimitiveID.  In the case where the FS input really is
  gl_PrimitiveID, this produces the intended result.  In all other
  cases, no harm is done since GL specifies that the behaviour is
  undefined.
 
  Fixes piglit test primitive-id-no-gs.

 Reviewed-by: Eric Anholt e...@anholt.net


I was about to push this when I realized that it regressed point sprite
functionality.  It seems that if an attribute has its component override
bots set *and* its point sprite texture coordinate enable bit set, the
component override takes predence (this isn't documented; I found it out by
running piglit tests).  As a result, this patch was causing gl_PointCoord
to get overridden with gl_PrimitiveID.

I'll follow up shortly with a corrected patch.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965: Make fs gl_PrimitiveID input work even when there's no gs.

2013-10-25 Thread Paul Berry
When a geometry shader is present, the fragment shader gl_PrimitiveID
input acts like an ordinary varying, receiving data from the gs
gl_PrimitiveID output.  When there's no geometry shader, we have to
ask the fixed function SF hardware to provide the primitive ID to the
fragment shader instead.

Previously, the SF setup code would handle this situation by
recognizing that the FS gl_PrimitiveID input didn't match to any VS
output; since normally an FS input with no corresponding VS output
leads to undefined data, the SF setup code used to just arbitrarily
assign it to receive data from attribute 0.

This patch changes the SF setup code so that instead of arbitrarily
using attribute 0, it assigns the unmatched FS input to receive
gl_PrimitiveID.  In the case where the FS input really is
gl_PrimitiveID, this produces the intended result.  In all other
cases, no harm is done since GL specifies that the behaviour is
undefined.

Fixes piglit test primitive-id-no-gs.

Reviewed-by: Eric Anholt e...@anholt.net

v2: If an attribute is already being overridden with point
coordinates, don't try to also override it with gl_PrimitiveID.  This
is necessary to avoid regressing piglit tests such as
shaders/glsl-fs-pointcoord.
---
 src/mesa/drivers/dri/i965/brw_defines.h   |  4 
 src/mesa/drivers/dri/i965/gen6_sf_state.c | 27 ++-
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 5ba9d45..b661194 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1508,6 +1508,10 @@ enum brw_message_target {
 # define ATTRIBUTE_0_OVERRIDE_Y(1  13)
 # define ATTRIBUTE_0_OVERRIDE_X(1  12)
 # define ATTRIBUTE_0_CONST_SOURCE_SHIFT9
+#  define ATTRIBUTE_CONST_ 0
+#  define ATTRIBUTE_CONST_0001_FLOAT   1
+#  define ATTRIBUTE_CONST__FLOAT   2
+#  define ATTRIBUTE_CONST_PRIM_ID  3
 # define ATTRIBUTE_0_SWIZZLE_SHIFT 6
 # define ATTRIBUTE_0_SOURCE_SHIFT  0
 
diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c 
b/src/mesa/drivers/dri/i965/gen6_sf_state.c
index 6a9fa60..47d76e9 100644
--- a/src/mesa/drivers/dri/i965/gen6_sf_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c
@@ -80,10 +80,23 @@ get_attr_override(const struct brw_vue_map *vue_map, int 
urb_entry_read_offset,
* the vertex shader, so its value is undefined.  Therefore the
* attribute override we supply doesn't matter.
*
-   * In either case the attribute override we supply doesn't matter, so
-   * just reference the first available attribute.
+   * (c) This attribute is gl_PrimitiveID, and it wasn't written by the
+   * previous shader stage.
+   *
+   * Note that we don't have to worry about the cases where the attribute
+   * is gl_PointCoord or is undergoing point sprite coordinate
+   * replacement, because in those cases, this function isn't called.
+   *
+   * In case (c), we need to program the attribute overrides so that the
+   * primitive ID will be stored in this slot.  In every other case, the
+   * attribute override we supply doesn't matter.  So just go ahead and
+   * program primitive ID in every case.
*/
-  return 0;
+  return (ATTRIBUTE_0_OVERRIDE_W |
+  ATTRIBUTE_0_OVERRIDE_Z |
+  ATTRIBUTE_0_OVERRIDE_Y |
+  ATTRIBUTE_0_OVERRIDE_X |
+  (ATTRIBUTE_CONST_PRIM_ID  ATTRIBUTE_0_CONST_SOURCE_SHIFT));
}
 
/* Compute the location of the attribute relative to urb_entry_read_offset.
@@ -149,13 +162,17 @@ calculate_attr_overrides(const struct brw_context *brw,
 continue;
 
   /* _NEW_POINT */
+  bool point_sprite = false;
   if (brw-ctx.Point.PointSprite 
  (attr = VARYING_SLOT_TEX0  attr = VARYING_SLOT_TEX7) 
  brw-ctx.Point.CoordReplace[attr - VARYING_SLOT_TEX0]) {
-*point_sprite_enables |= (1  input_index);
+ point_sprite = true;
   }
 
   if (attr == VARYING_SLOT_PNTC)
+ point_sprite = true;
+
+  if (point_sprite)
 *point_sprite_enables |= (1  input_index);
 
   /* flat shading */
@@ -165,7 +182,7 @@ calculate_attr_overrides(const struct brw_context *brw,
  *flat_enables |= (1  input_index);
 
   /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_LIGHT | _NEW_PROGRAM */
-  uint16_t attr_override =
+  uint16_t attr_override = point_sprite ? 0 :
  get_attr_override(brw-vue_map_geom_out,
   urb_entry_read_offset, attr,
brw-ctx.VertexProgram._TwoSideEnabled,
-- 
1.8.4.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [Bug 34495] Selecting objects in Blender 2.56 slow due the software gl_select mode

2013-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=34495

Alex Deucher ag...@yahoo.com changed:

   What|Removed |Added

   Assignee|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop.
   |.org|org
Summary|Selecting objects in|Selecting objects in
   |Blender 2.56 slow with  |Blender 2.56 slow due the
   |gallium r600 driver |software gl_select mode
  Component|Drivers/Gallium/r600|Mesa core

--- Comment #73 from Alex Deucher ag...@yahoo.com ---
(In reply to comment #71)
 (In reply to comment #70)
  I wonder if the fix has been committed? I am using Debian testing with the
  mesa 9.1-7 package provided in the testing repository, and the selections
  with Blender are still very slow.
 
 It has been merged to master in mesa 9.2.

It hasn't been merged yet.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] graw: add a test rendering a huge triangle

2013-10-25 Thread Jose Fonseca
Looks good.

A future improvement could be querying PIPE_CAP_MAX_TEXTURE_2D_LEVELS instead 
of using a constant width/height.

Jose


- Original Message -
 Used to test rasterization, because we often breakdown on
 subdivision of triangles with long edges.
 
 Signed-off-by: Zack Rusin za...@vmware.com
 ---
  src/gallium/tests/graw/SConscript  |   1 +
  src/gallium/tests/graw/tri-large.c | 173
  +
  2 files changed, 174 insertions(+)
  create mode 100644 src/gallium/tests/graw/tri-large.c
 
 diff --git a/src/gallium/tests/graw/SConscript
 b/src/gallium/tests/graw/SConscript
 index 8740ff3..8723807 100644
 --- a/src/gallium/tests/graw/SConscript
 +++ b/src/gallium/tests/graw/SConscript
 @@ -29,6 +29,7 @@ progs = [
  'tex-srgb',
  'tex-swizzle',
  'tri',
 +'tri-large',
  'tri-gs',
  'tri-instanced',
  'vs-test',
 diff --git a/src/gallium/tests/graw/tri-large.c
 b/src/gallium/tests/graw/tri-large.c
 new file mode 100644
 index 000..3fbbfb3
 --- /dev/null
 +++ b/src/gallium/tests/graw/tri-large.c
 @@ -0,0 +1,173 @@
 +/* Display a cleared blue window.  This demo has no dependencies on
 + * any utility code, just the graw interface and gallium.
 + */
 +
 +#include graw_util.h
 +#include util/u_debug.h
 +
 +#include stdio.h
 +
 +static struct graw_info info;
 +
 +static const int WIDTH = 4*2048;
 +static const int HEIGHT = 4*2048;
 +
 +
 +struct vertex {
 +   float position[4];
 +   float color[4];
 +};
 +
 +static boolean FlatShade = FALSE;
 +
 +
 +static struct vertex vertices[3] =
 +{
 +   {
 +  { -1.0f, -1.0f, 0.0f, 1.0f },
 +  { 1.0f, 0.0f, 0.0f, 1.0f }
 +   },
 +   {
 +  { -1.0f, 1.0f, 0.0f, 1.0f },
 +  { 0.0f, 1.0f, 0.0f, 1.0f }
 +   },
 +   {
 +  { 1.0f, 1.0f, 0.0f, 1.0f },
 +  { 0.0f, 0.0f, 1.0f, 1.0f }
 +   }
 +};
 +
 +
 +static void set_vertices( void )
 +{
 +   struct pipe_vertex_element ve[2];
 +   struct pipe_vertex_buffer vbuf;
 +   void *handle;
 +
 +   memset(ve, 0, sizeof ve);
 +
 +   ve[0].src_offset = Offset(struct vertex, position);
 +   ve[0].src_format = PIPE_FORMAT_R32G32B32A32_FLOAT;
 +   ve[1].src_offset = Offset(struct vertex, color);
 +   ve[1].src_format = PIPE_FORMAT_R32G32B32A32_FLOAT;
 +
 +   handle = info.ctx-create_vertex_elements_state(info.ctx, 2, ve);
 +   info.ctx-bind_vertex_elements_state(info.ctx, handle);
 +
 +   memset(vbuf, 0, sizeof vbuf);
 +
 +   vbuf.stride = sizeof( struct vertex );
 +   vbuf.buffer_offset = 0;
 +   vbuf.buffer = pipe_buffer_create_with_data(info.ctx,
 +  PIPE_BIND_VERTEX_BUFFER,
 +  PIPE_USAGE_STATIC,
 +  sizeof(vertices),
 +  vertices);
 +
 +   info.ctx-set_vertex_buffers(info.ctx, 0, 1, vbuf);
 +}
 +
 +
 +static void set_vertex_shader( void )
 +{
 +   void *handle;
 +   const char *text =
 +  VERT\n
 +  DCL IN[0]\n
 +  DCL IN[1]\n
 +  DCL OUT[0], POSITION\n
 +  DCL OUT[1], COLOR\n
 +0: MOV OUT[1], IN[1]\n
 +1: MOV OUT[0], IN[0]\n
 +2: END\n;
 +
 +   handle = graw_parse_vertex_shader(info.ctx, text);
 +   info.ctx-bind_vs_state(info.ctx, handle);
 +}
 +
 +
 +static void set_fragment_shader( void )
 +{
 +   void *handle;
 +   const char *text =
 +  FRAG\n
 +  DCL IN[0], COLOR, LINEAR\n
 +  DCL OUT[0], COLOR\n
 +0: MOV OUT[0], IN[0]\n
 +1: END\n;
 +
 +   handle = graw_parse_fragment_shader(info.ctx, text);
 +   info.ctx-bind_fs_state(info.ctx, handle);
 +}
 +
 +
 +static void draw( void )
 +{
 +   union pipe_color_union clear_color = { {1,0,1,1} };
 +
 +   info.ctx-clear(info.ctx, PIPE_CLEAR_COLOR, clear_color, 0, 0);
 +   util_draw_arrays(info.ctx, PIPE_PRIM_TRIANGLES, 0, 3);
 +   info.ctx-flush(info.ctx, NULL, 0);
 +
 +   graw_save_surface_to_file(info.ctx, info.color_surf[0], NULL);
 +
 +   graw_util_flush_front(info);
 +}
 +
 +
 +static void init( void )
 +{
 +   if (!graw_util_create_window(info, WIDTH, HEIGHT, 1, FALSE))
 +  exit(1);
 +
 +   graw_util_default_state(info, FALSE);
 +
 +   {
 +  struct pipe_rasterizer_state rasterizer;
 +  void *handle;
 +  memset(rasterizer, 0, sizeof rasterizer);
 +  rasterizer.cull_face = PIPE_FACE_NONE;
 +  rasterizer.half_pixel_center = 1;
 +  rasterizer.bottom_edge_rule = 1;
 +  rasterizer.flatshade = FlatShade;
 +  rasterizer.depth_clip = 1;
 +  handle = info.ctx-create_rasterizer_state(info.ctx, rasterizer);
 +  info.ctx-bind_rasterizer_state(info.ctx, handle);
 +   }
 +
 +
 +   graw_util_viewport(info, 0, 0, WIDTH, HEIGHT, 30, 1000);
 +
 +   set_vertices();
 +   set_vertex_shader();
 +   set_fragment_shader();
 +}
 +
 +static void args(int argc, char *argv[])
 +{
 +   int i;
 +
 +   for (i = 1; i  argc; ) {
 +  if (graw_parse_args(i, argc, argv)) {
 + /* ok */
 +  }
 +  else if (strcmp(argv[i], -f) 

Re: [Mesa-dev] [PATCH] mesa: Update MESA_INFO to eliminate error

2013-10-25 Thread Brian Paul

On 10/24/2013 01:13 PM, Courtney Goeltzenleuchter wrote:

If a user set MESA_INFO and the OpenGL application uses a
3.0 or later context then the MESA_INFO debug output will have
an error when it queries for extensions using the deprecated
enum GL_EXTENSIONS. Passing context argument allows code
to return extension list directly regardless of profile.
Commit title updated as recommended by Kenneth Graunke.
---


Reviewed-by: Brian Paul bri...@vmware.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70864] New: classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon

2013-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70864

  Priority: medium
Bug ID: 70864
  Assignee: mesa-dev@lists.freedesktop.org
   Summary: classic drivers needlessly link to libdrm_intel /
libdrm_nouveau / libdrm_radeon
  Severity: normal
Classification: Unclassified
OS: Linux (All)
  Reporter: fabio@libero.it
  Hardware: x86 (IA32)
Status: NEW
   Version: git
 Component: Mesa core
   Product: Mesa

I noticed (using ldd, incidentally while testing the new mesa_dri_drivers.so)
the classic drivers (radeon, r200, i915, i965, nouveau_vieux) links to all
three of libdrm_intel / libdrm_nouveau / libdrm_radeon, while only the matching
one should be needed.

Gallium drivers (i915, r300, r600, radeonsi, nouveau) are OK, linking only to
their libdrm.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70864] classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon

2013-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70864

--- Comment #1 from Fabio Pedretti fabio@libero.it ---
It looks every classic driver (including swrast) now includes all the classic
drivers, they are more or less a copy of mesa_dri_drivers.so.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 69437] Composite Bypass no longer works

2013-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=69437

U. Artie Eoff ullysses.a.e...@intel.com changed:

   What|Removed |Added

 Status|RESOLVED|VERIFIED

--- Comment #8 from U. Artie Eoff ullysses.a.e...@intel.com ---
Verified fixed on both master and 9.2 branches... Thanks!

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70864] classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon

2013-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70864

Matt Turner matts...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |NOTABUG

--- Comment #2 from Matt Turner matts...@gmail.com ---
(In reply to comment #1)
 It looks every classic driver (including swrast) now includes all the
 classic drivers, they are more or less a copy of mesa_dri_drivers.so.

They are in fact exact copies -- they're hardlinks.

The point of mega drivers was to link all of the (classic) drivers into a
single file.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70864] classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon

2013-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70864

--- Comment #3 from Fabio Pedretti fabio@libero.it ---
Thanks but it doesn't look they are hardlinks anyway, the md5sum all differ,
also their size is slightly different.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 34495] Selecting objects in Blender 2.56 slow due the software gl_select mode

2013-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=34495

--- Comment #74 from hapoofesg...@goingon.ir ---
S(In reply to comment #73)
 (In reply to comment #71)
  (In reply to comment #70)
   I wonder if the fix has been committed? I am using Debian testing with the
   mesa 9.1-7 package provided in the testing repository, and the selections
   with Blender are still very slow.
  
  It has been merged to master in mesa 9.2.
 
 It hasn't been merged yet.

So why i don't have any selection problems?
I'm currently running fedora 19  mesa 9.2.1 and i don't see those slow
selections anymore.
BTW sorry if i was/am wrong.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70864] classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon

2013-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70864

--- Comment #4 from Johannes Obermayr johannesoberm...@gmx.de ---
(In reply to comment #2)
 The point of mega drivers was to link all of the (classic) drivers into a
 single file.

... to waste memory on runtime if you make use of packages provided by
distributions which contain all classic drivers ...

I bet the solution to make only required symbols PUBLIC in former libdricore
and libgallium isn't that much worse like you propagate.
I just want to mention that there was never a comparision to my patchset which
closes a lot of symbols for libdricore's replacement:
http://lists.freedesktop.org/archives/mesa-dev/2013-September/044593.html

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] llvmpipe: fix bogus layer clamping in setup

2013-10-25 Thread sroland
From: Roland Scheidegger srol...@vmware.com

The layer coming from GS needs to be clamped (not sure if that's actually
the correct error behavior but we need something) as the number can be higher
than the amount of layers in the fb. However, this code was using the layer
calculation from the scene, and this was actually calculated in
lp_scene_begin_rasterization() hence too late (so setup was using the value
from the _previous_ scene or just zero if it was the first scene).
Since the value is used in both rasterization and setup, move calculation up
to lp_scene_begin_binning() though it's a bit more inconvenient to calculate
there. (Theoretically could move _all_ code which was in
lp_scene_begin_rasterization() to there, because ever since we got rid of
swizzled render/depth buffers our map functions preparing the fb data for
render don't actually change the data in there at all, but it feels like
it would be a hack.)
---
 src/gallium/drivers/llvmpipe/lp_scene.c |   25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_scene.c 
b/src/gallium/drivers/llvmpipe/lp_scene.c
index 2abbd25..483bfa5 100644
--- a/src/gallium/drivers/llvmpipe/lp_scene.c
+++ b/src/gallium/drivers/llvmpipe/lp_scene.c
@@ -151,7 +151,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
 {
const struct pipe_framebuffer_state *fb = scene-fb;
int i;
-   unsigned max_layer = ~0;
 
//LP_DBG(DEBUG_RAST, %s\n, __FUNCTION__);
 
@@ -162,7 +161,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
cbuf-u.tex.level);
  scene-cbufs[i].layer_stride = llvmpipe_layer_stride(cbuf-texture,
   
cbuf-u.tex.level);
- max_layer = MIN2(max_layer, cbuf-u.tex.last_layer - 
cbuf-u.tex.first_layer);
 
  scene-cbufs[i].map = llvmpipe_resource_map(cbuf-texture,
  cbuf-u.tex.level,
@@ -173,7 +171,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
  struct llvmpipe_resource *lpr = llvmpipe_resource(cbuf-texture);
  unsigned pixstride = util_format_get_blocksize(cbuf-format);
  scene-cbufs[i].stride = cbuf-texture-width0;
- max_layer = 0;
 
  scene-cbufs[i].map = lpr-data;
  scene-cbufs[i].map += cbuf-u.buf.first_element * pixstride;
@@ -184,15 +181,12 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
   struct pipe_surface *zsbuf = scene-fb.zsbuf;
   scene-zsbuf.stride = llvmpipe_resource_stride(zsbuf-texture, 
zsbuf-u.tex.level);
   scene-zsbuf.layer_stride = llvmpipe_layer_stride(zsbuf-texture, 
zsbuf-u.tex.level);
-  max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer - 
zsbuf-u.tex.first_layer);
 
   scene-zsbuf.map = llvmpipe_resource_map(zsbuf-texture,
zsbuf-u.tex.level,
zsbuf-u.tex.first_layer,
LP_TEX_USAGE_READ_WRITE);
}
-
-   scene-fb_max_layer = max_layer;
 }
 
 
@@ -506,6 +500,9 @@ end:
 void lp_scene_begin_binning( struct lp_scene *scene,
  struct pipe_framebuffer_state *fb, boolean 
discard )
 {
+   int i;
+   unsigned max_layer = ~0;
+
assert(lp_scene_is_empty(scene));
 
scene-discard = discard;
@@ -513,9 +510,23 @@ void lp_scene_begin_binning( struct lp_scene *scene,
 
scene-tiles_x = align(fb-width, TILE_SIZE) / TILE_SIZE;
scene-tiles_y = align(fb-height, TILE_SIZE) / TILE_SIZE;
-
assert(scene-tiles_x = TILES_X);
assert(scene-tiles_y = TILES_Y);
+
+   for (i = 0; i  scene-fb.nr_cbufs; i++) {
+  struct pipe_surface *cbuf = scene-fb.cbufs[i];
+  if (llvmpipe_resource_is_texture(cbuf-texture)) {
+ max_layer = MIN2(max_layer, cbuf-u.tex.last_layer - 
cbuf-u.tex.first_layer);
+  }
+  else {
+ max_layer = 0;
+  }
+   }
+   if (fb-zsbuf) {
+  struct pipe_surface *zsbuf = scene-fb.zsbuf;
+  max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer - 
zsbuf-u.tex.first_layer);
+   }
+   scene-fb_max_layer = max_layer;
 }
 
 
-- 
1.7.9.5
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 70864] classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon

2013-10-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=70864

--- Comment #5 from Matt Turner matts...@gmail.com ---
(In reply to comment #3)
 Thanks but it doesn't look they are hardlinks anyway, the md5sum all differ,
 also their size is slightly different.

Strange, that's not what I see on my system:

mattst88@work-Thinkpad mesa % md5sum $(find -name '*_dri.so')   
e918941fd19d4afaeb5834e90c5f88a4  ./lib/swrast_dri.so
e918941fd19d4afaeb5834e90c5f88a4  ./lib/r200_dri.so
e918941fd19d4afaeb5834e90c5f88a4  ./lib/i915_dri.so
e918941fd19d4afaeb5834e90c5f88a4  ./lib/i965_dri.so
e918941fd19d4afaeb5834e90c5f88a4  ./lib/radeon_dri.so
e918941fd19d4afaeb5834e90c5f88a4  ./install/usr/local/lib/dri/swrast_dri.so
e918941fd19d4afaeb5834e90c5f88a4  ./install/usr/local/lib/dri/r200_dri.so
e918941fd19d4afaeb5834e90c5f88a4  ./install/usr/local/lib/dri/i915_dri.so
e918941fd19d4afaeb5834e90c5f88a4  ./install/usr/local/lib/dri/i965_dri.so
e918941fd19d4afaeb5834e90c5f88a4  ./install/usr/local/lib/dri/radeon_dri.so

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 06/14] glsl: Add built-in functions and constants required for ARB_shader_atomic_counters.

2013-10-25 Thread Ian Romanick
Reviewed-by: Ian Romanick ian.d.roman...@intel.com

On 10/01/2013 07:15 PM, Francisco Jerez wrote:
 v2: Represent atomics as GLSL intrinsics.
 ---
  src/glsl/builtin_functions.cpp  | 58 
 +
  src/glsl/builtin_variables.cpp  | 15 +++
  src/glsl/glcpp/glcpp-parse.y|  3 +++
  src/glsl/glsl_parser_extras.cpp |  6 +
  src/glsl/glsl_parser_extras.h   |  7 +
  5 files changed, 89 insertions(+)
 
 diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
 index 03efb6d..d704b84 100644
 --- a/src/glsl/builtin_functions.cpp
 +++ b/src/glsl/builtin_functions.cpp
 @@ -300,6 +300,13 @@ tex3d_lod(const _mesa_glsl_parse_state *state)
  {
 return tex3d(state)  lod_exists_in_stage(state);
  }
 +
 +static bool
 +shader_atomic_counters(const _mesa_glsl_parse_state *state)
 +{
 +   return state-ARB_shader_atomic_counters_enable;
 +}
 +
  /** @} */
  
  
 /**/
 @@ -515,6 +522,11 @@ private:
 B1(fma)
 B2(ldexp)
 B2(frexp)
 +
 +   ir_function_signature *_atomic_intrinsic(builtin_available_predicate 
 avail);
 +   ir_function_signature *_atomic_op(const char *intrinsic,
 + builtin_available_predicate avail);
 +
  #undef B0
  #undef B1
  #undef B2
 @@ -621,6 +633,15 @@ builtin_builder::create_shader()
  void
  builtin_builder::create_intrinsics()
  {
 +   add_function(__intrinsic_atomic_read,
 +_atomic_intrinsic(shader_atomic_counters),
 +NULL);
 +   add_function(__intrinsic_atomic_increment,
 +_atomic_intrinsic(shader_atomic_counters),
 +NULL);
 +   add_function(__intrinsic_atomic_predecrement,
 +_atomic_intrinsic(shader_atomic_counters),
 +NULL);
  }
  
  /**
 @@ -1856,6 +1877,20 @@ builtin_builder::create_builtins()
  _frexp(glsl_type::vec3_type,  glsl_type::ivec3_type),
  _frexp(glsl_type::vec4_type,  glsl_type::ivec4_type),
  NULL);
 +
 +   add_function(atomicCounter,
 +_atomic_op(__intrinsic_atomic_read,
 +   shader_atomic_counters),
 +NULL);
 +   add_function(atomicCounterIncrement,
 +_atomic_op(__intrinsic_atomic_increment,
 +   shader_atomic_counters),
 +NULL);
 +   add_function(atomicCounterDecrement,
 +_atomic_op(__intrinsic_atomic_predecrement,
 +   shader_atomic_counters),
 +NULL);
 +
  #undef F
  #undef FI
  #undef FIU
 @@ -3606,6 +3641,29 @@ builtin_builder::_frexp(const glsl_type *x_type, const 
 glsl_type *exp_type)
  
 return sig;
  }
 +
 +ir_function_signature *
 +builtin_builder::_atomic_intrinsic(builtin_available_predicate avail)
 +{
 +   ir_variable *counter = in_var(glsl_type::atomic_uint_type, counter);
 +   MAKE_INTRINSIC(glsl_type::uint_type, avail, 1, counter);
 +   return sig;
 +}
 +
 +ir_function_signature *
 +builtin_builder::_atomic_op(const char *intrinsic,
 +builtin_available_predicate avail)
 +{
 +   ir_variable *counter = in_var(glsl_type::atomic_uint_type, 
 atomic_counter);
 +   MAKE_SIG(glsl_type::uint_type, avail, 1, counter);
 +
 +   ir_variable *retval = body.make_temp(glsl_type::uint_type, 
 atomic_retval);
 +   body.emit(call(shader-symbols-get_function(intrinsic), retval, 1,
 +  operand(counter)));
 +   body.emit(ret(retval));
 +   return sig;
 +}
 +
  /** @} */
  
  
 /**/
 diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp
 index 6a808c0..49f0f42 100644
 --- a/src/glsl/builtin_variables.cpp
 +++ b/src/glsl/builtin_variables.cpp
 @@ -555,6 +555,21 @@ builtin_variable_generator::generate_constants()
 */
add_const(gl_MaxTextureCoords, state-Const.MaxTextureCoords);
 }
 +
 +   if (state-ARB_shader_atomic_counters_enable) {
 +  add_const(gl_MaxVertexAtomicCounters,
 +state-Const.MaxVertexAtomicCounters);
 +  add_const(gl_MaxGeometryAtomicCounters,
 +state-Const.MaxGeometryAtomicCounters);
 +  add_const(gl_MaxFragmentAtomicCounters,
 +state-Const.MaxFragmentAtomicCounters);
 +  add_const(gl_MaxCombinedAtomicCounters,
 +state-Const.MaxCombinedAtomicCounters);
 +  add_const(gl_MaxAtomicCounterBindings,
 +state-Const.MaxAtomicBufferBindings);
 +  add_const(gl_MaxTessControlAtomicCounters, 0);
 +  add_const(gl_MaxTessEvaluationAtomicCounters, 0);
 +   }
  }
  
  
 diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y
 index 6eaa5f9..2b4e988 100644
 --- a/src/glsl/glcpp/glcpp-parse.y
 +++ b/src/glsl/glcpp/glcpp-parse.y
 @@ -1248,6 +1248,9 @@ glcpp_parser_create (const struct gl_extensions 
 

Re: [Mesa-dev] [PATCH 02/14] glsl: Add type predicate to check whether a type contains any opaque types.

2013-10-25 Thread Ian Romanick
Reviewed-by: Ian Romanick ian.d.roman...@intel.com

On 10/01/2013 07:15 PM, Francisco Jerez wrote:
 And use it to forbid comparisons of opaque operands.  According to the
 GL 4.2 specification:
 
 Except for array indexing, structure member selection, and
 parentheses, opaque variables are not allowed to be operands in
 expressions.
 ---
  src/glsl/ast_to_hir.cpp |  4 
  src/glsl/glsl_types.cpp | 18 ++
  src/glsl/glsl_types.h   |  5 +
  3 files changed, 27 insertions(+)
 
 diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
 index 99159dc..db59d0a 100644
 --- a/src/glsl/ast_to_hir.cpp
 +++ b/src/glsl/ast_to_hir.cpp
 @@ -1197,6 +1197,10 @@ ast_expression::hir(exec_list *instructions,
   !state-check_version(120, 300, loc,
 array comparisons forbidden)) {
error_emitted = true;
 +  } else if ((op[0]-type-contains_opaque() ||
 +  op[1]-type-contains_opaque())) {
 + _mesa_glsl_error(loc, state, opaque type comparisons forbidden);
 + error_emitted = true;
}
  
if (error_emitted) {
 diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
 index e1fe153..a9b7eb3 100644
 --- a/src/glsl/glsl_types.cpp
 +++ b/src/glsl/glsl_types.cpp
 @@ -162,6 +162,24 @@ glsl_type::contains_integer() const
 }
  }
  
 +bool
 +glsl_type::contains_opaque() const {
 +   switch (base_type) {
 +   case GLSL_TYPE_SAMPLER:
 +   case GLSL_TYPE_ATOMIC_UINT:
 +  return true;
 +   case GLSL_TYPE_ARRAY:
 +  return element_type()-contains_opaque();
 +   case GLSL_TYPE_STRUCT:
 +  for (unsigned int i = 0; i  length; i++) {
 + if (fields.structure[i].type-contains_opaque())
 +return true;
 +  }
 +  return false;
 +   default:
 +  return false;
 +   }
 +}
  
  gl_texture_index
  glsl_type::sampler_index() const
 diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h
 index d00b9e7..133b0af 100644
 --- a/src/glsl/glsl_types.h
 +++ b/src/glsl/glsl_types.h
 @@ -463,6 +463,11 @@ struct glsl_type {
 }
  
 /**
 +* Return whether a type contains any opaque types.
 +*/
 +   bool contains_opaque() const;
 +
 +   /**
  * Query the full type of a matrix row
  *
  * \return
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/18] Implement GLX_MESA_query_renderer

2013-10-25 Thread Ian Romanick
On 10/25/2013 03:20 AM, Dave Airlie wrote:

 Do either of you guys plan to implement support for this extension?  The
 value to developers is obviously increased if more drivers support the
 extension.  This extension was born from feedback that I received from
 people at FOSDEM and from various game developers at Game Developer
 Conference and elsewhere.

 I'd like to land this extension, and I haven't received any review.  I
 know you guys are both pretty busy, so I don't expect detailed reviews.
  I would really appreciate a quick skim of the extension spec (patch 15)
 and an Acked-by or two.
 
 Is there a test app or piglit set for this? I might try and fit in
 looking at this,

There is a piglit test... that I just sent to the list about a minute
ago. :) I was also planning to update glxinfo, but I haven't gotten
around to it / I forgot.

 Dave.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa (master): glx: Propagate failures from SendMakeCurrentRequest where possible

2013-10-25 Thread Ian Romanick
On 10/08/2013 10:24 AM, Adam Jackson wrote:
 Module: Mesa
 Branch: master
 Commit: d101204c23ba2f593881ede357309f3924cd
 URL:
 http://cgit.freedesktop.org/mesa/mesa/commit/?id=d101204c23ba2f593881ede357309f3924cd
 
 Author: Adam Jackson a...@redhat.com
 Date:   Fri Oct  4 09:25:51 2013 -0400
 
 glx: Propagate failures from SendMakeCurrentRequest where possible
 
 Reviewed-by: Brian Paul bri...@vmware.com
 Signed-off-by: Adam Jackson a...@redhat.com
 
 ---
 
  src/glx/indirect_glx.c |7 ---
  1 files changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/src/glx/indirect_glx.c b/src/glx/indirect_glx.c
 index d0457fe..d27b019 100644
 --- a/src/glx/indirect_glx.c
 +++ b/src/glx/indirect_glx.c
 @@ -132,6 +132,7 @@ indirect_bind_context(struct glx_context *gc, struct 
 glx_context *old,
 __GLXattribute *state;
 Display *dpy = gc-psc-dpy;
 int opcode = __glXSetupForCommand(dpy);
 +   Bool ret;
  
 if (old != dummyContext  !old-isDirect  old-psc-dpy == dpy) {
tag = old-currentContextTag;
 @@ -140,8 +141,8 @@ indirect_bind_context(struct glx_context *gc, struct 
 glx_context *old,
tag = 0;
 }
  
 -   SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read,
 -  gc-currentContextTag);
 +   ret = SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read,
 +gc-currentContextTag);
  
 if (!IndirectAPI)
IndirectAPI = __glXNewIndirectAPI();
 @@ -154,7 +155,7 @@ indirect_bind_context(struct glx_context *gc, struct 
 glx_context *old,
__glXInitVertexArrayState(gc);
 }
  
 -   return Success;
 +   return ret;

This is completely wrong.  SendMakeCurrentRequest returns the value from
_XReply.  _XReply returns True on success, and False on failure.
However, Success is 0.  So now indirect_bind_context returns 1 (True)
every time it is successful, and the caller interprets that to mean
failure (non-Success).

This is the source of https://bugs.freedesktop.org/show_bug.cgi?id=70486

  }
  
  static void
 
 ___
 mesa-commit mailing list
 mesa-com...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-commit

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50: implement multisample textures

2013-10-25 Thread Emil Velikov
On 21/10/13 23:23, Bryan Cain wrote:
 This is a port of 4da54c91d24da (nvc0: implement multisample textures) to
 nv50.
 
 When coupled with the patch to only report 16 texture samplers (to fix
 crashes), all of the Piglit tests in spec/arb_texture_multisample pass.
 
Hello Bryan,

Big thanks for your work. As promised here is a quick piglit summary on
my nv96

pass/fail/crash
69/32/27

* dmesg does not spit anything nouveau related during the tests
* any geometry shader related tests were skipped
(piglit: info: Failed to create GL 3.2 core context)
* all the crashes are due to the following assert
codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed.

PASSarb_texture_multisample-*
PASSfb-completeness/*
FAILsample-position/*
FAILtexelFetch fs sampler2DMS 4*
CRASH   texelFetch fs sampler2DMSArray 4*
FAILtexelFetch/*-*s-isampler2DMS
CRASH   texelFetch/*-*s-isampler2DMSArray
PASStextureSize/*


Hope you find this useful :)
No real world apps that use multisample textures were tested, yet.

Cheers
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glx: Fix return value from indirect_bind_context

2013-10-25 Thread Adam Jackson
_XReply returns 1 on success, but indirect_bind_context returns 0 on
success.

Signed-off-by: Adam Jackson a...@redhat.com
---
 src/glx/indirect_glx.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/glx/indirect_glx.c b/src/glx/indirect_glx.c
index d27b019..28b8cd0 100644
--- a/src/glx/indirect_glx.c
+++ b/src/glx/indirect_glx.c
@@ -132,7 +132,7 @@ indirect_bind_context(struct glx_context *gc, struct 
glx_context *old,
__GLXattribute *state;
Display *dpy = gc-psc-dpy;
int opcode = __glXSetupForCommand(dpy);
-   Bool ret;
+   Bool sent;
 
if (old != dummyContext  !old-isDirect  old-psc-dpy == dpy) {
   tag = old-currentContextTag;
@@ -141,8 +141,8 @@ indirect_bind_context(struct glx_context *gc, struct 
glx_context *old,
   tag = 0;
}
 
-   ret = SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read,
-gc-currentContextTag);
+   sent = SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read,
+gc-currentContextTag);
 
if (!IndirectAPI)
   IndirectAPI = __glXNewIndirectAPI();
@@ -155,7 +155,7 @@ indirect_bind_context(struct glx_context *gc, struct 
glx_context *old,
   __glXInitVertexArrayState(gc);
}
 
-   return ret;
+   return !sent;
 }
 
 static void
-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] gallium/auxiliary/indices: add start param

2013-10-25 Thread Rob Clark
From: Rob Clark robcl...@freedesktop.org

Add 'start' parameter to generator/translator.

Signed-off-by: Rob Clark robcl...@freedesktop.org
---
 src/gallium/auxiliary/indices/u_indices.c  |  6 --
 src/gallium/auxiliary/indices/u_indices.h  |  4 +++-
 src/gallium/auxiliary/indices/u_indices_gen.py | 21 +++--
 src/gallium/auxiliary/indices/u_unfilled_gen.py| 13 +++--
 src/gallium/auxiliary/indices/u_unfilled_indices.c | 19 ---
 src/gallium/drivers/svga/svga_draw_arrays.c|  3 ++-
 src/gallium/drivers/svga/svga_draw_elements.c  |  1 +
 7 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/src/gallium/auxiliary/indices/u_indices.c 
b/src/gallium/auxiliary/indices/u_indices.c
index 72c46f7..30b54b9 100644
--- a/src/gallium/auxiliary/indices/u_indices.c
+++ b/src/gallium/auxiliary/indices/u_indices.c
@@ -26,17 +26,19 @@
 #include u_indices_priv.h
 
 static void translate_memcpy_ushort( const void *in,
+ unsigned start,
  unsigned nr,
  void *out )
 {
-   memcpy(out, in, nr*sizeof(short));
+   memcpy(out, ((short *)in)[start], nr*sizeof(short));
 }
   
 static void translate_memcpy_uint( const void *in,
+   unsigned start,
unsigned nr,
void *out )
 {
-   memcpy(out, in, nr*sizeof(int));
+   memcpy(out, ((int *)in)[start], nr*sizeof(int));
 }
   
 
diff --git a/src/gallium/auxiliary/indices/u_indices.h 
b/src/gallium/auxiliary/indices/u_indices.h
index be522c6..922bfe6 100644
--- a/src/gallium/auxiliary/indices/u_indices.h
+++ b/src/gallium/auxiliary/indices/u_indices.h
@@ -32,10 +32,12 @@
 #define PV_COUNT  2
 
 typedef void (*u_translate_func)( const void *in,
+  unsigned start,
   unsigned nr,
   void *out );
 
-typedef void (*u_generate_func)( unsigned nr,
+typedef void (*u_generate_func)( unsigned start,
+ unsigned nr,
  void *out );
 
 
diff --git a/src/gallium/auxiliary/indices/u_indices_gen.py 
b/src/gallium/auxiliary/indices/u_indices_gen.py
index af63d09..2714df8 100644
--- a/src/gallium/auxiliary/indices/u_indices_gen.py
+++ b/src/gallium/auxiliary/indices/u_indices_gen.py
@@ -153,6 +153,7 @@ def preamble(intype, outtype, inpv, outpv, prim):
 print 'static void ' + name( intype, outtype, inpv, outpv, prim ) + '('
 if intype != GENERATE:
 print 'const void * _in,'
+print 'unsigned start,'
 print 'unsigned nr,'
 print 'void *_out )'
 print '{'
@@ -168,28 +169,28 @@ def postamble():
 
 def points(intype, outtype, inpv, outpv):
 preamble(intype, outtype, inpv, outpv, prim='points')
-print '  for (i = 0; i  nr; i++) { '
+print '  for (i = start; i  (nr+start); i++) { '
 do_point( intype, outtype, 'out+i',  'i' );
 print '   }'
 postamble()
 
 def lines(intype, outtype, inpv, outpv):
 preamble(intype, outtype, inpv, outpv, prim='lines')
-print '  for (i = 0; i  nr; i+=2) { '
+print '  for (i = start; i  (nr+start); i+=2) { '
 do_line( intype, outtype, 'out+i',  'i', 'i+1', inpv, outpv );
 print '   }'
 postamble()
 
 def linestrip(intype, outtype, inpv, outpv):
 preamble(intype, outtype, inpv, outpv, prim='linestrip')
-print '  for (j = i = 0; j  nr; j+=2, i++) { '
+print '  for (i = start, j = 0; j  nr; j+=2, i++) { '
 do_line( intype, outtype, 'out+j',  'i', 'i+1', inpv, outpv );
 print '   }'
 postamble()
 
 def lineloop(intype, outtype, inpv, outpv):
 preamble(intype, outtype, inpv, outpv, prim='lineloop')
-print '  for (j = i = 0; j  nr - 2; j+=2, i++) { '
+print '  for (i = start, j = 0; j  nr - 2; j+=2, i++) { '
 do_line( intype, outtype, 'out+j',  'i', 'i+1', inpv, outpv );
 print '   }'
 do_line( intype, outtype, 'out+j',  'i', '0', inpv, outpv );
@@ -197,7 +198,7 @@ def lineloop(intype, outtype, inpv, outpv):
 
 def tris(intype, outtype, inpv, outpv):
 preamble(intype, outtype, inpv, outpv, prim='tris')
-print '  for (i = 0; i  nr; i+=3) { '
+print '  for (i = start; i  (nr+start); i+=3) { '
 do_tri( intype, outtype, 'out+i',  'i', 'i+1', 'i+2', inpv, outpv );
 print '   }'
 postamble()
@@ -205,7 +206,7 @@ def tris(intype, outtype, inpv, outpv):
 
 def tristrip(intype, outtype, inpv, outpv):
 preamble(intype, outtype, inpv, outpv, prim='tristrip')
-print '  for (j = i = 0; j  nr; j+=3, i++) { '
+print '  for (i = start, j = 0; j  nr; j+=3, i++) { '
 if inpv == FIRST:
 do_tri( intype, outtype, 'out+j',  'i', 'i+1+(i1)', 'i+2-(i1)', 
inpv, outpv );
 else:
@@ -216,7 +217,7 @@ def tristrip(intype, 

[Mesa-dev] [PATCH 0/3] Add u_primconvert front-end to u_indices

2013-10-25 Thread Rob Clark
From: Rob Clark robcl...@freedesktop.org

This patchset (compared to RFC I sent previously) changes u_primconvert
to just be a front-end to the u_indices stuff.  It handles binding/
restoring new index buffer state, etc.  So driver using it just has
to put this at the top of their pipe-draw_vbo():

  if (prim_needs_emulating) {
 util_primconvert_save_index_buffer(ctx-primconvert, ctx-indexbuf);
 util_primconvert_save_rasterizer_state(ctx-primconvert, ctx-rasterizer);
 util_primconvert_draw_vbo(ctx-primconvert, info);
 return;
  }

It does not yet handle changing provoking vertex (since I didn't need
this), but that looks like it should be easy to add.

I considered first just using u_indices directly from freedreno, like
svga does.  But it is at least more complex than it needs to be and it
seemed like eventually more code could be shared with this approach.

I suspect some of the index buffer caching done in svga could be moved
into u_primconvert and shared between svga and freedreno (and any other
future drivers for GLES hw which might need the same thing).

The last patch converts freedreno over to use this.  It depends on
another patch with regenerated envytools headers (updated to take into
account differences between a20x/a22x/a3xx) for the draw initiator.

Rob Clark (3):
  gallium/auxiliary/indices: add start param
  gallium/auxiliary/indices: add u_primconvert
  freedreno: emulated unsupported primitive types

 src/gallium/auxiliary/Makefile.sources |   1 +
 src/gallium/auxiliary/indices/u_indices.c  |   6 +-
 src/gallium/auxiliary/indices/u_indices.h  |   4 +-
 src/gallium/auxiliary/indices/u_indices_gen.py |  21 +--
 src/gallium/auxiliary/indices/u_primconvert.c  | 171 +
 src/gallium/auxiliary/indices/u_primconvert.h  |  46 ++
 src/gallium/auxiliary/indices/u_unfilled_gen.py|  13 +-
 src/gallium/auxiliary/indices/u_unfilled_indices.c |  19 ++-
 src/gallium/drivers/freedreno/a2xx/fd2_context.c   |  24 ++-
 src/gallium/drivers/freedreno/a3xx/fd3_context.c   |  12 +-
 src/gallium/drivers/freedreno/freedreno_context.c  |  16 +-
 src/gallium/drivers/freedreno/freedreno_context.h  |  18 ++-
 src/gallium/drivers/freedreno/freedreno_draw.c |  29 ++--
 src/gallium/drivers/svga/svga_draw_arrays.c|   3 +-
 src/gallium/drivers/svga/svga_draw_elements.c  |   1 +
 15 files changed, 332 insertions(+), 52 deletions(-)
 create mode 100644 src/gallium/auxiliary/indices/u_primconvert.c
 create mode 100644 src/gallium/auxiliary/indices/u_primconvert.h

-- 
1.8.3.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] gallium/auxiliary/indices: add u_primconvert

2013-10-25 Thread Rob Clark
From: Rob Clark robcl...@freedesktop.org

A convenient front end to indices generate/translate code, for emulating
primitives which are not supported natively by the driver.

This handles saving/restoring index buffer state, etc.

Signed-off-by: Rob Clark robcl...@freedesktop.org
---
 src/gallium/auxiliary/Makefile.sources|   1 +
 src/gallium/auxiliary/indices/u_primconvert.c | 171 ++
 src/gallium/auxiliary/indices/u_primconvert.h |  46 +++
 3 files changed, 218 insertions(+)
 create mode 100644 src/gallium/auxiliary/indices/u_primconvert.c
 create mode 100644 src/gallium/auxiliary/indices/u_primconvert.h

diff --git a/src/gallium/auxiliary/Makefile.sources 
b/src/gallium/auxiliary/Makefile.sources
index acbcef7..c89cbdd 100644
--- a/src/gallium/auxiliary/Makefile.sources
+++ b/src/gallium/auxiliary/Makefile.sources
@@ -43,6 +43,7 @@ C_SOURCES := \
hud/hud_cpu.c \
hud/hud_fps.c \
 hud/hud_driver_query.c \
+   indices/u_primconvert.c \
os/os_misc.c \
os/os_process.c \
os/os_time.c \
diff --git a/src/gallium/auxiliary/indices/u_primconvert.c 
b/src/gallium/auxiliary/indices/u_primconvert.c
new file mode 100644
index 000..f7cf349
--- /dev/null
+++ b/src/gallium/auxiliary/indices/u_primconvert.c
@@ -0,0 +1,171 @@
+/* -*- mode: C; c-file-style: kr; tab-width 4; indent-tabs-mode: t; -*- */
+
+/*
+ * Copyright (C) 2013 Rob Clark robcl...@freedesktop.org
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
THE
+ * SOFTWARE.
+ *
+ * Authors:
+ *Rob Clark robcl...@freedesktop.org
+ */
+
+/**
+ * This module converts provides a more convenient front-end to u_indices,
+ * etc, utils to convert primitive types supported not supported by the
+ * hardware.  It handles binding new index buffer state, and restoring
+ * previous state after.  To use, put something like this at the front of
+ * drivers pipe-draw_vbo():
+ *
+ *// emulate unsupported primitives:
+ *if (info-mode needs emulating) {
+ *   util_primconvert_save_index_buffer(ctx-primconvert, ctx-indexbuf);
+ *   util_primconvert_save_rasterizer_state(ctx-primconvert, 
ctx-rasterizer);
+ *   util_primconvert_draw_vbo(ctx-primconvert, info);
+ *   return;
+ *}
+ *
+ */
+
+#include pipe/p_state.h
+#include util/u_memory.h
+#include util/u_inlines.h
+
+#include indices/u_primconvert.h
+#include indices/u_indices.h
+
+struct primconvert_context {
+   struct pipe_context *pipe;
+   struct pipe_index_buffer saved_ib;
+   uint32_t primtypes_mask;
+   unsigned api_pv;
+   // TODO we could cache/recycle the indexbuf created to translate prims..
+};
+
+
+struct primconvert_context *util_primconvert_create(struct pipe_context *pipe,
+   uint32_t primtypes_mask)
+{
+   struct primconvert_context *pc = CALLOC_STRUCT(primconvert_context);
+   if (!pc)
+   return NULL;
+   pc-pipe = pipe;
+   pc-primtypes_mask = primtypes_mask;
+   return pc;
+}
+
+void util_primconvert_destroy(struct primconvert_context *pc)
+{
+   util_primconvert_save_index_buffer(pc, NULL);
+   free(pc);
+}
+
+void util_primconvert_save_index_buffer(struct primconvert_context *pc,
+   const struct pipe_index_buffer *ib)
+{
+   if (ib) {
+   pipe_resource_reference(pc-saved_ib.buffer, ib-buffer);
+   pc-saved_ib.index_size = ib-index_size;
+   pc-saved_ib.offset = ib-offset;
+   pc-saved_ib.user_buffer = ib-user_buffer;
+   } else {
+   pipe_resource_reference(pc-saved_ib.buffer, NULL);
+   }
+}
+
+void util_primconvert_save_rasterizer_state(struct primconvert_context *pc,
+   const struct pipe_rasterizer_state *rast)
+{
+   /* if we actually translated the provoking vertex for the buffer,
+* we would actually need to save/restore rasterizer state.  As
+* it is, we 

[Mesa-dev] [PATCH 3/3] freedreno: emulated unsupported primitive types

2013-10-25 Thread Rob Clark
From: Rob Clark robcl...@freedesktop.org

Use u_primconvert to convert unsupported primitives into supported
primitive plus index buffer.

Signed-off-by: Rob Clark robcl...@freedesktop.org
---
 src/gallium/drivers/freedreno/a2xx/fd2_context.c  | 24 ++-
 src/gallium/drivers/freedreno/a3xx/fd3_context.c  | 12 +-
 src/gallium/drivers/freedreno/freedreno_context.c | 16 +++--
 src/gallium/drivers/freedreno/freedreno_context.h | 18 +-
 src/gallium/drivers/freedreno/freedreno_draw.c| 29 +++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_context.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_context.c
index a319275..ec9eaf6 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_context.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_context.c
@@ -67,9 +67,29 @@ create_solid_vertexbuf(struct pipe_context *pctx)
return prsc;
 }
 
+static const uint8_t a22x_primtypes[PIPE_PRIM_MAX] = {
+   [PIPE_PRIM_POINTS] = DI_PT_POINTLIST_A2XX,
+   [PIPE_PRIM_LINES]  = DI_PT_LINELIST,
+   [PIPE_PRIM_LINE_STRIP] = DI_PT_LINESTRIP,
+   [PIPE_PRIM_LINE_LOOP]  = DI_PT_LINELOOP,
+   [PIPE_PRIM_TRIANGLES]  = DI_PT_TRILIST,
+   [PIPE_PRIM_TRIANGLE_STRIP] = DI_PT_TRISTRIP,
+   [PIPE_PRIM_TRIANGLE_FAN]   = DI_PT_TRIFAN,
+};
+
+static const uint8_t a20x_primtypes[PIPE_PRIM_MAX] = {
+   [PIPE_PRIM_POINTS] = DI_PT_POINTLIST_A2XX,
+   [PIPE_PRIM_LINES]  = DI_PT_LINELIST,
+   [PIPE_PRIM_LINE_STRIP] = DI_PT_LINESTRIP,
+   [PIPE_PRIM_TRIANGLES]  = DI_PT_TRILIST,
+   [PIPE_PRIM_TRIANGLE_STRIP] = DI_PT_TRISTRIP,
+   [PIPE_PRIM_TRIANGLE_FAN]   = DI_PT_TRIFAN,
+};
+
 struct pipe_context *
 fd2_context_create(struct pipe_screen *pscreen, void *priv)
 {
+   struct fd_screen *screen = fd_screen(pscreen);
struct fd2_context *fd2_ctx = CALLOC_STRUCT(fd2_context);
struct pipe_context *pctx;
 
@@ -88,7 +108,9 @@ fd2_context_create(struct pipe_screen *pscreen, void *priv)
fd2_texture_init(pctx);
fd2_prog_init(pctx);
 
-   pctx = fd_context_init(fd2_ctx-base, pscreen, priv);
+   pctx = fd_context_init(fd2_ctx-base, pscreen,
+   (screen-gpu_id = 220) ? a22x_primtypes : 
a20x_primtypes,
+   priv);
if (!pctx)
return NULL;
 
diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_context.c 
b/src/gallium/drivers/freedreno/a3xx/fd3_context.c
index 589aeed..13f91e9 100644
--- a/src/gallium/drivers/freedreno/a3xx/fd3_context.c
+++ b/src/gallium/drivers/freedreno/a3xx/fd3_context.c
@@ -82,6 +82,16 @@ create_blit_texcoord_vertexbuf(struct pipe_context *pctx)
return prsc;
 }
 
+static const uint8_t primtypes[PIPE_PRIM_MAX] = {
+   [PIPE_PRIM_POINTS] = DI_PT_POINTLIST_A3XX,
+   [PIPE_PRIM_LINES]  = DI_PT_LINELIST,
+   [PIPE_PRIM_LINE_STRIP] = DI_PT_LINESTRIP,
+   [PIPE_PRIM_LINE_LOOP]  = DI_PT_LINELOOP,
+   [PIPE_PRIM_TRIANGLES]  = DI_PT_TRILIST,
+   [PIPE_PRIM_TRIANGLE_STRIP] = DI_PT_TRISTRIP,
+   [PIPE_PRIM_TRIANGLE_FAN]   = DI_PT_TRIFAN,
+};
+
 struct pipe_context *
 fd3_context_create(struct pipe_screen *pscreen, void *priv)
 {
@@ -106,7 +116,7 @@ fd3_context_create(struct pipe_screen *pscreen, void *priv)
fd3_texture_init(pctx);
fd3_prog_init(pctx);
 
-   pctx = fd_context_init(fd3_ctx-base, pscreen, priv);
+   pctx = fd_context_init(fd3_ctx-base, pscreen, primtypes, priv);
if (!pctx)
return NULL;
 
diff --git a/src/gallium/drivers/freedreno/freedreno_context.c 
b/src/gallium/drivers/freedreno/freedreno_context.c
index 96e1ef6..ddb8a0b 100644
--- a/src/gallium/drivers/freedreno/freedreno_context.c
+++ b/src/gallium/drivers/freedreno/freedreno_context.c
@@ -123,6 +123,9 @@ fd_context_destroy(struct pipe_context *pctx)
if (ctx-blitter)
util_blitter_destroy(ctx-blitter);
 
+   if (ctx-primconvert)
+   util_primconvert_destroy(ctx-primconvert);
+
fd_ringmarker_del(ctx-draw_start);
fd_ringmarker_del(ctx-draw_end);
fd_ringbuffer_del(ctx-ring);
@@ -131,8 +134,8 @@ fd_context_destroy(struct pipe_context *pctx)
 }
 
 struct pipe_context *
-fd_context_init(struct fd_context *ctx,
-   struct pipe_screen *pscreen, void *priv)
+fd_context_init(struct fd_context *ctx, struct pipe_screen *pscreen,
+   const uint8_t *primtypes, void *priv)
 {
struct fd_screen *screen = fd_screen(pscreen);
struct pipe_context *pctx;
@@ -140,6 +143,12 @@ fd_context_init(struct fd_context *ctx,
 
ctx-screen = screen;
 
+   ctx-primtypes = primtypes;
+   ctx-primtype_mask = 0;
+   

Re: [Mesa-dev] [PATCH] glx: Fix return value from indirect_bind_context

2013-10-25 Thread Ian Romanick
On 10/25/2013 12:14 PM, Adam Jackson wrote:
 _XReply returns 1 on success, but indirect_bind_context returns 0 on
 success.
 
 Signed-off-by: Adam Jackson a...@redhat.com

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70486
Reviewed-and-tested-by: Ian Romanick ian.d.roman...@intel.com

The other way to fix this would be to make glx_context_vtable::bind
return Bool instead of int.  That would be more work, but it may be a
tiny bit cleaner in the end.  *shrug*

 ---
  src/glx/indirect_glx.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/src/glx/indirect_glx.c b/src/glx/indirect_glx.c
 index d27b019..28b8cd0 100644
 --- a/src/glx/indirect_glx.c
 +++ b/src/glx/indirect_glx.c
 @@ -132,7 +132,7 @@ indirect_bind_context(struct glx_context *gc, struct 
 glx_context *old,
 __GLXattribute *state;
 Display *dpy = gc-psc-dpy;
 int opcode = __glXSetupForCommand(dpy);
 -   Bool ret;
 +   Bool sent;
  
 if (old != dummyContext  !old-isDirect  old-psc-dpy == dpy) {
tag = old-currentContextTag;
 @@ -141,8 +141,8 @@ indirect_bind_context(struct glx_context *gc, struct 
 glx_context *old,
tag = 0;
 }
  
 -   ret = SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read,
 -gc-currentContextTag);
 +   sent = SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read,
 +  gc-currentContextTag);
  
 if (!IndirectAPI)
IndirectAPI = __glXNewIndirectAPI();
 @@ -155,7 +155,7 @@ indirect_bind_context(struct glx_context *gc, struct 
 glx_context *old,
__glXInitVertexArrayState(gc);
 }
  
 -   return ret;
 +   return !sent;
  }
  
  static void
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.

2013-10-25 Thread Kenneth Graunke
On 10/21/2013 05:55 PM, Eric Anholt wrote:
[snip]
 This interface means synchronizing with the GPU, which sucks when we
 have the ability to actually do DTFB in the hardware pipeline (Indirect
 Parameter Enable of 3DPRIMITIVE).

It's not that simple.

The 3DPRIMITIVE indirect registers require you to specify a vertex count
(which should be the number of vertices actually written to the SO
buffer, which may be less than you asked for due to overflow).

As far as I can tell, the Gen7 SOL stage has no mechanism to give you
the number of vertices written to the SOL buffer.  There is
SO_NUM_PRIMS_WRITTEN(0-3), which gives you the number of primitives
actually written.

For POINTS, this works since each primitive is a single vertex.  But for
LINES and TRIANGLES, you need to multiply this count by 2 or 3 vertices
per primitive.

Haswell has an MI_MATH command which might be usable for this.  But on
Ivybridge, I don't know how to do this other than writing a shader
program that reads from the buffer, does the multiplication, and writes
it back out (and draw a single point).  Then MI_LOAD_REGISTER_MEM it
into the indirect vertex count register.  That might work, but is it better?

The other complexity is PauseTransformFeedback and switching.  The
vertex count is the # of verts actually written between Begin/End on a
single object.  If you have two objects, you might do:

Begin A, draw, Pause A, Begin B, draw, End B, Resume A, draw, End A.

But there is only one SO_NUM_PRIMS_WRITTEN register, which is intended
to be free running.  If you leave it free running, you need to take
snapshots at Begin/End/Pause/Resume and subtract deltas to get the
actual number of primitives written...then do the multiplication above.

We could violate the free running assumption and set
SO_NUM_PRIMS_WRITTEN to 0 on Begin, and save/restore it on Pause/Resume.
 Then the value at End would be the final value, and we wouldn't have to
deal with deltas, which would be simpler.  I'm open to trying that if
people would prefer it.

Maybe I am fundamentally missing something here, but it seems far from
obvious to me how to use draw indirect to do this properly.  On
Ivybridge doing it on the GPU sounds very complex and heavyweight.
Haswell could probably do it if we adopt the save/restore approach.

 We could mostly use the hw pipelined
 version only, as long as we had core contexts (meaning that we don't
 need vertex start/count to figure out how much user vertex array data to
 upload).

Right, so we'd need this for that case, anyway.

 But, given that we have sw primitive restart on some lame hardware that
 we want to support this on, we've got to have this path anyway.

Where by lame hardware you mean Ivybridge.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.

2013-10-25 Thread Kenneth Graunke
On 10/22/2013 04:30 AM, Marek Olšák wrote:
 On Fri, Oct 18, 2013 at 8:09 AM, Kenneth Graunke kenn...@whitecape.org 
 wrote:
 DrawTransformFeedback() needs to obtain the number of vertices written
 to a particular stream during the last Begin/EndTransformFeedback block.
 The new driver hook returns exactly that information.

 Gallium drivers already implement this functionality by passing the
 transform feedback object to the drawing function.  I prefer to avoid
 this for two reasons:

 1. Complexity:

 Normally, the drawing function takes an array of _mesa_prim objects,
 each of which specifies a vertex count.  If tfb_vertcount != NULL,
 however, there will only be one _mesa_prim object with an invalid
 vertex count (of 1), so it needs to be ignored.

 Since the _mesa_prim pointers are const, you can't even override it to
 the proper value; you need to pass around extra ignore that, here's
 the real count parameters.

 The drawing function is already terribly complicated, so I don't want to
 make it even more complicated.
 
 I don't understand this. Are you saying that the software emulation of
 the feature is always better because of complexity the real
 hardware-accelerated solution would have?

On Ivybridge hardware, I think that a GPU-only implementation of
DrawTransformFeedback would be very complicated, and probably less
efficient than this (extremely simple) software solution.  It might be
possible to do a reasonable GPU-only implementation on Haswell, but I
haven't looked into the details yet.  (See my reply to Eric.)

At least for Ivybridge, I think I want this software path 100% of the
time.  We may want to remove the stall on Haswell as a later optimization.

It sounds like for Gallium, you already have a decent GPU-only solution.
 I tried to follow that code to understand how it works, and got lost
after jumping through around 5 files...which is probably just my poor
understanding of the Gallium architecture.

[snip]
 diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
 index 1670409..11bb76a 100644
 --- a/src/mesa/vbo/vbo_exec_array.c
 +++ b/src/mesa/vbo/vbo_exec_array.c
 @@ -1464,6 +1464,12 @@ vbo_draw_transform_feedback(struct gl_context *ctx, 
 GLenum mode,
return;
 }

 +   if (ctx-Driver.GetTransformFeedbackVertexCount) {
 +  GLsizei n = ctx-Driver.GetTransformFeedbackVertexCount(ctx, obj, 
 stream);
 +  vbo_draw_arrays(ctx, mode, 0, n, numInstances, 0);
 +  return;
 +   }
 
 As you mentioned, the only issue is with primitive restart, so why is
 this done even if primitive restart is disabled? Drivers which will
 have to implement this just to make e.g. non-VBO vertex uploads work
 will suffer from the CPU-GPU synchronization this code forces.
 
 Marek

I hadn't thought about non-VBO vertex uploads.  What does Gallium do in
that case?  Has it just been broken this whole time?

I guess I figured drivers would either implement this hook, or do the
tfb_vertcount approach, but not both.  Maybe that's a bad assumption.

--Ken
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: fix bogus layer clamping in setup

2013-10-25 Thread Brian Paul

On 10/25/2013 10:14 AM, srol...@vmware.com wrote:

From: Roland Scheidegger srol...@vmware.com

The layer coming from GS needs to be clamped (not sure if that's actually
the correct error behavior but we need something) as the number can be higher
than the amount of layers in the fb. However, this code was using the layer
calculation from the scene, and this was actually calculated in
lp_scene_begin_rasterization() hence too late (so setup was using the value
from the _previous_ scene or just zero if it was the first scene).
Since the value is used in both rasterization and setup, move calculation up
to lp_scene_begin_binning() though it's a bit more inconvenient to calculate
there. (Theoretically could move _all_ code which was in
lp_scene_begin_rasterization() to there, because ever since we got rid of
swizzled render/depth buffers our map functions preparing the fb data for
render don't actually change the data in there at all, but it feels like
it would be a hack.)
---
  src/gallium/drivers/llvmpipe/lp_scene.c |   25 ++---
  1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_scene.c 
b/src/gallium/drivers/llvmpipe/lp_scene.c
index 2abbd25..483bfa5 100644
--- a/src/gallium/drivers/llvmpipe/lp_scene.c
+++ b/src/gallium/drivers/llvmpipe/lp_scene.c
@@ -151,7 +151,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
  {
 const struct pipe_framebuffer_state *fb = scene-fb;
 int i;
-   unsigned max_layer = ~0;

 //LP_DBG(DEBUG_RAST, %s\n, __FUNCTION__);

@@ -162,7 +161,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
 cbuf-u.tex.level);
   scene-cbufs[i].layer_stride = llvmpipe_layer_stride(cbuf-texture,

cbuf-u.tex.level);
- max_layer = MIN2(max_layer, cbuf-u.tex.last_layer - 
cbuf-u.tex.first_layer);

   scene-cbufs[i].map = llvmpipe_resource_map(cbuf-texture,
   cbuf-u.tex.level,
@@ -173,7 +171,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
   struct llvmpipe_resource *lpr = llvmpipe_resource(cbuf-texture);
   unsigned pixstride = util_format_get_blocksize(cbuf-format);
   scene-cbufs[i].stride = cbuf-texture-width0;
- max_layer = 0;

   scene-cbufs[i].map = lpr-data;
   scene-cbufs[i].map += cbuf-u.buf.first_element * pixstride;
@@ -184,15 +181,12 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
struct pipe_surface *zsbuf = scene-fb.zsbuf;
scene-zsbuf.stride = llvmpipe_resource_stride(zsbuf-texture, 
zsbuf-u.tex.level);
scene-zsbuf.layer_stride = llvmpipe_layer_stride(zsbuf-texture, 
zsbuf-u.tex.level);
-  max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer - 
zsbuf-u.tex.first_layer);

scene-zsbuf.map = llvmpipe_resource_map(zsbuf-texture,
 zsbuf-u.tex.level,
 zsbuf-u.tex.first_layer,
 LP_TEX_USAGE_READ_WRITE);
 }
-
-   scene-fb_max_layer = max_layer;
  }


@@ -506,6 +500,9 @@ end:
  void lp_scene_begin_binning( struct lp_scene *scene,
   struct pipe_framebuffer_state *fb, boolean 
discard )
  {
+   int i;
+   unsigned max_layer = ~0;
+
 assert(lp_scene_is_empty(scene));

 scene-discard = discard;
@@ -513,9 +510,23 @@ void lp_scene_begin_binning( struct lp_scene *scene,

 scene-tiles_x = align(fb-width, TILE_SIZE) / TILE_SIZE;
 scene-tiles_y = align(fb-height, TILE_SIZE) / TILE_SIZE;
-
 assert(scene-tiles_x = TILES_X);
 assert(scene-tiles_y = TILES_Y);
+


Maybe add a comment here indicating what we're doing.


+   for (i = 0; i  scene-fb.nr_cbufs; i++) {
+  struct pipe_surface *cbuf = scene-fb.cbufs[i];
+  if (llvmpipe_resource_is_texture(cbuf-texture)) {
+ max_layer = MIN2(max_layer, cbuf-u.tex.last_layer - 
cbuf-u.tex.first_layer);
+  }
+  else {
+ max_layer = 0;
+  }
+   }
+   if (fb-zsbuf) {
+  struct pipe_surface *zsbuf = scene-fb.zsbuf;
+  max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer - 
zsbuf-u.tex.first_layer);
+   }
+   scene-fb_max_layer = max_layer;


Suppose we have a layered color buffer and layered Z/S buffer, but the 
number of layers differs.  Are you sure we shouldn't be using separate 
max_layers for color vs. Z/S?


For the time being though I'm fine with the code as-is though.

Reviewed-by: Brian Paul bri...@vmware.com


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glx: Fix return value from indirect_bind_context

2013-10-25 Thread Adam Jackson
On Fri, 2013-10-25 at 12:59 -0700, Ian Romanick wrote:
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70486
 Reviewed-and-tested-by: Ian Romanick ian.d.roman...@intel.com

Pushed, thanks and sorry.

- ajax

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Drop no-op shifts by 0.

2013-10-25 Thread Eric Anholt
Erik Faye-Lund kusmab...@gmail.com writes:

 Why is this tagged as i965/fs, when everything seems to happen in the
 glsl-optimizer?

 On Thu, Oct 24, 2013 at 5:53 PM, Eric Anholt e...@anholt.net wrote:
 I noticed this in a shader in Unigine Heaven that was spilling.  While it
 doesn't really reduce register pressure, it shaves a few instructions
 anyway (7955 - 7882).
 ---
  src/glsl/opt_algebraic.cpp | 8 
  1 file changed, 8 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 37b2f02..ff06cfc 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -387,6 +387,14 @@ ir_algebraic_visitor::handle_expression(ir_expression 
 *ir)
}
break;

 +   case ir_binop_rshift:
 +   case ir_binop_lshift:
 +  if (is_vec_zero(op_const[0]))
 + return ir-operands[1];
 +  else if (is_vec_zero(op_const[1]))
 + return ir-operands[0];
 +  break;
 +

 Maybe update progress inside the conditionals also?

 But wait a minute. x shifted by 0 is x, so the latter part looks
 correct. But the first conditional seems to assume that 0 sifted by x
 is x, but it's really 0, no? Shouldn't both cases return
 ir-operands[0]? What am I missing?

You're not missing anything -- it was just copy-and-paste fail.

New patch series incoming that should fix that, plus should reduce other
copy and paste fail in this code.


pgpzBfLcZxNDT.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: fix bogus layer clamping in setup

2013-10-25 Thread Roland Scheidegger
Am 25.10.2013 22:33, schrieb Brian Paul:
 On 10/25/2013 10:14 AM, srol...@vmware.com wrote:
 From: Roland Scheidegger srol...@vmware.com

 The layer coming from GS needs to be clamped (not sure if that's actually
 the correct error behavior but we need something) as the number can be
 higher
 than the amount of layers in the fb. However, this code was using the
 layer
 calculation from the scene, and this was actually calculated in
 lp_scene_begin_rasterization() hence too late (so setup was using the
 value
 from the _previous_ scene or just zero if it was the first scene).
 Since the value is used in both rasterization and setup, move
 calculation up
 to lp_scene_begin_binning() though it's a bit more inconvenient to
 calculate
 there. (Theoretically could move _all_ code which was in
 lp_scene_begin_rasterization() to there, because ever since we got rid of
 swizzled render/depth buffers our map functions preparing the fb
 data for
 render don't actually change the data in there at all, but it feels like
 it would be a hack.)
 ---
   src/gallium/drivers/llvmpipe/lp_scene.c |   25
 ++---
   1 file changed, 18 insertions(+), 7 deletions(-)

 diff --git a/src/gallium/drivers/llvmpipe/lp_scene.c
 b/src/gallium/drivers/llvmpipe/lp_scene.c
 index 2abbd25..483bfa5 100644
 --- a/src/gallium/drivers/llvmpipe/lp_scene.c
 +++ b/src/gallium/drivers/llvmpipe/lp_scene.c
 @@ -151,7 +151,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
   {
  const struct pipe_framebuffer_state *fb = scene-fb;
  int i;
 -   unsigned max_layer = ~0;

  //LP_DBG(DEBUG_RAST, %s\n, __FUNCTION__);

 @@ -162,7 +161,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
 
 cbuf-u.tex.level);
scene-cbufs[i].layer_stride =
 llvmpipe_layer_stride(cbuf-texture,

 cbuf-u.tex.level);
 - max_layer = MIN2(max_layer, cbuf-u.tex.last_layer -
 cbuf-u.tex.first_layer);

scene-cbufs[i].map = llvmpipe_resource_map(cbuf-texture,
cbuf-u.tex.level,
 @@ -173,7 +171,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene)
struct llvmpipe_resource *lpr =
 llvmpipe_resource(cbuf-texture);
unsigned pixstride = util_format_get_blocksize(cbuf-format);
scene-cbufs[i].stride = cbuf-texture-width0;
 - max_layer = 0;

scene-cbufs[i].map = lpr-data;
scene-cbufs[i].map += cbuf-u.buf.first_element * pixstride;
 @@ -184,15 +181,12 @@ lp_scene_begin_rasterization(struct lp_scene
 *scene)
 struct pipe_surface *zsbuf = scene-fb.zsbuf;
 scene-zsbuf.stride = llvmpipe_resource_stride(zsbuf-texture,
 zsbuf-u.tex.level);
 scene-zsbuf.layer_stride =
 llvmpipe_layer_stride(zsbuf-texture, zsbuf-u.tex.level);
 -  max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer -
 zsbuf-u.tex.first_layer);

 scene-zsbuf.map = llvmpipe_resource_map(zsbuf-texture,
  zsbuf-u.tex.level,
 
 zsbuf-u.tex.first_layer,
 
 LP_TEX_USAGE_READ_WRITE);
  }
 -
 -   scene-fb_max_layer = max_layer;
   }


 @@ -506,6 +500,9 @@ end:
   void lp_scene_begin_binning( struct lp_scene *scene,
struct pipe_framebuffer_state *fb,
 boolean discard )
   {
 +   int i;
 +   unsigned max_layer = ~0;
 +
  assert(lp_scene_is_empty(scene));

  scene-discard = discard;
 @@ -513,9 +510,23 @@ void lp_scene_begin_binning( struct lp_scene *scene,

  scene-tiles_x = align(fb-width, TILE_SIZE) / TILE_SIZE;
  scene-tiles_y = align(fb-height, TILE_SIZE) / TILE_SIZE;
 -
  assert(scene-tiles_x = TILES_X);
  assert(scene-tiles_y = TILES_Y);
 +
 
 Maybe add a comment here indicating what we're doing.
Ok.

 
 +   for (i = 0; i  scene-fb.nr_cbufs; i++) {
 +  struct pipe_surface *cbuf = scene-fb.cbufs[i];
 +  if (llvmpipe_resource_is_texture(cbuf-texture)) {
 + max_layer = MIN2(max_layer, cbuf-u.tex.last_layer -
 cbuf-u.tex.first_layer);
 +  }
 +  else {
 + max_layer = 0;
 +  }
 +   }
 +   if (fb-zsbuf) {
 +  struct pipe_surface *zsbuf = scene-fb.zsbuf;
 +  max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer -
 zsbuf-u.tex.first_layer);
 +   }
 +   scene-fb_max_layer = max_layer;
 
 Suppose we have a layered color buffer and layered Z/S buffer, but the
 number of layers differs.  Are you sure we shouldn't be using separate
 max_layers for color vs. Z/S?
I believe this should be fine (such a setup is illegal in d3d10 fwiw).
I've put
a comment already at some point to the fb_max_layer variable in
lp_scene.h which reads:
/* OpenGL permits different amount of layers per rt, but rendering
limited to minimum */
It is legal in OpenGL not only to have different 

Re: [Mesa-dev] [PATCH] nv50: implement multisample textures

2013-10-25 Thread Christoph Bumiller
On 25.10.2013 20:35, Emil Velikov wrote:
 On 21/10/13 23:23, Bryan Cain wrote:
 This is a port of 4da54c91d24da (nvc0: implement multisample textures) to
 nv50.

 When coupled with the patch to only report 16 texture samplers (to fix
 crashes), all of the Piglit tests in spec/arb_texture_multisample pass.

 Hello Bryan,

 Big thanks for your work. As promised here is a quick piglit summary on
 my nv96

 pass/fail/crash
 69/32/27

 * dmesg does not spit anything nouveau related during the tests
 * any geometry shader related tests were skipped
 (piglit: info: Failed to create GL 3.2 core context)
 * all the crashes are due to the following assert
 codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed.

I'm not sure how you'd get  4 arguments there (x y layer sample ?).
There's no mip maps for multisample textures.

But either way you're probably going to have to do things by hand:
E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each
pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle
and then add the correct offsets for the sample id as seen in
get_sample_position (store the info in a constant buffer, that has to be
updated when texture changes).

You might want to use a lookup table like in nve4 compute (look for MS
sample coordinate offsets) to map sample id to coordinate offset, that
one works for any sample count as long as you don't use the ALT modes
(nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs
where the whole VM address calculation is done by hand).

 PASSarb_texture_multisample-*
 PASSfb-completeness/*
 FAILsample-position/*
 FAILtexelFetch fs sampler2DMS 4*
 CRASH   texelFetch fs sampler2DMSArray 4*
 FAILtexelFetch/*-*s-isampler2DMS
 CRASH   texelFetch/*-*s-isampler2DMSArray
 PASStextureSize/*


 Hope you find this useful :)
 No real world apps that use multisample textures were tested, yet.

 Cheers
 Emil
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallium: add PIPE_CAP_MIXED_FRAMEBUFFER_SIZES

2013-10-25 Thread Ilia Mirkin
Thanks, Marek. Could someone with commit access pick this up? Let me
know if you'd like me to reformat/resend/create a git tree/whatever.

  -ilia

On Sun, Oct 13, 2013 at 9:16 AM, Marek Olšák mar...@gmail.com wrote:
 For the series:

 Reviewed-by: Marek Olšák marek.ol...@amd.com

 Marek

 On Sun, Oct 13, 2013 at 3:43 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 ping

 On Fri, Oct 4, 2013 at 4:32 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 This CAP will determine whether ARB_framebuffer_object can be enabled.
 The nv30 driver does not allow mixing swizzled and linear zsbuf/cbuf
 textures.

 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  src/gallium/docs/source/screen.rst   | 3 +++
  src/gallium/drivers/freedreno/freedreno_screen.c | 1 +
  src/gallium/drivers/i915/i915_screen.c   | 1 +
  src/gallium/drivers/ilo/ilo_screen.c | 1 +
  src/gallium/drivers/llvmpipe/lp_screen.c | 1 +
  src/gallium/drivers/nouveau/nv30/nv30_screen.c   | 1 +
  src/gallium/drivers/nouveau/nv50/nv50_screen.c   | 1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   | 1 +
  src/gallium/drivers/r300/r300_screen.c   | 1 +
  src/gallium/drivers/r600/r600_pipe.c | 1 +
  src/gallium/drivers/radeonsi/radeonsi_pipe.c | 1 +
  src/gallium/drivers/softpipe/sp_screen.c | 1 +
  src/gallium/drivers/svga/svga_screen.c   | 1 +
  src/gallium/include/pipe/p_defines.h | 3 ++-
  14 files changed, 17 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/docs/source/screen.rst 
 b/src/gallium/docs/source/screen.rst
 index d19cd1a..a01f548 100644
 --- a/src/gallium/docs/source/screen.rst
 +++ b/src/gallium/docs/source/screen.rst
 @@ -173,6 +173,9 @@ The integer capabilities:
viewport/scissor combination.
  * ''PIPE_CAP_ENDIANNESS``:: The endianness of the device.  Either
PIPE_ENDIAN_BIG or PIPE_ENDIAN_LITTLE.
 +* ``PIPE_CAP_MIXED_FRAMEBUFFER_SIZES``: Whether it is allowed to have
 +  different sizes for fb color/zs attachments. This controls whether
 +  ARB_framebuffer_object is provided.


  .. _pipe_capf:
 diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
 b/src/gallium/drivers/freedreno/freedreno_screen.c
 index a038a77..7d0fb3b 100644
 --- a/src/gallium/drivers/freedreno/freedreno_screen.c
 +++ b/src/gallium/drivers/freedreno/freedreno_screen.c
 @@ -140,6 +140,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 switch (param) {
 /* Supported features (boolean caps). */
 case PIPE_CAP_NPOT_TEXTURES:
 +   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
 case PIPE_CAP_TWO_SIDED_STENCIL:
 case PIPE_CAP_ANISOTROPIC_FILTER:
 case PIPE_CAP_POINT_SPRITE:
 diff --git a/src/gallium/drivers/i915/i915_screen.c 
 b/src/gallium/drivers/i915/i915_screen.c
 index 556dda8..77607d0 100644
 --- a/src/gallium/drivers/i915/i915_screen.c
 +++ b/src/gallium/drivers/i915/i915_screen.c
 @@ -172,6 +172,7 @@ i915_get_param(struct pipe_screen *screen, enum 
 pipe_cap cap)
 /* Supported features (boolean caps). */
 case PIPE_CAP_ANISOTROPIC_FILTER:
 case PIPE_CAP_NPOT_TEXTURES:
 +   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
 case PIPE_CAP_POINT_SPRITE:
 case PIPE_CAP_PRIMITIVE_RESTART: /* draw module */
 case PIPE_CAP_TEXTURE_SHADOW_MAP:
 diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
 b/src/gallium/drivers/ilo/ilo_screen.c
 index 3f8d431..ddf11ff 100644
 --- a/src/gallium/drivers/ilo/ilo_screen.c
 +++ b/src/gallium/drivers/ilo/ilo_screen.c
 @@ -286,6 +286,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap 
 param)

 switch (param) {
 case PIPE_CAP_NPOT_TEXTURES:
 +   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
 case PIPE_CAP_TWO_SIDED_STENCIL:
return true;
 case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:
 diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
 b/src/gallium/drivers/llvmpipe/lp_screen.c
 index b3cd77f..2bbc2c9 100644
 --- a/src/gallium/drivers/llvmpipe/lp_screen.c
 +++ b/src/gallium/drivers/llvmpipe/lp_screen.c
 @@ -109,6 +109,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
 pipe_cap param)
 case PIPE_CAP_MAX_COMBINED_SAMPLERS:
return 2 * PIPE_MAX_SAMPLERS;  /* VS + FS samplers */
 case PIPE_CAP_NPOT_TEXTURES:
 +   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
return 1;
 case PIPE_CAP_TWO_SIDED_STENCIL:
return 1;
 diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
 b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
 index 50ddfec..807100e 100644
 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
 +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
 @@ -125,6 +125,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
 case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK:
 case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
 +   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
return 0;
 case 

Re: [Mesa-dev] [PATCH] nv50: implement multisample textures

2013-10-25 Thread Bryan Cain
On 10/25/2013 01:35 PM, Emil Velikov wrote:
 On 21/10/13 23:23, Bryan Cain wrote:
 This is a port of 4da54c91d24da (nvc0: implement multisample textures) to
 nv50.

 When coupled with the patch to only report 16 texture samplers (to fix
 crashes), all of the Piglit tests in spec/arb_texture_multisample pass.

 Hello Bryan,

 Big thanks for your work. As promised here is a quick piglit summary on
 my nv96

 pass/fail/crash
 69/32/27

 * dmesg does not spit anything nouveau related during the tests
 * any geometry shader related tests were skipped
 (piglit: info: Failed to create GL 3.2 core context)
 * all the crashes are due to the following assert
 codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed.

 PASSarb_texture_multisample-*
 PASSfb-completeness/*
 FAILsample-position/*
 FAILtexelFetch fs sampler2DMS 4*
 CRASH   texelFetch fs sampler2DMSArray 4*
 FAILtexelFetch/*-*s-isampler2DMS
 CRASH   texelFetch/*-*s-isampler2DMSArray
 PASStextureSize/*


 Hope you find this useful :)
 No real world apps that use multisample textures were tested, yet.

 Cheers
 Emil

Hi Emil,

Thanks for testing on nv96.  It seems, though, that I messed up my
piglit-run command and didn't include all of the relevant tests as a
result.  Now that I've fixed that, I'm seeing the same failures and
crashes on my nva5.

It seems that multisampling is broken with texelFetch (both the
texelFetch and sample-position tests use it) but works otherwise, unless
it turns out not to produce the right results in real world applications
for pre-nva3 cards.

I'm going to take some time this weekend to see what's going on with
multisampling and texelFetch.

Thanks again,
Bryan

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.

2013-10-25 Thread Eric Anholt
Kenneth Graunke kenn...@whitecape.org writes:

 On 10/21/2013 05:55 PM, Eric Anholt wrote:
 [snip]
 This interface means synchronizing with the GPU, which sucks when we
 have the ability to actually do DTFB in the hardware pipeline (Indirect
 Parameter Enable of 3DPRIMITIVE).

 It's not that simple.

 The 3DPRIMITIVE indirect registers require you to specify a vertex count
 (which should be the number of vertices actually written to the SO
 buffer, which may be less than you asked for due to overflow).

 As far as I can tell, the Gen7 SOL stage has no mechanism to give you
 the number of vertices written to the SOL buffer.  There is
 SO_NUM_PRIMS_WRITTEN(0-3), which gives you the number of primitives
 actually written.

*headdesk*

OK, so it looks like our hardware is just really not cut out for this
job, and the SW fallback's the way to go.


pgpeCg3NO3nk2.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] i965: Make fs gl_PrimitiveID input work even when there's no gs.

2013-10-25 Thread Eric Anholt
Paul Berry stereotype...@gmail.com writes:

 When a geometry shader is present, the fragment shader gl_PrimitiveID
 input acts like an ordinary varying, receiving data from the gs
 gl_PrimitiveID output.  When there's no geometry shader, we have to
 ask the fixed function SF hardware to provide the primitive ID to the
 fragment shader instead.

 Previously, the SF setup code would handle this situation by
 recognizing that the FS gl_PrimitiveID input didn't match to any VS
 output; since normally an FS input with no corresponding VS output
 leads to undefined data, the SF setup code used to just arbitrarily
 assign it to receive data from attribute 0.

 This patch changes the SF setup code so that instead of arbitrarily
 using attribute 0, it assigns the unmatched FS input to receive
 gl_PrimitiveID.  In the case where the FS input really is
 gl_PrimitiveID, this produces the intended result.  In all other
 cases, no harm is done since GL specifies that the behaviour is
 undefined.

 Fixes piglit test primitive-id-no-gs.

 Reviewed-by: Eric Anholt e...@anholt.net

Looks good still.


pgpAjFd63YBBs.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] i965/fs: Drop no-op shifts involving 0.

2013-10-25 Thread Eric Anholt
I noticed this in a shader in Unigine Heaven that was spilling.  While it
doesn't really reduce register pressure, it shaves a few instructions
anyway (7955 - 7882).

v2: Fix turning 0  x into x instead of 0 (caught by Erik
Faye-Lund).
---
 src/glsl/opt_algebraic.cpp | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
index 2e33dfe..a07e153 100644
--- a/src/glsl/opt_algebraic.cpp
+++ b/src/glsl/opt_algebraic.cpp
@@ -346,6 +346,16 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
   }
   break;
 
+   case ir_binop_rshift:
+   case ir_binop_lshift:
+  /* 0  x == 0 */
+  if (is_vec_zero(op_const[0]))
+ return ir-operands[0];
+  /* x  0 == x */
+  if (is_vec_zero(op_const[1]))
+ return ir-operands[0];
+  break;
+
case ir_binop_logic_and:
   /* FINISHME: Also simplify (a  a) to (a). */
   if (is_vec_one(op_const[0])) {
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] glsl: Use ir_builder more in opt_algebraic.

2013-10-25 Thread Eric Anholt
While ir_builder is slightly less efficient, we're only increasing the
work when there's actual optimization being done, and it's way more
readable code.
---
 src/glsl/opt_algebraic.cpp | 40 ++--
 1 file changed, 10 insertions(+), 30 deletions(-)

diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
index 8d02cad..2e33dfe 100644
--- a/src/glsl/opt_algebraic.cpp
+++ b/src/glsl/opt_algebraic.cpp
@@ -219,10 +219,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
   switch (op_expr[0]-operation) {
   case ir_unop_abs:
   case ir_unop_neg:
- return new(mem_ctx) ir_expression(ir_unop_abs,
-   ir-type,
-   op_expr[0]-operands[0],
-   NULL);
+ return abs(op_expr[0]-operands[0]);
   default:
  break;
   }
@@ -285,12 +282,8 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
   break;
 
case ir_binop_sub:
-  if (is_vec_zero(op_const[0])) {
-return new(mem_ctx) ir_expression(ir_unop_neg,
-  ir-operands[1]-type,
-  ir-operands[1],
-  NULL);
-  }
+  if (is_vec_zero(op_const[0]))
+return neg(ir-operands[1]);
   if (is_vec_zero(op_const[1]))
 return ir-operands[0];
   break;
@@ -304,18 +297,10 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
   if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1]))
 return ir_constant::zero(ir, ir-type);
 
-  if (is_vec_negative_one(op_const[0])) {
- return new(mem_ctx) ir_expression(ir_unop_neg,
-   ir-operands[1]-type,
-   ir-operands[1],
-   NULL);
-  }
-  if (is_vec_negative_one(op_const[1])) {
- return new(mem_ctx) ir_expression(ir_unop_neg,
-   ir-operands[0]-type,
-   ir-operands[0],
-   NULL);
-  }
+  if (is_vec_negative_one(op_const[0]))
+ return neg(ir-operands[1]);
+  if (is_vec_negative_one(op_const[1]))
+ return neg(ir-operands[0]);
 
 
   /* Reassociate multiplication of constants so that we can do
@@ -386,11 +371,9 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
   } else if (is_vec_zero(op_const[1])) {
 return ir-operands[0];
   } else if (is_vec_one(op_const[0])) {
-return new(mem_ctx) ir_expression(ir_unop_logic_not, ir-type,
-  ir-operands[1], NULL);
+return logic_not(ir-operands[1]);
   } else if (is_vec_one(op_const[1])) {
-return new(mem_ctx) ir_expression(ir_unop_logic_not, ir-type,
-  ir-operands[0], NULL);
+return logic_not(ir-operands[0]);
   }
   break;
 
@@ -428,10 +411,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
 
   /* As far as we know, all backends are OK with rsq. */
   if (op_expr[0]  op_expr[0]-operation == ir_unop_sqrt) {
-return new(mem_ctx) ir_expression(ir_unop_rsq,
-  op_expr[0]-operands[0]-type,
-  op_expr[0]-operands[0],
-  NULL);
+return rsq(op_expr[0]-operands[0]);
   }
 
   break;
-- 
1.8.4.rc3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] glsl: Move common code out of opt_algebraic's handle_expression().

2013-10-25 Thread Eric Anholt
Matt and I had each screwed up these common required patterns recently, in
ways that wouldn't have been noticed for a long time if not for code
review.  Just enforce it in the caller so that we don't rely on code
review catching these bugs.
---
 src/glsl/opt_algebraic.cpp | 117 +++--
 1 file changed, 39 insertions(+), 78 deletions(-)

diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
index 1351904..8d02cad 100644
--- a/src/glsl/opt_algebraic.cpp
+++ b/src/glsl/opt_algebraic.cpp
@@ -197,7 +197,6 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
 {
ir_constant *op_const[4] = {NULL, NULL, NULL, NULL};
ir_expression *op_expr[4] = {NULL, NULL, NULL, NULL};
-   ir_expression *temp;
unsigned int i;
 
assert(ir-get_num_operands() = 4);
@@ -220,12 +219,10 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
   switch (op_expr[0]-operation) {
   case ir_unop_abs:
   case ir_unop_neg:
- this-progress = true;
- temp = new(mem_ctx) ir_expression(ir_unop_abs,
+ return new(mem_ctx) ir_expression(ir_unop_abs,
ir-type,
op_expr[0]-operands[0],
NULL);
- return swizzle_if_required(ir, temp);
   default:
  break;
   }
@@ -236,8 +233,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
 break;
 
   if (op_expr[0]-operation == ir_unop_neg) {
- this-progress = true;
- return swizzle_if_required(ir, op_expr[0]-operands[0]);
+ return op_expr[0]-operands[0];
   }
   break;
 
@@ -264,7 +260,6 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
   }
 
   if (new_op != ir_unop_logic_not) {
-this-progress = true;
 return new(mem_ctx) ir_expression(new_op,
   ir-type,
   op_expr[0]-operands[0],
@@ -275,14 +270,10 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
}
 
case ir_binop_add:
-  if (is_vec_zero(op_const[0])) {
-this-progress = true;
-return swizzle_if_required(ir, ir-operands[1]);
-  }
-  if (is_vec_zero(op_const[1])) {
-this-progress = true;
-return swizzle_if_required(ir, ir-operands[0]);
-  }
+  if (is_vec_zero(op_const[0]))
+return ir-operands[1];
+  if (is_vec_zero(op_const[1]))
+return ir-operands[0];
 
   /* Reassociate addition of constants so that we can do constant
* folding.
@@ -295,48 +286,35 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
 
case ir_binop_sub:
   if (is_vec_zero(op_const[0])) {
-this-progress = true;
-temp = new(mem_ctx) ir_expression(ir_unop_neg,
+return new(mem_ctx) ir_expression(ir_unop_neg,
   ir-operands[1]-type,
   ir-operands[1],
   NULL);
-return swizzle_if_required(ir, temp);
-  }
-  if (is_vec_zero(op_const[1])) {
-this-progress = true;
-return swizzle_if_required(ir, ir-operands[0]);
   }
+  if (is_vec_zero(op_const[1]))
+return ir-operands[0];
   break;
 
case ir_binop_mul:
-  if (is_vec_one(op_const[0])) {
-this-progress = true;
-return swizzle_if_required(ir, ir-operands[1]);
-  }
-  if (is_vec_one(op_const[1])) {
-this-progress = true;
-return swizzle_if_required(ir, ir-operands[0]);
-  }
+  if (is_vec_one(op_const[0]))
+return ir-operands[1];
+  if (is_vec_one(op_const[1]))
+return ir-operands[0];
 
-  if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1])) {
-this-progress = true;
+  if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1]))
 return ir_constant::zero(ir, ir-type);
-  }
+
   if (is_vec_negative_one(op_const[0])) {
- this-progress = true;
- temp = new(mem_ctx) ir_expression(ir_unop_neg,
+ return new(mem_ctx) ir_expression(ir_unop_neg,
ir-operands[1]-type,
ir-operands[1],
NULL);
- return swizzle_if_required(ir, temp);
   }
   if (is_vec_negative_one(op_const[1])) {
- this-progress = true;
- temp = new(mem_ctx) ir_expression(ir_unop_neg,
+ return new(mem_ctx) ir_expression(ir_unop_neg,
ir-operands[0]-type,
ir-operands[0],
NULL);
- return swizzle_if_required(ir, temp);
   }
 
 
@@ -352,26 +330,20 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
 

Re: [Mesa-dev] [PATCH] nv50: implement multisample textures

2013-10-25 Thread Bryan Cain
On 10/25/2013 04:11 PM, Christoph Bumiller wrote:
 On 25.10.2013 20:35, Emil Velikov wrote:
 On 21/10/13 23:23, Bryan Cain wrote:
 This is a port of 4da54c91d24da (nvc0: implement multisample textures) to
 nv50.

 When coupled with the patch to only report 16 texture samplers (to fix
 crashes), all of the Piglit tests in spec/arb_texture_multisample pass.

 Hello Bryan,

 Big thanks for your work. As promised here is a quick piglit summary on
 my nv96

 pass/fail/crash
 69/32/27

 * dmesg does not spit anything nouveau related during the tests
 * any geometry shader related tests were skipped
 (piglit: info: Failed to create GL 3.2 core context)
 * all the crashes are due to the following assert
 codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed.
 I'm not sure how you'd get  4 arguments there (x y layer sample ?).
 There's no mip maps for multisample textures.

 But either way you're probably going to have to do things by hand:
 E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each
 pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle
 and then add the correct offsets for the sample id as seen in
 get_sample_position (store the info in a constant buffer, that has to be
 updated when texture changes).

 You might want to use a lookup table like in nve4 compute (look for MS
 sample coordinate offsets) to map sample id to coordinate offset, that
 one works for any sample count as long as you don't use the ALT modes
 (nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs
 where the whole VM address calculation is done by hand).

You're probably right.  I don't know why MSAA appears to work for me,
but there's probably something wrong with the output that I haven't
noticed.  I'll work on implementing it properly this weekend.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Move error message inside validation check reducing duplicate message handling

2013-10-25 Thread Paul Berry
On 17 October 2013 04:42, Timothy Arceri t_arc...@yahoo.com.au wrote:

 ---
  src/glsl/ast_to_hir.cpp | 27 ++-
  1 file changed, 14 insertions(+), 13 deletions(-)

 diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
 index dfa32d9..f96ed53 100644
 --- a/src/glsl/ast_to_hir.cpp
 +++ b/src/glsl/ast_to_hir.cpp
 @@ -637,8 +637,8 @@ shift_result_type(const struct glsl_type *type_a,
   */
  ir_rvalue *
  validate_assignment(struct _mesa_glsl_parse_state *state,
 -   const glsl_type *lhs_type, ir_rvalue *rhs,
 -   bool is_initializer)
 +YYLTYPE loc, const glsl_type *lhs_type,
 +ir_rvalue *rhs, bool is_initializer)
  {
 /* If there is already some error in the RHS, just return it.  Anything
  * else will lead to an avalanche of error message back to the user.
 @@ -670,6 +670,12 @@ validate_assignment(struct _mesa_glsl_parse_state
 *state,
  return rhs;
 }

 +   _mesa_glsl_error(loc, state,
 +is_initializer ? initializer : value
 + of type %s cannot be assigned to 
 +variable of type %s,
 +rhs-type-name, lhs_type-name);
 +


This doesn't produce the output you want.  String concatenation happens at
compile time and takes precedence over everything else, so this is being
interpreted as:

_mesa_glsl_error(loc, state, is_initializer ? initializer : value of
type %s cannot be assigned to variable of type %s, rhs-type-name,
lhs_type-name);

Adding parenthesis doesn't help because string concatenation only works on
string literals.  I believe what you actually want is:

   _mesa_glsl_error(loc, state,
%s of type %s cannot be assigned to 
variable of type %s,
is_initializer ? initializer : value,
rhs-type-name, lhs_type-name);

With that change, this patch is:

Reviewed-by: Paul Berry stereotype...@gmail.com

Do you have push access?  I can push the patch for you (with this change)
if you'd like.


 return NULL;
  }

 @@ -700,10 +706,10 @@ do_assignment(exec_list *instructions, struct
 _mesa_glsl_parse_state *state,

if (unlikely(expr-operation == ir_binop_vector_extract)) {
   ir_rvalue *new_rhs =
 -validate_assignment(state, lhs-type, rhs, is_initializer);
 +validate_assignment(state, lhs_loc, lhs-type,
 +rhs, is_initializer);

   if (new_rhs == NULL) {
 -_mesa_glsl_error( lhs_loc, state, type mismatch);
  return lhs;
   } else {
  rhs = new(ctx) ir_expression(ir_triop_vector_insert,
 @@ -752,10 +758,8 @@ do_assignment(exec_list *instructions, struct
 _mesa_glsl_parse_state *state,
 }

 ir_rvalue *new_rhs =
 -  validate_assignment(state, lhs-type, rhs, is_initializer);
 -   if (new_rhs == NULL) {
 -  _mesa_glsl_error( lhs_loc, state, type mismatch);
 -   } else {
 +  validate_assignment(state, lhs_loc, lhs-type, rhs, is_initializer);
 +   if (new_rhs != NULL) {
rhs = new_rhs;

/* If the LHS array was not declared with a size, it takes it size
 from
 @@ -2495,7 +2499,8 @@ process_initializer(ir_variable *var,
 ast_declaration *decl,
  */
 if (type-qualifier.flags.q.constant
 || type-qualifier.flags.q.uniform) {
 -  ir_rvalue *new_rhs = validate_assignment(state, var-type, rhs,
 true);
 +  ir_rvalue *new_rhs = validate_assignment(state, initializer_loc,
 +   var-type, rhs, true);
if (new_rhs != NULL) {
  rhs = new_rhs;

 @@ -2524,10 +2529,6 @@ process_initializer(ir_variable *var,
 ast_declaration *decl,
 var-constant_value = constant_value;
  }
} else {
 -_mesa_glsl_error(initializer_loc, state,
 - initializer of type %s cannot be assigned to 
 - variable of type %s,
 - rhs-type-name, var-type-name);
  if (var-type-is_numeric()) {
 /* Reduce cascading errors. */
 var-constant_value = ir_constant::zero(state, var-type);
 --
 1.8.3.1

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50: implement multisample textures

2013-10-25 Thread Christoph Bumiller
On 25.10.2013 23:51, Bryan Cain wrote:
 On 10/25/2013 04:11 PM, Christoph Bumiller wrote:
 On 25.10.2013 20:35, Emil Velikov wrote:
 On 21/10/13 23:23, Bryan Cain wrote:
 This is a port of 4da54c91d24da (nvc0: implement multisample textures) to
 nv50.

 When coupled with the patch to only report 16 texture samplers (to fix
 crashes), all of the Piglit tests in spec/arb_texture_multisample pass.

 Hello Bryan,

 Big thanks for your work. As promised here is a quick piglit summary on
 my nv96

 pass/fail/crash
 69/32/27

 * dmesg does not spit anything nouveau related during the tests
 * any geometry shader related tests were skipped
 (piglit: info: Failed to create GL 3.2 core context)
 * all the crashes are due to the following assert
 codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed.
 I'm not sure how you'd get  4 arguments there (x y layer sample ?).
 There's no mip maps for multisample textures.

 But either way you're probably going to have to do things by hand:
 E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each
 pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle
 and then add the correct offsets for the sample id as seen in
 get_sample_position (store the info in a constant buffer, that has to be
 updated when texture changes).

 You might want to use a lookup table like in nve4 compute (look for MS
 sample coordinate offsets) to map sample id to coordinate offset, that
 one works for any sample count as long as you don't use the ALT modes
 (nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs
 where the whole VM address calculation is done by hand).
 You're probably right.  I don't know why MSAA appears to work for me,
 but there's probably something wrong with the output that I haven't
 noticed.  I'll work on implementing it properly this weekend.

MSAA itself (rendering and resolving) has been working before, the only
thing that ARB_texture_multisample adds is texelFetch from MS resources.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50: implement multisample textures

2013-10-25 Thread Bryan Cain
On 10/25/2013 05:05 PM, Christoph Bumiller wrote:
 On 25.10.2013 23:51, Bryan Cain wrote:
 On 10/25/2013 04:11 PM, Christoph Bumiller wrote:
 On 25.10.2013 20:35, Emil Velikov wrote:
 On 21/10/13 23:23, Bryan Cain wrote:
 This is a port of 4da54c91d24da (nvc0: implement multisample textures) 
 to
 nv50.

 When coupled with the patch to only report 16 texture samplers (to fix
 crashes), all of the Piglit tests in spec/arb_texture_multisample pass.

 Hello Bryan,

 Big thanks for your work. As promised here is a quick piglit summary on
 my nv96

 pass/fail/crash
 69/32/27

 * dmesg does not spit anything nouveau related during the tests
 * any geometry shader related tests were skipped
 (piglit: info: Failed to create GL 3.2 core context)
 * all the crashes are due to the following assert
 codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed.
 I'm not sure how you'd get  4 arguments there (x y layer sample ?).
 There's no mip maps for multisample textures.

 But either way you're probably going to have to do things by hand:
 E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each
 pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle
 and then add the correct offsets for the sample id as seen in
 get_sample_position (store the info in a constant buffer, that has to be
 updated when texture changes).

 You might want to use a lookup table like in nve4 compute (look for MS
 sample coordinate offsets) to map sample id to coordinate offset, that
 one works for any sample count as long as you don't use the ALT modes
 (nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs
 where the whole VM address calculation is done by hand).
 You're probably right.  I don't know why MSAA appears to work for me,
 but there's probably something wrong with the output that I haven't
 noticed.  I'll work on implementing it properly this weekend.
 MSAA itself (rendering and resolving) has been working before, the only
 thing that ARB_texture_multisample adds is texelFetch from MS resources.

I really should read an extension's spec carefully before trying to
implement it so that I don't waste other people's time.  Sorry.

Bryan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.

2013-10-25 Thread Marek Olšák
On Fri, Oct 25, 2013 at 10:28 PM, Kenneth Graunke kenn...@whitecape.org wrote:
 On 10/22/2013 04:30 AM, Marek Olšák wrote:
 On Fri, Oct 18, 2013 at 8:09 AM, Kenneth Graunke kenn...@whitecape.org 
 wrote:
 DrawTransformFeedback() needs to obtain the number of vertices written
 to a particular stream during the last Begin/EndTransformFeedback block.
 The new driver hook returns exactly that information.

 Gallium drivers already implement this functionality by passing the
 transform feedback object to the drawing function.  I prefer to avoid
 this for two reasons:

 1. Complexity:

 Normally, the drawing function takes an array of _mesa_prim objects,
 each of which specifies a vertex count.  If tfb_vertcount != NULL,
 however, there will only be one _mesa_prim object with an invalid
 vertex count (of 1), so it needs to be ignored.

 Since the _mesa_prim pointers are const, you can't even override it to
 the proper value; you need to pass around extra ignore that, here's
 the real count parameters.

 The drawing function is already terribly complicated, so I don't want to
 make it even more complicated.

 I don't understand this. Are you saying that the software emulation of
 the feature is always better because of complexity the real
 hardware-accelerated solution would have?

 On Ivybridge hardware, I think that a GPU-only implementation of
 DrawTransformFeedback would be very complicated, and probably less
 efficient than this (extremely simple) software solution.  It might be
 possible to do a reasonable GPU-only implementation on Haswell, but I
 haven't looked into the details yet.  (See my reply to Eric.)

 At least for Ivybridge, I think I want this software path 100% of the
 time.  We may want to remove the stall on Haswell as a later optimization.

I'd like to have a dedicated flag for this fallback like we have
Const.PrimitiveRestartInSoftware, in case we need to implement the
query for something else.


 It sounds like for Gallium, you already have a decent GPU-only solution.
  I tried to follow that code to understand how it works, and got lost
 after jumping through around 5 files...which is probably just my poor
 understanding of the Gallium architecture.

Gallium doesn't do anything, the interface is pretty much the same as
the vbo one.

On the hardware side, there are 4 counters containing the number of
bytes written to each TFB buffer. If TFB is started, the counters are
set to 0. Everytime TFB is ended or paused, the counters are stored
for each buffer in memory. When resuming TFB, the counters are simply
loaded from memory.

When we have to do DrawTransformFeedback, we copy the value of the
counter from memory to a special draw register. Since the value is in
bytes, we also have to set the TFB buffer stride to another special
draw register. That's all. The hardware then calculates count =
bytes/stride before drawing.


 [snip]
 diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
 index 1670409..11bb76a 100644
 --- a/src/mesa/vbo/vbo_exec_array.c
 +++ b/src/mesa/vbo/vbo_exec_array.c
 @@ -1464,6 +1464,12 @@ vbo_draw_transform_feedback(struct gl_context *ctx, 
 GLenum mode,
return;
 }

 +   if (ctx-Driver.GetTransformFeedbackVertexCount) {
 +  GLsizei n = ctx-Driver.GetTransformFeedbackVertexCount(ctx, obj, 
 stream);
 +  vbo_draw_arrays(ctx, mode, 0, n, numInstances, 0);
 +  return;
 +   }

 As you mentioned, the only issue is with primitive restart, so why is
 this done even if primitive restart is disabled? Drivers which will
 have to implement this just to make e.g. non-VBO vertex uploads work
 will suffer from the CPU-GPU synchronization this code forces.

 Marek

 I hadn't thought about non-VBO vertex uploads.  What does Gallium do in
 that case?  Has it just been broken this whole time?

Yes, it has, I completely forgot about it. :(


 I guess I figured drivers would either implement this hook, or do the
 tfb_vertcount approach, but not both.  Maybe that's a bad assumption.

For vertex uploads and vertex fetch fallbacks (where we translate and
align vertex buffers to what a gallium driver supports -
util/u_vbuf.c), we can use a query like the one you want to add.
However, gallium drivers should use the tfb_vertcount approach (AKA
pipe_draw_info::count_from_stream_output) whenever they see it's not
NULL. Since most Gallium hardware drivers will never see non-VBO
vertex data or an unsupported vertex format, it's the only approach
they have to implement.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] gallium: add PIPE_CAP_MIXED_FRAMEBUFFER_SIZES

2013-10-25 Thread Marek Olšák
I'll do it in a moment.

Marek

On Fri, Oct 25, 2013 at 11:25 PM, Ilia Mirkin imir...@alum.mit.edu wrote:
 Thanks, Marek. Could someone with commit access pick this up? Let me
 know if you'd like me to reformat/resend/create a git tree/whatever.

   -ilia

 On Sun, Oct 13, 2013 at 9:16 AM, Marek Olšák mar...@gmail.com wrote:
 For the series:

 Reviewed-by: Marek Olšák marek.ol...@amd.com

 Marek

 On Sun, Oct 13, 2013 at 3:43 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 ping

 On Fri, Oct 4, 2013 at 4:32 AM, Ilia Mirkin imir...@alum.mit.edu wrote:
 This CAP will determine whether ARB_framebuffer_object can be enabled.
 The nv30 driver does not allow mixing swizzled and linear zsbuf/cbuf
 textures.

 Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
 ---
  src/gallium/docs/source/screen.rst   | 3 +++
  src/gallium/drivers/freedreno/freedreno_screen.c | 1 +
  src/gallium/drivers/i915/i915_screen.c   | 1 +
  src/gallium/drivers/ilo/ilo_screen.c | 1 +
  src/gallium/drivers/llvmpipe/lp_screen.c | 1 +
  src/gallium/drivers/nouveau/nv30/nv30_screen.c   | 1 +
  src/gallium/drivers/nouveau/nv50/nv50_screen.c   | 1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   | 1 +
  src/gallium/drivers/r300/r300_screen.c   | 1 +
  src/gallium/drivers/r600/r600_pipe.c | 1 +
  src/gallium/drivers/radeonsi/radeonsi_pipe.c | 1 +
  src/gallium/drivers/softpipe/sp_screen.c | 1 +
  src/gallium/drivers/svga/svga_screen.c   | 1 +
  src/gallium/include/pipe/p_defines.h | 3 ++-
  14 files changed, 17 insertions(+), 1 deletion(-)

 diff --git a/src/gallium/docs/source/screen.rst 
 b/src/gallium/docs/source/screen.rst
 index d19cd1a..a01f548 100644
 --- a/src/gallium/docs/source/screen.rst
 +++ b/src/gallium/docs/source/screen.rst
 @@ -173,6 +173,9 @@ The integer capabilities:
viewport/scissor combination.
  * ''PIPE_CAP_ENDIANNESS``:: The endianness of the device.  Either
PIPE_ENDIAN_BIG or PIPE_ENDIAN_LITTLE.
 +* ``PIPE_CAP_MIXED_FRAMEBUFFER_SIZES``: Whether it is allowed to have
 +  different sizes for fb color/zs attachments. This controls whether
 +  ARB_framebuffer_object is provided.


  .. _pipe_capf:
 diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c 
 b/src/gallium/drivers/freedreno/freedreno_screen.c
 index a038a77..7d0fb3b 100644
 --- a/src/gallium/drivers/freedreno/freedreno_screen.c
 +++ b/src/gallium/drivers/freedreno/freedreno_screen.c
 @@ -140,6 +140,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum 
 pipe_cap param)
 switch (param) {
 /* Supported features (boolean caps). */
 case PIPE_CAP_NPOT_TEXTURES:
 +   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
 case PIPE_CAP_TWO_SIDED_STENCIL:
 case PIPE_CAP_ANISOTROPIC_FILTER:
 case PIPE_CAP_POINT_SPRITE:
 diff --git a/src/gallium/drivers/i915/i915_screen.c 
 b/src/gallium/drivers/i915/i915_screen.c
 index 556dda8..77607d0 100644
 --- a/src/gallium/drivers/i915/i915_screen.c
 +++ b/src/gallium/drivers/i915/i915_screen.c
 @@ -172,6 +172,7 @@ i915_get_param(struct pipe_screen *screen, enum 
 pipe_cap cap)
 /* Supported features (boolean caps). */
 case PIPE_CAP_ANISOTROPIC_FILTER:
 case PIPE_CAP_NPOT_TEXTURES:
 +   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
 case PIPE_CAP_POINT_SPRITE:
 case PIPE_CAP_PRIMITIVE_RESTART: /* draw module */
 case PIPE_CAP_TEXTURE_SHADOW_MAP:
 diff --git a/src/gallium/drivers/ilo/ilo_screen.c 
 b/src/gallium/drivers/ilo/ilo_screen.c
 index 3f8d431..ddf11ff 100644
 --- a/src/gallium/drivers/ilo/ilo_screen.c
 +++ b/src/gallium/drivers/ilo/ilo_screen.c
 @@ -286,6 +286,7 @@ ilo_get_param(struct pipe_screen *screen, enum 
 pipe_cap param)

 switch (param) {
 case PIPE_CAP_NPOT_TEXTURES:
 +   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
 case PIPE_CAP_TWO_SIDED_STENCIL:
return true;
 case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS:
 diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
 b/src/gallium/drivers/llvmpipe/lp_screen.c
 index b3cd77f..2bbc2c9 100644
 --- a/src/gallium/drivers/llvmpipe/lp_screen.c
 +++ b/src/gallium/drivers/llvmpipe/lp_screen.c
 @@ -109,6 +109,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum 
 pipe_cap param)
 case PIPE_CAP_MAX_COMBINED_SAMPLERS:
return 2 * PIPE_MAX_SAMPLERS;  /* VS + FS samplers */
 case PIPE_CAP_NPOT_TEXTURES:
 +   case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES:
return 1;
 case PIPE_CAP_TWO_SIDED_STENCIL:
return 1;
 diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c 
 b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
 index 50ddfec..807100e 100644
 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c
 +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c
 @@ -125,6 +125,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, 
 enum pipe_cap param)
 case PIPE_CAP_QUERY_PIPELINE_STATISTICS:
 case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK:
 case 

Re: [Mesa-dev] [PATCH] R600: Expand vector FSQRT ops

2013-10-25 Thread Aaron Watry
Reviewed-by: Aaron Watry awa...@gmail.com

I have tested this on a Radeon 5400 (Cedar), and I just sent a few
generated tests to the piglit list.

--Aaron

On Wed, Oct 23, 2013 at 6:28 PM, Tom Stellard t...@stellard.net wrote:
 From: Tom Stellard thomas.stell...@amd.com

 ---
  lib/Target/R600/AMDGPUISelLowering.cpp |  1 +
  test/CodeGen/R600/llvm.sqrt.ll | 54 
 ++
  2 files changed, 55 insertions(+)
  create mode 100644 test/CodeGen/R600/llvm.sqrt.ll

 diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp 
 b/lib/Target/R600/AMDGPUISelLowering.cpp
 index 91d85d3..52dd010 100644
 --- a/lib/Target/R600/AMDGPUISelLowering.cpp
 +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
 @@ -181,6 +181,7 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine 
 TM) :
  setOperationAction(ISD::FFLOOR, VT, Expand);
  setOperationAction(ISD::FMUL, VT, Expand);
  setOperationAction(ISD::FRINT, VT, Expand);
 +setOperationAction(ISD::FSQRT, VT, Expand);
  setOperationAction(ISD::FSUB, VT, Expand);
}
  }
 diff --git a/test/CodeGen/R600/llvm.sqrt.ll b/test/CodeGen/R600/llvm.sqrt.ll
 new file mode 100644
 index 000..0d0d186
 --- /dev/null
 +++ b/test/CodeGen/R600/llvm.sqrt.ll
 @@ -0,0 +1,54 @@
 +; RUN: llc  %s -march=r600 --mcpu=redwood | FileCheck %s 
 --check-prefix=R600-CHECK
 +; RUN: llc  %s -march=r600 --mcpu=SI | FileCheck %s --check-prefix=SI-CHECK
 +
 +; R600-CHECK-LABEL: @sqrt_f32
 +; R600-CHECK: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[2].Z
 +; R600-CHECK: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[2].Z, PS
 +; SI-CHECK-LABEL: @sqrt_f32
 +; SI-CHECK: V_SQRT_F32_e32
 +define void @sqrt_f32(float addrspace(1)* %out, float %in) {
 +entry:
 +  %0 = call float @llvm.sqrt.f32(float %in)
 +  store float %0, float addrspace(1)* %out
 +  ret void
 +}
 +
 +; R600-CHECK-LABEL: @sqrt_v2f32
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[2].W
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[2].W, PS
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].X
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].X, PS
 +; SI-CHECK-LABEL: @sqrt_v2f32
 +; SI-CHECK: V_SQRT_F32_e32
 +; SI-CHECK: V_SQRT_F32_e32
 +define void @sqrt_v2f32(2 x float addrspace(1)* %out, 2 x float %in) {
 +entry:
 +  %0 = call 2 x float @llvm.sqrt.v2f32(2 x float %in)
 +  store 2 x float %0, 2 x float addrspace(1)* %out
 +  ret void
 +}
 +
 +; R600-CHECK-LABEL: @sqrt_v4f32
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].Y
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].Y, PS
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].Z
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].Z, PS
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].W
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].W, PS
 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[4].X
 +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[4].X, PS
 +; SI-CHECK-LABEL: @sqrt_v4f32
 +; SI-CHECK: V_SQRT_F32_e32
 +; SI-CHECK: V_SQRT_F32_e32
 +; SI-CHECK: V_SQRT_F32_e32
 +; SI-CHECK: V_SQRT_F32_e32
 +define void @sqrt_v4f32(4 x float addrspace(1)* %out, 4 x float %in) {
 +entry:
 +  %0 = call 4 x float @llvm.sqrt.v4f32(4 x float %in)
 +  store 4 x float %0, 4 x float addrspace(1)* %out
 +  ret void
 +}
 +
 +declare float @llvm.sqrt.f32(float %in)
 +declare 2 x float @llvm.sqrt.v2f32(2 x float %in)
 +declare 4 x float @llvm.sqrt.v4f32(4 x float %in)
 --
 1.7.11.4

 ___
 llvm-commits mailing list
 llvm-comm...@cs.uiuc.edu
 http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Ivybridge support for ARB_transform_feedback2

2013-10-25 Thread Ian Romanick
On 10/17/2013 11:09 PM, Kenneth Graunke wrote:
 Here's my implementation of ARB_transform_feedback2.  I believe it's
 complete; it passes all of our Piglit tests and a lot of Intel's
 oglconform tests.
 
 This should work out of the box on Ivybridge and Baytrail.  It won't
 work on Haswell at the moment, due to restrictions on register writes
 (to be solved in a future kernel version).  Patch 9 will need to be
 replaced with something that detects whether or not we can write
 registers from userspace batchbuffers.
 
 In the meantime, I figured I'd send out the rest for review.
 
 Porting this back to Sandybridge is probably doable, but annoying.
 Sandybridge doesn't have the MI_LOAD_REGISTER_MEM command, so we'd have
 to map the buffers and use MI_LOAD_REGISTER_IMM.  Seems pretty gross.
 Plus, transform feedback is done very differently pre-Ivybridge.  I'm
 not sure it's worth it, seeing as it's a GL 4.0 feature.

Patches 5, 7, 8, and 9 (with Eric's suggested change) are all

Reviewed-by: Ian Romanick ian.d.roman...@intel.com

I share Eric's concern about patch 4.

It sounds like you, Eric, and Marek are trending towards a solution for
patch 6, so I'll stay out of it. :)

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] implement NV_vdpau_interop v3

2013-10-25 Thread Marek Olšák
On Sun, Oct 20, 2013 at 11:57 AM, Christian König
deathsim...@vodafone.de wrote:
 Hi Marek,

 I've just send out a v6 of the patch, please take a second look. Most things
 are fixed now, but there are still a couple of open issues:


 3) There should also probably be some checking for
 GL_ARB_texture_non_power_of_two, but the spec doesn't say what we
 should do (probably return GL_INVALID_OPERATION).


 Actually I thing VDPAU hold the answer to this. The specification there
 states that the different surfaces creation function should round up the
 width/height to supported values (which can then be queried later by the
 application). So we always will end up with correct values independent of
 GL_ARB_texture_non_power_of_two.


 6) Registered and mapped VDPAU textures are not allowed to be
 re-specified by TexImage, TexSubImage, TexImage*Multisample,
 CopyTexImage, CopyTexSubImage, TexStorage, TexStorage*Multisample, and
 similar functions. This should be properly handled in those functions
 and GL errors should be returned.


 I would rather like to avoid touching those functions, cause they are not
 directly related to the spec and I don't want to risk breaking anything
 there.

 Would it valid so set/clear the immutable flag instead (honestly I don't
 have the slightest idea how the frontend handling works in this code)?

Yes, it seems to be sufficient.



 7) The extension spec says that all VDPAU textures should be
 y-inverted. Is that actually the case here?


 Uhm, no idea? It does seems to work, but where is that information stored?

It means that a VDPAU surface is upside-down when it's used as an
OpenGL texture. I don't remember whether we need to a blit or whether
OpenGL textures are y-inverted by default (then we don't have to do
anything). If we do the same thing as NVIDIA, it's probably okay.


Please review and squash the attached patch with your version 6, and
feel free to push it.

Marek
From 1ca52d1ae40fd81276f56e8a61fbed3ad819eb41 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= marek.ol...@amd.com
Date: Sat, 26 Oct 2013 00:39:52 +0200
Subject: [PATCH] squash this with the vdpau patch
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Marek Olšák marek.ol...@amd.com
---
 src/mesa/main/vdpau.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/vdpau.c b/src/mesa/main/vdpau.c
index 157df47..7792b4b 100644
--- a/src/mesa/main/vdpau.c
+++ b/src/mesa/main/vdpau.c
@@ -131,7 +131,7 @@ register_surface(struct gl_context *ctx, GLboolean isOutput,
   return (GLintptr)NULL;
}
 
-   surf = MALLOC_STRUCT( vdp_surface );
+   surf = CALLOC_STRUCT( vdp_surface );
surf-vdpSurface = vdpSurface;
surf-target = target;
surf-access = GL_READ_WRITE;
@@ -144,6 +144,7 @@ register_surface(struct gl_context *ctx, GLboolean isOutput,
   _mesa_lock_texture(ctx, tex);
 
   if (tex-Immutable) {
+_mesa_unlock_texture(ctx, tex);
  FREE(surf);
  _mesa_error(ctx, GL_INVALID_OPERATION,
  VDPAURegisterSurfaceNV(texture is immutable));
@@ -153,15 +154,18 @@ register_surface(struct gl_context *ctx, GLboolean isOutput,
   if (tex-Target == 0)
  tex-Target = target;
   else if (tex-Target != target) {
+_mesa_unlock_texture(ctx, tex);
  FREE(surf);
  _mesa_error(ctx, GL_INVALID_OPERATION,
  VDPAURegisterSurfaceNV(target mismatch));
  return (GLintptr)NULL;
   }
 
+  /* This will disallow respecifying the storage. */
+  tex-Immutable = GL_TRUE;
   _mesa_unlock_texture(ctx, tex);
 
-  surf-textures[i] = tex;
+  _mesa_reference_texobj(surf-textures[i], tex);
}
 
_mesa_set_add(ctx-vdpSurfaces, _mesa_hash_pointer(surf), surf);
@@ -223,6 +227,7 @@ _mesa_VDPAUUnregisterSurfaceNV(GLintptr surface)
 {
struct vdp_surface *surf = (struct vdp_surface *)surface;
struct set_entry *entry;
+   int i;
GET_CURRENT_CONTEXT(ctx);
 
if (!ctx-vdpDevice || !ctx-vdpGetProcAddress || !ctx-vdpSurfaces) {
@@ -240,6 +245,13 @@ _mesa_VDPAUUnregisterSurfaceNV(GLintptr surface)
   return;
}
 
+   for (i = 0; i  MAX_TEXTURES; i++) {
+  if (surf-textures[i]) {
+ surf-textures[i]-Immutable = GL_FALSE;
+ _mesa_reference_texobj(surf-textures[i], NULL);
+  }
+   }
+
_mesa_set_remove(ctx-vdpSurfaces, entry);
FREE(surf);
 }
-- 
1.8.1.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 00/12] Implement GL_ARB_sample_shading on Intel hardware

2013-10-25 Thread Anuj Phogat
Patches listed below implement the GL_ARB_sample_shading extension
on Intel hardware = gen6. I verified the implementation with a
number of piglit tests, currently under review on piglit mailing
list. Observed no piglit, gles3 CTS regressions with these patches
on SNB, IVB  HSW.
These patches can also be found at my github branch:
https://github.com/aphogat/mesa.git branch: sample-shading-8

This is the V2 of the series I posted earlier. [PATCH 5/8] i965:
Implement FS backend for ARB_sample_shading in my original series
is split in to 3 patches here. Changes in individual patches are
listed in commit message. Following patches in this series need a
'reviewed-by'.
4/12, 6/12, 7/12, 8/12, 9/12, 10/12, 11,12

Anuj Phogat (12):
  mesa: Add infrastructure for GL_ARB_sample_shading
  mesa: Add new functions and enums required by GL_ARB_sample_shading
  mesa: Pass number of samples as a program state variable
  mesa: Add a helper function _mesa_get_min_invocations_per_fragment()
  glsl: Add new builtins required by GL_ARB_sample_shading
  i965: Don't do vector splitting for ir_var_system_value
  i965: Add FS backend for builtin gl_SamplePosition
  i965: Add FS backend for builtin gl_SampleID
  i965: Add FS backend for builtin gl_SampleMask[]
  i965/gen6: Enable the features required for GL_ARB_sample_shading
  i965/gen7: Enable the features required for GL_ARB_sample_shading
  i965: Enable ARB_sample_shading on intel hardware = gen6

 src/glsl/builtin_variables.cpp |  18 
 src/glsl/glcpp/glcpp-parse.y   |   3 +
 src/glsl/glsl_parser_extras.cpp|   1 +
 src/glsl/glsl_parser_extras.h  |   2 +
 src/glsl/standalone_scaffolding.cpp|   1 +
 src/mapi/glapi/gen/ARB_sample_shading.xml  |  19 
 src/mapi/glapi/gen/GL4x.xml|  21 
 src/mapi/glapi/gen/Makefile.am |   4 +-
 src/mapi/glapi/gen/gl_API.xml  |   3 +-
 src/mesa/drivers/dri/i965/brw_context.h|   2 +
 src/mesa/drivers/dri/i965/brw_defines.h|   2 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 114 +
 src/mesa/drivers/dri/i965/brw_fs.h |  14 +++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  50 +
 .../drivers/dri/i965/brw_fs_vector_splitting.cpp   |   1 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  22 
 src/mesa/drivers/dri/i965/brw_wm.c |  12 +++
 src/mesa/drivers/dri/i965/brw_wm.h |   3 +
 src/mesa/drivers/dri/i965/gen6_wm_state.c  |  52 +-
 src/mesa/drivers/dri/i965/gen7_wm_state.c  |  53 +-
 src/mesa/drivers/dri/i965/intel_extensions.c   |   1 +
 src/mesa/main/enable.c |  16 +++
 src/mesa/main/extensions.c |   1 +
 src/mesa/main/get.c|   8 ++
 src/mesa/main/get_hash_params.py   |   3 +
 src/mesa/main/mtypes.h |  13 ++-
 src/mesa/main/multisample.c|  18 
 src/mesa/main/multisample.h|   2 +
 src/mesa/main/tests/dispatch_sanity.cpp|   2 +-
 src/mesa/program/prog_print.c  |   1 +
 src/mesa/program/prog_statevars.c  |  11 ++
 src/mesa/program/prog_statevars.h  |   2 +
 src/mesa/program/program.c |  31 ++
 src/mesa/program/program.h |   3 +
 34 files changed, 499 insertions(+), 10 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_sample_shading.xml
 create mode 100644 src/mapi/glapi/gen/GL4x.xml

-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 01/12] mesa: Add infrastructure for GL_ARB_sample_shading

2013-10-25 Thread Anuj Phogat
This patch implements the common support code required for the
GL_ARB_sample_shading extension.

V2: Move GL_ARB_sample_shading to ARB extension list.

Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
Reviewed-by: Ian Romanick i...@freedesktop.org
Reviewed-by: Ken Graunke kenn...@whitecape.org
---
 src/glsl/glcpp/glcpp-parse.y| 3 +++
 src/glsl/glsl_parser_extras.cpp | 1 +
 src/glsl/glsl_parser_extras.h   | 2 ++
 src/glsl/standalone_scaffolding.cpp | 1 +
 src/mesa/main/extensions.c  | 1 +
 src/mesa/main/mtypes.h  | 1 +
 6 files changed, 9 insertions(+)

diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y
index 02100ab..5141bdd 100644
--- a/src/glsl/glcpp/glcpp-parse.y
+++ b/src/glsl/glcpp/glcpp-parse.y
@@ -1249,6 +1249,9 @@ glcpp_parser_create (const struct gl_extensions 
*extensions, int api)
  if (extensions-ARB_shading_language_420pack)
 add_builtin_define(parser, GL_ARB_shading_language_420pack, 
1);
 
+ if (extensions-ARB_sample_shading)
+add_builtin_define(parser, GL_ARB_sample_shading, 1);
+
  if (extensions-EXT_shader_integer_mix)
 add_builtin_define(parser, GL_EXT_shader_integer_mix, 1);
 
diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp
index be17109..669f531 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -533,6 +533,7 @@ static const _mesa_glsl_extension 
_mesa_glsl_supported_extensions[] = {
EXT(AMD_vertex_shader_layer,true,  false, 
AMD_vertex_shader_layer),
EXT(EXT_shader_integer_mix, true,  true,  
EXT_shader_integer_mix),
EXT(ARB_texture_gather, true,  false, ARB_texture_gather),
+   EXT(ARB_sample_shading, true,  false, ARB_sample_shading),
 };
 
 #undef EXT
diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h
index a674384..872dd80 100644
--- a/src/glsl/glsl_parser_extras.h
+++ b/src/glsl/glsl_parser_extras.h
@@ -323,6 +323,8 @@ struct _mesa_glsl_parse_state {
bool AMD_vertex_shader_layer_warn;
bool ARB_shading_language_420pack_enable;
bool ARB_shading_language_420pack_warn;
+   bool ARB_sample_shading_enable;
+   bool ARB_sample_shading_warn;
bool EXT_shader_integer_mix_enable;
bool EXT_shader_integer_mix_warn;
/*@}*/
diff --git a/src/glsl/standalone_scaffolding.cpp 
b/src/glsl/standalone_scaffolding.cpp
index 7a1cf68..cbff6d1 100644
--- a/src/glsl/standalone_scaffolding.cpp
+++ b/src/glsl/standalone_scaffolding.cpp
@@ -97,6 +97,7 @@ void initialize_context_to_defaults(struct gl_context *ctx, 
gl_api api)
ctx-Extensions.ARB_explicit_attrib_location = true;
ctx-Extensions.ARB_fragment_coord_conventions = true;
ctx-Extensions.ARB_gpu_shader5 = true;
+   ctx-Extensions.ARB_sample_shading = true;
ctx-Extensions.ARB_shader_bit_encoding = true;
ctx-Extensions.ARB_shader_stencil_export = true;
ctx-Extensions.ARB_shader_texture_lod = true;
diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index e8e0a20..f3300e3 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -118,6 +118,7 @@ static const struct extension extension_table[] = {
{ GL_ARB_point_sprite,o(ARB_point_sprite),
GL, 2003 },
{ GL_ARB_provoking_vertex,o(EXT_provoking_vertex),
GL, 2009 },
{ GL_ARB_robustness,  o(dummy_true),  
GL, 2010 },
+   { GL_ARB_sample_shading,  o(ARB_sample_shading),  
GL, 2009 },
{ GL_ARB_sampler_objects, o(dummy_true),  
GL, 2009 },
{ GL_ARB_seamless_cube_map,   o(ARB_seamless_cube_map),   
GL, 2009 },
{ GL_ARB_shader_bit_encoding, o(ARB_shader_bit_encoding), 
GL, 2010 },
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 6374e8c..67f1bf6 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3198,6 +3198,7 @@ struct gl_extensions
GLboolean ARB_occlusion_query;
GLboolean ARB_occlusion_query2;
GLboolean ARB_point_sprite;
+   GLboolean ARB_sample_shading;
GLboolean ARB_seamless_cube_map;
GLboolean ARB_shader_bit_encoding;
GLboolean ARB_shader_stencil_export;
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 02/12] mesa: Add new functions and enums required by GL_ARB_sample_shading

2013-10-25 Thread Anuj Phogat
New functions added by GL_ARB_sample_shading:
glMinSampleShadingARB()

New enums:
GL_SAMPLE_SHADING_ARB
GL_MIN_SAMPLE_SHADING_VALUE_ARB

V2: Update comments.
Create new GL4x.xml.
Remove redundant code in get.c.
Update the API_XML list in Makefile.am.
Add extra_gl40_ARB_sample_shading predicate to get.c.

Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
Reviewed-by: Ken Graunke kenn...@whitecape.org
---
 src/mapi/glapi/gen/ARB_sample_shading.xml | 19 +++
 src/mapi/glapi/gen/GL4x.xml   | 21 +
 src/mapi/glapi/gen/Makefile.am|  4 +++-
 src/mapi/glapi/gen/gl_API.xml |  3 ++-
 src/mesa/main/enable.c| 16 
 src/mesa/main/get.c   |  8 
 src/mesa/main/get_hash_params.py  |  3 +++
 src/mesa/main/mtypes.h|  2 ++
 src/mesa/main/multisample.c   | 18 ++
 src/mesa/main/multisample.h   |  2 ++
 src/mesa/main/tests/dispatch_sanity.cpp   |  2 +-
 11 files changed, 95 insertions(+), 3 deletions(-)
 create mode 100644 src/mapi/glapi/gen/ARB_sample_shading.xml
 create mode 100644 src/mapi/glapi/gen/GL4x.xml

diff --git a/src/mapi/glapi/gen/ARB_sample_shading.xml 
b/src/mapi/glapi/gen/ARB_sample_shading.xml
new file mode 100644
index 000..a87a517
--- /dev/null
+++ b/src/mapi/glapi/gen/ARB_sample_shading.xml
@@ -0,0 +1,19 @@
+?xml version=1.0?
+!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd
+
+!-- Note: no GLX protocol info yet. --
+
+OpenGLAPI
+
+category name=GL_ARB_sample_shading number=70
+
+   enum name=SAMPLE_SHADING_ARB  value=0x8C36/
+   enum name=MIN_SAMPLE_SHADING_VALUE_ARBvalue=0x8C37/
+
+   function name=MinSampleShadingARB alias=MinSampleShading
+  param name=value type=GLclampf/
+   /function
+
+/category
+
+/OpenGLAPI
diff --git a/src/mapi/glapi/gen/GL4x.xml b/src/mapi/glapi/gen/GL4x.xml
new file mode 100644
index 000..367741f
--- /dev/null
+++ b/src/mapi/glapi/gen/GL4x.xml
@@ -0,0 +1,21 @@
+?xml version=1.0?
+!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd
+
+!-- Note: no GLX protocol info yet. --
+
+OpenGLAPI
+
+category name=4.0
+  enum name=SAMPLE_SHADING  value=0x8C36/
+  enum name=MIN_SAMPLE_SHADING_VALUEvalue=0x8C37/
+
+  function name=MinSampleShading offset=assign
+param name=value type=GLclampf/
+  /function
+/category
+
+category name=4.3
+
+/category
+
+/OpenGLAPI
diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am
index d71d5d2..b8d280c 100644
--- a/src/mapi/glapi/gen/Makefile.am
+++ b/src/mapi/glapi/gen/Makefile.am
@@ -108,6 +108,7 @@ API_XML = \
ARB_invalidate_subdata.xml \
ARB_map_buffer_range.xml \
ARB_robustness.xml \
+   ARB_sample_shading.xml \
ARB_sampler_objects.xml \
ARB_seamless_cube_map.xml \
ARB_sync.xml \
@@ -142,7 +143,8 @@ API_XML = \
NV_primitive_restart.xml \
NV_texture_barrier.xml \
OES_EGL_image.xml \
-   GL3x.xml
+   GL3x.xml \
+   GL4x.xml
 
 
 COMMON = $(API_XML) \
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 48fce36..f0eea9b 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8187,7 +8187,7 @@
 xi:include href=ARB_draw_buffers_blend.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 xi:include href=AMD_draw_buffers_blend.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 
-!-- 70. GL_ARB_sample_shading --
+xi:include href=ARB_sample_shading.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 xi:include href=ARB_texture_cube_map_array.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 xi:include href=ARB_texture_gather.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 !-- 73. GL_ARB_texture_query_lod --
@@ -13150,4 +13150,5 @@
 
 xi:include href=EXT_transform_feedback.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
 
+xi:include href=GL4x.xml xmlns:xi=http://www.w3.org/2001/XInclude/
 /OpenGLAPI
diff --git a/src/mesa/main/enable.c b/src/mesa/main/enable.c
index 5e2fd80..c9ccfd2 100644
--- a/src/mesa/main/enable.c
+++ b/src/mesa/main/enable.c
@@ -802,6 +802,15 @@ _mesa_set_enable(struct gl_context *ctx, GLenum cap, 
GLboolean state)
  ctx-Multisample.SampleCoverageInvert = state;
  break;
 
+  /* GL_ARB_sample_shading */
+  case GL_SAMPLE_SHADING:
+ CHECK_EXTENSION(ARB_sample_shading, cap);
+ if (ctx-Multisample.SampleShading == state)
+return;
+ FLUSH_VERTICES(ctx, _NEW_MULTISAMPLE);
+ ctx-Multisample.SampleShading = state;
+ break;
+
   /* GL_IBM_rasterpos_clip */
   case GL_RASTER_POSITION_UNCLIPPED_IBM:
  if (ctx-API != API_OPENGL_COMPAT)
@@ -1594,6 +1603,13 @@ _mesa_IsEnabled( GLenum cap )
  CHECK_EXTENSION(ARB_texture_multisample);
  return ctx-Multisample.SampleMask;
 
+  /* ARB_sample_shading */
+  case 

[Mesa-dev] [PATCH V2 03/12] mesa: Pass number of samples as a program state variable

2013-10-25 Thread Anuj Phogat
Number of samples will be required in fragment shader program by new
GLSL builtin uniform gl_NumSamples.

V2: Use state.numsamples in place of state.num.samples
Use _NEW_BUFFERS flag in place of _NEW_MULTISAMPLE

Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
Reviewed-by: Ian Romanick i...@freedesktop.org
Reviewed-by: Ken Graunke kenn...@whitecape.org
Reviewed-by: Paul Berry stereotype...@gmail.com
---
 src/mesa/program/prog_statevars.c | 11 +++
 src/mesa/program/prog_statevars.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/src/mesa/program/prog_statevars.c 
b/src/mesa/program/prog_statevars.c
index 145c07c..f6fd535 100644
--- a/src/mesa/program/prog_statevars.c
+++ b/src/mesa/program/prog_statevars.c
@@ -349,6 +349,9 @@ _mesa_fetch_state(struct gl_context *ctx, const 
gl_state_index state[],
  }
   }
   return;
+   case STATE_NUM_SAMPLES:
+  ((int *)value)[0] = ctx-DrawBuffer-Visual.samples;
+  return;
case STATE_DEPTH_RANGE:
   value[0] = ctx-Viewport.Near; /* near   */
   value[1] = ctx-Viewport.Far;  /* far*/
@@ -665,6 +668,9 @@ _mesa_program_state_flags(const gl_state_index 
state[STATE_LENGTH])
case STATE_PROGRAM_MATRIX:
   return _NEW_TRACK_MATRIX;
 
+   case STATE_NUM_SAMPLES:
+  return _NEW_BUFFERS;
+
case STATE_DEPTH_RANGE:
   return _NEW_VIEWPORT;
 
@@ -852,6 +858,9 @@ append_token(char *dst, gl_state_index k)
case STATE_TEXENV_COLOR:
   append(dst, texenv);
   break;
+   case STATE_NUM_SAMPLES:
+  append(dst, numsamples);
+  break;
case STATE_DEPTH_RANGE:
   append(dst, depth.range);
   break;
@@ -1027,6 +1036,8 @@ _mesa_program_state_string(const gl_state_index 
state[STATE_LENGTH])
   break;
case STATE_FOG_COLOR:
   break;
+   case STATE_NUM_SAMPLES:
+  break;
case STATE_DEPTH_RANGE:
   break;
case STATE_FRAGMENT_PROGRAM:
diff --git a/src/mesa/program/prog_statevars.h 
b/src/mesa/program/prog_statevars.h
index ec22b73..c3081c4 100644
--- a/src/mesa/program/prog_statevars.h
+++ b/src/mesa/program/prog_statevars.h
@@ -103,6 +103,8 @@ typedef enum gl_state_index_ {
 
STATE_TEXENV_COLOR,
 
+   STATE_NUM_SAMPLES,
+
STATE_DEPTH_RANGE,
 
STATE_VERTEX_PROGRAM,
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/12] mesa: Add a helper function _mesa_get_min_invocations_per_fragment()

2013-10-25 Thread Anuj Phogat
Thsi function is used to test if we need to do per sample shading or
per fragment shading.

Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
---
 src/mesa/program/program.c | 31 +++
 src/mesa/program/program.h |  3 +++
 2 files changed, 34 insertions(+)

diff --git a/src/mesa/program/program.c b/src/mesa/program/program.c
index 093d372..e12e6ec 100644
--- a/src/mesa/program/program.c
+++ b/src/mesa/program/program.c
@@ -1024,3 +1024,34 @@ _mesa_postprocess_program(struct gl_context *ctx, struct 
gl_program *prog)
 
}
 }
+
+/* Gets the minimum number of shader invocations per fragment.
+ * This function is useful to determine if we need to do per
+ * sample shading or per fragment shading.
+ */
+GLint
+_mesa_get_min_invocations_per_fragment(struct gl_context *ctx,
+   const struct gl_fragment_program *prog)
+{
+   /* From ARB_sample_shading specification:
+* Using gl_SampleID in a fragment shader causes the entire shader
+*  to be evaluated per-sample.
+*
+* Using gl_SamplePosition in a fragment shader causes the entire
+*  shader to be evaluated per-sample.
+*
+* If MULTISAMPLE or SAMPLE_SHADING_ARB is disabled, sample shading
+*  has no effect.
+*/
+   if (ctx-Multisample.Enabled) {
+  if (prog-Base.SystemValuesRead  SYSTEM_BIT_SAMPLE_ID ||
+  prog-Base.SystemValuesRead  SYSTEM_BIT_SAMPLE_POS)
+ return ctx-DrawBuffer-Visual.samples;
+  else if (ctx-Multisample.SampleShading)
+ return ceil(ctx-Multisample.MinSampleShadingValue *
+ ctx-DrawBuffer-Visual.samples);
+  else
+ return 1;
+   }
+   return 1;
+}
diff --git a/src/mesa/program/program.h b/src/mesa/program/program.h
index 34965ab..353ccab 100644
--- a/src/mesa/program/program.h
+++ b/src/mesa/program/program.h
@@ -187,6 +187,9 @@ _mesa_valid_register_index(const struct gl_context *ctx,
 extern void
 _mesa_postprocess_program(struct gl_context *ctx, struct gl_program *prog);
 
+extern GLint
+_mesa_get_min_invocations_per_fragment(struct gl_context *ctx,
+   const struct gl_fragment_program *prog);
 
 static inline GLuint
 _mesa_program_target_to_index(GLenum v)
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 05/12] glsl: Add new builtins required by GL_ARB_sample_shading

2013-10-25 Thread Anuj Phogat
New builtins added by GL_ARB_sample_shading:
in vec2 gl_SamplePosition
in int gl_SampleID
in int gl_NumSamples
out int gl_SampleMask[]

V2: - Use SWIZZLE_ for STATE_NUM_SAMPLES.
- Use result.samplemask in arb_output_attrib_string.
- Add comment to explain the size of gl_SampleMask[] array.
- Make gl_SampleID and gl_SamplePosition system values.
Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
Reviewed-by: Paul Berry stereotype...@gmail.com
---
 src/glsl/builtin_variables.cpp | 18 ++
 src/mesa/main/mtypes.h | 10 +-
 src/mesa/program/prog_print.c  |  1 +
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp
index 64f3406..bb3d057 100644
--- a/src/glsl/builtin_variables.cpp
+++ b/src/glsl/builtin_variables.cpp
@@ -30,6 +30,9 @@
 #include program/prog_statevars.h
 #include program/prog_instruction.h
 
+static struct gl_builtin_uniform_element gl_NumSamples_elements[] = {
+   {NULL, {STATE_NUM_SAMPLES, 0, 0}, SWIZZLE_}
+};
 
 static struct gl_builtin_uniform_element gl_DepthRange_elements[] = {
{near, {STATE_DEPTH_RANGE, 0, 0}, SWIZZLE_},
@@ -236,6 +239,7 @@ static struct gl_builtin_uniform_element 
gl_NormalMatrix_elements[] = {
 #define STATEVAR(name) {#name, name ## _elements, Elements(name ## _elements)}
 
 static const struct gl_builtin_uniform_desc _mesa_builtin_uniform_desc[] = {
+   STATEVAR(gl_NumSamples),
STATEVAR(gl_DepthRange),
STATEVAR(gl_ClipPlane),
STATEVAR(gl_Point),
@@ -645,6 +649,7 @@ builtin_variable_generator::generate_constants()
 void
 builtin_variable_generator::generate_uniforms()
 {
+   add_uniform(int_t, gl_NumSamples);
add_uniform(type(gl_DepthRangeParameters), gl_DepthRange);
add_uniform(array(vec4_t, VERT_ATTRIB_MAX), gl_CurrentAttribVertMESA);
add_uniform(array(vec4_t, VARYING_SLOT_MAX), gl_CurrentAttribFragMESA);
@@ -821,6 +826,19 @@ builtin_variable_generator::generate_fs_special_vars()
   if (state-AMD_shader_stencil_export_warn)
  var-warn_extension = GL_AMD_shader_stencil_export;
}
+
+   if (state-ARB_sample_shading_enable) {
+  add_system_value(SYSTEM_VALUE_SAMPLE_ID, int_t, gl_SampleID);
+  add_system_value(SYSTEM_VALUE_SAMPLE_POS, vec2_t, gl_SamplePosition);
+  /* From the ARB_sample_shading specification:
+   * The number of elements in the array is ceil(s/32), where s
+   *  is the maximum number of color samples supported by the
+   *  implementation.
+   *  Since no drivers expose more than 32x MSAA, we can simply set
+   *  the array size to 1 rather than computing it.
+   */
+  add_output(FRAG_RESULT_SAMPLE_MASK, array(int_t, 1), gl_SampleMask);
+   }
 }
 
 
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 8306969..869470e 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -274,6 +274,11 @@ typedef enum
 #define VARYING_BIT_VAR(V) BITFIELD64_BIT(VARYING_SLOT_VAR0 + (V))
 /*@}*/
 
+/**
+ * Bitflags for system values.
+ */
+#define SYSTEM_BIT_SAMPLE_ID BITFIELD64_BIT(SYSTEM_VALUE_SAMPLE_ID)
+#define SYSTEM_BIT_SAMPLE_POS BITFIELD64_BIT(SYSTEM_VALUE_SAMPLE_POS)
 
 /**
  * Determine if the given gl_varying_slot appears in the fragment shader.
@@ -306,12 +311,13 @@ typedef enum
 * register is written.  No FRAG_RESULT_DATAn will be written.
 */
FRAG_RESULT_COLOR = 2,
+   FRAG_RESULT_SAMPLE_MASK = 3,
 
/* FRAG_RESULT_DATAn are the per-render-target (GLSL gl_FragData[n]
 * or ARB_fragment_program fragment.color[n]) color results.  If
 * any are written, FRAG_RESULT_COLOR will not be written.
 */
-   FRAG_RESULT_DATA0 = 3,
+   FRAG_RESULT_DATA0 = 4,
FRAG_RESULT_MAX = (FRAG_RESULT_DATA0 + MAX_DRAW_BUFFERS)
 } gl_frag_result;
 
@@ -1904,6 +1910,8 @@ typedef enum
SYSTEM_VALUE_FRONT_FACE,  /** Fragment shader only (not done yet) */
SYSTEM_VALUE_VERTEX_ID,   /** Vertex shader only */
SYSTEM_VALUE_INSTANCE_ID, /** Vertex shader only */
+   SYSTEM_VALUE_SAMPLE_ID, /** Fragment shader only */
+   SYSTEM_VALUE_SAMPLE_POS, /** Fragment shader only */
SYSTEM_VALUE_MAX  /** Number of values */
 } gl_system_value;
 
diff --git a/src/mesa/program/prog_print.c b/src/mesa/program/prog_print.c
index cf85213..fa9063f 100644
--- a/src/mesa/program/prog_print.c
+++ b/src/mesa/program/prog_print.c
@@ -311,6 +311,7 @@ arb_output_attrib_string(GLint index, GLenum progType)
   result.depth, /* FRAG_RESULT_DEPTH */
   result.(one), /* FRAG_RESULT_STENCIL */
   result.color, /* FRAG_RESULT_COLOR */
+  result.samplemask, /* FRAG_RESULT_SAMPLE_MASK */
   result.color[0], /* FRAG_RESULT_DATA0 (named for GLSL's gl_FragData) */
   result.color[1],
   result.color[2],
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/12] i965: Don't do vector splitting for ir_var_system_value

2013-10-25 Thread Anuj Phogat
This is required while adding builtin system value vec{2, 3, 4}
variables. For example:
(declare (sys) vec2 gl_SamplePosition)

Without this patch above glsl ir splits in to:
(declare (temporary) float gl_SamplePosition_x)
(declare (temporary) float gl_SamplePosition_y)

Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
---
 src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
index eb7851b..6284b59 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp
@@ -111,6 +111,7 @@ ir_vector_reference_visitor::get_variable_entry(ir_variable 
*var)
case ir_var_uniform:
case ir_var_shader_in:
case ir_var_shader_out:
+   case ir_var_system_value:
case ir_var_function_in:
case ir_var_function_out:
case ir_var_function_inout:
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 07/12] i965: Add FS backend for builtin gl_SamplePosition

2013-10-25 Thread Anuj Phogat
V2:
   - Update comments.
   - Make changes to support simd16 mode.
   - Add compute_pos_offset variable in brw_wm_prog_key.
   - Add variable uses_omask in brw_wm_prog_data.

Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
---
 src/mesa/drivers/dri/i965/brw_context.h  |  1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp | 65 
 src/mesa/drivers/dri/i965/brw_fs.h   |  2 +
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  5 +++
 src/mesa/drivers/dri/i965/brw_wm.c   |  6 +++
 src/mesa/drivers/dri/i965/brw_wm.h   |  2 +
 6 files changed, 81 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 3b95922..d16f257 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -380,6 +380,7 @@ struct brw_wm_prog_data {
GLuint nr_params;   /** number of float params/constants */
GLuint nr_pull_params;
bool dual_src_blend;
+   bool uses_pos_offset;
uint32_t prog_offset_16;
 
/**
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 65a4b66..0f8213e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1118,6 +1118,64 @@ fs_visitor::emit_frontfacing_interpolation(ir_variable 
*ir)
return reg;
 }
 
+void
+fs_visitor::compute_sample_position(fs_reg dst, fs_reg int_sample_pos)
+{
+   assert(dst.type == BRW_REGISTER_TYPE_F);
+
+   if (c-key.compute_pos_offset) {
+  /* Convert int_sample_pos to floating point */
+  emit(MOV(dst, int_sample_pos));
+  /* Scale to the range [0, 1] */
+  emit(MUL(dst, dst, fs_reg(1 / 16.0f)));
+   }
+   else {
+  /* From ARB_sample_shading specification:
+   * When rendering to a non-multisample buffer, or if multisample
+   *  rasterization is disabled, gl_SamplePosition will always be
+   *  (0.5, 0.5).
+   */
+  emit(MOV(dst, fs_reg(0.5f)));
+   }
+}
+
+fs_reg *
+fs_visitor::emit_samplepos_setup(ir_variable *ir)
+{
+   assert(brw-gen = 6);
+   assert(ir-type == glsl_type::vec2_type);
+
+   this-current_annotation = compute sample position;
+   fs_reg *reg = new(this-mem_ctx) fs_reg(this, ir-type);
+   fs_reg pos = *reg;
+   fs_reg int_sample_x = fs_reg(this, glsl_type::int_type);
+   fs_reg int_sample_y = fs_reg(this, glsl_type::int_type);
+
+   /* WM will be run in MSDISPMODE_PERSAMPLE. So, only one of SIMD8 or SIMD16
+* mode will be enabled.
+*
+* From the Ivy Bridge PRM, volume 2 part 1, page 344:
+* R31.1:0 Position Offset X/Y for Slot[3:0]
+* R31.3:2 Position Offset X/Y for Slot[7:4]
+* .
+*
+* The X, Y sample positions come in as bytes in  thread payload. So, read
+* the positions using vstride=16, width=8, hstride=2.
+*/
+   struct brw_reg sample_pos_reg =
+  stride(retype(brw_vec1_grf(c-sample_pos_reg, 0),
+BRW_REGISTER_TYPE_B), 16, 8, 2);
+
+   emit(MOV(int_sample_x, fs_reg(sample_pos_reg)));
+   /* Compute gl_SamplePosition.x */
+   compute_sample_position(pos, int_sample_x);
+   pos.reg_offset += dispatch_width / 8;
+   emit(MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1;
+   /* Compute gl_SamplePosition.y */
+   compute_sample_position(pos, int_sample_y);
+   return reg;
+}
+
 fs_reg
 fs_visitor::fix_math_operand(fs_reg src)
 {
@@ -2985,7 +3043,14 @@ fs_visitor::setup_payload_gen6()
  c-nr_payload_regs++;
   }
}
+
+   c-prog_data.uses_pos_offset = c-key.compute_pos_offset;
/* R31: MSAA position offsets. */
+   if (c-prog_data.uses_pos_offset) {
+  c-sample_pos_reg = c-nr_payload_regs;
+  c-nr_payload_regs++;
+   }
+
/* R32-: bary for 32-pixel. */
/* R58-59: interp W for 32-pixel. */
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index b5aed23..db5df39 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -333,9 +333,11 @@ public:
  glsl_interp_qualifier interpolation_mode,
  bool is_centroid);
fs_reg *emit_frontfacing_interpolation(ir_variable *ir);
+   fs_reg *emit_samplepos_setup(ir_variable *ir);
fs_reg *emit_general_interpolation(ir_variable *ir);
void emit_interpolation_setup_gen4();
void emit_interpolation_setup_gen6();
+   void compute_sample_position(fs_reg dst, fs_reg int_sample_pos);
fs_reg rescale_texcoord(ir_texture *ir, fs_reg coordinate,
bool is_rect, int sampler, int texunit);
fs_inst *emit_texture_gen4(ir_texture *ir, fs_reg dst, fs_reg coordinate,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 9f37013..51972fe 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -125,6 +125,11 @@ fs_visitor::visit(ir_variable *ir)
 
   reg = 

[Mesa-dev] [PATCH V2 08/12] i965: Add FS backend for builtin gl_SampleID

2013-10-25 Thread Anuj Phogat
V2:
   - Update comments
   - Make changes to support simd16 mode.
   - Add compute_sample_id variables in brw_wm_prog_key
   - Add a special backend instruction to compute sample_id.

Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
---
 src/mesa/drivers/dri/i965/brw_defines.h|  1 +
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 49 ++
 src/mesa/drivers/dri/i965/brw_fs.h |  7 
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27 ++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  2 ++
 src/mesa/drivers/dri/i965/brw_wm.c |  6 
 src/mesa/drivers/dri/i965/brw_wm.h |  1 +
 7 files changed, 93 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 5ba9d45..f3c994b 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -788,6 +788,7 @@ enum opcode {
FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
FS_OPCODE_MOV_DISPATCH_TO_FLAGS,
FS_OPCODE_DISCARD_JUMP,
+   FS_OPCODE_SET_SAMPLE_ID,
FS_OPCODE_SET_SIMD4X2_OFFSET,
FS_OPCODE_PACK_HALF_2x16_SPLIT,
FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X,
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 0f8213e..5773c6f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1176,6 +1176,55 @@ fs_visitor::emit_samplepos_setup(ir_variable *ir)
return reg;
 }
 
+fs_reg *
+fs_visitor::emit_sampleid_setup(ir_variable *ir)
+{
+   assert(brw-gen = 6);
+
+   this-current_annotation = compute sample id;
+   fs_reg *reg = new(this-mem_ctx) fs_reg(this, ir-type);
+
+   if (c-key.compute_sample_id) {
+  fs_reg t1 = fs_reg(this, glsl_type::int_type);
+  fs_reg t2 = fs_reg(this, glsl_type::int_type);
+  t2.type = BRW_REGISTER_TYPE_UW;
+
+  /* The PS will be run in MSDISPMODE_PERSAMPLE. For example with
+   * 8x multisampling, subspan 0 will represent sample N (where N
+   * is 0, 2, 4 or 6), subspan 1 will represent sample 1, 3, 5 or
+   * 7. We can find the value of N by looking at R0.0 bits 7:6
+   * (Starting Sample Pair Index (SSPI)) and multiplying by two
+   * (since samples are always delivered in pairs). That is, we
+   * compute 2*((R0.0  0xc0)  6) == (R0.0  0xc0)  5. Then
+   * we need to add N to the sequence (0, 0, 0, 0, 1, 1, 1, 1) in
+   * case of SIMD8 and sequence (0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2,
+   * 2, 3, 3, 3, 3) in case of SIMD16. We compute this sequence by
+   * populating a temporary variable with the sequence (0, 1, 2, 3),
+   * and then reading from it using vstride=1, width=4, hstride=0.
+   * These computations hold good for 4x multisampling as well.
+   */
+  emit(BRW_OPCODE_AND, t1,
+   fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
+   fs_reg(brw_imm_d(0xc0)));
+  emit(BRW_OPCODE_SHR, t1, t1, fs_reg(5));
+  /* This works for both SIMD8 and SIMD16 */
+  emit(MOV(t2, brw_imm_v(0x3210)));
+  /* This special instruction takes care of setting vstride=1,
+   * width=4, hstride=0 of t2 during an ADD instruction.
+   */
+  emit(FS_OPCODE_SET_SAMPLE_ID, *reg, t1, t2);
+   }
+   else {
+  /* As per GL_ARB_sample_shading specification:
+   * When rendering to a non-multisample buffer, or if multisample
+   *  rasterization is disabled, gl_SampleID will always be zero.
+   */
+  emit(BRW_OPCODE_MOV, *reg, fs_reg(0));
+   }
+
+   return reg;
+}
+
 fs_reg
 fs_visitor::fix_math_operand(fs_reg src)
 {
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index db5df39..8a1a414 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -334,6 +334,7 @@ public:
  bool is_centroid);
fs_reg *emit_frontfacing_interpolation(ir_variable *ir);
fs_reg *emit_samplepos_setup(ir_variable *ir);
+   fs_reg *emit_sampleid_setup(ir_variable *ir);
fs_reg *emit_general_interpolation(ir_variable *ir);
void emit_interpolation_setup_gen4();
void emit_interpolation_setup_gen6();
@@ -538,6 +539,12 @@ private:
  struct brw_reg index,
  struct brw_reg offset);
void generate_mov_dispatch_to_flags(fs_inst *inst);
+
+   void generate_set_sample_id(fs_inst *inst,
+   struct brw_reg dst,
+   struct brw_reg src0,
+   struct brw_reg src1);
+
void generate_set_simd4x2_offset(fs_inst *inst,
 struct brw_reg dst,
 struct brw_reg offset);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index fa15f7b..4f15ed7 100644
--- 

[Mesa-dev] [PATCH V2 09/12] i965: Add FS backend for builtin gl_SampleMask[]

2013-10-25 Thread Anuj Phogat
V2:
   - Update comments
   - Use fs_reg(0x) in AND instruction to get the 16 bit
 sample_mask.
   - Add a special backend instructions to compute sample_mask.
   - Add a new variable uses_omask in brw_wm_prog_data.

Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
---
 src/mesa/drivers/dri/i965/brw_context.h|  1 +
 src/mesa/drivers/dri/i965/brw_defines.h|  1 +
 src/mesa/drivers/dri/i965/brw_fs.h |  5 +
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 23 +++
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 15 +++
 5 files changed, 45 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index d16f257..d623368 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -381,6 +381,7 @@ struct brw_wm_prog_data {
GLuint nr_pull_params;
bool dual_src_blend;
bool uses_pos_offset;
+   bool uses_omask;
uint32_t prog_offset_16;
 
/**
diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index f3c994b..f9556a5 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -788,6 +788,7 @@ enum opcode {
FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
FS_OPCODE_MOV_DISPATCH_TO_FLAGS,
FS_OPCODE_DISCARD_JUMP,
+   FS_OPCODE_SET_OMASK,
FS_OPCODE_SET_SAMPLE_ID,
FS_OPCODE_SET_SIMD4X2_OFFSET,
FS_OPCODE_PACK_HALF_2x16_SPLIT,
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 8a1a414..c9bcc4e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -436,6 +436,7 @@ public:
 
struct hash_table *variable_ht;
fs_reg frag_depth;
+   fs_reg sample_mask;
fs_reg outputs[BRW_MAX_DRAW_BUFFERS];
unsigned output_components[BRW_MAX_DRAW_BUFFERS];
fs_reg dual_src_output;
@@ -540,6 +541,10 @@ private:
  struct brw_reg offset);
void generate_mov_dispatch_to_flags(fs_inst *inst);
 
+   void generate_set_omask(fs_inst *inst,
+   struct brw_reg dst,
+   struct brw_reg sample_mask);
+
void generate_set_sample_id(fs_inst *inst,
struct brw_reg dst,
struct brw_reg src0,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 4f15ed7..fc8e0bd 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -1024,6 +1024,25 @@ fs_generator::generate_set_simd4x2_offset(fs_inst *inst,
brw_pop_insn_state(p);
 }
 
+/* Sets vstride=16, width=8, hstride=2 of register mask before
+ * moving to register dst.
+ */
+void
+fs_generator::generate_set_omask(fs_inst *inst,
+ struct brw_reg dst,
+ struct brw_reg mask)
+{
+   assert(dst.type == BRW_REGISTER_TYPE_UW);
+   if (dispatch_width == 16)
+  dst = vec16(dst);
+   brw_push_insn_state(p);
+   brw_set_compression_control(p, BRW_COMPRESSION_NONE);
+   brw_set_mask_control(p, BRW_MASK_DISABLE);
+   brw_MOV(p, dst, stride(retype(brw_vec1_reg(mask.file, mask.nr, 0),
+ dst.type), 16, 8, 2));
+   brw_pop_insn_state(p);
+}
+
 /* Sets vstride=1, width=4, hstride=0 of register src1 during
  * the ADD instruction.
  */
@@ -1576,6 +1595,10 @@ fs_generator::generate_code(exec_list *instructions)
  generate_set_simd4x2_offset(inst, dst, src[0]);
  break;
 
+  case FS_OPCODE_SET_OMASK:
+ generate_set_omask(inst, dst, src[0]);
+ break;
+
   case FS_OPCODE_SET_SAMPLE_ID:
  generate_set_sample_id(inst, dst, src[0], src[1]);
  break;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 7a6a0b5..b9eb5b8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -82,6 +82,8 @@ fs_visitor::visit(ir_variable *ir)
 }
   } else if (ir-location == FRAG_RESULT_DEPTH) {
 this-frag_depth = *reg;
+  } else if (ir-location == FRAG_RESULT_SAMPLE_MASK) {
+ this-sample_mask = *reg;
   } else {
 /* gl_FragData or a user-defined FS output */
 assert(ir-location = FRAG_RESULT_DATA0 
@@ -2510,6 +2512,19 @@ fs_visitor::emit_fb_writes()
   pop_force_uncompressed();
}
 
+   c-prog_data.uses_omask =
+  fp-Base.OutputsWritten  BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK);
+   if(c-prog_data.uses_omask) {
+  this-current_annotation = FB write oMask;
+  assert(this-sample_mask.file != BAD_FILE);
+  fs_reg reg = fs_reg(this, glsl_type::int_type);
+
+  /* Hand over gl_SampleMask. Only lower 16 bits are relevant. */
+  emit(AND(reg, this-sample_mask, 

[Mesa-dev] [PATCH V2 10/12] i965/gen6: Enable the features required for GL_ARB_sample_shading

2013-10-25 Thread Anuj Phogat
- Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE,
  GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
- Only enable one of GEN6_WM_8_DISPATCH_ENABLE or GEN6_WM_16_DISPATCH_ENABLE
  when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled.
  Refer SNB PRM Vol. 2, Part 1, Page 279 for details.

V2: - Add a comment explaining why only SIMD8 mode is enabled with
  MSDISPMODE_PERSAMPLE.
- Use shared function _mesa_get_min_invocations_per_fragment().
- Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.

Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
---
 src/mesa/drivers/dri/i965/gen6_wm_state.c | 52 +--
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c 
b/src/mesa/drivers/dri/i965/gen6_wm_state.c
index e3395ce..25ecc11 100644
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
@@ -30,6 +30,7 @@
 #include brw_defines.h
 #include brw_util.h
 #include brw_wm.h
+#include program/program.h
 #include program/prog_parameter.h
 #include program/prog_statevars.h
 #include intel_batchbuffer.h
@@ -153,8 +154,20 @@ upload_wm_state(struct brw_context *brw)
dw5 |= (brw-max_wm_threads - 1)  GEN6_WM_MAX_THREADS_SHIFT;
 
/* CACHE_NEW_WM_PROG */
+
+   /* In case of non 1x (i.e 4x, 8x) multisampling with MSDISPMODE_PERSAMPLE,
+* only one of SIMD8 and SIMD16 should be enabled. So, we have two options
+* in above mentioned case:
+* 'SIMD8 only' dispatch:   allowed on gen6.
+* 'SIMD16 only' dispatch:  not allowed on gen6.
+*
+* So, we enable 'SIMD8 only' dispatch in above case.
+*/
dw5 |= GEN6_WM_8_DISPATCH_ENABLE;
-   if (brw-wm.prog_data-prog_offset_16)
+
+ if (brw-wm.prog_data-prog_offset_16 
+ !(_mesa_get_min_invocations_per_fragment(ctx,
+  brw-fragment_program)  1))
   dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
 
/* CACHE_NEW_WM_PROG | _NEW_COLOR */
@@ -183,7 +196,8 @@ upload_wm_state(struct brw_context *brw)
 
/* _NEW_COLOR, _NEW_MULTISAMPLE */
if (fp-program.UsesKill || ctx-Color.AlphaEnabled ||
-   ctx-Multisample.SampleAlphaToCoverage)
+   ctx-Multisample.SampleAlphaToCoverage ||
+   brw-wm.prog_data-uses_omask)
   dw5 |= GEN6_WM_KILL_ENABLE;
 
if (brw_color_buffer_write_enabled(brw) ||
@@ -191,6 +205,16 @@ upload_wm_state(struct brw_context *brw)
   dw5 |= GEN6_WM_DISPATCH_ENABLE;
}
 
+   /* From the SNB PRM, volume 2 part 1, page 278:
+* This bit is inserted in the PS payload header and made available to
+* the DataPort (either via the message header or via header bypass) to
+* indicate that oMask data (one or two phases) is included in Render
+* Target Write messages. If present, the oMask data is used to mask off
+* samples.
+*/
+if(brw-wm.prog_data-uses_omask)
+  dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET;
+
/* CACHE_NEW_WM_PROG */
dw6 |= brw-wm.prog_data-num_varying_inputs 
   GEN6_WM_NUM_SF_OUTPUTS_SHIFT;
@@ -200,12 +224,34 @@ upload_wm_state(struct brw_context *brw)
  dw6 |= GEN6_WM_MSRAST_ON_PATTERN;
   else
  dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
-  dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
+
+  if (_mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program)  
1)
+ dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
+  else
+ dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
} else {
   dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
   dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
}
 
+   /* _NEW_BUFFERS, _NEW_MULTISAMPLE */
+   /* From the SNB PRM, volume 2 part 1, page 281:
+* If the PS kernel does not need the Position XY Offsets
+* to compute a Position XY value, then this field should be
+* programmed to POSOFFSET_NONE.
+*
+* SW Recommendation: If the PS kernel needs the Position Offsets
+* to compute a Position XY value, this field should match Position
+* ZW Interpolation Mode to ensure a consistent position.xyzw
+* computation.
+* We only require XY sample offsets. So, this recommendation doesn't
+* look useful at the moment. We might need this in future.
+*/
+   if (brw-wm.prog_data-uses_pos_offset)
+  dw6 |= GEN6_WM_POSOFFSET_SAMPLE;
+   else
+  dw6 |= GEN6_WM_POSOFFSET_NONE;
+
BEGIN_BATCH(9);
OUT_BATCH(_3DSTATE_WM  16 | (9 - 2));
OUT_BATCH(brw-wm.base.prog_offset);
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2 11/12] i965/gen7: Enable the features required for GL_ARB_sample_shading

2013-10-25 Thread Anuj Phogat
- Enable GEN7_WM_MSDISPMODE_PERSAMPLE, GEN7_WM_POSOFFSET_SAMPLE,
  GEN7_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
- Only enable one of GEN7_WM_8_DISPATCH_ENABLE or GEN7_WM_16_DISPATCH_ENABLE
  when GEN7_WM_MSDISPMODE_PERSAMPLE is enabled. Refer IVB PRM Vol. 2, Part 1,
  Page 288 for details.

V2: - Add a comment explaining why only SIMD8 mode is enabled with
  MSDISPMODE_PERSAMPLE.
- Use shared function _mesa_get_min_invocations_per_fragment().
- Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.
Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
---
 src/mesa/drivers/dri/i965/gen7_wm_state.c | 53 +--
 1 file changed, 50 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_state.c
index a2046c3..493b107 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
@@ -27,6 +27,7 @@
 #include brw_defines.h
 #include brw_util.h
 #include brw_wm.h
+#include program/program.h
 #include program/prog_parameter.h
 #include program/prog_statevars.h
 #include intel_batchbuffer.h
@@ -82,9 +83,13 @@ upload_wm_state(struct brw_context *brw)
   GEN7_WM_BARYCENTRIC_INTERPOLATION_MODE_SHIFT;
 
/* _NEW_COLOR, _NEW_MULTISAMPLE */
+   /* Enable if the pixel shader kernel generates and outputs oMask.
+*/
if (fp-program.UsesKill || ctx-Color.AlphaEnabled ||
-   ctx-Multisample.SampleAlphaToCoverage)
+   ctx-Multisample.SampleAlphaToCoverage ||
+   brw-wm.prog_data-uses_omask) {
   dw1 |= GEN7_WM_KILL_ENABLE;
+   }
 
/* _NEW_BUFFERS */
if (brw_color_buffer_write_enabled(brw) || writes_depth ||
@@ -97,7 +102,11 @@ upload_wm_state(struct brw_context *brw)
  dw1 |= GEN7_WM_MSRAST_ON_PATTERN;
   else
  dw1 |= GEN7_WM_MSRAST_OFF_PIXEL;
-  dw2 |= GEN7_WM_MSDISPMODE_PERPIXEL;
+
+  if (_mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program)  
1)
+ dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE;
+  else
+ dw2 |= GEN7_WM_MSDISPMODE_PERPIXEL;
} else {
   dw1 |= GEN7_WM_MSRAST_OFF_PIXEL;
   dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE;
@@ -169,6 +178,32 @@ upload_ps_state(struct brw_context *brw)
if (brw-wm.prog_data-nr_params  0)
   dw4 |= GEN7_PS_PUSH_CONSTANT_ENABLE;
 
+   /* From the IVB PRM, volume 2 part 1, page 287:
+* This bit is inserted in the PS payload header and made available to
+* the DataPort (either via the message header or via header bypass) to
+* indicate that oMask data (one or two phases) is included in Render
+* Target Write messages. If present, the oMask data is used to mask off
+* samples.
+*/
+   if (brw-wm.prog_data-uses_omask)
+  dw4 |= GEN7_PS_OMASK_TO_RENDER_TARGET;
+
+   /* From the IVB PRM, volume 2 part 1, page 287:
+* If the PS kernel does not need the Position XY Offsets to
+* compute a Position Value, then this field should be programmed
+* to POSOFFSET_NONE.
+* SW Recommendation: If the PS kernel needs the Position Offsets
+* to compute a Position XY value, this field should match Position
+* ZW Interpolation Mode to ensure a consistent position.xyzw
+* computation.
+* We only require XY sample offsets. So, this recommendation doesn't
+* look useful at the moment. We might need this in future.
+*/
+   if (brw-wm.prog_data-uses_pos_offset)
+  dw4 |= GEN7_PS_POSOFFSET_SAMPLE;
+   else
+  dw4 |= GEN7_PS_POSOFFSET_NONE;
+
/* CACHE_NEW_WM_PROG | _NEW_COLOR
 *
 * The hardware wedges if you have this bit set but don't turn on any dual
@@ -184,8 +219,20 @@ upload_ps_state(struct brw_context *brw)
if (brw-wm.prog_data-num_varying_inputs != 0)
   dw4 |= GEN7_PS_ATTRIBUTE_ENABLE;
 
+   /* In case of non 1x (i.e 4x, 8x) multisampling with MSDISPMODE_PERSAMPLE,
+* only one of SIMD8 and SIMD16 should be enabled. So, we have two options
+* in that case:
+* 'SIMD8 only' dispatch:   allowed on gen7.
+* 'SIMD16 only' dispatch:  allowed on gen7 except when in PERSAMPLE mode
+*  with number of multisamples = 8.
+* TODO: Currently we enable 'SIMD8 only' dispatch in above mentioned case.
+*   Make changes to allow 'SIMD16 only' dispatch for multisamples  8.
+*/
dw4 |= GEN7_PS_8_DISPATCH_ENABLE;
-   if (brw-wm.prog_data-prog_offset_16)
+
+   if (brw-wm.prog_data-prog_offset_16 
+   !(_mesa_get_min_invocations_per_fragment(ctx,
+brw-fragment_program )  1))
   dw4 |= GEN7_PS_16_DISPATCH_ENABLE;
 
dw5 |= (brw-wm.prog_data-first_curbe_grf 
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/12] i965: Enable ARB_sample_shading on intel hardware = gen6

2013-10-25 Thread Anuj Phogat
Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
Reviewed-by: Paul Berry stereotype...@gmail.com
Reviewed-by: Ken Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 803d090..88201bd 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -150,6 +150,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx-Extensions.OES_depth_texture_cube_map = true;
   ctx-Extensions.ARB_shading_language_packing = true;
   ctx-Extensions.ARB_texture_multisample = true;
+  ctx-Extensions.ARB_sample_shading = true;
 
   /* Test if the kernel has the ioctl. */
   if (drm_intel_reg_read(brw-bufmgr, TIMESTAMP, dummy) == 0)
-- 
1.8.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.

2013-10-25 Thread Kenneth Graunke
On 10/25/2013 03:26 PM, Marek Olšák wrote:
[snip]
 At least for Ivybridge, I think I want this software path 100% of the
 time.  We may want to remove the stall on Haswell as a later optimization.
 
 I'd like to have a dedicated flag for this fallback like we have
 Const.PrimitiveRestartInSoftware, in case we need to implement the
 query for something else.

Sure, that seems reasonable.  I'll send out a proposed patch and CC you.

 It sounds like for Gallium, you already have a decent GPU-only solution.
  I tried to follow that code to understand how it works, and got lost
 after jumping through around 5 files...which is probably just my poor
 understanding of the Gallium architecture.
 
 Gallium doesn't do anything, the interface is pretty much the same as
 the vbo one.
 
 On the hardware side, there are 4 counters containing the number of
 bytes written to each TFB buffer. If TFB is started, the counters are
 set to 0. Everytime TFB is ended or paused, the counters are stored
 for each buffer in memory. When resuming TFB, the counters are simply
 loaded from memory.
 
 When we have to do DrawTransformFeedback, we copy the value of the
 counter from memory to a special draw register. Since the value is in
 bytes, we also have to set the TFB buffer stride to another special
 draw register. That's all. The hardware then calculates count =
 bytes/stride before drawing.

Oh, interesting!  I would have expected it to count in vertices, but
bytes - that's pretty clever.  If the units were the same on i965, I
would've definitely done it that way...it makes a lot of sense.

[snip]
 I hadn't thought about non-VBO vertex uploads.  What does Gallium do in
 that case?  Has it just been broken this whole time?
 
 Yes, it has, I completely forgot about it. :(
 

 I guess I figured drivers would either implement this hook, or do the
 tfb_vertcount approach, but not both.  Maybe that's a bad assumption.
 
 For vertex uploads and vertex fetch fallbacks (where we translate and
 align vertex buffers to what a gallium driver supports -
 util/u_vbuf.c), we can use a query like the one you want to add.
 However, gallium drivers should use the tfb_vertcount approach (AKA
 pipe_draw_info::count_from_stream_output) whenever they see it's not
 NULL. Since most Gallium hardware drivers will never see non-VBO
 vertex data or an unsupported vertex format, it's the only approach
 they have to implement.
 
 Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/4] GL_OES_get_program_binary extension

2013-10-25 Thread Matt Turner
On Thu, Oct 24, 2013 at 1:28 AM, Tapani Pälli tapani.pa...@intel.com wrote:
 Hello;

 These patches introduce GL_OES_get_program_binary extension support for Mesa.
 There are already stub functions for this extension, patches add the missing
 functionality part. This is based on the 'more automatic' shader cache work
 I've been implementing. I wanted to implement this first as this is a standard
 for applications to use and the automatic cache can be built separately based
 on these same enablers.

 As well as code review I would also appreciate any testing efforts with this.
 I've tested this with my own test apps but as you can imagine the coverage
 ain't that big. I'm also thinking of building piglit test cases to exercise
 cache shader but that is still on planning stage.

Are the implementations for serializing and unserializing cache
shaders mostly shared between the automatic shader cache and this
extension's implementation?

I worry that the only way to get sufficient testing of that is via the
automatic shader cache, and that only once it's stable can this
extension proceed.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] i965/fs: Drop no-op shifts involving 0.

2013-10-25 Thread Matt Turner
On Fri, Oct 25, 2013 at 2:49 PM, Eric Anholt e...@anholt.net wrote:
 I noticed this in a shader in Unigine Heaven that was spilling.  While it
 doesn't really reduce register pressure, it shaves a few instructions
 anyway (7955 - 7882).

 v2: Fix turning 0  x into x instead of 0 (caught by Erik
 Faye-Lund).
 ---
  src/glsl/opt_algebraic.cpp | 10 ++
  1 file changed, 10 insertions(+)

 diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp
 index 2e33dfe..a07e153 100644
 --- a/src/glsl/opt_algebraic.cpp
 +++ b/src/glsl/opt_algebraic.cpp
 @@ -346,6 +346,16 @@ ir_algebraic_visitor::handle_expression(ir_expression 
 *ir)
}
break;

 +   case ir_binop_rshift:
 +   case ir_binop_lshift:
 +  /* 0  x == 0 */
 +  if (is_vec_zero(op_const[0]))
 + return ir-operands[0];

Any value to writing ir_constant::zero(ir, ir-type) here instead?

Either way, this series is

Reviewed-by: Matt Turner matts...@gmail.com

... for whatever that's worth these days. :)

I do think these clean ups make the code a lot clearer. Nice.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] glsl: Move common code out of opt_algebraic's handle_expression().

2013-10-25 Thread Kenneth Graunke
On 10/25/2013 02:49 PM, Eric Anholt wrote:
 Matt and I had each screwed up these common required patterns recently, in
 ways that wouldn't have been noticed for a long time if not for code
 review.  Just enforce it in the caller so that we don't rely on code
 review catching these bugs.
 ---
  src/glsl/opt_algebraic.cpp | 117 
 +++--
  1 file changed, 39 insertions(+), 78 deletions(-)

Yes, thank you!  These first two patches are great.  Much less likely to
botch things after the first patch, and I've been meaning to convert a
bunch of our code to IR builder, but never got to it.  Thanks for
beating me to it :)

All three are:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/8] mesa: Separate transform feedback object initialization from allocation.

2013-10-25 Thread Kenneth Graunke
Both Gallium and i965 subclass gl_transform_feedback_object, which
requires implementing a custom NewTransformFeedback hook.  Creating a
helper function to initialize the fields avoids code duplication and
divergence.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Cc: Eric Anholt e...@anholt.net
Cc: Marek Olšák mar...@gmail.com
---
 src/mesa/main/transformfeedback.c | 20 +++-
 src/mesa/main/transformfeedback.h |  3 +++
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/src/mesa/main/transformfeedback.c 
b/src/mesa/main/transformfeedback.c
index bc9b52a..76d213b 100644
--- a/src/mesa/main/transformfeedback.c
+++ b/src/mesa/main/transformfeedback.c
@@ -205,17 +205,27 @@ _mesa_free_transform_feedback(struct gl_context *ctx)
 }
 
 
+/** Initialize the fields of a gl_transform_feedback_object. */
+void
+_mesa_init_transform_feedback_object(struct gl_transform_feedback_object *obj,
+ GLuint name)
+{
+   if (!obj)
+  return;
+
+   obj-Name = name;
+   obj-RefCount = 1;
+   obj-EverBound = GL_FALSE;
+}
+
+
 /** Default fallback for ctx-Driver.NewTransformFeedback() */
 static struct gl_transform_feedback_object *
 new_transform_feedback(struct gl_context *ctx, GLuint name)
 {
struct gl_transform_feedback_object *obj;
obj = CALLOC_STRUCT(gl_transform_feedback_object);
-   if (obj) {
-  obj-Name = name;
-  obj-RefCount = 1;
-  obj-EverBound = GL_FALSE;
-   }
+   _mesa_init_transform_feedback_object(obj, name);
return obj;
 }
 
diff --git a/src/mesa/main/transformfeedback.h 
b/src/mesa/main/transformfeedback.h
index 0ffaab5..7aecd66 100644
--- a/src/mesa/main/transformfeedback.h
+++ b/src/mesa/main/transformfeedback.h
@@ -91,6 +91,9 @@ _mesa_GetTransformFeedbackVarying(GLuint program, GLuint 
index,
 
 
 /*** GL_ARB_transform_feedback2 ***/
+extern void
+_mesa_init_transform_feedback_object(struct gl_transform_feedback_object *obj,
+ GLuint name);
 
 struct gl_transform_feedback_object *
 _mesa_lookup_transform_feedback_object(struct gl_context *ctx, GLuint name);
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/8] st/mesa: Use the new _mesa_init_transform_feedback_object() helper.

2013-10-25 Thread Kenneth Graunke
This picks up a missing obj-EverBound = GL_FALSE line, and will catch
any new fields that get added in the future.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Cc: Marek Olšák mar...@gmail.com
---
 src/mesa/state_tracker/st_cb_xformfb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Compile tested with --with-gallium-drivers=swrast.  Not tested beyond that.

diff --git a/src/mesa/state_tracker/st_cb_xformfb.c 
b/src/mesa/state_tracker/st_cb_xformfb.c
index e1a7a88..a1c643b 100644
--- a/src/mesa/state_tracker/st_cb_xformfb.c
+++ b/src/mesa/state_tracker/st_cb_xformfb.c
@@ -74,8 +74,8 @@ st_new_transform_feedback(struct gl_context *ctx, GLuint name)
if (!obj)
   return NULL;
 
-   obj-base.Name = name;
-   obj-base.RefCount = 1;
+   _mesa_init_transform_feedback_object(obj, name);
+
return obj-base;
 }
 
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/8] i965: Create a new brw_transform_feedback_object subclass.

2013-10-25 Thread Kenneth Graunke
This adds the basic driver hooks to allocate/free the brw variant.
It doesn't contain any additional information yet, but it will soon.

v2: Use the new _mesa_init_transform_feedback_object helper function
(requested by Eric and Ian).

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_context.c |  2 ++
 src/mesa/drivers/dri/i965/brw_context.h |  9 +
 src/mesa/drivers/dri/i965/gen6_sol.c| 29 +
 3 files changed, 40 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 8420c65..2df12ed 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -250,6 +250,8 @@ brw_init_driver_functions(struct brw_context *brw,
 
functions-QuerySamplesForFormat = brw_query_samples_for_format;
 
+   functions-NewTransformFeedback = brw_new_transform_feedback;
+   functions-DeleteTransformFeedback = brw_delete_transform_feedback;
if (brw-gen = 7) {
   functions-BeginTransformFeedback = gen7_begin_transform_feedback;
   functions-EndTransformFeedback = gen7_end_transform_feedback;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 4bff63e..54ad929 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -880,6 +880,10 @@ struct intel_batchbuffer {
} saved;
 };
 
+struct brw_transform_feedback_object {
+   struct gl_transform_feedback_object base;
+};
+
 /**
  * Data shared between each programmable stage in the pipeline (vs, gs, and
  * wm).
@@ -1556,6 +1560,11 @@ extern int intel_translate_logic_op(GLenum opcode);
 void intel_init_syncobj_functions(struct dd_function_table *functions);
 
 /* gen6_sol.c */
+struct gl_transform_feedback_object *
+brw_new_transform_feedback(struct gl_context *ctx, GLuint name);
+void
+brw_delete_transform_feedback(struct gl_context *ctx,
+  struct gl_transform_feedback_object *obj);
 void
 brw_begin_transform_feedback(struct gl_context *ctx, GLenum mode,
 struct gl_transform_feedback_object *obj);
diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c 
b/src/mesa/drivers/dri/i965/gen6_sol.c
index 21da444..0d84cee 100644
--- a/src/mesa/drivers/dri/i965/gen6_sol.c
+++ b/src/mesa/drivers/dri/i965/gen6_sol.c
@@ -26,6 +26,7 @@
  * Code to initialize the binding table entries used by transform feedback.
  */
 
+#include main/bufferobj.h
 #include main/macros.h
 #include brw_context.h
 #include intel_batchbuffer.h
@@ -132,6 +133,34 @@ const struct brw_tracked_state gen6_gs_binding_table = {
.emit = brw_gs_upload_binding_table,
 };
 
+struct gl_transform_feedback_object *
+brw_new_transform_feedback(struct gl_context *ctx, GLuint name)
+{
+   struct brw_context *brw = brw_context(ctx);
+   struct brw_transform_feedback_object *brw_obj =
+  CALLOC_STRUCT(brw_transform_feedback_object);
+   if (!brw_obj)
+  return NULL;
+
+   _mesa_init_transform_feedback_object(brw_obj-base, name);
+
+   return brw_obj-base;
+}
+
+void
+brw_delete_transform_feedback(struct gl_context *ctx,
+  struct gl_transform_feedback_object *obj)
+{
+   struct brw_transform_feedback_object *brw_obj =
+  (struct brw_transform_feedback_object *) obj;
+
+   for (unsigned i = 0; i  Elements(obj-Buffers); i++) {
+  _mesa_reference_buffer_object(ctx, obj-Buffers[i], NULL);
+   }
+
+   free(brw_obj);
+}
+
 void
 brw_begin_transform_feedback(struct gl_context *ctx, GLenum mode,
 struct gl_transform_feedback_object *obj)
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 4/8] i965: Implement Pause/ResumeTransformfeedback driver hooks on Gen7+.

2013-10-25 Thread Kenneth Graunke
The ARB_transform_feedback2 extension introduces the ability to pause
and resume transform feedback sessions.  Although only one can be active
at a time, it's possible to switch between multiple transform feedback
objects while paused.

In order to facilitate this, we need to save/restore the SO_WRITE_OFFSET
registers so that after resuming, the GPU continues writing where it
left off.

This functionality also exists in ES 3.0, but somehow we completely
forgot to implement it.

v2: Reduce alignment from 4096 to 64 (it seemed excessive).

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Reviewed-by: Ian Romanick ian.d.roman...@intel.com
Reviewed-by: Eric Anholt e...@anholt.net
---
 src/mesa/drivers/dri/i965/brw_context.c|  2 ++
 src/mesa/drivers/dri/i965/brw_context.h|  9 +++
 src/mesa/drivers/dri/i965/gen6_sol.c   |  5 
 src/mesa/drivers/dri/i965/gen7_sol_state.c | 40 ++
 4 files changed, 56 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 2df12ed..90d9be4 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -255,6 +255,8 @@ brw_init_driver_functions(struct brw_context *brw,
if (brw-gen = 7) {
   functions-BeginTransformFeedback = gen7_begin_transform_feedback;
   functions-EndTransformFeedback = gen7_end_transform_feedback;
+  functions-PauseTransformFeedback = gen7_pause_transform_feedback;
+  functions-ResumeTransformFeedback = gen7_resume_transform_feedback;
} else {
   functions-BeginTransformFeedback = brw_begin_transform_feedback;
   functions-EndTransformFeedback = brw_end_transform_feedback;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 54ad929..48aa4c1 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -882,6 +882,9 @@ struct intel_batchbuffer {
 
 struct brw_transform_feedback_object {
struct gl_transform_feedback_object base;
+
+   /** A buffer to hold SO_WRITE_OFFSET(n) values while paused. */
+   drm_intel_bo *offset_bo;
 };
 
 /**
@@ -1579,6 +1582,12 @@ gen7_begin_transform_feedback(struct gl_context *ctx, 
GLenum mode,
 void
 gen7_end_transform_feedback(struct gl_context *ctx,
struct gl_transform_feedback_object *obj);
+void
+gen7_pause_transform_feedback(struct gl_context *ctx,
+  struct gl_transform_feedback_object *obj);
+void
+gen7_resume_transform_feedback(struct gl_context *ctx,
+   struct gl_transform_feedback_object *obj);
 
 /* brw_blorp_blit.cpp */
 GLbitfield
diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c 
b/src/mesa/drivers/dri/i965/gen6_sol.c
index 0d84cee..2e6c86a 100644
--- a/src/mesa/drivers/dri/i965/gen6_sol.c
+++ b/src/mesa/drivers/dri/i965/gen6_sol.c
@@ -144,6 +144,9 @@ brw_new_transform_feedback(struct gl_context *ctx, GLuint 
name)
 
_mesa_init_transform_feedback_object(brw_obj-base, name);
 
+   brw_obj-offset_bo =
+  drm_intel_bo_alloc(brw-bufmgr, transform feedback offsets, 16, 64);
+
return brw_obj-base;
 }
 
@@ -158,6 +161,8 @@ brw_delete_transform_feedback(struct gl_context *ctx,
   _mesa_reference_buffer_object(ctx, obj-Buffers[i], NULL);
}
 
+   drm_intel_bo_unreference(brw_obj-offset_bo);
+
free(brw_obj);
 }
 
diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index abfe0a0..27421da 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -273,3 +273,43 @@ gen7_end_transform_feedback(struct gl_context *ctx,
 
intel_batchbuffer_emit_mi_flush(brw);
 }
+
+void
+gen7_pause_transform_feedback(struct gl_context *ctx,
+  struct gl_transform_feedback_object *obj)
+{
+   struct brw_context *brw = brw_context(ctx);
+   struct brw_transform_feedback_object *brw_obj =
+  (struct brw_transform_feedback_object *) obj;
+
+   /* Save the SOL buffer offset register values. */
+   for (int i = 0; i  4; i++) {
+  BEGIN_BATCH(3);
+  OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2));
+  OUT_BATCH(GEN7_SO_WRITE_OFFSET(i));
+  OUT_RELOC(brw_obj-offset_bo,
+I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER,
+i * sizeof(uint32_t));
+  ADVANCE_BATCH();
+   }
+}
+
+void
+gen7_resume_transform_feedback(struct gl_context *ctx,
+   struct gl_transform_feedback_object *obj)
+{
+   struct brw_context *brw = brw_context(ctx);
+   struct brw_transform_feedback_object *brw_obj =
+  (struct brw_transform_feedback_object *) obj;
+
+   /* Reload the SOL buffer offset registers. */
+   for (int i = 0; i  4; i++) {
+  BEGIN_BATCH(3);
+  OUT_BATCH(GEN7_MI_LOAD_REGISTER_MEM | (3 - 2));
+  OUT_BATCH(GEN7_SO_WRITE_OFFSET(i));
+  OUT_RELOC(brw_obj-offset_bo,
+   

[Mesa-dev] [PATCH v2 5/8] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.

2013-10-25 Thread Kenneth Graunke
DrawTransformFeedback() needs to obtain the number of vertices written
to a particular stream during the last Begin/EndTransformFeedback block.
The new driver hook returns exactly that information.

Gallium drivers already implement this by passing the transform feedback
object to the drawing function, counting the number of vertices written
on the GPU, and using draw indirect.  This is efficient, but doesn't
always work:

If vertex data comes from user arrays, then the VBO module needs to
know how many vertices to upload, so we need to synchronously count.
Gallium drivers are currently broken in this case.

It also doesn't work if primitive restart is done in software.  For
normal drawing, vbo_draw_arrays() performs software primitive restart,
splitting the draw call in two.  vbo_draw_transform_feedback() currently
doesn't because it has no idea how many vertices need to be drawn.

The new driver hook gives it that information, allowing us to reuse
the existing vbo_draw_arrays() code to do everything right.

On Intel hardware (at least Ivybridge), using the draw indirect approach
is difficult since the hardware counts primitives, rather than vertices,
which requires doing some simple math.  So we always use this hook.

Gallium drivers will likely want to use this hook in some cases, but
want to use the existing draw indirect approach where possible.  Hence,
I've added a flag to allow drivers to opt-in to this call.

v2: Make it possible to implement this hook but only use this path
when necessary (suggested by Marek).

Cc: Marek Olšák mar...@gmail.com
Cc: Eric Anholt e...@anholt.net
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_context.c |  2 ++
 src/mesa/main/dd.h  |  8 
 src/mesa/main/mtypes.h  |  6 ++
 src/mesa/vbo/vbo_exec_array.c   | 10 ++
 4 files changed, 26 insertions(+)

Marek,

Does this look like what you wanted?  I feel a bit silly adding all of this
seeing as the later conditions are totally untested - i965 sets the always
use this hook flag, so it short-circuits them, and Gallium drivers don't
yet implement the hook, so they don't hit it either. :)

But I think this is probably roughly what you're going to want...

Eric, does this look reasonable?

Thanks for everything!

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 90d9be4..623273c 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -329,6 +329,8 @@ brw_initialize_context_constants(struct brw_context *brw)
ctx-Const.MaxTransformFeedbackSeparateComponents =
   BRW_MAX_SOL_BINDINGS / BRW_MAX_SOL_BUFFERS;
 
+   ctx-Const.AlwaysUseGetTransformFeedbackVertexCount = true;
+
if (brw-gen == 6) {
   ctx-Const.MaxSamples = 4;
   ctx-Const.MaxColorTextureSamples = 4;
diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h
index 29469ce..11d5a9e 100644
--- a/src/mesa/main/dd.h
+++ b/src/mesa/main/dd.h
@@ -843,6 +843,14 @@ struct dd_function_table {
struct gl_transform_feedback_object *obj);
 
/**
+* Return the number of vertices written to a stream during the last
+* Begin/EndTransformFeedback block.
+*/
+   GLsizei (*GetTransformFeedbackVertexCount)(struct gl_context *ctx,
+  struct 
gl_transform_feedback_object *obj,
+  GLuint stream);
+
+   /**
 * \name GL_NV_texture_barrier interface
 */
void (*TextureBarrier)(struct gl_context *ctx);
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index 97ed1bd..f5e1f01 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3131,6 +3131,12 @@ struct gl_constants
 */
GLboolean PrimitiveRestartInSoftware;
 
+   /**
+* Always use the GetTransformFeedbackVertexCount() driver hook, rather
+* than passing the transform feedback object to the drawing function.
+*/
+   GLboolean AlwaysUseGetTransformFeedbackVertexCount;
+
/** GL_ARB_map_buffer_alignment */
GLuint MinMapBufferAlignment;
 
diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
index 1670409..f25a9de 100644
--- a/src/mesa/vbo/vbo_exec_array.c
+++ b/src/mesa/vbo/vbo_exec_array.c
@@ -1464,6 +1464,16 @@ vbo_draw_transform_feedback(struct gl_context *ctx, 
GLenum mode,
   return;
}
 
+   if (ctx-Driver.GetTransformFeedbackVertexCount 
+   (ctx-Const.AlwaysUseGetTransformFeedbackVertexCount ||
+(ctx-Const.PrimitiveRestartInSoftware 
+ ctx-Array._PrimitiveRestart) ||
+!vbo_all_varyings_in_vbos(exec-array.inputs))) {
+  GLsizei n = ctx-Driver.GetTransformFeedbackVertexCount(ctx, obj, 
stream);
+  vbo_draw_arrays(ctx, mode, 0, n, numInstances, 0);
+  return;
+   }
+
vbo_bind_arrays(ctx);
 
/* init most fields to zero */
-- 
1.8.3.2


[Mesa-dev] [PATCH v2 6/8] i965: Mark brw_draw_prims tfb_vertcount parameter as unused.

2013-10-25 Thread Kenneth Graunke
Renaming it makes it obvious that it isn't used, and the assertion
verifies that the VBO module never passes us such an object.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Reviewed-by: Ian Romanick ian.d.roman...@intel.com
Reviewed-by: Eric Anholt e...@anholt.net
---
 src/mesa/drivers/dri/i965/brw_draw.c | 4 +++-
 src/mesa/drivers/dri/i965/brw_draw.h | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 0acd089..7b33b76 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -463,11 +463,13 @@ void brw_draw_prims( struct gl_context *ctx,
 GLboolean index_bounds_valid,
 GLuint min_index,
 GLuint max_index,
-struct gl_transform_feedback_object *tfb_vertcount )
+struct gl_transform_feedback_object *unused_tfb_object)
 {
struct brw_context *brw = brw_context(ctx);
const struct gl_client_array **arrays = ctx-Array._DrawArrays;
 
+   assert(unused_tfb_object == NULL);
+
if (!_mesa_check_conditional_render(ctx))
   return;
 
diff --git a/src/mesa/drivers/dri/i965/brw_draw.h 
b/src/mesa/drivers/dri/i965/brw_draw.h
index aac375f..fb96813 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.h
+++ b/src/mesa/drivers/dri/i965/brw_draw.h
@@ -41,7 +41,7 @@ void brw_draw_prims( struct gl_context *ctx,
 GLboolean index_bounds_valid,
 GLuint min_index,
 GLuint max_index,
-struct gl_transform_feedback_object *tfb_vertcount );
+struct gl_transform_feedback_object *unused_tfb_object);
 
 void brw_draw_init( struct brw_context *brw );
 void brw_draw_destroy( struct brw_context *brw );
-- 
1.8.3.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 7/8] i965: Implement glDrawTransformFeedback().

2013-10-25 Thread Kenneth Graunke
Implementing the GetTransformFeedbackVertexCount() driver hook allows
the VBO module to call us with the right number of vertices.

The hardware doesn't directly count the number of vertices written by
SOL, so we instead use the SO_NUM_PRIMS_WRITTEN(n) counters and multiply
by the number of vertices per primitive.

Unfortunately, counting the number of primitives generated is tricky:
a program might pause a transform feedback operation, start a second one
with a different object, then switch back and resume.  Both transform
feedback operations share the SO_NUM_PRIMS_WRITTEN counters.

To work around this, we save the counter values at Begin, Pause, Resume,
and End.  This bookends each section where transform feedback is
active for the current object.  Adding up differences of pairs gives
us the number of primitives generated.  (This is similar to what we
do for occlusion queries on platforms without hardware contexts.)

v2: Fix missing parenthesis in assertion (caught by Eric Anholt).

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Reviewed-by: Ian Romanick ian.d.roman...@intel.com
Reviewed-by: Eric Anholt e...@anholt.net
---
 src/mesa/drivers/dri/i965/brw_context.c|   2 +
 src/mesa/drivers/dri/i965/brw_context.h|  26 
 src/mesa/drivers/dri/i965/gen6_sol.c   |   1 +
 src/mesa/drivers/dri/i965/gen7_sol_state.c | 190 -
 4 files changed, 218 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index 623273c..f4e04b6 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -252,6 +252,8 @@ brw_init_driver_functions(struct brw_context *brw,
 
functions-NewTransformFeedback = brw_new_transform_feedback;
functions-DeleteTransformFeedback = brw_delete_transform_feedback;
+   functions-GetTransformFeedbackVertexCount =
+  brw_get_transform_feedback_vertex_count;
if (brw-gen = 7) {
   functions-BeginTransformFeedback = gen7_begin_transform_feedback;
   functions-EndTransformFeedback = gen7_end_transform_feedback;
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 48aa4c1..c72bad1 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -880,11 +880,33 @@ struct intel_batchbuffer {
} saved;
 };
 
+#define BRW_MAX_XFB_STREAMS 4
+
 struct brw_transform_feedback_object {
struct gl_transform_feedback_object base;
 
/** A buffer to hold SO_WRITE_OFFSET(n) values while paused. */
drm_intel_bo *offset_bo;
+
+   /** The most recent primitive mode (GL_TRIANGLES/GL_POINTS/GL_LINES). */
+   GLenum primitive_mode;
+
+   /**
+* Count of primitives generated during this transform feedback operation.
+*  @{
+*/
+   uint64_t prims_generated[BRW_MAX_XFB_STREAMS];
+   drm_intel_bo *prim_count_bo;
+   unsigned prim_count_buffer_index; /** in number of uint64_t units */
+   /** @} */
+
+   /**
+* Number of vertices written between last Begin/EndTransformFeedback().
+*
+* Used to implement DrawTransformFeedback().
+*/
+   uint64_t vertices_written[BRW_MAX_XFB_STREAMS];
+   bool vertices_written_valid;
 };
 
 /**
@@ -1574,6 +1596,10 @@ brw_begin_transform_feedback(struct gl_context *ctx, 
GLenum mode,
 void
 brw_end_transform_feedback(struct gl_context *ctx,
struct gl_transform_feedback_object *obj);
+GLsizei
+brw_get_transform_feedback_vertex_count(struct gl_context *ctx,
+struct gl_transform_feedback_object 
*obj,
+GLuint stream);
 
 /* gen7_sol_state.c */
 void
diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c 
b/src/mesa/drivers/dri/i965/gen6_sol.c
index 2e6c86a..af5bed9 100644
--- a/src/mesa/drivers/dri/i965/gen6_sol.c
+++ b/src/mesa/drivers/dri/i965/gen6_sol.c
@@ -162,6 +162,7 @@ brw_delete_transform_feedback(struct gl_context *ctx,
}
 
drm_intel_bo_unreference(brw_obj-offset_bo);
+   drm_intel_bo_unreference(brw_obj-prim_count_bo);
 
free(brw_obj);
 }
diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c 
b/src/mesa/drivers/dri/i965/gen7_sol_state.c
index 27421da..7cac8fe 100644
--- a/src/mesa/drivers/dri/i965/gen7_sol_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c
@@ -249,14 +249,179 @@ const struct brw_tracked_state gen7_sol_state = {
.emit = upload_sol_state,
 };
 
+/**
+ * Tally the number of primitives generated so far.
+ *
+ * The buffer contains a series of pairs:
+ * (start0, start1, start2, start3, end0, end1, end2, end3) ;
+ * (start0, start1, start2, start3, end0, end1, end2, end3) ;
+ *
+ * For each stream, we subtract the pair of values (end - start) to get the
+ * number of primitives generated during one section.  We accumulate these
+ * values, adding them up to get the total number of primitives generated.
+ */
+static void