from:"Zack Rusin"

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-24 Thread Zack Rusin

On Wed, Jan 24, 2024 at 6:57 PM Marek Olšák  wrote:
>
> Gallium looks like it was just a copy of DX10, and likely many things were 
> known from DX10 in advance before anything started. Vulkanium doesn't have 
> anything to draw inspiration from. It's a completely unexplored idea.

I'm not sure if I follow this. GNU/Linux didn't have a unified driver
interface to implement GL, but Windows did have a standardized
interface to implement D3D10 which we drew inspiration from. The same
is still true if you s/GL/Vulkan/ and s/D3D10/D3D12/. It's just that
more features of modern API's are tied to kernel features (i.e. wddm
versions) than in the past, but with gpuvm, drm scheduler and syncobj
that's also going to be Vulkan's path.
Now, you might say that this time we're not going to use any lessons
from Windows and this interface will be completely unlike what Windows
does for D3D12, which is fine but I still wouldn't call the idea of
standardizing an interface for a low level graphics API a completely
unexplored idea given that it works on Windows on an api that's a lot
more like Vulkan, than D3D10 was like GL.

z

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-24 Thread Zack Rusin

On Wed, Jan 24, 2024 at 10:27 AM Faith Ekstrand  wrote:
>
> Jose,
>
> Thanks for your thoughts!
>
> On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca  
> wrote:
> >
> > I don't know much about the current Vulkan driver internals to have or 
> > provide an informed opinion on the path forward, but I'd like to share my 
> > backwards looking perspective.
> >
> > Looking back, Gallium was two things effectively:
> > (1) an abstraction layer, that's watertight (as in upper layers shouldn't 
> > reach through to lower layers)
> > (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
> >
> > (1) was of course important -- and the discipline it imposed is what 
> > enabled to great simplifications -- but it also became a straight-jacket, 
> > as GPUs didn't stand still, and sooner or later the 
> > see-every-hardware-as-the-same lenses stop reflecting reality.
> >
> > If I had to pick one, I'd say that (2) is far more useful and practical.
> > Take components like gallium's draw and other util modules. A driver can 
> > choose to use them or not.  One could fork them within Mesa source tree, 
> > and only the drivers that opt-in into the fork would need to be 
> > tested/adapted/etc
> >
> > On the flip side, Vulkan API is already a pretty low level HW abstraction.  
> > It's also very flexible and extensible, so it's hard to provide a 
> > watertight abstraction underneath it without either taking the lowest 
> > common denominator, or having lots of optional bits of functionality 
> > governed by a myriad of caps like you alluded to.
>
> There is a third thing that isn't really recognized in your description:
>
> (3) A common "language" to talk about GPUs and data structures that
> represent that language
>
> This is precisely what the Vulkan runtime today doesn't have. Classic
> meta sucked because we were trying to implement GL in GL. u_blitter,
> on the other hand, is pretty fantastic because Gallium provides a much
> more sane interface to write those common components in terms of.
>
> So far, we've been trying to build those components in terms of the
> Vulkan API itself with calls jumping back into the dispatch table to
> try and get inside the driver. This is working but it's getting more
> and more fragile the more tools we add to that box. A lot of what I
> want to do with gallium2 or whatever we're calling it is to fix our
> layering problems so that calls go in one direction and we can
> untangle the jumble. I'm still not sure what I want that to look like
> but I think I want it to look a lot like Vulkan, just with a handier
> interface.

Yes, that makes sense. When we were writing the initial components for
gallium (draw and cso) I really liked the general concept and thought
about trying to reuse them in the old, non-gallium Mesa drivers but
the obstacle was that there was no common interface to lay them on.
Using GL to implement GL was silly and using Vulkan to implement
Vulkan is not much better.

Having said that my general thoughts on GPU abstractions largely match
what Jose has said. To me it's a question of whether a clean
abstraction:
- on top of which you can build an entire GPU driver toolkit (i.e. all
the components and helpers)
- that makes it trivial to figure up what needs to be done to write a
new driver and makes bootstrapping a new driver a lot simpler
- that makes it easier to reason about cross hardware concepts (it's a
lot easier to understand the entirety of the ecosystem if every driver
is not doing something unique to implement similar functionality)
is worth more than almost exponentially increasing the difficulty of:
- advancing the ecosystem (i.e. it might be easier to understand but
it's way harder to create clean abstractions across such different
hardware).
- driver maintenance (i.e. there will be a constant stream of
regressions hitting your driver as a result of other people working on
their drivers)
- general development (i.e. bug fixes/new features being held back
because they break some other driver)

Some of those can certainly be titled one way or the other, e.g. the
driver maintenance con be somewhat eased by requiring that every
driver working on top of the new abstraction has to have a stable
Mesa-CI setup (be it lava or ci-tron, or whatever) but all of those
things need to be reasoned about. In my experience abstractions never
have uniform support because some people will value cons of them more
than they value the pros. So the entire process requires some very
steadfast individuals to keep going despite hearing that the effort is
dumb, at least until the benefits of the new approach are impossible
to deny. So you know... "how much do you believe in this approach
because some days will suck and you can't give up" ;) is probably the
question.

z

Re: [Mesa-dev] [PATCH] draw: fix clipvertex trouble if position comes from gs

2014-08-06 Thread Zack Rusin

On Aug 5, 2014, at 9:40 PM, srol...@vmware.com wrote:
 From: Roland Scheidegger srol...@vmware.com
 
 If the vertex shader has no position but the gs has, the clipvertex output
 was -1 (because it's the same as vs position in this case if there's no
 explicit clipvertex output). This caused crashes (or assertion failures) in
 clipping since in the end position (which came from gs) was different from
 cv (-1) and we then tried to use the bogus cv input.
 Rather than just test for -1 cv value in clipping, make it explicitly return
 the position output of the gs instead which seems cleaner (since we really
 don't want to use the clipvertex value from the vs (it could be a valid value
 in the (unsupported) case of vs writing clipvertex but still using a gs).
 This fixes piglit shader_runner clip-distance-out-values.shader_test.

Great. Well done! Both of those look good. 

Reviewed-by: Zack Rusin za...@vmware.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/8] draw/gs: fix segfault in glsl-1.50-gs-mismatch-prim-type triangles_adjacency

2014-06-10 Thread Zack Rusin

That looks wrong.  The total number of verts per buffer is the maximum number 
of verts that can be output per invocation (primitive_boundary) times number of 
invocations of geometry shader (num_in_primitives).

It's not maximum number of verts that can be output per invocation 
(primitive_boundary) times maximum number of primitives output by geometry 
shader (max_out_prims).

z

- Original Message -
 From: Dave Airlie airl...@redhat.com
 
 This crashes on softpipe due to a lack of output memory allocated,
 
 it appears we allocate memory for enough primtives, but not vertices
 so convert to number of vertices.
 
 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  src/gallium/auxiliary/draw/draw_gs.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_gs.c
 b/src/gallium/auxiliary/draw/draw_gs.c
 index fc4f697..0a9bf81 100644
 --- a/src/gallium/auxiliary/draw/draw_gs.c
 +++ b/src/gallium/auxiliary/draw/draw_gs.c
 @@ -555,7 +555,7 @@ int draw_geometry_shader_run(struct draw_geometry_shader
 *shader,
 /* we allocate exactly one extra vertex per primitive to allow the GS to
 emit
  * overflown vertices into some area where they won't harm anyone */
 unsigned total_verts_per_buffer = shader-primitive_boundary *
 -  num_in_primitives;
 +  max_out_prims * u_vertices_per_prim(shader-output_primitive);
  
 //Assume at least one primitive
 max_out_prims = MAX2(max_out_prims, 1);
 --
 1.9.3
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/8] draw/gs: fix segfault in glsl-1.50-gs-mismatch-prim-type triangles_adjacency

2014-06-10 Thread Zack Rusin

I think the code is already correct and something else goes wrong. The tgsi 
geometry shader code was never done properly so it's more than likely that 
tgsi_exec is doing something wonky.

Geometry shaders specify the maximum number of vertices that they can emit. 
That's what draw_geometry_shader::max_output_vertices is. If a geometry shader 
emits more than that, the verts will be ignored. So our primitive_boundary is 
max_output_vertices + 1  because we want to make sure that in SoA we have a 
scratch space where we can keep writing the overflowed vertices. 

So the worst case scenario for our output buffer is: (max_output_vertices + 1) 
* geometry shader invocations. That's what we have there now and that's 
correct. I don't remember what tgsi_exec does, I think I never even implemented 
proper SoA for gs in tgsi_exec, so if there's anything wrong I'd look for the 
bug there.

z

- Original Message -
 On 11 June 2014 00:02, Zack Rusin za...@vmware.com wrote:
  That looks wrong.  The total number of verts per buffer is the maximum
  number of verts that can be output per invocation (primitive_boundary)
  times number of invocations of geometry shader (num_in_primitives).
 
  It's not maximum number of verts that can be output per invocation
  (primitive_boundary) times maximum number of primitives output by geometry
  shader (max_out_prims).
 
 
 Okay so just adding * u_vertices_per_prim(shader-output_primitive);
 would suffice?
 
 Dave
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/8] draw/gs: fix segfault in glsl-1.50-gs-mismatch-prim-type triangles_adjacency

2014-06-10 Thread Zack Rusin

 I'll revisit it today and see if I can spot something else wrong, it
 fails for triangle adj because there are 6 vertices per primitive and
 we have only malloced space for 4.

It has to be something else because that's impossible, in fact it's 2x 
impossible ;)

1) It's illegal and impossible for geometry shader to emit adjacency 
primitives. Only points, lines and triangles can be emitted from gs.

2) The output primitive is irrelevant for the size of the buffer. If a geometry 
shader claims that the max output vertices is four, then it can, at most, emit 
4 points, 2 lines or 1 triangle (incomplete primitives are discarded from 
geometry shader so the extra 4th vertex will be discarded). If a geometry 
shader claims to max emit 4 vertices and you try to emit 100 points, you will 
still get only 4 points (96 will be counted as overflowed but they won't be 
emitted).

My advice would be to check what's in the output buffer with llvmpipe. If 
tgsi_exec doesn't match llvmpipe then there's a bug in tgsi_exec.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: avoid buffer overflows with bad geometry programs.

2014-06-10 Thread Zack Rusin

To be honest I still don't like it. While the tgsi_exec specific paths in 
draw_gs don't matter to me and can be as ugly as they need to be, they can't be 
polluting the draw_pt_emit code, in other words the primitive_lengths can't be 
bogus at that point - prim_info can't lie about the amount of data that it's 
holding.

z

- Original Message -
 From: Dave Airlie airl...@redhat.com
 
 One of the mismatched tests have a max output vertices of 3,
 but emits 6 vertices, this means the output buffer is undersized
 and causes problems down the line, so limit things later if we
 have a number of vertices lower than the number required to execute
 a primitive.
 
 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  src/gallium/auxiliary/draw/draw_gs.c  | 4 ++--
  src/gallium/auxiliary/draw/draw_pt_emit.c | 8 +++-
  2 files changed, 9 insertions(+), 3 deletions(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_gs.c
 b/src/gallium/auxiliary/draw/draw_gs.c
 index fc4f697..d07e88f 100644
 --- a/src/gallium/auxiliary/draw/draw_gs.c
 +++ b/src/gallium/auxiliary/draw/draw_gs.c
 @@ -92,8 +92,8 @@ tgsi_fetch_gs_outputs(struct draw_geometry_shader *shader,
unsigned num_verts_per_prim = machine-Primitives[prim_idx];
shader-primitive_lengths[prim_idx +   shader-emitted_primitives] =
   machine-Primitives[prim_idx];
 -  shader-emitted_vertices += num_verts_per_prim;
 -  for (j = 0; j  num_verts_per_prim; j++, current_idx++) {
 +  shader-emitted_vertices += MIN2(num_verts_per_prim,
 shader-max_output_vertices);
 +  for (j = 0; j  MIN2(num_verts_per_prim, shader-max_output_vertices);
 j++, current_idx++) {
   int idx = current_idx * shader-info.num_outputs;
  #ifdef DEBUG_OUTPUTS
   debug_printf(%d) Output vert:\n, idx / shader-info.num_outputs);
 diff --git a/src/gallium/auxiliary/draw/draw_pt_emit.c
 b/src/gallium/auxiliary/draw/draw_pt_emit.c
 index 011efe7..d8e2809 100644
 --- a/src/gallium/auxiliary/draw/draw_pt_emit.c
 +++ b/src/gallium/auxiliary/draw/draw_pt_emit.c
 @@ -26,6 +26,7 @@
   **/
  
  #include util/u_memory.h
 +#include util/u_math.h
  #include draw/draw_context.h
  #include draw/draw_private.h
  #include draw/draw_vbuf.h
 @@ -255,9 +256,14 @@ draw_pt_emit_linear(struct pt_emit *emit,
  i  prim_info-primitive_count;
  start += prim_info-primitive_lengths[i], i++)
 {
 +  int len;
 +  if (start  count)
 + continue;
 +  len = MIN2(prim_info-primitive_lengths[i], count);
render-draw_arrays(render,
start,
 -  prim_info-primitive_lengths[i]);
 +  len);
 +
 }
 
 render-release_vertices(render);
 --
 1.9.3
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] tgsi/gs: bound max output vertices in shader

2014-06-10 Thread Zack Rusin

Looks great. If I was into diffs I'd make sweet and passionate love to this one.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Dave Airlie airl...@redhat.com
 
 This limits the number of emitted vertices to the shaders max output
 vertices, and avoids us writing things into memory that isn't big
 enough for it.
 
 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  src/gallium/auxiliary/tgsi/tgsi_exec.c | 8 
  src/gallium/auxiliary/tgsi/tgsi_exec.h | 1 +
  2 files changed, 9 insertions(+)
 
 diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c
 b/src/gallium/auxiliary/tgsi/tgsi_exec.c
 index 69d98fd..d848348 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
 +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
 @@ -789,6 +789,11 @@ tgsi_exec_machine_bind_shader(
   break;
  
case TGSI_TOKEN_TYPE_PROPERTY:
 + if (mach-Processor == TGSI_PROCESSOR_GEOMETRY) {
 +if (parse.FullToken.FullProperty.Property.PropertyName ==
 TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES) {
 +   mach-MaxOutputVertices =
 parse.FullToken.FullProperty.u[0].Data;
 +}
 + }
   break;
  
default:
 @@ -1621,6 +1626,9 @@ emit_vertex(struct tgsi_exec_machine *mach)
   if ((mach-ExecMask  (1  i)))
 */
 if (mach-ExecMask) {
 +  if
 (mach-Primitives[mach-Temps[TEMP_PRIMITIVE_I].xyzw[TEMP_PRIMITIVE_C].u[0]]
 = mach-MaxOutputVertices)
 + return;
 +
mach-Temps[TEMP_OUTPUT_I].xyzw[TEMP_OUTPUT_C].u[0] +=
mach-NumOutputs;

 mach-Primitives[mach-Temps[TEMP_PRIMITIVE_I].xyzw[TEMP_PRIMITIVE_C].u[0]]++;
 }
 diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h
 b/src/gallium/auxiliary/tgsi/tgsi_exec.h
 index 7a82f69..d53c4ba 100644
 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.h
 +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h
 @@ -297,6 +297,7 @@ struct tgsi_exec_machine
 unsigned  *Primitives;
 unsigned   NumOutputs;
 unsigned   MaxGeometryShaderOutputs;
 +   unsigned   MaxOutputVertices;
  
 /* FRAGMENT processor only. */
 const struct tgsi_interp_coef *InterpCoefs;
 --
 1.9.3
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 7/8] gallium: create TGSI_PROPERTY to disable viewport and clipping

2014-05-20 Thread Zack Rusin

It's not relevant to anything we have. The last I looked st/nine wasn't even an 
umd. Everything that's needed for a d3d9 (and d3d10) umd's has already been 
added to gallium, we don't have any patches against core gallium that we've 
been keeping from the community. All we could do is review the patch for 
code-quality, but so does everyone else.

z

- Original Message -
 Hi,
 
 Could somebody from VMWare please review this patch? It's for st/nine
 (open d3d9 state tracker).
 
 Thanks,
 
 Marek
 
 On Sat, May 17, 2014 at 1:20 AM, Automated rebase
 david.heidelber...@ixit.cz wrote:
  From: Christoph Bumiller e0425...@student.tuwien.ac.at
 
  ---
   src/gallium/auxiliary/tgsi/tgsi_strings.c  |  3 ++-
   src/gallium/auxiliary/tgsi/tgsi_ureg.c | 16 
   src/gallium/auxiliary/tgsi/tgsi_ureg.h |  4 
   src/gallium/docs/source/tgsi.rst   |  9 +
   src/gallium/include/pipe/p_shader_tokens.h |  3 ++-
   5 files changed, 33 insertions(+), 2 deletions(-)
 
  diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c
  b/src/gallium/auxiliary/tgsi/tgsi_strings.c
  index 5b6e47f..c3e7118 100644
  --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c
  +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c
  @@ -120,7 +120,8 @@ const char *tgsi_property_names[TGSI_PROPERTY_COUNT] =
  FS_COORD_PIXEL_CENTER,
  FS_COLOR0_WRITES_ALL_CBUFS,
  FS_DEPTH_LAYOUT,
  -   VS_PROHIBIT_UCPS
  +   VS_PROHIBIT_UCPS,
  +   VS_POSITION_WINDOW_SPACE
   };
 
   const char *tgsi_type_names[5] =
  diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
  b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
  index 2bf93ee..bd0a3f7 100644
  --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
  +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
  @@ -173,6 +173,7 @@ struct ureg_program
  unsigned char property_fs_coord_pixel_center; /* =
  TGSI_FS_COORD_PIXEL_CENTER_* */
  unsigned char property_fs_color0_writes_all_cbufs; /* =
  TGSI_FS_COLOR0_WRITES_ALL_CBUFS * */
  unsigned char property_fs_depth_layout; /* TGSI_FS_DEPTH_LAYOUT */
  +   boolean property_vs_window_space_position; /*
  TGSI_VS_WINDOW_SPACE_POSITION */
 
  unsigned nr_addrs;
  unsigned nr_preds;
  @@ -331,6 +332,13 @@ ureg_property_fs_depth_layout(struct ureg_program
  *ureg,
  ureg-property_fs_depth_layout = fs_depth_layout;
   }
 
  +void
  +ureg_property_vs_window_space_position(struct ureg_program *ureg,
  +   boolean vs_window_space_position)
  +{
  +   ureg-property_vs_window_space_position = vs_window_space_position;
  +}
  +
   struct ureg_src
   ureg_DECL_fs_input_cyl_centroid(struct ureg_program *ureg,
  unsigned semantic_name,
  @@ -1508,6 +1516,14 @@ static void emit_decls( struct ureg_program *ureg )
   ureg-property_fs_depth_layout);
  }
 
  +   if (ureg-property_vs_window_space_position) {
  +  assert(ureg-processor == TGSI_PROCESSOR_VERTEX);
  +
  +  emit_property(ureg,
  +TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION,
  +ureg-property_vs_window_space_position);
  +   }
  +
  if (ureg-processor == TGSI_PROCESSOR_VERTEX) {
 for (i = 0; i  UREG_MAX_INPUT; i++) {
if (ureg-vs_inputs[i/32]  (1  (i%32))) {
  diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
  b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
  index a0a50b7..28edea6 100644
  --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h
  +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h
  @@ -184,6 +184,10 @@ void
   ureg_property_fs_depth_layout(struct ureg_program *ureg,
 unsigned fs_depth_layout);
 
  +void
  +ureg_property_vs_window_space_position(struct ureg_program *ureg,
  +   boolean vs_window_space_position);
  +
 
   /***
* Build shader declarations:
  diff --git a/src/gallium/docs/source/tgsi.rst
  b/src/gallium/docs/source/tgsi.rst
  index 9500b9d..2ca3c3b 100644
  --- a/src/gallium/docs/source/tgsi.rst
  +++ b/src/gallium/docs/source/tgsi.rst
  @@ -2848,6 +2848,15 @@ input primitive. Each invocation will have a
  different
   TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
   be 1.
 
  +VS_WINDOW_SPACE_POSITION
  +
  +If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION
  output
  +is assumed to contain window space coordinates.
  +Division of X,Y,Z by W and the viewport transformation are disabled, and
  1/W is
  +directly taken from the 4-th component of the shader output.
  +Naturally, clipping is not performed on window coordinates either.
  +The effect of this property is undefined if a geometry or tessellation
  shader
  +are in use.
 
   Texture Sampling and Texture Formats
   
  diff --git a/src/gallium/include/pipe/p_shader_tokens.h
  b/src/gallium/include/pipe/p_shader_tokens.h
  index

[Mesa-dev] [PATCH] draw/llvm: reduce memory usage

2014-04-23 Thread Zack Rusin

Lets make draw_get_option_use_llvm function available unconditionally
and use it to avoid useless allocations when LLVM paths are active.
TGSI machine is never used when we're using LLVM.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_context.c |  6 ++
 src/gallium/auxiliary/draw/draw_context.h |  2 --
 src/gallium/auxiliary/draw/draw_gs.c  | 26 --
 src/gallium/auxiliary/draw/draw_vs.c  | 11 +++
 src/gallium/auxiliary/draw/draw_vs_exec.c |  2 ++
 5 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index 0a67879..ddc305b 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -68,6 +68,12 @@ draw_get_option_use_llvm(void)
}
return value;
 }
+#else
+boolean
+draw_get_option_use_llvm(void)
+{
+   return FALSE;
+}
 #endif
 
 
diff --git a/src/gallium/auxiliary/draw/draw_context.h 
b/src/gallium/auxiliary/draw/draw_context.h
index f114f50..48549fe 100644
--- a/src/gallium/auxiliary/draw/draw_context.h
+++ b/src/gallium/auxiliary/draw/draw_context.h
@@ -288,9 +288,7 @@ draw_get_shader_param(unsigned shader, enum pipe_shader_cap 
param);
 int
 draw_get_shader_param_no_llvm(unsigned shader, enum pipe_shader_cap param);
 
-#ifdef HAVE_LLVM
 boolean
 draw_get_option_use_llvm(void);
-#endif
 
 #endif /* DRAW_CONTEXT_H */
diff --git a/src/gallium/auxiliary/draw/draw_gs.c 
b/src/gallium/auxiliary/draw/draw_gs.c
index 7de5e03..5e503ff 100644
--- a/src/gallium/auxiliary/draw/draw_gs.c
+++ b/src/gallium/auxiliary/draw/draw_gs.c
@@ -674,11 +674,7 @@ int draw_geometry_shader_run(struct draw_geometry_shader 
*shader,
 void draw_geometry_shader_prepare(struct draw_geometry_shader *shader,
   struct draw_context *draw)
 {
-#ifdef HAVE_LLVM
boolean use_llvm = draw_get_option_use_llvm();
-#else
-   boolean use_llvm = FALSE;
-#endif
if (!use_llvm  shader  shader-machine-Tokens != shader-state.tokens) 
{
   tgsi_exec_machine_bind_shader(shader-machine,
 shader-state.tokens,
@@ -690,16 +686,18 @@ void draw_geometry_shader_prepare(struct 
draw_geometry_shader *shader,
 boolean
 draw_gs_init( struct draw_context *draw )
 {
-   draw-gs.tgsi.machine = tgsi_exec_machine_create();
-   if (!draw-gs.tgsi.machine)
-  return FALSE;
-
-   draw-gs.tgsi.machine-Primitives = align_malloc(
-  MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector), 16);
-   if (!draw-gs.tgsi.machine-Primitives)
-  return FALSE;
-   memset(draw-gs.tgsi.machine-Primitives, 0,
-  MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector));
+   if (!draw_get_option_use_llvm()) {
+  draw-gs.tgsi.machine = tgsi_exec_machine_create();
+  if (!draw-gs.tgsi.machine)
+ return FALSE;
+
+  draw-gs.tgsi.machine-Primitives = align_malloc(
+ MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector), 16);
+  if (!draw-gs.tgsi.machine-Primitives)
+ return FALSE;
+  memset(draw-gs.tgsi.machine-Primitives, 0,
+ MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector));
+   }
 
return TRUE;
 }
diff --git a/src/gallium/auxiliary/draw/draw_vs.c 
b/src/gallium/auxiliary/draw/draw_vs.c
index 55cbeb9..8bb9a7f 100644
--- a/src/gallium/auxiliary/draw/draw_vs.c
+++ b/src/gallium/auxiliary/draw/draw_vs.c
@@ -149,9 +149,11 @@ draw_vs_init( struct draw_context *draw )
 {
draw-dump_vs = debug_get_option_gallium_dump_vs();
 
-   draw-vs.tgsi.machine = tgsi_exec_machine_create();
-   if (!draw-vs.tgsi.machine)
-  return FALSE;
+   if (!draw_get_option_use_llvm()) {
+  draw-vs.tgsi.machine = tgsi_exec_machine_create();
+  if (!draw-vs.tgsi.machine)
+ return FALSE;
+   }
 
draw-vs.emit_cache = translate_cache_create();
if (!draw-vs.emit_cache) 
@@ -173,7 +175,8 @@ draw_vs_destroy( struct draw_context *draw )
if (draw-vs.emit_cache)
   translate_cache_destroy(draw-vs.emit_cache);
 
-   tgsi_exec_machine_destroy(draw-vs.tgsi.machine);
+   if (draw_get_option_use_llvm())
+  tgsi_exec_machine_destroy(draw-vs.tgsi.machine);
 }
 
 
diff --git a/src/gallium/auxiliary/draw/draw_vs_exec.c 
b/src/gallium/auxiliary/draw/draw_vs_exec.c
index 133b116..6a18d8c 100644
--- a/src/gallium/auxiliary/draw/draw_vs_exec.c
+++ b/src/gallium/auxiliary/draw/draw_vs_exec.c
@@ -63,6 +63,7 @@ vs_exec_prepare( struct draw_vertex_shader *shader,
 {
struct exec_vertex_shader *evs = exec_vertex_shader(shader);
 
+   debug_assert(!draw_get_option_use_llvm());
/* Specify the vertex program to interpret/execute.
 * Avoid rebinding when possible.
 */
@@ -96,6 +97,7 @@ vs_exec_run_linear( struct draw_vertex_shader *shader,
unsigned slot;
boolean clamp_vertex_color = shader-draw-rasterizer-clamp_vertex_color;
 
+   debug_assert(!draw_get_option_use_llvm());
tgsi_exec_set_constant_buffers(machine

Re: [Mesa-dev] [PATCH] draw/llvm: reduce memory usage

2014-04-23 Thread Zack Rusin

  
 -   tgsi_exec_machine_destroy(draw-vs.tgsi.machine);
 +   if (draw_get_option_use_llvm())
 +      tgsi_exec_machine_destroy(draw-vs.tgsi.machine);

That part should have used !draw_get_option_use_llvm() 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/4] XA composite and perf improvements

2014-04-01 Thread Zack Rusin

- Original Message -
 From: Rob Clark robcl...@freedesktop.org
 
 While still more of a stop-gap solution (until glamor) for freedreno,
 with these few relatively simple changes I get a pretty big performance
 boost (~40%) for xf86-video-freedreno.

That looks great to me. Nice work.
But to be honest the only thing I remember about this code is that it has been 
written in C and I'm probably like 40% certain of that.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] llvmpipe: Simplify vertex and geometry shaders.

2014-03-26 Thread Zack Rusin

Actually Jose I think we'll need to revert this. That's because draw always 
assumed that if geometry shader is present it means that the geometry shader is 
present, but that is not true anymore. That's because d3d10 creates a null 
geometry shader to pass around the stream output. Before the patch the draw 
geometry shader was only created if tokens weren't null, with your change it 
always is and it's causing lots and lots of crashes because various parts are 
trying to execute a null geometry shader. Although I agree it'd be nice if we 
could handle it, I don't see a trivial way of fixing it.

z

- Original Message -
 The series looks great to me.
 
 - Original Message -
  From: José Fonseca jfons...@vmware.com
  
  Eliminate lp_vertex_shader, as it added nothing over draw_vertex_shader.
  
  Simplify lp_geometry_shader, as most of the incoming state is unneeded.
  (We could also just use draw_geometry_shader if we were willing to peek
  inside the structure.)
  ---
   src/gallium/drivers/llvmpipe/lp_context.h |  4 +--
   src/gallium/drivers/llvmpipe/lp_draw_arrays.c |  8 ++---
   src/gallium/drivers/llvmpipe/lp_state.h   | 13 ++--
   src/gallium/drivers/llvmpipe/lp_state_gs.c| 32 +++
   src/gallium/drivers/llvmpipe/lp_state_vs.c| 46
   +++
   5 files changed, 33 insertions(+), 70 deletions(-)
  
  diff --git a/src/gallium/drivers/llvmpipe/lp_context.h
  b/src/gallium/drivers/llvmpipe/lp_context.h
  index 05cdfe5..ee8033c 100644
  --- a/src/gallium/drivers/llvmpipe/lp_context.h
  +++ b/src/gallium/drivers/llvmpipe/lp_context.h
  @@ -46,8 +46,8 @@
   struct llvmpipe_vbuf_render;
   struct draw_context;
   struct draw_stage;
  +struct draw_vertex_shader;
   struct lp_fragment_shader;
  -struct lp_vertex_shader;
   struct lp_blend_state;
   struct lp_setup_context;
   struct lp_setup_variant;
  @@ -63,7 +63,7 @@ struct llvmpipe_context {
  const struct pipe_depth_stencil_alpha_state *depth_stencil;
  const struct pipe_rasterizer_state *rasterizer;
  struct lp_fragment_shader *fs;
  -   const struct lp_vertex_shader *vs;
  +   struct draw_vertex_shader *vs;
  const struct lp_geometry_shader *gs;
  const struct lp_velems_state *velems;
  const struct lp_so_state *so;
  diff --git a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
  b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
  index 3df0a5c..99e6d19 100644
  --- a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
  +++ b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
  @@ -112,11 +112,11 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const
  struct pipe_draw_info *info)
  llvmpipe_prepare_geometry_sampling(lp,
 
  lp-num_sampler_views[PIPE_SHADER_GEOMETRY],
 
  lp-sampler_views[PIPE_SHADER_GEOMETRY]);
  -   if (lp-gs  !lp-gs-shader.tokens) {
  +   if (lp-gs  lp-gs-no_tokens) {
 /* we have an empty geometry shader with stream output, so
attach the stream output info to the current vertex shader */
 if (lp-vs) {
  - draw_vs_attach_so(lp-vs-draw_data,
  lp-gs-shader.stream_output);
  + draw_vs_attach_so(lp-vs, lp-gs-stream_output);
 }
  }
  draw_collect_pipeline_statistics(draw,
  @@ -136,11 +136,11 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const
  struct pipe_draw_info *info)
  }
  draw_set_mapped_so_targets(draw, 0, NULL);
   
  -   if (lp-gs  !lp-gs-shader.tokens) {
  +   if (lp-gs  lp-gs-no_tokens) {
 /* we have attached stream output to the vs for rendering,
now lets reset it */
 if (lp-vs) {
  - draw_vs_reset_so(lp-vs-draw_data);
  + draw_vs_reset_so(lp-vs);
 }
  }
  
  diff --git a/src/gallium/drivers/llvmpipe/lp_state.h
  b/src/gallium/drivers/llvmpipe/lp_state.h
  index 8635cf1..2da6caa 100644
  --- a/src/gallium/drivers/llvmpipe/lp_state.h
  +++ b/src/gallium/drivers/llvmpipe/lp_state.h
  @@ -65,17 +65,10 @@ struct llvmpipe_context;
   
   
   
  -/** Subclass of pipe_shader_state */
  -struct lp_vertex_shader
  -{
  -   struct pipe_shader_state shader;
  -   struct draw_vertex_shader *draw_data;
  -};
  -
  -/** Subclass of pipe_shader_state */
   struct lp_geometry_shader {
  -   struct pipe_shader_state shader;
  -   struct draw_geometry_shader *draw_data;
  +   boolean no_tokens;
  +   struct pipe_stream_output_info stream_output;
  +   struct draw_geometry_shader *dgs;
   };
   
   /** Vertex element state */
  diff --git a/src/gallium/drivers/llvmpipe/lp_state_gs.c
  b/src/gallium/drivers/llvmpipe/lp_state_gs.c
  index 74cf992..c94afed 100644
  --- a/src/gallium/drivers/llvmpipe/lp_state_gs.c
  +++ b/src/gallium/drivers/llvmpipe/lp_state_gs.c
  @@ -48,7 +48,7 @@ llvmpipe_create_gs_state(struct pipe_context *pipe,
   
  state = CALLOC_STRUCT(lp_geometry_shader);
  if (state == NULL )
  -  goto fail;
  +  goto no_state;

Re: [Mesa-dev] [PATCH 2/2] llvmpipe: Simplify vertex and geometry shaders.

2014-03-26 Thread Zack Rusin

 I see the crashes you're referring to.
 
 I don't quite understand why though: concerning the geometry shader, other
 than cosmetic changes, in theory I should just have replaced a null/non-null
 `tokens` pointer with a boolean `no_tokens`, though obviously I missed
 something.

Yea, you missed the entire draw pipeline because you replaced:
if (templ-tokens) {
...
state-draw_data = draw_create_geometry_shader(llvmpipe-draw, templ);
}

with unconditional:
state-dgs = draw_create_geometry_shader(llvmpipe-draw, templ);

i.e. draw gs is /always/ created whether tokens are there or not. So the 
draw_bind_geometry_shader will always bind gs's with null tokens. And that's 
what draw can't handle. I think that replacing that with:
if (!state-no_tokens) {
  state-dgs = draw_create_geometry_shader(...);
  ...
}

should work.

 I should also had broken this in two separate changes: vs portion, and gs
 portion.

vs's are fine because they're never created with null tokens.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] draw/gs: reduce the size of the gs output buffer

2014-03-26 Thread Zack Rusin

We used to overallocate the output buffer sometimes running out
of memory with applications rendering large geometries. The actual
maximum number of vertices out is simply the maximum number of
primitives in (number of gs invocations) multiplied by the maximum
number of output vertices per gs input primitive (i.e. gs invocation).

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_gs.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_gs.c 
b/src/gallium/auxiliary/draw/draw_gs.c
index 97e8a90..7de5e03 100644
--- a/src/gallium/auxiliary/draw/draw_gs.c
+++ b/src/gallium/auxiliary/draw/draw_gs.c
@@ -552,6 +552,10 @@ int draw_geometry_shader_run(struct draw_geometry_shader 
*shader,
   u_decomposed_prims_for_vertices(shader-output_primitive,
   shader-max_output_vertices)
   * num_in_primitives;
+   /* we allocate exactly one extra vertex per primitive to allow the GS to 
emit
+* overflown vertices into some area where they won't harm anyone */
+   unsigned total_verts_per_buffer = shader-primitive_boundary *
+  num_in_primitives;
 
//Assume at least one primitive
max_out_prims = MAX2(max_out_prims, 1);
@@ -559,23 +563,25 @@ int draw_geometry_shader_run(struct draw_geometry_shader 
*shader,
 
output_verts-vertex_size = vertex_size;
output_verts-stride = output_verts-vertex_size;
-   /* we allocate exactly one extra vertex per primitive to allow the GS to 
emit
-* overflown vertices into some area where they won't harm anyone */
output_verts-verts =
   (struct vertex_header *)MALLOC(output_verts-vertex_size *
- max_out_prims *
- shader-primitive_boundary);
+ total_verts_per_buffer);
+   debug_assert(output_verts-verts);
 
 #if 0
debug_printf(%s count = %d (in prims # = %d)\n,
 __FUNCTION__, num_input_verts, num_in_primitives);
debug_printf(\tlinear = %d, prim_info-count = %d\n,
 input_prim-linear, input_prim-count);
-   debug_printf(\tprim pipe = %s, shader in = %s, shader out = %s, max out = 
%d\n,
+   debug_printf(\tprim pipe = %s, shader in = %s, shader out = %s\n
 u_prim_name(input_prim-prim),
 u_prim_name(shader-input_primitive),
-u_prim_name(shader-output_primitive),
-shader-max_output_vertices);
+u_prim_name(shader-output_primitive));
+   debug_printf(\tmaxv  = %d, maxp = %d, primitive_boundary = %d, 
+vertex_size = %d, tverts = %d\n,
+shader-max_output_vertices, max_out_prims,
+shader-primitive_boundary, output_verts-vertex_size,
+total_verts_per_buffer);
 #endif
 
shader-emitted_vertices = 0;
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] draw/llvm: improve debugging output a bit

2014-03-26 Thread Zack Rusin

it's useful to know what the llvmbuildstore arguments are going to
be before executing it because it can crash and make sure to
print out the inputs only if we're not generating a gs because
it fetches inputs differently.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_llvm.c  | 2 +-
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 53d13f3..b9f8bb9 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -939,11 +939,11 @@ store_aos_array(struct gallivm_state *gallivm,
  LLVMValueRef id_ptr = draw_jit_header_id(gallivm, io_ptrs[i]);
  val = LLVMBuildExtractElement(builder, cliptmp, linear_inds[i], );
  val = adjust_mask(gallivm, val);
- LLVMBuildStore(builder, val, id_ptr);
 #if DEBUG_STORE
  lp_build_printf(gallivm, io = %p, index %d, clipmask = %x\n,
  io_ptrs[i], inds[i], val);
 #endif
+ LLVMBuildStore(builder, val, id_ptr);
   }
}
 
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index d2cb0a0..8791168 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -3569,7 +3569,8 @@ static void emit_prologue(struct lp_build_tgsi_context * 
bld_base)
if (DEBUG_EXECUTION) {
   lp_build_printf(gallivm, \n);
   emit_dump_file(bld, TGSI_FILE_CONSTANT);
-  emit_dump_file(bld, TGSI_FILE_INPUT);
+  if (!bld-gs_iface)
+ emit_dump_file(bld, TGSI_FILE_INPUT);
}
 }
 
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] llvmpipe: Simplify vertex and geometry shaders.

2014-03-24 Thread Zack Rusin

The series looks great to me.

- Original Message -
 From: José Fonseca jfons...@vmware.com
 
 Eliminate lp_vertex_shader, as it added nothing over draw_vertex_shader.
 
 Simplify lp_geometry_shader, as most of the incoming state is unneeded.
 (We could also just use draw_geometry_shader if we were willing to peek
 inside the structure.)
 ---
  src/gallium/drivers/llvmpipe/lp_context.h |  4 +--
  src/gallium/drivers/llvmpipe/lp_draw_arrays.c |  8 ++---
  src/gallium/drivers/llvmpipe/lp_state.h   | 13 ++--
  src/gallium/drivers/llvmpipe/lp_state_gs.c| 32 +++
  src/gallium/drivers/llvmpipe/lp_state_vs.c| 46
  +++
  5 files changed, 33 insertions(+), 70 deletions(-)
 
 diff --git a/src/gallium/drivers/llvmpipe/lp_context.h
 b/src/gallium/drivers/llvmpipe/lp_context.h
 index 05cdfe5..ee8033c 100644
 --- a/src/gallium/drivers/llvmpipe/lp_context.h
 +++ b/src/gallium/drivers/llvmpipe/lp_context.h
 @@ -46,8 +46,8 @@
  struct llvmpipe_vbuf_render;
  struct draw_context;
  struct draw_stage;
 +struct draw_vertex_shader;
  struct lp_fragment_shader;
 -struct lp_vertex_shader;
  struct lp_blend_state;
  struct lp_setup_context;
  struct lp_setup_variant;
 @@ -63,7 +63,7 @@ struct llvmpipe_context {
 const struct pipe_depth_stencil_alpha_state *depth_stencil;
 const struct pipe_rasterizer_state *rasterizer;
 struct lp_fragment_shader *fs;
 -   const struct lp_vertex_shader *vs;
 +   struct draw_vertex_shader *vs;
 const struct lp_geometry_shader *gs;
 const struct lp_velems_state *velems;
 const struct lp_so_state *so;
 diff --git a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
 b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
 index 3df0a5c..99e6d19 100644
 --- a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
 +++ b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
 @@ -112,11 +112,11 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const
 struct pipe_draw_info *info)
 llvmpipe_prepare_geometry_sampling(lp,

 lp-num_sampler_views[PIPE_SHADER_GEOMETRY],

 lp-sampler_views[PIPE_SHADER_GEOMETRY]);
 -   if (lp-gs  !lp-gs-shader.tokens) {
 +   if (lp-gs  lp-gs-no_tokens) {
/* we have an empty geometry shader with stream output, so
   attach the stream output info to the current vertex shader */
if (lp-vs) {
 - draw_vs_attach_so(lp-vs-draw_data,
 lp-gs-shader.stream_output);
 + draw_vs_attach_so(lp-vs, lp-gs-stream_output);
}
 }
 draw_collect_pipeline_statistics(draw,
 @@ -136,11 +136,11 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const
 struct pipe_draw_info *info)
 }
 draw_set_mapped_so_targets(draw, 0, NULL);
  
 -   if (lp-gs  !lp-gs-shader.tokens) {
 +   if (lp-gs  lp-gs-no_tokens) {
/* we have attached stream output to the vs for rendering,
   now lets reset it */
if (lp-vs) {
 - draw_vs_reset_so(lp-vs-draw_data);
 + draw_vs_reset_so(lp-vs);
}
 }
 
 diff --git a/src/gallium/drivers/llvmpipe/lp_state.h
 b/src/gallium/drivers/llvmpipe/lp_state.h
 index 8635cf1..2da6caa 100644
 --- a/src/gallium/drivers/llvmpipe/lp_state.h
 +++ b/src/gallium/drivers/llvmpipe/lp_state.h
 @@ -65,17 +65,10 @@ struct llvmpipe_context;
  
  
  
 -/** Subclass of pipe_shader_state */
 -struct lp_vertex_shader
 -{
 -   struct pipe_shader_state shader;
 -   struct draw_vertex_shader *draw_data;
 -};
 -
 -/** Subclass of pipe_shader_state */
  struct lp_geometry_shader {
 -   struct pipe_shader_state shader;
 -   struct draw_geometry_shader *draw_data;
 +   boolean no_tokens;
 +   struct pipe_stream_output_info stream_output;
 +   struct draw_geometry_shader *dgs;
  };
  
  /** Vertex element state */
 diff --git a/src/gallium/drivers/llvmpipe/lp_state_gs.c
 b/src/gallium/drivers/llvmpipe/lp_state_gs.c
 index 74cf992..c94afed 100644
 --- a/src/gallium/drivers/llvmpipe/lp_state_gs.c
 +++ b/src/gallium/drivers/llvmpipe/lp_state_gs.c
 @@ -48,7 +48,7 @@ llvmpipe_create_gs_state(struct pipe_context *pipe,
  
 state = CALLOC_STRUCT(lp_geometry_shader);
 if (state == NULL )
 -  goto fail;
 +  goto no_state;
  
 /* debug */
 if (LP_DEBUG  DEBUG_TGSI) {
 @@ -57,26 +57,19 @@ llvmpipe_create_gs_state(struct pipe_context *pipe,
 }
  
 /* copy stream output info */
 -   state-shader = *templ;
 -   if (templ-tokens) {
 -  /* copy shader tokens, the ones passed in will go away. */
 -  state-shader.tokens = tgsi_dup_tokens(templ-tokens);
 -  if (state-shader.tokens == NULL)
 - goto fail;
 -
 -  state-draw_data = draw_create_geometry_shader(llvmpipe-draw, templ);
 -  if (state-draw_data == NULL)
 - goto fail;
 +   state-no_tokens = !templ-tokens;
 +   memcpy(state-stream_output, templ-stream_output, sizeof
 state-stream_output);
 +
 +   state-dgs = draw_create_geometry_shader(llvmpipe-draw, templ);
 +   if

Re: [Mesa-dev] [PATCH] gallivm: fix no-op n:n lp_build_resize()

2014-03-24 Thread Zack Rusin

Looks good to me.

z

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 This can get called in some circumstances if both src type and dst type
 have same width (seen with float32-unorm32). While this particular case
 was bogus anyway let's just fix that as it can work trivially (due to the
 way it was called it actually worked anyway apart from the assert).
 ---
  src/gallium/auxiliary/gallivm/lp_bld_pack.c |   12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
 b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
 index 22a4f5a8..2b0a1fb 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c
 @@ -710,9 +710,6 @@ lp_build_resize(struct gallivm_state *gallivm,
 /* We must not loose or gain channels. Only precision */
 assert(src_type.length * num_srcs == dst_type.length * num_dsts);
  
 -   /* We don't support M:N conversion, only 1:N, M:1, or 1:1 */
 -   assert(num_srcs == 1 || num_dsts == 1);
 -
 assert(src_type.length = LP_MAX_VECTOR_LENGTH);
 assert(dst_type.length = LP_MAX_VECTOR_LENGTH);
 assert(num_srcs = LP_MAX_VECTOR_LENGTH);
 @@ -723,6 +720,7 @@ lp_build_resize(struct gallivm_state *gallivm,
 * Truncate bit width.
 */
  
 +  /* Conversion must be M:1 */
assert(num_dsts == 1);
  
if (src_type.width * src_type.length == dst_type.width *
dst_type.length) {
 @@ -775,6 +773,7 @@ lp_build_resize(struct gallivm_state *gallivm,
 * Expand bit width.
 */
  
 +  /* Conversion must be 1:N */
assert(num_srcs == 1);
  
if (src_type.width * src_type.length == dst_type.width *
dst_type.length) {
 @@ -813,10 +812,11 @@ lp_build_resize(struct gallivm_state *gallivm,
 * No-op
 */
  
 -  assert(num_srcs == 1);
 -  assert(num_dsts == 1);
 +  /* Conversion must be N:N */
 +  assert(num_srcs == num_dsts);
  
 -  tmp[0] = src[0];
 +  for(i = 0; i  num_dsts; ++i)
 + tmp[i] = src[i];
 }
  
 for(i = 0; i  num_dsts; ++i)
 --
 1.7.9.5
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallium: allow setting of the internal stream output offset

2014-03-07 Thread Zack Rusin

(This version includes comments from Roland.)

D3D10 allows setting of the internal offset of a buffer, which is
in general only incremented via actual stream output writes. By
allowing setting of the internal offset draw_auto is capable
of rendering from buffers which have not been actually streamed
out to. Our interface didn't allow. This change functionally
shouldn't make any difference to OpenGL where instead of an
append_bitmask you just get a real array where -1 means append
(like in D3D) and 0 means do not append.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/cso_cache/cso_context.c | 13 -
 src/gallium/auxiliary/cso_cache/cso_context.h |  2 +-
 src/gallium/auxiliary/draw/draw_context.h |  3 +--
 src/gallium/auxiliary/draw/draw_pt.c  |  8 +---
 src/gallium/auxiliary/draw/draw_pt_so_emit.c  |  3 +--
 src/gallium/auxiliary/hud/hud_context.c   |  2 +-
 src/gallium/auxiliary/postprocess/pp_run.c|  2 +-
 src/gallium/auxiliary/util/u_blit.c   |  2 +-
 src/gallium/auxiliary/util/u_blitter.c| 13 +
 src/gallium/auxiliary/util/u_gen_mipmap.c |  2 +-
 src/gallium/docs/source/context.rst   |  9 +
 src/gallium/drivers/galahad/glhd_context.c|  4 ++--
 src/gallium/drivers/ilo/ilo_state.c   |  8 ++--
 src/gallium/drivers/llvmpipe/lp_state_so.c| 12 ++--
 src/gallium/drivers/noop/noop_state.c |  2 +-
 src/gallium/drivers/nouveau/nv50/nv50_state.c |  7 ---
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c |  7 ---
 src/gallium/drivers/radeon/r600_pipe_common.h |  2 +-
 src/gallium/drivers/radeon/r600_streamout.c   |  5 -
 src/gallium/drivers/radeonsi/si_descriptors.c |  4 ++--
 src/gallium/drivers/softpipe/sp_state_so.c|  2 +-
 src/gallium/drivers/trace/tr_context.c|  6 +++---
 src/gallium/include/pipe/p_context.h  |  2 +-
 src/gallium/tools/trace/dump_state.py |  4 ++--
 src/mesa/state_tracker/st_cb_bitmap.c |  2 +-
 src/mesa/state_tracker/st_cb_clear.c  |  2 +-
 src/mesa/state_tracker/st_cb_drawpixels.c |  2 +-
 src/mesa/state_tracker/st_cb_drawtex.c|  2 +-
 src/mesa/state_tracker/st_cb_xformfb.c| 20 +---
 29 files changed, 88 insertions(+), 64 deletions(-)

diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c 
b/src/gallium/auxiliary/cso_cache/cso_context.c
index 2dcf01d..9146684 100644
--- a/src/gallium/auxiliary/cso_cache/cso_context.c
+++ b/src/gallium/auxiliary/cso_cache/cso_context.c
@@ -332,7 +332,7 @@ void cso_release_all( struct cso_context *ctx )
   ctx-pipe-bind_vertex_elements_state( ctx-pipe, NULL );
 
   if (ctx-pipe-set_stream_output_targets)
- ctx-pipe-set_stream_output_targets(ctx-pipe, 0, NULL, 0);
+ ctx-pipe-set_stream_output_targets(ctx-pipe, 0, NULL, NULL);
}
 
/* free fragment sampler views */
@@ -1241,7 +1241,7 @@ void
 cso_set_stream_outputs(struct cso_context *ctx,
unsigned num_targets,
struct pipe_stream_output_target **targets,
-   unsigned append_bitmask)
+   const unsigned *offsets)
 {
struct pipe_context *pipe = ctx-pipe;
uint i;
@@ -1266,7 +1266,7 @@ cso_set_stream_outputs(struct cso_context *ctx,
}
 
pipe-set_stream_output_targets(pipe, num_targets, targets,
-   append_bitmask);
+   offsets);
ctx-nr_so_targets = num_targets;
 }
 
@@ -1292,6 +1292,7 @@ cso_restore_stream_outputs(struct cso_context *ctx)
 {
struct pipe_context *pipe = ctx-pipe;
uint i;
+   unsigned offset[PIPE_MAX_SO_BUFFERS];
 
if (!ctx-has_streamout) {
   return;
@@ -1302,19 +1303,21 @@ cso_restore_stream_outputs(struct cso_context *ctx)
   return;
}
 
+   assert(ctx-nr_so_targets_saved = PIPE_MAX_SO_BUFFERS);
for (i = 0; i  ctx-nr_so_targets_saved; i++) {
   pipe_so_target_reference(ctx-so_targets[i], NULL);
   /* move the reference from one pointer to another */
   ctx-so_targets[i] = ctx-so_targets_saved[i];
   ctx-so_targets_saved[i] = NULL;
+  /* -1 means append */
+  offset[i] = (unsigned)-1;
}
for (; i  ctx-nr_so_targets; i++) {
   pipe_so_target_reference(ctx-so_targets[i], NULL);
}
 
-   /* ~0 means append */
pipe-set_stream_output_targets(pipe, ctx-nr_so_targets_saved,
-   ctx-so_targets, ~0);
+   ctx-so_targets, offset);
 
ctx-nr_so_targets = ctx-nr_so_targets_saved;
ctx-nr_so_targets_saved = 0;
diff --git a/src/gallium/auxiliary/cso_cache/cso_context.h 
b/src/gallium/auxiliary/cso_cache/cso_context.h
index 822e2df..1aa9998 100644
--- a/src/gallium/auxiliary/cso_cache/cso_context.h
+++ b/src/gallium/auxiliary/cso_cache/cso_context.h
@@ -115,7 +115,7 @@ unsigned cso_get_aux_vertex_buffer_slot(struct cso_context 
*ctx);
 void

[Mesa-dev] [PATCH] gallium: allow setting of the internal stream output offset

2014-03-06 Thread Zack Rusin

D3D10 allows setting of the internal offset of a buffer, which is
in general only incremented via actual stream output writes. By
allowing setting of the internal offset draw_auto is capable
of rendering from buffers which have not been actually streamed
out to. Our interface didn't allow. This change functionally
shouldn't make any difference to OpenGL where instead of an
append_bitmask you just get a real array where -1 means append
(like in D3D) and 0 means do not append.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/cso_cache/cso_context.c | 13 -
 src/gallium/auxiliary/cso_cache/cso_context.h |  2 +-
 src/gallium/auxiliary/draw/draw_pt.c  |  8 +---
 src/gallium/auxiliary/hud/hud_context.c   |  2 +-
 src/gallium/auxiliary/postprocess/pp_run.c|  2 +-
 src/gallium/auxiliary/util/u_blit.c   |  2 +-
 src/gallium/auxiliary/util/u_blitter.c| 13 +
 src/gallium/auxiliary/util/u_gen_mipmap.c |  2 +-
 src/gallium/docs/source/context.rst   |  9 +
 src/gallium/drivers/galahad/glhd_context.c|  4 ++--
 src/gallium/drivers/ilo/ilo_state.c   |  8 ++--
 src/gallium/drivers/llvmpipe/lp_state_so.c|  7 ---
 src/gallium/drivers/noop/noop_state.c |  2 +-
 src/gallium/drivers/nouveau/nv50/nv50_state.c |  7 ---
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c |  7 ---
 src/gallium/drivers/radeon/r600_pipe_common.h |  2 +-
 src/gallium/drivers/radeon/r600_streamout.c   |  5 -
 src/gallium/drivers/radeonsi/si_descriptors.c |  4 ++--
 src/gallium/drivers/softpipe/sp_state_so.c|  2 +-
 src/gallium/drivers/trace/tr_context.c|  6 +++---
 src/gallium/include/pipe/p_context.h  |  2 +-
 src/gallium/tools/trace/dump_state.py |  4 ++--
 src/mesa/state_tracker/st_cb_bitmap.c |  2 +-
 src/mesa/state_tracker/st_cb_clear.c  |  2 +-
 src/mesa/state_tracker/st_cb_drawpixels.c |  2 +-
 src/mesa/state_tracker/st_cb_drawtex.c|  2 +-
 src/mesa/state_tracker/st_cb_xformfb.c| 20 +---
 27 files changed, 84 insertions(+), 57 deletions(-)

diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c 
b/src/gallium/auxiliary/cso_cache/cso_context.c
index 2dcf01d..9146684 100644
--- a/src/gallium/auxiliary/cso_cache/cso_context.c
+++ b/src/gallium/auxiliary/cso_cache/cso_context.c
@@ -332,7 +332,7 @@ void cso_release_all( struct cso_context *ctx )
   ctx-pipe-bind_vertex_elements_state( ctx-pipe, NULL );
 
   if (ctx-pipe-set_stream_output_targets)
- ctx-pipe-set_stream_output_targets(ctx-pipe, 0, NULL, 0);
+ ctx-pipe-set_stream_output_targets(ctx-pipe, 0, NULL, NULL);
}
 
/* free fragment sampler views */
@@ -1241,7 +1241,7 @@ void
 cso_set_stream_outputs(struct cso_context *ctx,
unsigned num_targets,
struct pipe_stream_output_target **targets,
-   unsigned append_bitmask)
+   const unsigned *offsets)
 {
struct pipe_context *pipe = ctx-pipe;
uint i;
@@ -1266,7 +1266,7 @@ cso_set_stream_outputs(struct cso_context *ctx,
}
 
pipe-set_stream_output_targets(pipe, num_targets, targets,
-   append_bitmask);
+   offsets);
ctx-nr_so_targets = num_targets;
 }
 
@@ -1292,6 +1292,7 @@ cso_restore_stream_outputs(struct cso_context *ctx)
 {
struct pipe_context *pipe = ctx-pipe;
uint i;
+   unsigned offset[PIPE_MAX_SO_BUFFERS];
 
if (!ctx-has_streamout) {
   return;
@@ -1302,19 +1303,21 @@ cso_restore_stream_outputs(struct cso_context *ctx)
   return;
}
 
+   assert(ctx-nr_so_targets_saved = PIPE_MAX_SO_BUFFERS);
for (i = 0; i  ctx-nr_so_targets_saved; i++) {
   pipe_so_target_reference(ctx-so_targets[i], NULL);
   /* move the reference from one pointer to another */
   ctx-so_targets[i] = ctx-so_targets_saved[i];
   ctx-so_targets_saved[i] = NULL;
+  /* -1 means append */
+  offset[i] = (unsigned)-1;
}
for (; i  ctx-nr_so_targets; i++) {
   pipe_so_target_reference(ctx-so_targets[i], NULL);
}
 
-   /* ~0 means append */
pipe-set_stream_output_targets(pipe, ctx-nr_so_targets_saved,
-   ctx-so_targets, ~0);
+   ctx-so_targets, offset);
 
ctx-nr_so_targets = ctx-nr_so_targets_saved;
ctx-nr_so_targets_saved = 0;
diff --git a/src/gallium/auxiliary/cso_cache/cso_context.h 
b/src/gallium/auxiliary/cso_cache/cso_context.h
index 822e2df..1aa9998 100644
--- a/src/gallium/auxiliary/cso_cache/cso_context.h
+++ b/src/gallium/auxiliary/cso_cache/cso_context.h
@@ -115,7 +115,7 @@ unsigned cso_get_aux_vertex_buffer_slot(struct cso_context 
*ctx);
 void cso_set_stream_outputs(struct cso_context *ctx,
 unsigned num_targets,
 struct pipe_stream_output_target **targets

[Mesa-dev] [PATCH] draw/llvm: fix generation of the VS with GS present

2014-03-03 Thread Zack Rusin

draw_current_shader_* functions return a final output when considering
both the geometry shader and the vertex shader. But when code generating
vertex shader we can not be using output slots from the geometry shader
because, obviously, those can be completely different. This fixes a
number of very non-obvious crashes.
A side-effect of this bug was that sometimes the vertex shading code
could save some random outputs as position/clip when the geometry
shader was writing them and vertex shader had different outputs at
those slots (sometimes writing garbage and sometimes something correct).
---
 src/gallium/auxiliary/draw/draw_llvm.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 0bbb680..53d13f3 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1104,7 +1104,7 @@ generate_viewport(struct draw_llvm_variant *variant,
int i;
struct gallivm_state *gallivm = variant-gallivm;
struct lp_type f32_type = vs_type;
-   const unsigned pos = 
draw_current_shader_position_output(variant-llvm-draw);
+   const unsigned pos = variant-llvm-draw-vs.position_output;
LLVMTypeRef vs_type_llvm = lp_build_vec_type(gallivm, vs_type);
LLVMValueRef out3 = LLVMBuildLoad(builder, outputs[pos][3], ); /*w0 w1 .. 
wn*/
LLVMValueRef const1 = lp_build_const_vec(gallivm, f32_type, 1.0);   
/*1.0 1.0 1.0 1.0*/
@@ -1173,14 +1173,14 @@ generate_clipmask(struct draw_llvm *llvm,
LLVMValueRef plane1, planes, plane_ptr, sum;
struct lp_type f32_type = vs_type;
struct lp_type i32_type = lp_int_type(vs_type);
-   const unsigned pos = draw_current_shader_position_output(llvm-draw);
-   const unsigned cv = draw_current_shader_clipvertex_output(llvm-draw);
+   const unsigned pos = llvm-draw-vs.position_output;
+   const unsigned cv = llvm-draw-vs.clipvertex_output;
int num_written_clipdistance = 
llvm-draw-vs.vertex_shader-info.num_written_clipdistance;
bool have_cd = false;
unsigned cd[2];
 
-   cd[0] = draw_current_shader_clipdistance_output(llvm-draw, 0);
-   cd[1] = draw_current_shader_clipdistance_output(llvm-draw, 1);
+   cd[0] = llvm-draw-vs.clipdistance_output[0];
+   cd[1] = llvm-draw-vs.clipdistance_output[1];
 
if (cd[0] != pos || cd[1] != pos)
   have_cd = true;
@@ -1551,8 +1551,8 @@ draw_llvm_generate(struct draw_llvm *llvm, struct 
draw_llvm_variant *variant,
key-clip_z  ||
key-clip_user);
LLVMValueRef variant_func;
-   const unsigned pos = draw_current_shader_position_output(llvm-draw);
-   const unsigned cv = draw_current_shader_clipvertex_output(llvm-draw);
+   const unsigned pos = llvm-draw-vs.position_output;
+   const unsigned cv = llvm-draw-vs.clipvertex_output;
boolean have_clipdist = FALSE;
struct lp_bld_tgsi_system_values system_values;
 
-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] translate: fix buffer overflows

2014-03-03 Thread Zack Rusin

Because in draw we always inject position at slot 0 whenever
fragment shader would take the maximum number of inputs (32) it
meant that we had PIPE_MAX_ATTRIBS + 1 slots to translate, which
meant that we were crashing with fragment shaders that took
the maximum number of attributes as inputs. The actual max number
of attributes we need to translate thus is PIPE_MAX_ATTRIBS + 1.
---
 src/gallium/auxiliary/translate/translate_generic.c | 2 +-
 src/gallium/auxiliary/translate/translate_sse.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/auxiliary/translate/translate_generic.c 
b/src/gallium/auxiliary/translate/translate_generic.c
index 5ffce32..82b4d00 100644
--- a/src/gallium/auxiliary/translate/translate_generic.c
+++ b/src/gallium/auxiliary/translate/translate_generic.c
@@ -73,7 +73,7 @@ struct translate_generic {
*/
   int copy_size;
 
-   } attrib[PIPE_MAX_ATTRIBS];
+   } attrib[PIPE_MAX_ATTRIBS + 1];
 
unsigned nr_attrib;
 };
diff --git a/src/gallium/auxiliary/translate/translate_sse.c 
b/src/gallium/auxiliary/translate/translate_sse.c
index b6bc222..1833d8a 100644
--- a/src/gallium/auxiliary/translate/translate_sse.c
+++ b/src/gallium/auxiliary/translate/translate_sse.c
@@ -104,15 +104,15 @@ struct translate_sse
int8_t reg_to_const[16];
int8_t const_to_reg[NUM_CONSTS];
 
-   struct translate_buffer buffer[PIPE_MAX_ATTRIBS];
+   struct translate_buffer buffer[PIPE_MAX_ATTRIBS + 1];
unsigned nr_buffers;
 
/* Multiple buffer variants can map to a single buffer. */
-   struct translate_buffer_variant buffer_variant[PIPE_MAX_ATTRIBS];
+   struct translate_buffer_variant buffer_variant[PIPE_MAX_ATTRIBS + 1];
unsigned nr_buffer_variants;
 
/* Multiple elements can map to a single buffer variant. */
-   unsigned element_to_buffer_variant[PIPE_MAX_ATTRIBS];
+   unsigned element_to_buffer_variant[PIPE_MAX_ATTRIBS + 1];
 
boolean use_instancing;
unsigned instance_id;
-- 
1.9.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: fix F2U opcode

2014-02-04 Thread Zack Rusin

Looks good.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 Previously, we were really doing F2I. And also move it to generic section.
 (Note that for llvmpipe the code generated is definitely bad, due to lack
 of unsigned conversions with sse. I think though what llvm does (using scalar
 conversions to 64bit signed either with x87 fpu (32bit) or sse (64bit)
 including lots of domain changes is quite suboptimal, could do something like
 is_large = arg = 2^31
 half_arg = 0.5 * arg
 small_c = fptoint(arg)
 large_c = fptoint(half_arg)  1
 res = select(is_large, large_c, small_c)
 which should be much less instructions but that's something llvm should do
 itself.)
 
 This fixes piglit fs/vs-float-uint-conversion.shader_test (maybe more, needs
 GL 3.0 version override to run.)
 ---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c |   42
  ++--
  1 file changed, 22 insertions(+), 20 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
 b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
 index caaeb01..b9546db 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
 @@ -720,10 +720,23 @@ sub_emit(
 struct lp_build_tgsi_context * bld_base,
 struct lp_build_emit_data * emit_data)
  {
 - emit_data-output[emit_data-chan] = LLVMBuildFSub(
 - bld_base-base.gallivm-builder,
 - emit_data-args[0],
 - emit_data-args[1], );
 +   emit_data-output[emit_data-chan] =
 +  LLVMBuildFSub(bld_base-base.gallivm-builder,
 +emit_data-args[0],
 +emit_data-args[1], );
 +}
 +
 +/* TGSI_OPCODE_F2U */
 +static void
 +f2u_emit(
 +   const struct lp_build_tgsi_action * action,
 +   struct lp_build_tgsi_context * bld_base,
 +   struct lp_build_emit_data * emit_data)
 +{
 +   emit_data-output[emit_data-chan] =
 +  LLVMBuildFPToUI(bld_base-base.gallivm-builder,
 +  emit_data-args[0],
 +  bld_base-base.int_vec_type, );
  }
  
  /* TGSI_OPCODE_U2F */
 @@ -733,9 +746,10 @@ u2f_emit(
 struct lp_build_tgsi_context * bld_base,
 struct lp_build_emit_data * emit_data)
  {
 -   emit_data-output[emit_data-chan] =
 LLVMBuildUIToFP(bld_base-base.gallivm-builder,
 - emit_data-args[0],
 - 
 bld_base-base.vec_type, );
 +   emit_data-output[emit_data-chan] =
 +  LLVMBuildUIToFP(bld_base-base.gallivm-builder,
 +  emit_data-args[0],
 +  bld_base-base.vec_type, );
  }
  
  static void
 @@ -949,6 +963,7 @@ lp_set_default_actions(struct lp_build_tgsi_context *
 bld_base)
 bld_base-op_actions[TGSI_OPCODE_SUB].emit = sub_emit;
  
 bld_base-op_actions[TGSI_OPCODE_UARL].emit = mov_emit;
 +   bld_base-op_actions[TGSI_OPCODE_F2U].emit = f2u_emit;
 bld_base-op_actions[TGSI_OPCODE_U2F].emit = u2f_emit;
 bld_base-op_actions[TGSI_OPCODE_UMAD].emit = umad_emit;
 bld_base-op_actions[TGSI_OPCODE_UMUL].emit = umul_emit;
 @@ -1128,18 +1143,6 @@ f2i_emit_cpu(
  emit_data-args[0]);
  }
  
 -/* TGSI_OPCODE_F2U (CPU Only) */
 -static void
 -f2u_emit_cpu(
 -   const struct lp_build_tgsi_action * action,
 -   struct lp_build_tgsi_context * bld_base,
 -   struct lp_build_emit_data * emit_data)
 -{
 -   /* FIXME: implement and use lp_build_utrunc() */
 -   emit_data-output[emit_data-chan] = lp_build_itrunc(bld_base-base,
 -emit_data-args[0]);
 -}
 -
  /* TGSI_OPCODE_FSET Helper (CPU Only) */
  static void
  fset_emit_cpu(
 @@ -1832,7 +1835,6 @@ lp_set_default_actions_cpu(
 bld_base-op_actions[TGSI_OPCODE_DIV].emit = div_emit_cpu;
 bld_base-op_actions[TGSI_OPCODE_EX2].emit = ex2_emit_cpu;
 bld_base-op_actions[TGSI_OPCODE_F2I].emit = f2i_emit_cpu;
 -   bld_base-op_actions[TGSI_OPCODE_F2U].emit = f2u_emit_cpu;
 bld_base-op_actions[TGSI_OPCODE_FLR].emit = flr_emit_cpu;
 bld_base-op_actions[TGSI_OPCODE_FSEQ].emit = fseq_emit_cpu;
 bld_base-op_actions[TGSI_OPCODE_FSGE].emit = fsge_emit_cpu;
 --
 1.7.9.5
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] gallivm: handle huge number of immediates

2014-02-04 Thread Zack Rusin

We only supported up to 256 immediates, which isn't enough. We had
code which was allocating immediates as an allocated array, but it
was always used along a statically backed array for performance
reasons. This commit adds code to skip that performance optimization
and always use just the dynamically allocated immediates if the
number of them is too great.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.h |   2 +-
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 112 
 2 files changed, 77 insertions(+), 37 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
index 1a93951..46f7d77 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
@@ -482,7 +482,7 @@ struct lp_build_tgsi_soa_context
struct lp_exec_mask exec_mask;
 
uint num_immediates;
-
+   boolean use_immediates_array;
 };
 
 void
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index 7c5de21..067e6af 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -1295,33 +1295,42 @@ emit_fetch_immediate(
LLVMBuilderRef builder = gallivm-builder;
LLVMValueRef res = NULL;
 
-   if (reg-Register.Indirect) {
-  LLVMValueRef indirect_index;
-  LLVMValueRef index_vec;  /* index into the immediate register array */
+   if (bld-use_immediates_array || reg-Register.Indirect) {
   LLVMValueRef imms_array;
   LLVMTypeRef fptr_type;
 
-  indirect_index = get_indirect_index(bld,
-  reg-Register.File,
-  reg-Register.Index,
-  reg-Indirect);
-  /*
-   * Unlike for other reg classes, adding pixel offsets is unnecessary -
-   * immediates are stored as full vectors (FIXME??? - might be better
-   * to store them the same as constants) but all elements are the same
-   * in any case.
-   */
-  index_vec = get_soa_array_offsets(bld_base-uint_bld,
-indirect_index,
-swizzle,
-FALSE);
-
   /* cast imms_array pointer to float* */
   fptr_type = LLVMPointerType(LLVMFloatTypeInContext(gallivm-context), 0);
   imms_array = LLVMBuildBitCast(builder, bld-imms_array, fptr_type, );
 
-  /* Gather values from the immediate register array */
-  res = build_gather(bld_base-base, imms_array, index_vec, NULL);
+  if (reg-Register.Indirect) {
+ LLVMValueRef indirect_index;
+ LLVMValueRef index_vec;  /* index into the immediate register array */
+
+ indirect_index = get_indirect_index(bld,
+ reg-Register.File,
+ reg-Register.Index,
+ reg-Indirect);
+ /*
+  * Unlike for other reg classes, adding pixel offsets is unnecessary -
+  * immediates are stored as full vectors (FIXME??? - might be better
+  * to store them the same as constants) but all elements are the same
+  * in any case.
+  */
+ index_vec = get_soa_array_offsets(bld_base-uint_bld,
+   indirect_index,
+   swizzle,
+   FALSE);
+
+ /* Gather values from the immediate register array */
+ res = build_gather(bld_base-base, imms_array, index_vec, NULL);
+  } else {
+ LLVMValueRef lindex = lp_build_const_int32(gallivm,
+reg-Register.Index * 4 + swizzle);
+ LLVMValueRef imms_ptr =  LLVMBuildGEP(builder,
+bld-imms_array, lindex, 1, 
);
+ res = LLVMBuildLoad(builder, imms_ptr, );
+  }
}
else {
   res = bld-immediates[reg-Register.Index][swizzle];
@@ -2728,51 +2737,71 @@ void lp_emit_immediate_soa(
 {
struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base);
struct gallivm_state * gallivm = bld_base-base.gallivm;
-
-   /* simply copy the immediate values into the next immediates[] slot */
+   LLVMValueRef imms[4];
unsigned i;
const uint size = imm-Immediate.NrTokens - 1;
assert(size = 4);
-   assert(bld-num_immediates  LP_MAX_TGSI_IMMEDIATES);
switch (imm-Immediate.DataType) {
case TGSI_IMM_FLOAT32:
   for( i = 0; i  size; ++i )
- bld-immediates[bld-num_immediates][i] =
-lp_build_const_vec(gallivm, bld_base-base.type, imm-u[i].Float);
+ imms[i] =
+   lp_build_const_vec(gallivm, bld_base-base.type, 
imm-u[i].Float);
 
   break;
case TGSI_IMM_UINT32

[Mesa-dev] [PATCH 2/3] gallivm: make sure analysis works with large number of immediates

2014-02-04 Thread Zack Rusin

We need to handle a lot more immediates and in order to do that
we also switch from allocating this structure on the stack to
allocating it on the heap.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c
index 184790b..ce0598d 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c
@@ -47,7 +47,7 @@ struct analysis_context
struct lp_tgsi_info *info;
 
unsigned num_imms;
-   float imm[128][4];
+   float imm[4096][4];
 
struct lp_tgsi_channel_info temp[32][4];
 };
@@ -487,7 +487,7 @@ lp_build_tgsi_info(const struct tgsi_token *tokens,
struct lp_tgsi_info *info)
 {
struct tgsi_parse_context parse;
-   struct analysis_context ctx;
+   struct analysis_context *ctx;
unsigned index;
unsigned chan;
 
@@ -495,8 +495,8 @@ lp_build_tgsi_info(const struct tgsi_token *tokens,
 
tgsi_scan_shader(tokens, info-base);
 
-   memset(ctx, 0, sizeof ctx);
-   ctx.info = info;
+   ctx = CALLOC(1, sizeof(struct analysis_context));
+   ctx-info = info;
 
tgsi_parse_init(parse, tokens);
 
@@ -518,7 +518,7 @@ lp_build_tgsi_info(const struct tgsi_token *tokens,
goto finished;
 }
 
-analyse_instruction(ctx, inst);
+analyse_instruction(ctx, inst);
  }
  break;
 
@@ -527,16 +527,16 @@ lp_build_tgsi_info(const struct tgsi_token *tokens,
 const unsigned size =
   parse.FullToken.FullImmediate.Immediate.NrTokens - 1;
 assert(size = 4);
-if (ctx.num_imms  Elements(ctx.imm)) {
+if (ctx-num_imms  Elements(ctx-imm)) {
for (chan = 0; chan  size; ++chan) {
   float value = parse.FullToken.FullImmediate.u[chan].Float;
-  ctx.imm[ctx.num_imms][chan] = value;
+  ctx-imm[ctx-num_imms][chan] = value;
 
   if (value  0.0f || value  1.0f) {
  info-unclamped_immediates = TRUE;
   }
}
-   ++ctx.num_imms;
+   ++ctx-num_imms;
 }
  }
  break;
@@ -551,6 +551,7 @@ lp_build_tgsi_info(const struct tgsi_token *tokens,
 finished:
 
tgsi_parse_free(parse);
+   FREE(ctx);
 
 
/*
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] tgsi/ureg: increase the number of immediates

2014-02-04 Thread Zack Rusin

ureg_program is allocated on the heap so we can just bump the
number of immediates that it can handle. It's needed for d3d10.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c 
b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
index f06858e..f928f57 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
@@ -77,7 +77,7 @@ struct ureg_tokens {
 #define UREG_MAX_SYSTEM_VALUE PIPE_MAX_ATTRIBS
 #define UREG_MAX_OUTPUT PIPE_MAX_SHADER_OUTPUTS
 #define UREG_MAX_CONSTANT_RANGE 32
-#define UREG_MAX_IMMEDIATE 256
+#define UREG_MAX_IMMEDIATE 4096
 #define UREG_MAX_ADDR 2
 #define UREG_MAX_PRED 1
 #define UREG_MAX_ARRAY_TEMPS 256
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] tgsi/ureg: increase the number of immediates

2014-02-04 Thread Zack Rusin

Yes, they simply always behave as if they were accessed indirectly from our 
code, but llvm seems to be pretty good at moving all of those accesses to 
registers (aka. eliminating alloca's) if they're not actually indirectly 
indexed, so it all ends up pretty.

z

- Original Message -
 Am 05.02.2014 01:34, schrieb Zack Rusin:
  ureg_program is allocated on the heap so we can just bump the
  number of immediates that it can handle. It's needed for d3d10.
  
  Signed-off-by: Zack Rusin za...@vmware.com
  ---
   src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
  b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
  index f06858e..f928f57 100644
  --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c
  +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c
  @@ -77,7 +77,7 @@ struct ureg_tokens {
   #define UREG_MAX_SYSTEM_VALUE PIPE_MAX_ATTRIBS
   #define UREG_MAX_OUTPUT PIPE_MAX_SHADER_OUTPUTS
   #define UREG_MAX_CONSTANT_RANGE 32
  -#define UREG_MAX_IMMEDIATE 256
  +#define UREG_MAX_IMMEDIATE 4096
   #define UREG_MAX_ADDR 2
   #define UREG_MAX_PRED 1
   #define UREG_MAX_ARRAY_TEMPS 256
  
 
 Series looks good to me. llvm can still perform all optimizations on
 such immediates right?
 
 Roland
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] gallivm: handle huge number of immediates

2014-02-04 Thread Zack Rusin

  reasons. This commit adds code to skip that performance optimization
  and always use just the dynamically allocated immediates if the
  number of them is too great.
 
 So is there any limit on the number of immediates now?

Technically not. Practically other parts of the code will max out and assert at 
anything greater than 4096 which is what sm4 defines as maximum for temps. So 
at least theoretically the gallivm code will just work if that limit is 
increased elsewhere.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallivm: allow large numbers of temporaries

2014-02-03 Thread Zack Rusin

The number of allowed temporaries increases almost with every
iteration of an api. We used to support 128, then we started
increasing and the newer api's support 4096+. So if we notice
that the number of temporaries is larger than our statically
allocated storage would allow we just treat them as indexable
temporaries and allocate them as an array from the start.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index 9db41a9..7c5de21 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -2672,8 +2672,8 @@ lp_emit_declaration_soa(
   assert(last = bld-bld_base.info-file_max[decl-Declaration.File]);
   switch (decl-Declaration.File) {
   case TGSI_FILE_TEMPORARY:
- assert(idx  LP_MAX_TGSI_TEMPS);
  if (!(bld-indirect_files  (1  TGSI_FILE_TEMPORARY))) {
+assert(idx  LP_MAX_TGSI_TEMPS);
 for (i = 0; i  TGSI_NUM_CHANNELS; i++)
bld-temps[idx][i] = lp_build_alloca(gallivm, vec_type, temp);
  }
@@ -3621,6 +3621,15 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm,
bld.bld_base.info = info;
bld.indirect_files = info-indirect_files;
 
+   /*
+* If the number of temporaries is rather large then we just
+* allocate them as an array right from the start and treat
+* like indirect temporaries.
+*/
+   if (info-file_max[TGSI_FILE_TEMPORARY] = LP_MAX_TGSI_TEMPS) {
+  bld.indirect_files |= (1  TGSI_FILE_TEMPORARY);
+   }
+
bld.bld_base.soa = TRUE;
bld.bld_base.emit_debug = emit_debug;
bld.bld_base.emit_fetch_funcs[TGSI_FILE_CONSTANT] = emit_fetch_constant;
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] d3d10: allow indexable temporaries as relative registers

2014-02-03 Thread Zack Rusin

Indexable temporaries are 2d (the index of the array and the index
within the array) and can be used both as outputs, inputs and relative
addressing registers. This fixes parsing of indexable temporaries
and fixes their parsing in relative addressing.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/state_trackers/d3d10/ShaderParse.c | 14 ++
 src/gallium/state_trackers/d3d10/ShaderParse.h |  2 +-
 src/gallium/state_trackers/d3d10/ShaderTGSI.c  |  8 +++-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/src/gallium/state_trackers/d3d10/ShaderParse.c 
b/src/gallium/state_trackers/d3d10/ShaderParse.c
index 38ec2fe..7cec385 100644
--- a/src/gallium/state_trackers/d3d10/ShaderParse.c
+++ b/src/gallium/state_trackers/d3d10/ShaderParse.c
@@ -207,13 +207,19 @@ parse_relative_operand(const unsigned **curr,
assert(operand-type != D3D10_SB_OPERAND_TYPE_IMMEDIATE32);
 
/* Index dimension. */
-   assert(DECODE_D3D10_SB_OPERAND_INDEX_DIMENSION(**curr) == 
D3D10_SB_OPERAND_INDEX_1D);
assert(DECODE_D3D10_SB_OPERAND_INDEX_REPRESENTATION(0, **curr) == 
D3D10_SB_OPERAND_INDEX_IMMEDIATE32);
 
-   (*curr)++;
-
-   operand-index[0].imm = **curr;
+   if (DECODE_D3D10_SB_OPERAND_INDEX_DIMENSION(**curr) == 
D3D10_SB_OPERAND_INDEX_1D) {
+  (*curr)++;
+  operand-index[0].imm = **curr;
+   } else {
+  assert(DECODE_D3D10_SB_OPERAND_INDEX_DIMENSION(**curr) == 
D3D10_SB_OPERAND_INDEX_2D);
+  (*curr)++;
+  operand-index[0].imm = **curr;
+  (*curr)++;
+  operand-index[1].imm = **curr;
 
+   }
(*curr)++;
 }
 
diff --git a/src/gallium/state_trackers/d3d10/ShaderParse.h 
b/src/gallium/state_trackers/d3d10/ShaderParse.h
index 64f177c..5971864 100644
--- a/src/gallium/state_trackers/d3d10/ShaderParse.h
+++ b/src/gallium/state_trackers/d3d10/ShaderParse.h
@@ -54,7 +54,7 @@ struct Shader_relative_index {
 
 struct Shader_relative_operand {
D3D10_SB_OPERAND_TYPE type;
-   struct Shader_relative_index index[1];
+   struct Shader_relative_index index[2];
D3D10_SB_4_COMPONENT_NAME comp;
 };
 
diff --git a/src/gallium/state_trackers/d3d10/ShaderTGSI.c 
b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
index 9fb6b1d..2e42b8b 100644
--- a/src/gallium/state_trackers/d3d10/ShaderTGSI.c
+++ b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
@@ -637,9 +637,15 @@ translate_relative_operand(struct Shader_xlate *sx,
   reg = sx-prim_id;
   break;
 
+   case D3D10_SB_OPERAND_TYPE_INDEXABLE_TEMP:
+  assert(operand-index[1].imm  SHADER_MAX_TEMPS);
+
+  reg = 
ureg_src(sx-temps[sx-indexable_temp_offsets[operand-index[0].imm] +
+operand-index[1].imm]);
+  break;
+
case D3D10_SB_OPERAND_TYPE_INPUT:
case D3D10_SB_OPERAND_TYPE_OUTPUT:
-   case D3D10_SB_OPERAND_TYPE_INDEXABLE_TEMP:
case D3D10_SB_OPERAND_TYPE_IMMEDIATE32:
case D3D10_SB_OPERAND_TYPE_IMMEDIATE64:
case D3D10_SB_OPERAND_TYPE_SAMPLER:
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] d3d10: allow indirect addressing on outputs

2014-02-03 Thread Zack Rusin

Outputs can have relative addressing. This adds basic support for it.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/state_trackers/d3d10/ShaderTGSI.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/src/gallium/state_trackers/d3d10/ShaderTGSI.c 
b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
index 2e42b8b..1cf9e0e 100644
--- a/src/gallium/state_trackers/d3d10/ShaderTGSI.c
+++ b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
@@ -687,20 +687,26 @@ translate_operand(struct Shader_xlate *sx,
 
case D3D10_SB_OPERAND_TYPE_OUTPUT:
   assert(operand-index_dim == 1);
-  assert(operand-index[0].index_rep == 
D3D10_SB_OPERAND_INDEX_IMMEDIATE32);
   assert(operand-index[0].imm  SHADER_MAX_OUTPUTS);
 
-  if (!writemask) {
- reg = sx-outputs[operand-index[0].imm].reg[0];
-  } else {
- unsigned i;
- for (i = 0; i  4; ++i) {
-unsigned mask = 1  i;
-if ((writemask  mask)) {
-   reg = sx-outputs[operand-index[0].imm].reg[i];
-   break;
+  if (operand-index[0].index_rep == D3D10_SB_OPERAND_INDEX_IMMEDIATE32) {
+ if (!writemask) {
+reg = sx-outputs[operand-index[0].imm].reg[0];
+ } else {
+unsigned i;
+for (i = 0; i  4; ++i) {
+   unsigned mask = 1  i;
+   if ((writemask  mask)) {
+  reg = sx-outputs[operand-index[0].imm].reg[i];
+  break;
+   }
 }
  }
+  } else {
+ struct ureg_src addr =
+translate_relative_operand(sx, operand-index[0].rel);
+ assert(operand-index[0].index_rep == 
D3D10_SB_OPERAND_INDEX_IMMEDIATE32_PLUS_RELATIVE);
+ reg = ureg_dst_indirect(sx-outputs[operand-index[0].imm].reg[0], 
addr);
   }
   break;
 
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] d3d10: support 1d indirect addressing on inputs

2014-02-03 Thread Zack Rusin

we supported 2d indirect addressing (gs tests were using it) but
not 1d indirect addressing (which can be used in vs and ps). This
adds support for 1d indirect addressing.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/state_trackers/d3d10/ShaderTGSI.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/gallium/state_trackers/d3d10/ShaderTGSI.c 
b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
index 1cf9e0e..76126c5 100644
--- a/src/gallium/state_trackers/d3d10/ShaderTGSI.c
+++ b/src/gallium/state_trackers/d3d10/ShaderTGSI.c
@@ -828,11 +828,29 @@ translate_src_operand(struct Shader_xlate *sx,
switch (operand-base.type) {
case D3D10_SB_OPERAND_TYPE_INPUT:
   if (operand-base.index_dim == 1) {
- assert(operand-base.index[0].index_rep ==
-D3D10_SB_OPERAND_INDEX_IMMEDIATE32);
- assert(operand-base.index[0].imm  SHADER_MAX_INPUTS);
+ switch (operand-base.index[0].index_rep) {
+ case D3D10_SB_OPERAND_INDEX_IMMEDIATE32:
+assert(operand-base.index[0].imm  SHADER_MAX_INPUTS);
+reg = sx-inputs[operand-base.index[0].imm].reg;
+break;
+ case D3D10_SB_OPERAND_INDEX_RELATIVE: {
+struct ureg_src tmp =
+   translate_relative_operand(sx, operand-base.index[0].rel);
+reg = ureg_src_indirect(sx-inputs[0].reg, tmp);
+ }
+break;
+ case D3D10_SB_OPERAND_INDEX_IMMEDIATE32_PLUS_RELATIVE: {
+struct ureg_src tmp =
+   translate_relative_operand(sx, operand-base.index[0].rel);
+reg = 
ureg_src_indirect(sx-inputs[operand-base.index[0].imm].reg, tmp);
+ }
+break;
+ default:
+/* XXX: Other index representations.
+ */
+LOG_UNSUPPORTED(TRUE);
 
- reg = sx-inputs[operand-base.index[0].imm].reg;
+ }
   } else {
  assert(operand-base.index_dim == 2);
  assert(operand-base.index[1].imm  SHADER_MAX_INPUTS);
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: fix opcode and function nesting

2014-01-29 Thread Zack Rusin

- Original Message -
 Am 28.01.2014 23:08, schrieb Zack Rusin:
  gallivm soa code supported only a single level of nesting for
  control flow opcodes (if, switch, loops...) but the d3d10 spec
  clearly states that those are nested within functions. To support
  nesting of conditionals inside functions we need to store the
  nesting data inside function contexts and keep a stack of those.
  Furthermore we make sure that if nesting for subroutines is deeper
  than 32 then we simply ignore all subsequent 'call' invocations.
 Hmm I thought nesting worked just fine, except for the fact that when
 using just one stack we'd have needed a much larger one. Wasn't that
 true? (Not arguing about using per-function stacks, just curious.)

The issue is that d3d10 spec is very specific about the nesting requirement 
being per-subroutine. So just increasing those nesting levels wouldn't really 
work, because 63 levels in one subroutine and 65 in another doesn't necessarily 
equal code path that has 64 levels in two subroutines (even both have 
technically 64 levels) - we get away with it for conditionals but not for 
function calls.
It's also worth noting that the patch handles overflows which whck explicitly 
tests for and we were just crashing on those. The overflow behavior is 
unidentified only for conditionals, subroutine calls above the level 32 /have/ 
to be ignored (again whck tests for it). 

  +   ctx-loop_stack[ctx-loop_stack_size].loop_block = ctx-loop_block;
  +   ctx-loop_stack[ctx-loop_stack_size].cont_mask = mask-cont_mask;
  +   ctx-loop_stack[ctx-loop_stack_size].break_mask = mask-break_mask;
 I am confused why some assignments use the variables from ctx and some
 from mask here.

The masks are in general global (the 'call' opcode could have been inside 
switch'es, loops or/and conditionals) so the function contexts push/pop the 
global masks and need to operate on those. Things that are not masks are in 
general per-function-context, which means that we can just store them in 
function contexts.

 As mentioned inline, I don't quite get when the values from mask or ctx
 are used. This might well be correct as this is tricky stuff and the
 diff is difficult to understand.
 Otherwise this looks good to me. Would that also help when we'd switch
 to not always inline all functions?

Yea, I think it would, but then we would need a lot of other changes (storing 
those masks in some struct inside some global object that each function can 
reference). But yea, this code isn't the cleanest but function calls, 
conditionals, loops and switches are inherently difficult in SoA mode so 
there's not a lot we can do. We need to store the nesting data inside something 
that resembles function context because d3d is very clear that that's what it 
wants so everything else we'll be a hack where we just try to imitate that 
behavior that's going to be uglier than this code.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallivm: fix opcode and function nesting

2014-01-28 Thread Zack Rusin

gallivm soa code supported only a single level of nesting for
control flow opcodes (if, switch, loops...) but the d3d10 spec
clearly states that those are nested within functions. To support
nesting of conditionals inside functions we need to store the
nesting data inside function contexts and keep a stack of those.
Furthermore we make sure that if nesting for subroutines is deeper
than 32 then we simply ignore all subsequent 'call' invocations.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.h |  72 ++---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 377 
 2 files changed, 292 insertions(+), 157 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
index 4f988b8..839ab85 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
@@ -260,49 +260,51 @@ struct lp_exec_mask {
 
LLVMTypeRef int_vec_type;
 
-   LLVMValueRef cond_stack[LP_MAX_TGSI_NESTING];
-   int cond_stack_size;
-   LLVMValueRef cond_mask;
-
-   /* keep track if break belongs to switch or loop */
-   enum lp_exec_mask_break_type break_type_stack[LP_MAX_TGSI_NESTING];
-   enum lp_exec_mask_break_type break_type;
+   LLVMValueRef exec_mask;
 
-   struct {
-  LLVMValueRef switch_val;
-  LLVMValueRef switch_mask;
-  LLVMValueRef switch_mask_default;
-  boolean switch_in_default;
-  unsigned switch_pc;
-   } switch_stack[LP_MAX_TGSI_NESTING];
-   int switch_stack_size;
-   LLVMValueRef switch_val;
+   LLVMValueRef ret_mask;
+   LLVMValueRef cond_mask;
LLVMValueRef switch_mask; /* current switch exec mask */
-   LLVMValueRef switch_mask_default; /* reverse of switch mask used for 
default */
-   boolean switch_in_default;/* if switch exec is currently in default 
*/
-   unsigned switch_pc;   /* when used points to default or 
endswitch-1 */
-
-   LLVMBasicBlockRef loop_block;
LLVMValueRef cont_mask;
LLVMValueRef break_mask;
-   LLVMValueRef break_var;
-   struct {
-  LLVMBasicBlockRef loop_block;
-  LLVMValueRef cont_mask;
-  LLVMValueRef break_mask;
-  LLVMValueRef break_var;
-   } loop_stack[LP_MAX_TGSI_NESTING];
-   int loop_stack_size;
 
-   LLVMValueRef ret_mask;
-   struct {
+   struct function_ctx {
   int pc;
   LLVMValueRef ret_mask;
-   } call_stack[LP_MAX_TGSI_NESTING];
-   int call_stack_size;
 
-   LLVMValueRef exec_mask;
-   LLVMValueRef loop_limiter;
+  LLVMValueRef cond_stack[LP_MAX_TGSI_NESTING];
+  int cond_stack_size;
+
+  /* keep track if break belongs to switch or loop */
+  enum lp_exec_mask_break_type break_type_stack[LP_MAX_TGSI_NESTING];
+  enum lp_exec_mask_break_type break_type;
+
+  struct {
+ LLVMValueRef switch_val;
+ LLVMValueRef switch_mask;
+ LLVMValueRef switch_mask_default;
+ boolean switch_in_default;
+ unsigned switch_pc;
+  } switch_stack[LP_MAX_TGSI_NESTING];
+  int switch_stack_size;
+  LLVMValueRef switch_val;
+  LLVMValueRef switch_mask_default; /* reverse of switch mask used for 
default */
+  boolean switch_in_default;/* if switch exec is currently in 
default */
+  unsigned switch_pc;   /* when used points to default or 
endswitch-1 */
+
+  LLVMValueRef loop_limiter;
+  LLVMBasicBlockRef loop_block;
+  LLVMValueRef break_var;
+  struct {
+ LLVMBasicBlockRef loop_block;
+ LLVMValueRef cont_mask;
+ LLVMValueRef break_mask;
+ LLVMValueRef break_var;
+  } loop_stack[LP_MAX_TGSI_NESTING];
+  int loop_stack_size;
+
+   } *function_stack;
+   int function_stack_size;
 };
 
 struct lp_build_tgsi_inst_list
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index f01b50c..52e1b51 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -66,6 +66,10 @@
 #include lp_bld_sample.h
 #include lp_bld_struct.h
 
+/* SM 4.0 says that subroutines can nest 32 deep and 
+ * we need one more for our main function */
+#define LP_MAX_NUM_FUNCS 33
+
 #define DUMP_GS_EMITS 0
 
 /*
@@ -98,38 +102,108 @@ emit_dump_reg(struct gallivm_state *gallivm,
lp_build_print_value(gallivm, buf, value);
 }
 
+static INLINE struct function_ctx *
+func_ctx(struct lp_exec_mask *mask)
+{
+   assert(mask-function_stack_size  0);
+   assert(mask-function_stack_size = LP_MAX_NUM_FUNCS);
+   return mask-function_stack[mask-function_stack_size - 1];
+}
 
-static void lp_exec_mask_init(struct lp_exec_mask *mask, struct 
lp_build_context *bld)
+static INLINE boolean
+mask_has_loop(struct lp_exec_mask *mask)
 {
-   LLVMTypeRef int_type = LLVMInt32TypeInContext(bld-gallivm-context);
-   LLVMBuilderRef builder = bld-gallivm-builder;
+   int i;
+   for (i = mask-function_stack_size - 1; i = 0; --i

[Mesa-dev] [PATCH] llvmpipe: fix possible constant buffer overflow

2014-01-15 Thread Zack Rusin

It's possible to bind a smaller buffer as a constant buffer, than
what the shader actually uses/requires. This could cause nasty
crashes. This patch adds the architecture to pass the maximum
allowable constant buffer index to the jit to let it make
sure that the constant buffer indices are always within bounds.
The behavior follows the d3d10 spec, which says the overflow
should always return all zeros, and overflow is only defined
as access beyond the size of the currently bound buffer. Accesses
beyond the declared shader constant register size are not
considered an overflow and expected to return garbage but consistent
garbage (we follow the behavior which some wlk tests expect which
is to return the actual values from the bound buffer).

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_llvm.c | 42 ++
 src/gallium/auxiliary/draw/draw_llvm.h | 32 +---
 .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |  6 ++
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.h|  2 +
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c| 89 ++
 src/gallium/drivers/llvmpipe/lp_jit.c  |  7 +-
 src/gallium/drivers/llvmpipe/lp_jit.h  |  5 ++
 src/gallium/drivers/llvmpipe/lp_setup.c|  7 +-
 src/gallium/drivers/llvmpipe/lp_state_fs.c |  5 +-
 9 files changed, 152 insertions(+), 43 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 331039a..0bbb680 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -242,17 +242,20 @@ create_jit_context_type(struct gallivm_state *gallivm,
 {
LLVMTargetDataRef target = gallivm-target;
LLVMTypeRef float_type = LLVMFloatTypeInContext(gallivm-context);
+   LLVMTypeRef int_type = LLVMInt32TypeInContext(gallivm-context);
LLVMTypeRef elem_types[DRAW_JIT_CTX_NUM_FIELDS];
LLVMTypeRef context_type;
 
elem_types[0] = LLVMArrayType(LLVMPointerType(float_type, 0), /* 
vs_constants */
  LP_MAX_TGSI_CONST_BUFFERS);
-   elem_types[1] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4),
+   elem_types[1] = LLVMArrayType(int_type, /* num_vs_constants */
+ LP_MAX_TGSI_CONST_BUFFERS);
+   elem_types[2] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4),
  DRAW_TOTAL_CLIP_PLANES), 0);
-   elem_types[2] = LLVMPointerType(float_type, 0); /* viewport */
-   elem_types[3] = LLVMArrayType(texture_type,
+   elem_types[3] = LLVMPointerType(float_type, 0); /* viewport */
+   elem_types[4] = LLVMArrayType(texture_type,
  PIPE_MAX_SHADER_SAMPLER_VIEWS); /* textures */
-   elem_types[4] = LLVMArrayType(sampler_type,
+   elem_types[5] = LLVMArrayType(sampler_type,
  PIPE_MAX_SAMPLERS); /* samplers */
context_type = LLVMStructTypeInContext(gallivm-context, elem_types,
   Elements(elem_types), 0);
@@ -264,6 +267,8 @@ create_jit_context_type(struct gallivm_state *gallivm,
 
LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, vs_constants,
   target, context_type, DRAW_JIT_CTX_CONSTANTS);
+   LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, num_vs_constants,
+  target, context_type, DRAW_JIT_CTX_NUM_CONSTANTS);
LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, planes,
   target, context_type, DRAW_JIT_CTX_PLANES);
LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, viewport,
@@ -298,20 +303,22 @@ create_gs_jit_context_type(struct gallivm_state *gallivm,
 
elem_types[0] = LLVMArrayType(LLVMPointerType(float_type, 0), /* constants 
*/
  LP_MAX_TGSI_CONST_BUFFERS);
-   elem_types[1] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4),
+   elem_types[1] = LLVMArrayType(int_type, /* num_constants */
+ LP_MAX_TGSI_CONST_BUFFERS);
+   elem_types[2] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4),
  DRAW_TOTAL_CLIP_PLANES), 0);
-   elem_types[2] = LLVMPointerType(float_type, 0); /* viewport */
+   elem_types[3] = LLVMPointerType(float_type, 0); /* viewport */
 
-   elem_types[3] = LLVMArrayType(texture_type,
+   elem_types[4] = LLVMArrayType(texture_type,
  PIPE_MAX_SHADER_SAMPLER_VIEWS); /* textures */
-   elem_types[4] = LLVMArrayType(sampler_type,
+   elem_types[5] = LLVMArrayType(sampler_type,
  PIPE_MAX_SAMPLERS); /* samplers */

-   elem_types[5] = LLVMPointerType(LLVMPointerType(int_type, 0), 0);
-   elem_types[6] = LLVMPointerType(LLVMVectorType(int_type,
-  vector_length), 0);
+   elem_types[6] = LLVMPointerType

Re: [Mesa-dev] [PATCH] llvmpipe: fix primitive input to geom shaders

2014-01-07 Thread Zack Rusin

Yea, this sucks. Geometry shaders can take primitive id (system value) for 
passed in primitives and generate one (semantic) for primitives generated in 
the geometry shader. TBH, I thought we already handled it... Maybe wlk doesn't 
test it, we'll see if it regresses.

z

- Original Message -
 Well we were using a system value for prim id in gs, hence this was not
 necessary. I'm always confused though about system value / normal
 semantic usage though, Zack might know better.
 
 Roland
 
 Am 07.01.2014 09:55, schrieb Dave Airlie:
  Not sure this is 100% the correct way to do this, since it may be a change
  at the glsl-tgsi level that is required, either way open discussions!
  
  fixes piglit
  tests/spec/glsl-1.50/execution/geometry/primitive-id-in.shader_test
  with llvmpipe with fake MSAA
  
  Signed-off-by: Dave Airlie airl...@redhat.com
  ---
   src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 5 +
   src/gallium/auxiliary/tgsi/tgsi_scan.c  | 3 +++
   2 files changed, 8 insertions(+)
  
  diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
  b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
  index 6d8dc8c..de2c64f 100644
  --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
  +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
  @@ -1173,6 +1173,10 @@ emit_fetch_gs_input(
  LLVMValueRef swizzle_index = lp_build_const_int32(gallivm, swizzle);
  LLVMValueRef res;
   
  +   if (bld_base-info-input_semantic_name[reg-Register.Index] ==
  TGSI_SEMANTIC_PRIMID) {
  +  res = bld-system_values.prim_id;
  +  goto out;
  +   }
  if (reg-Register.Indirect) {
 attrib_index = get_indirect_index(bld,
   reg-Register.File,
  @@ -1200,6 +1204,7 @@ emit_fetch_gs_input(
   
  assert(res);
   
  + out:
  if (stype == TGSI_TYPE_UNSIGNED) {
 res = LLVMBuildBitCast(builder, res, bld_base-uint_bld.vec_type,
 );
  } else if (stype == TGSI_TYPE_SIGNED) {
  diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c
  b/src/gallium/auxiliary/tgsi/tgsi_scan.c
  index 0f10556..ce1f7b6 100644
  --- a/src/gallium/auxiliary/tgsi/tgsi_scan.c
  +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c
  @@ -198,6 +198,9 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
   info-uses_primid = TRUE;
else if (semName == TGSI_SEMANTIC_FACE)
   info-uses_frontface = TRUE;
  +  } else if (procType == TGSI_PROCESSOR_GEOMETRY) {
  + if (semName == TGSI_SEMANTIC_PRIMID)
  +info-uses_primid = TRUE;
 }
  }
  else if (file == TGSI_FILE_SYSTEM_VALUE) {
  
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] llvmpipe: fix possible constant buffer overflow

2013-12-18 Thread Zack Rusin

It's possible to bind a smaller buffer as a constant buffer, than
what the shader actually uses/requires. This could cause nasty
crashes. This patch adds the architecture to pass the maximum
allowable constant buffer index to the jit so let it make
sure that the constant buffer indices are always within bounds.
Currently only used for indirect addressing.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_llvm.c | 42 +++---
 src/gallium/auxiliary/draw/draw_llvm.h | 32 +++--
 .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |  6 
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.h|  2 ++
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c| 14 +++-
 src/gallium/drivers/llvmpipe/lp_jit.c  |  7 +++-
 src/gallium/drivers/llvmpipe/lp_jit.h  |  5 +++
 src/gallium/drivers/llvmpipe/lp_setup.c|  7 +++-
 src/gallium/drivers/llvmpipe/lp_state_fs.c |  6 ++--
 9 files changed, 92 insertions(+), 29 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 71cc45f..e5a3842 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -242,17 +242,20 @@ create_jit_context_type(struct gallivm_state *gallivm,
 {
LLVMTargetDataRef target = gallivm-target;
LLVMTypeRef float_type = LLVMFloatTypeInContext(gallivm-context);
+   LLVMTypeRef int_type = LLVMInt32TypeInContext(gallivm-context);
LLVMTypeRef elem_types[DRAW_JIT_CTX_NUM_FIELDS];
LLVMTypeRef context_type;
 
elem_types[0] = LLVMArrayType(LLVMPointerType(float_type, 0), /* 
vs_constants */
  LP_MAX_TGSI_CONST_BUFFERS);
-   elem_types[1] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4),
+   elem_types[1] = LLVMArrayType(int_type, /* vs_constants_max_index */
+ LP_MAX_TGSI_CONST_BUFFERS);
+   elem_types[2] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4),
  DRAW_TOTAL_CLIP_PLANES), 0);
-   elem_types[2] = LLVMPointerType(float_type, 0); /* viewport */
-   elem_types[3] = LLVMArrayType(texture_type,
+   elem_types[3] = LLVMPointerType(float_type, 0); /* viewport */
+   elem_types[4] = LLVMArrayType(texture_type,
  PIPE_MAX_SHADER_SAMPLER_VIEWS); /* textures */
-   elem_types[4] = LLVMArrayType(sampler_type,
+   elem_types[5] = LLVMArrayType(sampler_type,
  PIPE_MAX_SAMPLERS); /* samplers */
context_type = LLVMStructTypeInContext(gallivm-context, elem_types,
   Elements(elem_types), 0);
@@ -264,6 +267,8 @@ create_jit_context_type(struct gallivm_state *gallivm,
 
LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, vs_constants,
   target, context_type, DRAW_JIT_CTX_CONSTANTS);
+   LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, vs_constants_max_index,
+  target, context_type, 
DRAW_JIT_CTX_CONSTANTS_MAX_INDEX);
LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, planes,
   target, context_type, DRAW_JIT_CTX_PLANES);
LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, viewport,
@@ -298,20 +303,22 @@ create_gs_jit_context_type(struct gallivm_state *gallivm,
 
elem_types[0] = LLVMArrayType(LLVMPointerType(float_type, 0), /* constants 
*/
  LP_MAX_TGSI_CONST_BUFFERS);
-   elem_types[1] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4),
+   elem_types[1] = LLVMArrayType(int_type, /* constants_max_index */
+ LP_MAX_TGSI_CONST_BUFFERS);
+   elem_types[2] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4),
  DRAW_TOTAL_CLIP_PLANES), 0);
-   elem_types[2] = LLVMPointerType(float_type, 0); /* viewport */
+   elem_types[3] = LLVMPointerType(float_type, 0); /* viewport */
 
-   elem_types[3] = LLVMArrayType(texture_type,
+   elem_types[4] = LLVMArrayType(texture_type,
  PIPE_MAX_SHADER_SAMPLER_VIEWS); /* textures */
-   elem_types[4] = LLVMArrayType(sampler_type,
+   elem_types[5] = LLVMArrayType(sampler_type,
  PIPE_MAX_SAMPLERS); /* samplers */

-   elem_types[5] = LLVMPointerType(LLVMPointerType(int_type, 0), 0);
-   elem_types[6] = LLVMPointerType(LLVMVectorType(int_type,
-  vector_length), 0);
+   elem_types[6] = LLVMPointerType(LLVMPointerType(int_type, 0), 0);
elem_types[7] = LLVMPointerType(LLVMVectorType(int_type,
   vector_length), 0);
+   elem_types[8] = LLVMPointerType(LLVMVectorType(int_type,
+  vector_length), 0);
 
context_type = LLVMStructTypeInContext(gallivm

Re: [Mesa-dev] [PATCH] gallivm: fix pointer type for stmxcsr/ldmxcsr

2013-12-13 Thread Zack Rusin

Looks good. Thanks Roland!

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 The argument is a i8 pointer not a i32 pointer (even though the value
 actually
 stored/loaded IS i32). Older llvm versions didn't care but 3.2 and newer do
 leading to crashes.
 ---
  src/gallium/auxiliary/gallivm/lp_bld_arit.c |9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 index 440dd0b..e516ae8 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 @@ -3510,10 +3510,12 @@ lp_build_fpstate_get(struct gallivm_state *gallivm)
   gallivm,
   LLVMInt32TypeInContext(gallivm-context),
   mxcsr_ptr);
 +  LLVMValueRef mxcsr_ptr8 = LLVMBuildPointerCast(builder, mxcsr_ptr,
 +  LLVMPointerType(LLVMInt8TypeInContext(gallivm-context), 0), );
lp_build_intrinsic(builder,
   llvm.x86.sse.stmxcsr,
   LLVMVoidTypeInContext(gallivm-context),
 - mxcsr_ptr, 1);
 + mxcsr_ptr8, 1);
return mxcsr_ptr;
 }
 return 0;
 @@ -3554,7 +3556,10 @@ lp_build_fpstate_set(struct gallivm_state *gallivm,
   LLVMValueRef mxcsr_ptr)
  {
 if (util_cpu_caps.has_sse) {
 -  lp_build_intrinsic(gallivm-builder,
 +  LLVMBuilderRef builder = gallivm-builder;
 +  mxcsr_ptr = LLVMBuildPointerCast(builder, mxcsr_ptr,
 +
 LLVMPointerType(LLVMInt8TypeInContext(gallivm-context),
 0), );
 +  lp_build_intrinsic(builder,
   llvm.x86.sse.ldmxcsr,
   LLVMVoidTypeInContext(gallivm-context),
   mxcsr_ptr, 1);
 --
 1.7.9.5
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: (trivial) get rid of triangle subdivision code

2013-12-12 Thread Zack Rusin

Ah, good stuff, very sensual and does not need more cowbell.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 This code was always problematic, and with 64bit rasterization we no longer
 need it at all.
 ---
  src/gallium/drivers/llvmpipe/lp_setup.c |8 +-
  src/gallium/drivers/llvmpipe/lp_setup_context.h |1 -
  src/gallium/drivers/llvmpipe/lp_setup_tri.c |  174
  ---
  3 files changed, 1 insertion(+), 182 deletions(-)
 
 diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c
 b/src/gallium/drivers/llvmpipe/lp_setup.c
 index 49962af..2fad469 100644
 --- a/src/gallium/drivers/llvmpipe/lp_setup.c
 +++ b/src/gallium/drivers/llvmpipe/lp_setup.c
 @@ -1081,14 +1081,8 @@ try_update_scene_state( struct lp_setup_context *setup
 )
   setup-draw_regions[i]);
   }
}
 -  /*
 -   * Subdivide triangles if the framebuffer is larger than the
 -   * MAX_FIXED_LENGTH.
 -   */
 -  setup-subdivide_large_triangles = (setup-fb.width  MAX_FIXED_LENGTH
 ||
 -  setup-fb.height 
 MAX_FIXED_LENGTH);
 }
 -
 +
 setup-dirty = 0;
  
 assert(setup-fs.stored);
 diff --git a/src/gallium/drivers/llvmpipe/lp_setup_context.h
 b/src/gallium/drivers/llvmpipe/lp_setup_context.h
 index 8bb95c1..b3fb24a 100644
 --- a/src/gallium/drivers/llvmpipe/lp_setup_context.h
 +++ b/src/gallium/drivers/llvmpipe/lp_setup_context.h
 @@ -93,7 +93,6 @@ struct lp_setup_context
 struct llvmpipe_query *active_queries[LP_MAX_ACTIVE_BINNED_QUERIES];
 unsigned active_binned_queries;
  
 -   boolean subdivide_large_triangles;
 boolean flatshade_first;
 boolean ccw_is_frontface;
 boolean scissor_test;
 diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
 b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
 index e22f14c..ce3a0a7 100644
 --- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
 +++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
 @@ -921,168 +921,6 @@ rotate_fixed_position_12( struct fixed_position*
 position )
  }
  
  
 -typedef void (*triangle_func_t)(struct lp_setup_context *setup,
 -const float (*v0)[4],
 -const float (*v1)[4],
 -const float (*v2)[4]);
 -
 -
 -/**
 - * Subdivide this triangle by bisecting edge (v0, v1).
 - * \param pv  the provoking vertex (must = v0 or v1 or v2)
 - * TODO: should probably think about non-overflowing arithmetic elsewhere.
 - * This will definitely screw with pipeline counters for instance.
 - */
 -static void
 -subdiv_tri(struct lp_setup_context *setup,
 -   const float (*v0)[4],
 -   const float (*v1)[4],
 -   const float (*v2)[4],
 -   const float (*pv)[4],
 -   triangle_func_t tri)
 -{
 -   unsigned n = setup-fs.current.variant-shader-info.base.num_inputs + 1;
 -   const struct lp_shader_input *inputs =
 -  setup-fs.current.variant-shader-inputs;
 -   PIPE_ALIGN_VAR(LP_MIN_VECTOR_ALIGN) float vmid[PIPE_MAX_ATTRIBS][4];
 -   const float (*vm)[4] = (const float (*)[4]) vmid;
 -   unsigned i;
 -   float w0, w1, wm;
 -   boolean flatshade = setup-fs.current.variant-key.flatshade;
 -
 -   /* find position midpoint (attrib[0] = position) */
 -   vmid[0][0] = 0.5f * (v1[0][0] + v0[0][0]);
 -   vmid[0][1] = 0.5f * (v1[0][1] + v0[0][1]);
 -   vmid[0][2] = 0.5f * (v1[0][2] + v0[0][2]);
 -   vmid[0][3] = 0.5f * (v1[0][3] + v0[0][3]);
 -
 -   w0 = v0[0][3];
 -   w1 = v1[0][3];
 -   wm = vmid[0][3];
 -
 -   /* interpolate other attributes */
 -   for (i = 1; i  n; i++) {
 -  if ((inputs[i - 1].interp == LP_INTERP_COLOR  flatshade) ||
 -  inputs[i - 1].interp == LP_INTERP_CONSTANT) {
 - /* copy the provoking vertex's attribute */
 - vmid[i][0] = pv[i][0];
 - vmid[i][1] = pv[i][1];
 - vmid[i][2] = pv[i][2];
 - vmid[i][3] = pv[i][3];
 -  }
 -  else {
 - /* interpolate with perspective correction (for linear too) */
 - vmid[i][0] = 0.5f * (v1[i][0] * w1 + v0[i][0] * w0) / wm;
 - vmid[i][1] = 0.5f * (v1[i][1] * w1 + v0[i][1] * w0) / wm;
 - vmid[i][2] = 0.5f * (v1[i][2] * w1 + v0[i][2] * w0) / wm;
 - vmid[i][3] = 0.5f * (v1[i][3] * w1 + v0[i][3] * w0) / wm;
 -  }
 -   }
 -
 -   /* handling flat shading and first vs. last provoking vertex is a
 -* little tricky...
 -*/
 -   if (pv == v0) {
 -  if (setup-flatshade_first) {
 - /* first vertex must be v0 or vm */
 - tri(setup, v0, vm, v2);
 - tri(setup, vm, v1, v2);
 -  }
 -  else {
 - /* last vertex must be v0 or vm */
 - tri(setup, vm, v2, v0);
 - tri(setup, v1, v2, vm);
 -  }
 -   }
 -   else if (pv == v1) {
 -  if (setup-flatshade_first) {
 - tri(setup, vm, v2, v0);
 - tri(setup, v1, v2, vm

[Mesa-dev] [PATCH] llvmpipe: fix blending with half-float formats

2013-12-09 Thread Zack Rusin

The fact that we flush denorms to zero breaks our half-float
conversion and blending. This patches enables denorms for
blending. It's a little tricky due to the llvm bug that makes
it incorrectly reorder the mxcsr intrinsics:
http://llvm.org/bugs/show_bug.cgi?id=6393

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/gallivm/lp_bld_arit.c | 67 +
 src/gallium/auxiliary/gallivm/lp_bld_arit.h | 11 +
 src/gallium/drivers/llvmpipe/lp_state_fs.c  | 31 ++---
 3 files changed, 104 insertions(+), 5 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c 
b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
index 70929e7..47e778c 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
@@ -64,6 +64,13 @@
 #include lp_bld_arit.h
 #include lp_bld_flow.h
 
+#if defined(PIPE_ARCH_SSE)
+#include xmmintrin.h
+/* This is defined in pmmintrin.h, but it can only be included when -msse3 is
+ * used, so just define it here to avoid further. */
+#define _MM_DENORMALS_ZERO_MASK0x0040
+#endif
+
 
 #define EXP_POLY_DEGREE 5
 
@@ -3489,3 +3496,63 @@ lp_build_is_inf_or_nan(struct gallivm_state *gallivm,
return ret;
 }
 
+
+LLVMValueRef
+lp_build_fpstate_get(struct gallivm_state *gallivm)
+{
+   if (util_cpu_caps.has_sse) {
+  LLVMBuilderRef builder = gallivm-builder;
+  LLVMValueRef mxcsr_ptr = lp_build_alloca(
+ gallivm,
+ LLVMInt32TypeInContext(gallivm-context),
+ mxcsr_ptr);
+  lp_build_intrinsic(builder,
+ llvm.x86.sse.stmxcsr,
+ LLVMVoidTypeInContext(gallivm-context),
+ mxcsr_ptr, 1);
+  return mxcsr_ptr;
+   }
+   return 0;
+}
+
+void
+lp_build_fpstate_set_denorms_zero(struct gallivm_state *gallivm,
+  boolean zero)
+{
+   if (util_cpu_caps.has_sse) {
+  /* turn on DAZ (64) | FTZ (32768) = 32832 if available */
+  int daz_ftz = _MM_FLUSH_ZERO_MASK;
+
+  LLVMBuilderRef builder = gallivm-builder;
+  LLVMValueRef mxcsr_ptr = lp_build_fpstate_get(gallivm);
+  LLVMValueRef mxcsr =
+ LLVMBuildLoad(builder, mxcsr_ptr, mxcsr);
+
+  if (util_cpu_caps.has_daz) {
+ /* Enable denormals are zero mode */
+ daz_ftz |= _MM_DENORMALS_ZERO_MASK;
+  }
+  if (zero) {
+ mxcsr = LLVMBuildOr(builder, mxcsr,
+ LLVMConstInt(LLVMTypeOf(mxcsr), daz_ftz, 0), );
+  } else {
+ mxcsr = LLVMBuildAnd(builder, mxcsr,
+  LLVMConstInt(LLVMTypeOf(mxcsr), ~daz_ftz, 0), 
);
+  }
+
+  LLVMBuildStore(builder, mxcsr, mxcsr_ptr);
+  lp_build_fpstate_set(gallivm, mxcsr_ptr);
+   }
+}
+
+void
+lp_build_fpstate_set(struct gallivm_state *gallivm,
+ LLVMValueRef mxcsr_ptr)
+{
+   if (util_cpu_caps.has_sse) {
+  lp_build_intrinsic(gallivm-builder,
+ llvm.x86.sse.ldmxcsr,
+ LLVMVoidTypeInContext(gallivm-context),
+ mxcsr_ptr, 1);
+   }
+}
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.h 
b/src/gallium/auxiliary/gallivm/lp_bld_arit.h
index 75bf89e..9d29093 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_arit.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.h
@@ -358,4 +358,15 @@ lp_build_is_inf_or_nan(struct gallivm_state *gallivm,
const struct lp_type type,
LLVMValueRef x);
 
+
+LLVMValueRef
+lp_build_fpstate_get(struct gallivm_state *gallivm);
+
+void
+lp_build_fpstate_set_denorms_zero(struct gallivm_state *gallivm,
+  boolean zero);
+void
+lp_build_fpstate_set(struct gallivm_state *gallivm,
+ LLVMValueRef mxcsr);
+
 #endif /* !LP_BLD_ARIT_H */
diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
b/src/gallium/drivers/llvmpipe/lp_state_fs.c
index b5816e0..d0fdc80 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_fs.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c
@@ -1490,6 +1490,28 @@ generate_unswizzled_blend(struct gallivm_state *gallivm,
 
const boolean is_1d = variant-key.resource_1d;
unsigned num_fullblock_fs = is_1d ? 2 * num_fs : num_fs;
+   LLVMValueRef fpstate = 0;
+
+   /* Get type from output format */
+   lp_blend_type_from_format_desc(out_format_desc, row_type);
+   lp_mem_type_from_format_desc(out_format_desc, dst_type);
+
+   /*
+* Technically this code should go into lp_build_smallfloat_to_float
+* and lp_build_float_to_smallfloat but due to the
+* http://llvm.org/bugs/show_bug.cgi?id=6393
+* llvm reorders the mxcsr intrinsics in a way that breaks the code.
+* So the ordering is important here and there shouldn't be any
+* llvm ir instrunctions in this function before
+* this, otherwise half-float format conversions won't work
+* (again due to llvm bug #6393).
+*/
+   if (dst_type.floating

[Mesa-dev] [PATCH 2/2] llvmpipe: add a very useful (disabled) debugging output

2013-12-09 Thread Zack Rusin

Disabled by default, but it's very useful when needed.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_setup_point.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/gallium/drivers/llvmpipe/lp_setup_point.c 
b/src/gallium/drivers/llvmpipe/lp_setup_point.c
index 4b31495..c42646e 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_point.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_point.c
@@ -302,6 +302,23 @@ subpixel_snap(float a)
return util_iround(FIXED_ONE * a);
 }
 
+/**
+ * Print point vertex attribs (for debug).
+ */
+static void
+print_point(struct lp_setup_context *setup,
+const float (*v0)[4])
+{
+   const struct lp_setup_variant_key *key = setup-setup.variant-key;
+   uint i;
+
+   debug_printf(llvmpipe point\n);
+   for (i = 0; i  1 + key-num_inputs; i++) {
+  debug_printf(  v0[%d]:  %f %f %f %f\n, i,
+   v0[i][0], v0[i][1], v0[i][2], v0[i][3]);
+   }
+}
+
 
 static boolean
 try_setup_point( struct lp_setup_context *setup,
@@ -342,6 +359,9 @@ try_setup_point( struct lp_setup_context *setup,
   layer = MIN2(layer, scene-fb_max_layer);
}
 
+   if (0)
+  print_point(setup, v0);
+
/* Bounding rectangle (in pixels) */
{
   /* Yes this is necessary to accurately calculate bounding boxes
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] draw: fix vbuf caching of vertices with inject front face

2013-12-09 Thread Zack Rusin

Caching in the vbuf module meant that once a vertex has been
emitted it was cached, but it's possible for a vertex at the
same location to be emitted again, but this time with a different
front-face semantic. Caching was causing the first version of the
vertex to be emitted, which resulted in the renderer getting
incorrect front-face attributes. By reseting the vertex_id (which
is used for caching) we make sure that once a front-face info
has been injected the vertex will endup getting emitted.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_pipe_unfilled.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c 
b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
index 8cba07c..4f0326b 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
@@ -81,6 +81,7 @@ inject_front_face_info(struct draw_stage *stage,
   v-data[slot][1] = is_front_face;
   v-data[slot][2] = is_front_face;
   v-data[slot][3] = is_front_face;
+  v-vertex_id = UNDEFINED_VERTEX_ID;
}
 }
 
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] gallium/cso: fix sampler / sampler_view counts

2013-11-25 Thread Zack Rusin

The entire series looks good to me.

 Now that it is possible to query drivers for the max sampler view it should
 be safe to increase this without crashing.
 Not entirely convinced this really works correctly though if state trackers
 using non-linked sampler / sampler_views use this.

I'm not sure if I get this. What would be the problem in that case?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: support 8bit subpixel precision

2013-11-21 Thread Zack Rusin

 For me too, other than the fixed_position members, looks good.  Thanks for
 your perseverance on this Zack!

Thanks! ok, attached is a version that makes position and dx/dy 32bit again, it 
seems to work great. I have a question for you guys if you run the piglits:
./bin/triangle-rasterization-overdraw -max_size -seed 0xA8402F24 -count 1 -auto
on master does it fail for you? It fails for me on master, with and without the 
patch. I'm not sure what to make of it, I might have been looking at 
rasterization for too long. Looking at the rendering it looks correct.

zFrom 55c9a288c7ebc37b32bc75526e6de71a838ccaef Mon Sep 17 00:00:00 2001
From: Zack Rusin za...@vmware.com
Date: Thu, 24 Oct 2013 22:05:22 -0400
Subject: [PATCH] llvmpipe: support 8bit subpixel precision

8 bit precision is required by d3d10 but unfortunately
requires 64 bit rasterizer. This commit implements
64 bit rasterization with full support for 8bit subpixel
precision. It's a combination of all individual commits
from the llvmpipe-rast-64 branch.
---
 src/gallium/drivers/llvmpipe/lp_rast.c |  11 ++
 src/gallium/drivers/llvmpipe/lp_rast.h |  47 +--
 src/gallium/drivers/llvmpipe/lp_rast_debug.c   |   6 +-
 src/gallium/drivers/llvmpipe/lp_rast_priv.h|  27 
 src/gallium/drivers/llvmpipe/lp_rast_tri.c | 173 
 src/gallium/drivers/llvmpipe/lp_rast_tri_tmp.h |  56 
 src/gallium/drivers/llvmpipe/lp_setup_line.c   |   2 +-
 src/gallium/drivers/llvmpipe/lp_setup_tri.c| 147 +
 src/gallium/tests/graw/SConscript  |   1 +
 src/gallium/tests/graw/tri-large.c | 174 +
 10 files changed, 496 insertions(+), 148 deletions(-)
 create mode 100644 src/gallium/tests/graw/tri-large.c

diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c b/src/gallium/drivers/llvmpipe/lp_rast.c
index af661e9..0cd62c2 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast.c
+++ b/src/gallium/drivers/llvmpipe/lp_rast.c
@@ -589,6 +589,17 @@ static lp_rast_cmd_func dispatch[LP_RAST_OP_MAX] =
lp_rast_begin_query,
lp_rast_end_query,
lp_rast_set_state,
+   lp_rast_triangle_32_1,
+   lp_rast_triangle_32_2,
+   lp_rast_triangle_32_3,
+   lp_rast_triangle_32_4,
+   lp_rast_triangle_32_5,
+   lp_rast_triangle_32_6,
+   lp_rast_triangle_32_7,
+   lp_rast_triangle_32_8,
+   lp_rast_triangle_32_3_4,
+   lp_rast_triangle_32_3_16,
+   lp_rast_triangle_32_4_16
 };
 
 
diff --git a/src/gallium/drivers/llvmpipe/lp_rast.h b/src/gallium/drivers/llvmpipe/lp_rast.h
index 43c598d..b81d94f 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast.h
+++ b/src/gallium/drivers/llvmpipe/lp_rast.h
@@ -46,10 +46,11 @@ struct lp_scene;
 struct lp_fence;
 struct cmd_bin;
 
-#define FIXED_TYPE_WIDTH 32
+#define FIXED_TYPE_WIDTH 64
 /** For sub-pixel positioning */
-#define FIXED_ORDER 4
+#define FIXED_ORDER 8
 #define FIXED_ONE (1FIXED_ORDER)
+#define FIXED_SHIFT (FIXED_TYPE_WIDTH - 1)
 /** Maximum length of an edge in a primitive in pixels.
  *  If the framebuffer is large we have to think about fixed-point
  *  integer overflow. Coordinates need ((FIXED_TYPE_WIDTH/2) - 1) bits
@@ -59,11 +60,14 @@ struct cmd_bin;
  */
 #define MAX_FIXED_LENGTH (1  (((FIXED_TYPE_WIDTH/2) - 1) - FIXED_ORDER))
 
+#define MAX_FIXED_LENGTH32 (1  (((32/2) - 1) - FIXED_ORDER))
+
 /* Rasterizer output size going to jit fs, width/height */
 #define LP_RASTER_BLOCK_SIZE 4
 
 #define LP_MAX_ACTIVE_BINNED_QUERIES 16
 
+#define IMUL64(a, b) (((int64_t)(a)) * ((int64_t)(b)))
 
 struct lp_rasterizer_task;
 
@@ -102,18 +106,15 @@ struct lp_rast_shader_inputs {
/* followed by a0, dadx, dady and planes[] */
 };
 
-/* Note: the order of these values is important as they are loaded by
- * sse code in rasterization:
- */
 struct lp_rast_plane {
/* edge function values at minx,miny ?? */
-   int c;
+   int64_t c;
 
-   int dcdx;
-   int dcdy;
+   int32_t dcdx;
+   int32_t dcdy;
 
/* one-pixel sized trivial reject offsets for each plane */
-   int eo;
+   int64_t eo;
 };
 
 /**
@@ -277,8 +278,19 @@ lp_rast_arg_null( void )
 #define LP_RAST_OP_BEGIN_QUERY   0xf
 #define LP_RAST_OP_END_QUERY 0x10
 #define LP_RAST_OP_SET_STATE 0x11
-
-#define LP_RAST_OP_MAX   0x12
+#define LP_RAST_OP_TRIANGLE_32_1 0x12
+#define LP_RAST_OP_TRIANGLE_32_2 0x13
+#define LP_RAST_OP_TRIANGLE_32_3 0x14
+#define LP_RAST_OP_TRIANGLE_32_4 0x15
+#define LP_RAST_OP_TRIANGLE_32_5 0x16
+#define LP_RAST_OP_TRIANGLE_32_6 0x17
+#define LP_RAST_OP_TRIANGLE_32_7 0x18
+#define LP_RAST_OP_TRIANGLE_32_8 0x19
+#define LP_RAST_OP_TRIANGLE_32_3_4   0x1a
+#define LP_RAST_OP_TRIANGLE_32_3_16  0x1b
+#define LP_RAST_OP_TRIANGLE_32_4_16  0x1c
+
+#define LP_RAST_OP_MAX   0x1d
 #define LP_RAST_OP_MASK  0xff
 
 void
@@ -289,4 +301,17 @@ void
 lp_debug_draw_bins_by_coverage( struct lp_scene *scene );
 
 
+#ifdef PIPE_ARCH_SSE
+#include emmintrin.h
+#include util/u_sse.h

[Mesa-dev] [PATCH] llvmpipe: support 8bit subpixel precision

2013-11-20 Thread Zack Rusin

8 bit precision is required by d3d10 but unfortunately
requires 64 bit rasterizer. This commit implements
64 bit rasterization with full support for 8bit subpixel
precision. It's a combination of all individual commits
from the llvmpipe-rast-64 branch.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_rast.c |  11 ++
 src/gallium/drivers/llvmpipe/lp_rast.h |  47 +--
 src/gallium/drivers/llvmpipe/lp_rast_debug.c   |   6 +-
 src/gallium/drivers/llvmpipe/lp_rast_priv.h|  27 
 src/gallium/drivers/llvmpipe/lp_rast_tri.c | 173 +
 src/gallium/drivers/llvmpipe/lp_rast_tri_tmp.h |  56 
 src/gallium/drivers/llvmpipe/lp_setup_line.c   |   2 +-
 src/gallium/drivers/llvmpipe/lp_setup_tri.c| 155 ++
 src/gallium/tests/graw/SConscript  |   1 +
 src/gallium/tests/graw/tri-large.c | 173 +
 10 files changed, 500 insertions(+), 151 deletions(-)
 create mode 100644 src/gallium/tests/graw/tri-large.c

diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c 
b/src/gallium/drivers/llvmpipe/lp_rast.c
index af661e9..0cd62c2 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast.c
+++ b/src/gallium/drivers/llvmpipe/lp_rast.c
@@ -589,6 +589,17 @@ static lp_rast_cmd_func dispatch[LP_RAST_OP_MAX] =
lp_rast_begin_query,
lp_rast_end_query,
lp_rast_set_state,
+   lp_rast_triangle_32_1,
+   lp_rast_triangle_32_2,
+   lp_rast_triangle_32_3,
+   lp_rast_triangle_32_4,
+   lp_rast_triangle_32_5,
+   lp_rast_triangle_32_6,
+   lp_rast_triangle_32_7,
+   lp_rast_triangle_32_8,
+   lp_rast_triangle_32_3_4,
+   lp_rast_triangle_32_3_16,
+   lp_rast_triangle_32_4_16
 };
 
 
diff --git a/src/gallium/drivers/llvmpipe/lp_rast.h 
b/src/gallium/drivers/llvmpipe/lp_rast.h
index 43c598d..b81d94f 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast.h
+++ b/src/gallium/drivers/llvmpipe/lp_rast.h
@@ -46,10 +46,11 @@ struct lp_scene;
 struct lp_fence;
 struct cmd_bin;
 
-#define FIXED_TYPE_WIDTH 32
+#define FIXED_TYPE_WIDTH 64
 /** For sub-pixel positioning */
-#define FIXED_ORDER 4
+#define FIXED_ORDER 8
 #define FIXED_ONE (1FIXED_ORDER)
+#define FIXED_SHIFT (FIXED_TYPE_WIDTH - 1)
 /** Maximum length of an edge in a primitive in pixels.
  *  If the framebuffer is large we have to think about fixed-point
  *  integer overflow. Coordinates need ((FIXED_TYPE_WIDTH/2) - 1) bits
@@ -59,11 +60,14 @@ struct cmd_bin;
  */
 #define MAX_FIXED_LENGTH (1  (((FIXED_TYPE_WIDTH/2) - 1) - FIXED_ORDER))
 
+#define MAX_FIXED_LENGTH32 (1  (((32/2) - 1) - FIXED_ORDER))
+
 /* Rasterizer output size going to jit fs, width/height */
 #define LP_RASTER_BLOCK_SIZE 4
 
 #define LP_MAX_ACTIVE_BINNED_QUERIES 16
 
+#define IMUL64(a, b) (((int64_t)(a)) * ((int64_t)(b)))
 
 struct lp_rasterizer_task;
 
@@ -102,18 +106,15 @@ struct lp_rast_shader_inputs {
/* followed by a0, dadx, dady and planes[] */
 };
 
-/* Note: the order of these values is important as they are loaded by
- * sse code in rasterization:
- */
 struct lp_rast_plane {
/* edge function values at minx,miny ?? */
-   int c;
+   int64_t c;
 
-   int dcdx;
-   int dcdy;
+   int32_t dcdx;
+   int32_t dcdy;
 
/* one-pixel sized trivial reject offsets for each plane */
-   int eo;
+   int64_t eo;
 };
 
 /**
@@ -277,8 +278,19 @@ lp_rast_arg_null( void )
 #define LP_RAST_OP_BEGIN_QUERY   0xf
 #define LP_RAST_OP_END_QUERY 0x10
 #define LP_RAST_OP_SET_STATE 0x11
-
-#define LP_RAST_OP_MAX   0x12
+#define LP_RAST_OP_TRIANGLE_32_1 0x12
+#define LP_RAST_OP_TRIANGLE_32_2 0x13
+#define LP_RAST_OP_TRIANGLE_32_3 0x14
+#define LP_RAST_OP_TRIANGLE_32_4 0x15
+#define LP_RAST_OP_TRIANGLE_32_5 0x16
+#define LP_RAST_OP_TRIANGLE_32_6 0x17
+#define LP_RAST_OP_TRIANGLE_32_7 0x18
+#define LP_RAST_OP_TRIANGLE_32_8 0x19
+#define LP_RAST_OP_TRIANGLE_32_3_4   0x1a
+#define LP_RAST_OP_TRIANGLE_32_3_16  0x1b
+#define LP_RAST_OP_TRIANGLE_32_4_16  0x1c
+
+#define LP_RAST_OP_MAX   0x1d
 #define LP_RAST_OP_MASK  0xff
 
 void
@@ -289,4 +301,17 @@ void
 lp_debug_draw_bins_by_coverage( struct lp_scene *scene );
 
 
+#ifdef PIPE_ARCH_SSE
+#include emmintrin.h
+#include util/u_sse.h
+
+static INLINE __m128i
+lp_plane_to_m128i(const struct lp_rast_plane *plane)
+{
+   return _mm_setr_epi32((int32_t)plane-c, (int32_t)plane-dcdx,
+ (int32_t)plane-dcdy, (int32_t)plane-eo);
+}
+
+#endif
+
 #endif
diff --git a/src/gallium/drivers/llvmpipe/lp_rast_debug.c 
b/src/gallium/drivers/llvmpipe/lp_rast_debug.c
index 3bc75aa..587c793 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast_debug.c
+++ b/src/gallium/drivers/llvmpipe/lp_rast_debug.c
@@ -195,8 +195,8 @@ debug_triangle(int tilex, int tiley,
while (plane_mask) {
   plane[nr_planes] = tri_plane[u_bit_scan(plane_mask)];
   plane[nr_planes].c = (plane[nr_planes].c +
-plane[nr_planes].dcdy * tiley

Re: [Mesa-dev] [PATCH] gallivm: Compile flag to debug TGSI execution through printfs.

2013-11-13 Thread Zack Rusin

That's very nice Jose! Looks good to me.


- Original Message -
 From: José Fonseca jfons...@vmware.com
 
 It is similar to tgsi_exec.c's DEBUG_EXECUTION compile flag.
 
 I had prototyped this for a while while debugging an issue, but finally
 cleaned this up and added a few more bells and whistles.
 
 Here is a sample output.
 
 CONST[0]:
   X: 0.006250 0.006250 0.006250 0.006250
   Y: -0.007143 -0.007143 -0.007143 -0.007143
   Z: -1.00 -1.00 -1.00 -1.00
   W: 1.00 1.00 1.00 1.00
 IN[0]:
   X: 143.50 175.50 175.50 143.50
   Y: 123.50 123.50 155.50 155.50
   Z: 0.00 0.00 0.00 0.00
   W: 1.00 1.00 1.00 1.00
1: RCP TEMP[0].w, IN[0].
 TEMP[0].w =  1 1 1 1
2: MAD TEMP[0].xy, IN[0], CONST[0], CONST[0].zwzw
 TEMP[0].x =  -0.103124976 0.0968750715 0.0968750715 -0.103124976
 TEMP[0].y =  0.117857158 0.117857158 -0.110714316 -0.110714316
3: MUL OUT[0].xy, TEMP[0], TEMP[0].
 OUT[0].x =  -0.103124976 0.0968750715 0.0968750715 -0.103124976
 OUT[0].y =  0.117857158 0.117857158 -0.110714316 -0.110714316
4: MUL OUT[0].z, IN[0]., TEMP[0].
 OUT[0].z =  0 0 0 0
5: MOV OUT[0].w, TEMP[0]
 OUT[0].w =  1 1 1 1
 ---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 158
  +++-
  src/gallium/auxiliary/tgsi/tgsi_dump.c  |  23 
  src/gallium/auxiliary/tgsi/tgsi_dump.h  |   7 ++
  3 files changed, 159 insertions(+), 29 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 index 5f81066..917826d 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 @@ -47,6 +47,7 @@
  #include tgsi/tgsi_parse.h
  #include tgsi/tgsi_util.h
  #include tgsi/tgsi_scan.h
 +#include tgsi/tgsi_strings.h
  #include lp_bld_tgsi_action.h
  #include lp_bld_type.h
  #include lp_bld_const.h
 @@ -67,6 +68,17 @@
  
  #define DUMP_GS_EMITS 0
  
 +/*
 + * If non-zero, the generated LLVM IR will print intermediate results on
 every TGSI
 + * instruction.
 + *
 + * TODO:
 + * - take execution masks in consideration
 + * - debug control-flow instructions
 + */
 +#define DEBUG_EXECUTION 0
 +
 +
  static void lp_exec_mask_init(struct lp_exec_mask *mask, struct
  lp_build_context *bld)
  {
 LLVMTypeRef int_type = LLVMInt32TypeInContext(bld-gallivm-context);
 @@ -664,6 +676,43 @@ static void lp_exec_mask_endsub(struct lp_exec_mask
 *mask, int *pc)
  }
  
  
 +static LLVMValueRef
 +get_file_ptr(struct lp_build_tgsi_soa_context *bld,
 + unsigned file,
 + unsigned index,
 + unsigned chan)
 +{
 +   LLVMBuilderRef builder = bld-bld_base.base.gallivm-builder;
 +   LLVMValueRef (*array_of_vars)[TGSI_NUM_CHANNELS];
 +   LLVMValueRef var_of_array;
 +
 +   switch (file) {
 +   case TGSI_FILE_TEMPORARY:
 +  array_of_vars = bld-temps;
 +  var_of_array = bld-temps_array;
 +  break;
 +   case TGSI_FILE_OUTPUT:
 +  array_of_vars = bld-outputs;
 +  var_of_array = bld-outputs_array;
 +  break;
 +   default:
 +  assert(0);
 +  return NULL;
 +   }
 +
 +   assert(chan  4);
 +
 +   if (bld-indirect_files  (1  file)) {
 +  LLVMValueRef lindex = lp_build_const_int32(bld-bld_base.base.gallivm,
 index * 4 + chan);
 +  return LLVMBuildGEP(builder, var_of_array, lindex, 1, );
 +   }
 +   else {
 +  assert(index = bld-bld_base.info-file_max[file]);
 +  return array_of_vars[index][chan];
 +   }
 +}
 +
 +
  /**
   * Return pointer to a temporary register channel (src or dest).
   * Note that indirect addressing cannot be handled here.
 @@ -675,15 +724,7 @@ lp_get_temp_ptr_soa(struct lp_build_tgsi_soa_context
 *bld,
   unsigned index,
   unsigned chan)
  {
 -   LLVMBuilderRef builder = bld-bld_base.base.gallivm-builder;
 -   assert(chan  4);
 -   if (bld-indirect_files  (1  TGSI_FILE_TEMPORARY)) {
 -  LLVMValueRef lindex = lp_build_const_int32(bld-bld_base.base.gallivm,
 index * 4 + chan);
 -  return LLVMBuildGEP(builder, bld-temps_array, lindex, 1, );
 -   }
 -   else {
 -  return bld-temps[index][chan];
 -   }
 +   return get_file_ptr(bld, TGSI_FILE_TEMPORARY, index, chan);
  }
  
  /**
 @@ -697,16 +738,7 @@ lp_get_output_ptr(struct lp_build_tgsi_soa_context *bld,
 unsigned index,
 unsigned chan)
  {
 -   LLVMBuilderRef builder = bld-bld_base.base.gallivm-builder;
 -   assert(chan  4);
 -   if (bld-indirect_files  (1  TGSI_FILE_OUTPUT)) {
 -  LLVMValueRef lindex = lp_build_const_int32(bld-bld_base.base.gallivm,
 - index * 4 + chan);
 -  return LLVMBuildGEP(builder, bld-outputs_array, lindex, 1, );
 -   }
 -   else {
 -  return bld-outputs[index][chan];
 -   }
 +   return get_file_ptr(bld, TGSI_FILE_OUTPUT, index, chan);
  }
  
  /*
 @@ -1415,6

Re: [Mesa-dev] [PATCH] gallivm: deduplicate some indirect register address code

2013-11-06 Thread Zack Rusin

Looks good.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 There's only one minor functional change, for immediates the pixel offsets
 are no longer added since the values are all the same for all elements in
 any case (it might be better if those weren't stored as soa vectors in the
 first place maybe).
 ---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c |  253
  +--
  1 file changed, 96 insertions(+), 157 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 index 75f6def..5f81066 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 @@ -898,6 +898,39 @@ stype_to_fetch(struct lp_build_tgsi_context * bld_base,
  }
  
  static LLVMValueRef
 +get_soa_array_offsets(struct lp_build_context *uint_bld,
 +  LLVMValueRef indirect_index,
 +  unsigned chan_index,
 +  boolean need_perelement_offset)
 +{
 +   struct gallivm_state *gallivm = uint_bld-gallivm;
 +   LLVMValueRef chan_vec =
 +  lp_build_const_int_vec(uint_bld-gallivm, uint_bld-type, chan_index);
 +   LLVMValueRef length_vec =
 +  lp_build_const_int_vec(gallivm, uint_bld-type,
 uint_bld-type.length);
 +   LLVMValueRef index_vec;
 +
 +   /* index_vec = (indirect_index * 4 + chan_index) * length + offsets */
 +   index_vec = lp_build_shl_imm(uint_bld, indirect_index, 2);
 +   index_vec = lp_build_add(uint_bld, index_vec, chan_vec);
 +   index_vec = lp_build_mul(uint_bld, index_vec, length_vec);
 +
 +   if (need_perelement_offset) {
 +  LLVMValueRef pixel_offsets;
 +  int i;
 + /* build pixel offset vector: {0, 1, 2, 3, ...} */
 +  pixel_offsets = uint_bld-undef;
 +  for (i = 0; i  uint_bld-type.length; i++) {
 + LLVMValueRef ii = lp_build_const_int32(gallivm, i);
 + pixel_offsets = LLVMBuildInsertElement(gallivm-builder,
 pixel_offsets,
 +ii, ii, );
 +  }
 +  index_vec = lp_build_add(uint_bld, index_vec, pixel_offsets);
 +   }
 +   return index_vec;
 +}
 +
 +static LLVMValueRef
  emit_fetch_constant(
 struct lp_build_tgsi_context * bld_base,
 const struct tgsi_full_src_register * reg,
 @@ -908,7 +941,6 @@ emit_fetch_constant(
 struct gallivm_state *gallivm = bld_base-base.gallivm;
 LLVMBuilderRef builder = gallivm-builder;
 struct lp_build_context *uint_bld = bld_base-uint_bld;
 -   LLVMValueRef indirect_index = NULL;
 unsigned dimension = 0;
 LLVMValueRef dimension_index;
 LLVMValueRef consts_ptr;
 @@ -927,16 +959,15 @@ emit_fetch_constant(
 consts_ptr = lp_build_array_get(gallivm, bld-consts_ptr,
 dimension_index);
  
 if (reg-Register.Indirect) {
 +  LLVMValueRef indirect_index;
 +  LLVMValueRef swizzle_vec =
 + lp_build_const_int_vec(gallivm, uint_bld-type, swizzle);
 +  LLVMValueRef index_vec;  /* index into the const buffer */
 +
indirect_index = get_indirect_index(bld,
reg-Register.File,
reg-Register.Index,
reg-Indirect);
 -   }
 -
 -   if (reg-Register.Indirect) {
 -  LLVMValueRef swizzle_vec =
 - lp_build_const_int_vec(bld-bld_base.base.gallivm, uint_bld-type,
 swizzle);
 -  LLVMValueRef index_vec;  /* index into the const buffer */
  
/* index_vec = indirect_index * 4 + swizzle */
index_vec = lp_build_shl_imm(uint_bld, indirect_index, 2);
 @@ -949,7 +980,7 @@ emit_fetch_constant(
LLVMValueRef index;  /* index into the const buffer */
LLVMValueRef scalar, scalar_ptr;
  
 -  index = lp_build_const_int32(gallivm, reg-Register.Index*4 +
 swizzle);
 +  index = lp_build_const_int32(gallivm, reg-Register.Index * 4 +
 swizzle);
  
scalar_ptr = LLVMBuildGEP(builder, consts_ptr,
  index, 1, );
 @@ -974,49 +1005,32 @@ emit_fetch_immediate(
 struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
 struct gallivm_state *gallivm = bld-bld_base.base.gallivm;
 LLVMBuilderRef builder = gallivm-builder;
 -   struct lp_build_context *uint_bld = bld_base-uint_bld;
 -   struct lp_build_context *float_bld = bld_base-base;
 LLVMValueRef res = NULL;
 -   LLVMValueRef indirect_index = NULL;
  
 if (reg-Register.Indirect) {
 +  LLVMValueRef indirect_index;
 +  LLVMValueRef index_vec;  /* index into the immediate register array */
 +  LLVMValueRef imms_array;
 +  LLVMTypeRef fptr_type;
 +
indirect_index = get_indirect_index(bld,
reg-Register.File,
reg-Register.Index,
reg-Indirect);
 -   }
 -
 -   if (reg

[Mesa-dev] [PATCH] graw: add a test rendering a huge triangle

2013-10-24 Thread Zack Rusin

Used to test rasterization, because we often breakdown on
subdivision of triangles with long edges.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/tests/graw/SConscript  |   1 +
 src/gallium/tests/graw/tri-large.c | 173 +
 2 files changed, 174 insertions(+)
 create mode 100644 src/gallium/tests/graw/tri-large.c

diff --git a/src/gallium/tests/graw/SConscript 
b/src/gallium/tests/graw/SConscript
index 8740ff3..8723807 100644
--- a/src/gallium/tests/graw/SConscript
+++ b/src/gallium/tests/graw/SConscript
@@ -29,6 +29,7 @@ progs = [
 'tex-srgb',
 'tex-swizzle',
 'tri',
+'tri-large',
 'tri-gs',
 'tri-instanced',
 'vs-test',
diff --git a/src/gallium/tests/graw/tri-large.c 
b/src/gallium/tests/graw/tri-large.c
new file mode 100644
index 000..3fbbfb3
--- /dev/null
+++ b/src/gallium/tests/graw/tri-large.c
@@ -0,0 +1,173 @@
+/* Display a cleared blue window.  This demo has no dependencies on
+ * any utility code, just the graw interface and gallium.
+ */
+
+#include graw_util.h
+#include util/u_debug.h
+
+#include stdio.h
+
+static struct graw_info info;
+
+static const int WIDTH = 4*2048;
+static const int HEIGHT = 4*2048;
+
+
+struct vertex {
+   float position[4];
+   float color[4];
+};
+
+static boolean FlatShade = FALSE;
+
+
+static struct vertex vertices[3] =
+{
+   {
+  { -1.0f, -1.0f, 0.0f, 1.0f },
+  { 1.0f, 0.0f, 0.0f, 1.0f }
+   },
+   {
+  { -1.0f, 1.0f, 0.0f, 1.0f },
+  { 0.0f, 1.0f, 0.0f, 1.0f }
+   },
+   {
+  { 1.0f, 1.0f, 0.0f, 1.0f },
+  { 0.0f, 0.0f, 1.0f, 1.0f }
+   }
+};
+
+
+static void set_vertices( void )
+{
+   struct pipe_vertex_element ve[2];
+   struct pipe_vertex_buffer vbuf;
+   void *handle;
+
+   memset(ve, 0, sizeof ve);
+
+   ve[0].src_offset = Offset(struct vertex, position);
+   ve[0].src_format = PIPE_FORMAT_R32G32B32A32_FLOAT;
+   ve[1].src_offset = Offset(struct vertex, color);
+   ve[1].src_format = PIPE_FORMAT_R32G32B32A32_FLOAT;
+
+   handle = info.ctx-create_vertex_elements_state(info.ctx, 2, ve);
+   info.ctx-bind_vertex_elements_state(info.ctx, handle);
+
+   memset(vbuf, 0, sizeof vbuf);
+
+   vbuf.stride = sizeof( struct vertex );
+   vbuf.buffer_offset = 0;
+   vbuf.buffer = pipe_buffer_create_with_data(info.ctx,
+  PIPE_BIND_VERTEX_BUFFER,
+  PIPE_USAGE_STATIC,
+  sizeof(vertices),
+  vertices);
+
+   info.ctx-set_vertex_buffers(info.ctx, 0, 1, vbuf);
+}
+
+
+static void set_vertex_shader( void )
+{
+   void *handle;
+   const char *text =
+  VERT\n
+  DCL IN[0]\n
+  DCL IN[1]\n
+  DCL OUT[0], POSITION\n
+  DCL OUT[1], COLOR\n
+0: MOV OUT[1], IN[1]\n
+1: MOV OUT[0], IN[0]\n
+2: END\n;
+
+   handle = graw_parse_vertex_shader(info.ctx, text);
+   info.ctx-bind_vs_state(info.ctx, handle);
+}
+
+
+static void set_fragment_shader( void )
+{
+   void *handle;
+   const char *text =
+  FRAG\n
+  DCL IN[0], COLOR, LINEAR\n
+  DCL OUT[0], COLOR\n
+0: MOV OUT[0], IN[0]\n
+1: END\n;
+
+   handle = graw_parse_fragment_shader(info.ctx, text);
+   info.ctx-bind_fs_state(info.ctx, handle);
+}
+
+
+static void draw( void )
+{
+   union pipe_color_union clear_color = { {1,0,1,1} };
+
+   info.ctx-clear(info.ctx, PIPE_CLEAR_COLOR, clear_color, 0, 0);
+   util_draw_arrays(info.ctx, PIPE_PRIM_TRIANGLES, 0, 3);
+   info.ctx-flush(info.ctx, NULL, 0);
+
+   graw_save_surface_to_file(info.ctx, info.color_surf[0], NULL);
+
+   graw_util_flush_front(info);
+}
+
+
+static void init( void )
+{
+   if (!graw_util_create_window(info, WIDTH, HEIGHT, 1, FALSE))
+  exit(1);
+
+   graw_util_default_state(info, FALSE);
+
+   {
+  struct pipe_rasterizer_state rasterizer;
+  void *handle;
+  memset(rasterizer, 0, sizeof rasterizer);
+  rasterizer.cull_face = PIPE_FACE_NONE;
+  rasterizer.half_pixel_center = 1;
+  rasterizer.bottom_edge_rule = 1;
+  rasterizer.flatshade = FlatShade;
+  rasterizer.depth_clip = 1;
+  handle = info.ctx-create_rasterizer_state(info.ctx, rasterizer);
+  info.ctx-bind_rasterizer_state(info.ctx, handle);
+   }
+
+
+   graw_util_viewport(info, 0, 0, WIDTH, HEIGHT, 30, 1000);
+
+   set_vertices();
+   set_vertex_shader();
+   set_fragment_shader();
+}
+
+static void args(int argc, char *argv[])
+{
+   int i;
+
+   for (i = 1; i  argc; ) {
+  if (graw_parse_args(i, argc, argv)) {
+ /* ok */
+  }
+  else if (strcmp(argv[i], -f) == 0) {
+ FlatShade = TRUE;
+ i++;
+  }
+  else {
+ printf(Invalid arg %s\n, argv[i]);
+ exit(1);
+  }
+   }
+}
+
+int main( int argc, char *argv[] )
+{
+   args(argc, argv);
+   init();
+
+   graw_set_display_func( draw );
+   graw_main_loop();
+   return 0;
+}
-- 
1.8.3.2

[Mesa-dev] [PATCH 1/3] gallivm: support printing of 64 bit integers

2013-10-08 Thread Zack Rusin

only 8 and 32 bit integers were supported before.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/gallivm/lp_bld_printf.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_printf.c 
b/src/gallium/auxiliary/gallivm/lp_bld_printf.c
index 1324da2..d06209a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_printf.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_printf.c
@@ -106,7 +106,11 @@ lp_build_print_value(struct gallivm_state *gallivm,
   type_fmt[4] = 'g';
   type_fmt[5] = '\0';
} else if (type_kind == LLVMIntegerTypeKind) {
-  if (LLVMGetIntTypeWidth(type_ref) == 8) {
+  if (LLVMGetIntTypeWidth(type_ref) == 64) {
+ type_fmt[2] = 'l';
+ type_fmt[3] = 'd';
+ type_fmt[4] = '\0';
+  } else if (LLVMGetIntTypeWidth(type_ref) == 8) {
  type_fmt[2] = 'u';
   } else {
  type_fmt[2] = 'i';
-- 
1.8.1.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] gallium: Add support for 32x32 muls with 64 bit results

2013-10-08 Thread Zack Rusin

The code introduces two new 32bit integer multiplication opcodes which
can be used to produce correct 64 bit results. GLSL, OpenCL and D3D10+
require them. We use two seperate opcodes, because they match the
behavior of GLSL and OpenCL, are a lot easier to add than a single
opcode with multiple destinations and because there's not much (any)
difference wrt code-generation.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/tgsi/tgsi_exec.c | 34 ++
 src/gallium/auxiliary/tgsi/tgsi_info.c |  6 
 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h   |  3 ++
 src/gallium/auxiliary/tgsi/tgsi_util.c |  2 ++
 src/gallium/docs/source/tgsi.rst   | 30 +++
 src/gallium/include/pipe/p_shader_tokens.h |  5 +++-
 .../tests/graw/vertex-shader/vert-imul_hi.sh   | 13 +
 .../tests/graw/vertex-shader/vert-umul_hi.sh   | 11 +++
 8 files changed, 103 insertions(+), 1 deletion(-)
 create mode 100644 src/gallium/tests/graw/vertex-shader/vert-imul_hi.sh
 create mode 100644 src/gallium/tests/graw/vertex-shader/vert-umul_hi.sh

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c 
b/src/gallium/auxiliary/tgsi/tgsi_exec.c
index 0750a50..6db1238 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -3478,6 +3478,32 @@ micro_umul(union tgsi_exec_channel *dst,
 }
 
 static void
+micro_imul_hi(union tgsi_exec_channel *dst,
+  const union tgsi_exec_channel *src0,
+  const union tgsi_exec_channel *src1)
+{
+#define I64M(x, y) int64_t)x) * ((int64_t)y))  32)
+   dst-i[0] = I64M(src0-i[0], src1-i[0]);
+   dst-i[1] = I64M(src0-i[1], src1-i[1]);
+   dst-i[2] = I64M(src0-i[2], src1-i[2]);
+   dst-i[3] = I64M(src0-i[3], src1-i[3]);
+#undef I64M
+}
+
+static void
+micro_umul_hi(union tgsi_exec_channel *dst,
+  const union tgsi_exec_channel *src0,
+  const union tgsi_exec_channel *src1)
+{
+#define U64M(x, y) uint64_t)x) * ((uint64_t)y))  32)
+   dst-u[0] = U64M(src0-u[0], src1-u[0]);
+   dst-u[1] = U64M(src0-u[1], src1-u[1]);
+   dst-u[2] = U64M(src0-u[2], src1-u[2]);
+   dst-u[3] = U64M(src0-u[3], src1-u[3]);
+#undef U64M
+}
+
+static void
 micro_useq(union tgsi_exec_channel *dst,
const union tgsi_exec_channel *src0,
const union tgsi_exec_channel *src1)
@@ -4277,6 +4303,14 @@ exec_instruction(
   exec_vector_binary(mach, inst, micro_umul, TGSI_EXEC_DATA_UINT, 
TGSI_EXEC_DATA_UINT);
   break;
 
+   case TGSI_OPCODE_IMUL_HI:
+  exec_vector_binary(mach, inst, micro_imul_hi, TGSI_EXEC_DATA_INT, 
TGSI_EXEC_DATA_INT);
+  break;
+
+   case TGSI_OPCODE_UMUL_HI:
+  exec_vector_binary(mach, inst, micro_umul_hi, TGSI_EXEC_DATA_UINT, 
TGSI_EXEC_DATA_UINT);
+  break;
+
case TGSI_OPCODE_USEQ:
   exec_vector_binary(mach, inst, micro_useq, TGSI_EXEC_DATA_UINT, 
TGSI_EXEC_DATA_UINT);
   break;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c 
b/src/gallium/auxiliary/tgsi/tgsi_info.c
index 7a5d18f..0beef44 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_info.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
@@ -219,6 +219,8 @@ static const struct tgsi_opcode_info 
opcode_info[TGSI_OPCODE_LAST] =
{ 1, 3, 1, 0, 0, 0, OTHR, TEX2, TGSI_OPCODE_TEX2 },
{ 1, 3, 1, 0, 0, 0, OTHR, TXB2, TGSI_OPCODE_TXB2 },
{ 1, 3, 1, 0, 0, 0, OTHR, TXL2, TGSI_OPCODE_TXL2 },
+   { 1, 2, 0, 0, 0, 0, COMP, IMUL_HI, TGSI_OPCODE_IMUL_HI },
+   { 1, 2, 0, 0, 0, 0, COMP, UMUL_HI, TGSI_OPCODE_UMUL_HI },
 };
 
 const struct tgsi_opcode_info *
@@ -297,6 +299,7 @@ tgsi_opcode_infer_type( uint opcode )
case TGSI_OPCODE_USLT:
case TGSI_OPCODE_USNE:
case TGSI_OPCODE_SVIEWINFO:
+   case TGSI_OPCODE_UMUL_HI:
   return TGSI_TYPE_UNSIGNED;
case TGSI_OPCODE_ARL:
case TGSI_OPCODE_ARR:
@@ -317,6 +320,7 @@ tgsi_opcode_infer_type( uint opcode )
case TGSI_OPCODE_UARL:
case TGSI_OPCODE_IABS:
case TGSI_OPCODE_ISSG:
+   case TGSI_OPCODE_IMUL_HI:
   return TGSI_TYPE_SIGNED;
default:
   return TGSI_TYPE_FLOAT;
@@ -339,7 +343,9 @@ tgsi_opcode_infer_src_type( uint opcode )
case TGSI_OPCODE_CASE:
case TGSI_OPCODE_SAMPLE_I:
case TGSI_OPCODE_SAMPLE_I_MS:
+   case TGSI_OPCODE_UMUL_HI:
   return TGSI_TYPE_UNSIGNED;
+   case TGSI_OPCODE_IMUL_HI:
case TGSI_OPCODE_I2F:
   return TGSI_TYPE_SIGNED;
case TGSI_OPCODE_ARL:
diff --git a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h 
b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
index b8144a8..1ef78dd 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h
@@ -204,6 +204,9 @@ OP12(SAMPLE_INFO)
 
 OP13(UCMP)
 
+OP12(IMUL_HI)
+OP12(UMUL_HI)
+
 #undef OP00
 #undef OP01
 #undef OP10
diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c 
b/src/gallium/auxiliary/tgsi/tgsi_util.c
index b3bc8f2..73a0667 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_util.c
+++ b/src

[Mesa-dev] [PATCH 3/3] llvmpipe: implement 64 bit mul opcodes in llvmpipe

2013-10-08 Thread Zack Rusin

Both the imul_hi and umul_hi are working with this patch.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 60 ++
 1 file changed, 60 insertions(+)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
index 1cfaf78..8caaf83 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
@@ -763,6 +763,64 @@ umul_emit(
emit_data-args[0], emit_data-args[1]);
 }
 
+/* TGSI_OPCODE_IMUL_HI */
+static void
+imul_hi_emit(
+   const struct lp_build_tgsi_action * action,
+   struct lp_build_tgsi_context * bld_base,
+   struct lp_build_emit_data * emit_data)
+{
+   LLVMBuilderRef builder = bld_base-base.gallivm-builder;
+   struct lp_build_context *int_bld = bld_base-int_bld;
+   struct lp_type type = int_bld-type;
+   LLVMValueRef src0, src1;
+   LLVMValueRef dst64;
+   LLVMTypeRef typeRef;
+
+   assert(type.width == 32);
+   type.width = 64;
+   typeRef = lp_build_vec_type(bld_base-base.gallivm, type);
+   src0 = LLVMBuildSExt(builder, emit_data-args[0], typeRef, );
+   src1 = LLVMBuildSExt(builder, emit_data-args[1], typeRef, );
+   dst64 = LLVMBuildMul(builder, src0, src1, );
+   dst64 = LLVMBuildAShr(
+builder, dst64,
+lp_build_const_vec(bld_base-base.gallivm, type, 32), );
+   type.width = 32;
+   typeRef = lp_build_vec_type(bld_base-base.gallivm, type);
+   emit_data-output[emit_data-chan] =
+ LLVMBuildTrunc(builder, dst64, typeRef, );
+}
+
+/* TGSI_OPCODE_UMUL_HI */
+static void
+umul_hi_emit(
+   const struct lp_build_tgsi_action * action,
+   struct lp_build_tgsi_context * bld_base,
+   struct lp_build_emit_data * emit_data)
+{
+   LLVMBuilderRef builder = bld_base-base.gallivm-builder;
+   struct lp_build_context *uint_bld = bld_base-uint_bld;
+   struct lp_type type = uint_bld-type;
+   LLVMValueRef src0, src1;
+   LLVMValueRef dst64;
+   LLVMTypeRef typeRef;
+
+   assert(type.width == 32);
+   type.width = 64;
+   typeRef = lp_build_vec_type(bld_base-base.gallivm, type);
+   src0 = LLVMBuildZExt(builder, emit_data-args[0], typeRef, );
+   src1 = LLVMBuildZExt(builder, emit_data-args[1], typeRef, );
+   dst64 = LLVMBuildMul(builder, src0, src1, );
+   dst64 = LLVMBuildLShr(
+builder, dst64,
+lp_build_const_vec(bld_base-base.gallivm, type, 32), );
+   type.width = 32;
+   typeRef = lp_build_vec_type(bld_base-base.gallivm, type);
+   emit_data-output[emit_data-chan] =
+ LLVMBuildTrunc(builder, dst64, typeRef, );
+}
+
 /* TGSI_OPCODE_MAX */
 static void fmax_emit(
const struct lp_build_tgsi_action * action,
@@ -894,6 +952,8 @@ lp_set_default_actions(struct lp_build_tgsi_context * 
bld_base)
bld_base-op_actions[TGSI_OPCODE_U2F].emit = u2f_emit;
bld_base-op_actions[TGSI_OPCODE_UMAD].emit = umad_emit;
bld_base-op_actions[TGSI_OPCODE_UMUL].emit = umul_emit;
+   bld_base-op_actions[TGSI_OPCODE_IMUL_HI].emit = imul_hi_emit;
+   bld_base-op_actions[TGSI_OPCODE_UMUL_HI].emit = umul_hi_emit;
 
bld_base-op_actions[TGSI_OPCODE_MAX].emit = fmax_emit;
bld_base-op_actions[TGSI_OPCODE_MIN].emit = fmin_emit;
-- 
1.8.1.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] llvmpipe: abstract the code to set number of subpixel bits

2013-10-08 Thread Zack Rusin

As we're moving towards expanding the number of subpixel
bits and the width of the variables used in the computations
we need to make this code a bit more centralized.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_rast.h  |  9 +
 src/gallium/drivers/llvmpipe/lp_setup.c | 14 +-
 src/gallium/drivers/llvmpipe/lp_setup_tri.c |  2 +-
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_rast.h 
b/src/gallium/drivers/llvmpipe/lp_rast.h
index c57f2ea..43c598d 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast.h
+++ b/src/gallium/drivers/llvmpipe/lp_rast.h
@@ -46,9 +46,18 @@ struct lp_scene;
 struct lp_fence;
 struct cmd_bin;
 
+#define FIXED_TYPE_WIDTH 32
 /** For sub-pixel positioning */
 #define FIXED_ORDER 4
 #define FIXED_ONE (1FIXED_ORDER)
+/** Maximum length of an edge in a primitive in pixels.
+ *  If the framebuffer is large we have to think about fixed-point
+ *  integer overflow. Coordinates need ((FIXED_TYPE_WIDTH/2) - 1) bits
+ *  to be able to fit product of two such coordinates inside
+ *  FIXED_TYPE_WIDTH, any larger and we could overflow a
+ *  FIXED_TYPE_WIDTH_-bit int.
+ */
+#define MAX_FIXED_LENGTH (1  (((FIXED_TYPE_WIDTH/2) - 1) - FIXED_ORDER))
 
 /* Rasterizer output size going to jit fs, width/height */
 #define LP_RASTER_BLOCK_SIZE 4
diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c 
b/src/gallium/drivers/llvmpipe/lp_setup.c
index c8199b4..9b277d3 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup.c
@@ -1007,16 +1007,12 @@ try_update_scene_state( struct lp_setup_context *setup )
  setup-draw_regions[i]);
  }
   }
-  /* If the framebuffer is large we have to think about fixed-point
-   * integer overflow.  For 2K by 2K images, coordinates need 15 bits
-   * (2^11 + 4 subpixel bits).  The product of two such numbers would
-   * use 30 bits.  Any larger and we could overflow a 32-bit int.
-   *
-   * To cope with this problem we check if triangles are large and
-   * subdivide them if needed.
+  /*
+   * Subdivide triangles if the framebuffer is larger than the
+   * MAX_FIXED_LENGTH.
*/
-  setup-subdivide_large_triangles = (setup-fb.width  2048 ||
-  setup-fb.height  2048);
+  setup-subdivide_large_triangles = (setup-fb.width  MAX_FIXED_LENGTH ||
+  setup-fb.height  MAX_FIXED_LENGTH);
}
   
setup-dirty = 0;
diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c 
b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
index 051ffa0..9cc81e9 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
@@ -988,7 +988,7 @@ check_subdivide_triangle(struct lp_setup_context *setup,
  const float (*v2)[4],
  triangle_func_t tri)
 {
-   const float maxLen = 2048.0f;  /* longest permissible edge, in pixels */
+   const float maxLen = MAX_FIXED_LENGTH;  /* longest permissible edge, in 
pixels */
float dx10, dy10, len10;
float dx21, dy21, len21;
float dx02, dy02, len02;
-- 
1.8.1.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] llvmpipe: we need to subdivide if fb is bigger in either direction

2013-09-24 Thread Zack Rusin

We need to subdivide triangles if either of the dimensions is
larger than the max edge length, not when both of them are larger.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c 
b/src/gallium/drivers/llvmpipe/lp_setup.c
index 5fde01f..c8199b4 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup.c
@@ -1015,7 +1015,7 @@ try_update_scene_state( struct lp_setup_context *setup )
* To cope with this problem we check if triangles are large and
* subdivide them if needed.
*/
-  setup-subdivide_large_triangles = (setup-fb.width  2048 
+  setup-subdivide_large_triangles = (setup-fb.width  2048 ||
   setup-fb.height  2048);
}
   
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] llvmpipe: align the array used for subdivived vertices

2013-09-23 Thread Zack Rusin

When subdiving a triangle we're using a temporary array to store
the new coordinates for the subdivided triangles. Unfortunately
the array used for that was not aligned properly causing
random crashes in the llvm jit code which was trying to load
vectors from it.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_setup_tri.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c 
b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
index 8b0fcd0..cf67f29 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
@@ -909,7 +909,7 @@ subdiv_tri(struct lp_setup_context *setup,
unsigned n = setup-fs.current.variant-shader-info.base.num_inputs + 1;
const struct lp_shader_input *inputs =
   setup-fs.current.variant-shader-inputs;
-   float vmid[PIPE_MAX_ATTRIBS][4];
+   PIPE_ALIGN_VAR(LP_MIN_VECTOR_ALIGN) float vmid[PIPE_MAX_ATTRIBS][4];
const float (*vm)[4] = (const float (*)[4]) vmid;
unsigned i;
float w0, w1, wm;
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] llvmpipe: count c_primitives before discarding null prims

2013-09-19 Thread Zack Rusin

We need to count the clipper primitives before the rasterizer
discards one it considers to be null.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_setup_tri.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c 
b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
index 23bc6e2..e61efd4 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
@@ -252,7 +252,6 @@ do_triangle_ccw(struct lp_setup_context *setup,
 const float (*v2)[4],
 boolean frontfacing )
 {
-   struct llvmpipe_context *lp_context = (struct llvmpipe_context 
*)setup-pipe;
struct lp_scene *scene = setup-scene;
const struct lp_setup_variant_key *key = setup-setup.variant-key;
struct lp_rast_triangle *tri;
@@ -340,11 +339,6 @@ do_triangle_ccw(struct lp_setup_context *setup,
 
LP_COUNT(nr_tris);
 
-   if (lp_context-active_statistics_queries 
-   !llvmpipe_rasterization_disabled(lp_context)) {
-  lp_context-pipeline_statistics.c_primitives++;
-   }
-
/* Setup parameter interpolants:
 */
setup-setup.variant-jit_function( v0,
@@ -803,7 +797,6 @@ static void retry_triangle_ccw( struct lp_setup_context 
*setup,
}
 }
 
-
 /**
  * Calculate fixed position data for a triangle
  */
@@ -1102,11 +1095,17 @@ static void triangle_both( struct lp_setup_context 
*setup,
   const float (*v2)[4] )
 {
struct fixed_position position;
+   struct llvmpipe_context *lp_context = (struct llvmpipe_context 
*)setup-pipe;
 
if (setup-subdivide_large_triangles 
check_subdivide_triangle(setup, v0, v1, v2, triangle_both))
   return;
 
+   if (lp_context-active_statistics_queries 
+   !llvmpipe_rasterization_disabled(lp_context)) {
+  lp_context-pipeline_statistics.c_primitives++;
+   }
+
calc_fixed_position(setup, position, v0, v1, v2);
 
if (0) {
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] llvmpipe: increase number of subpixel bits to eight

2013-09-19 Thread Zack Rusin

Unfortunately d3d10 requires a lot higher precision (e.g.
wgf11clipping tests for it). The smallest number of precision
bits with which it passes is 8. That means that we need to
decrease the maximum length of an edge that we can handle without
subdivision by 4 bits. Abstracted the code a bit to make it easier
to change once to switch to 64bit rasterization.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_rast.h  | 12 +++-
 src/gallium/drivers/llvmpipe/lp_setup.c | 14 +-
 src/gallium/drivers/llvmpipe/lp_setup_tri.c |  2 +-
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_rast.h 
b/src/gallium/drivers/llvmpipe/lp_rast.h
index c57f2ea..b72be55 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast.h
+++ b/src/gallium/drivers/llvmpipe/lp_rast.h
@@ -46,10 +46,20 @@ struct lp_scene;
 struct lp_fence;
 struct cmd_bin;
 
+#define FIXED_TYPE_WIDTH 32
 /** For sub-pixel positioning */
-#define FIXED_ORDER 4
+#define FIXED_ORDER 8
 #define FIXED_ONE (1FIXED_ORDER)
 
+/** Maximum length of an edge in a primitive in pixels.
+ *  If the framebuffer is large we have to think about fixed-point
+ *  integer overflow.  Coordinates need ((FIXED_TYPE_WIDTH/2) - 1) bits
+ *  to be able to fit product of two such coordinates inside 
+ *  FIXED_TYPE_WIDTH, any larger and we could overflow a 
+ *  FIXED_TYPE_WIDTH_-bit int.
+ */
+#define MAX_FIXED_LENGTH (1  (((FIXED_TYPE_WIDTH/2) - 1) - FIXED_ORDER))
+
 /* Rasterizer output size going to jit fs, width/height */
 #define LP_RASTER_BLOCK_SIZE 4
 
diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c 
b/src/gallium/drivers/llvmpipe/lp_setup.c
index 5fde01f..edb55ad 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup.c
@@ -1007,16 +1007,12 @@ try_update_scene_state( struct lp_setup_context *setup )
  setup-draw_regions[i]);
  }
   }
-  /* If the framebuffer is large we have to think about fixed-point
-   * integer overflow.  For 2K by 2K images, coordinates need 15 bits
-   * (2^11 + 4 subpixel bits).  The product of two such numbers would
-   * use 30 bits.  Any larger and we could overflow a 32-bit int.
-   *
-   * To cope with this problem we check if triangles are large and
-   * subdivide them if needed.
+  /* 
+   * Subdivide triangles if the framebuffer is larger than our 
+   * MAX_FIXED_LENGTH cab accomodate.
*/
-  setup-subdivide_large_triangles = (setup-fb.width  2048 
-  setup-fb.height  2048);
+  setup-subdivide_large_triangles = (setup-fb.width  MAX_FIXED_LENGTH 
+  setup-fb.height  MAX_FIXED_LENGTH);
}
   
setup-dirty = 0;
diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c 
b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
index e61efd4..ee30a64 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
@@ -988,7 +988,7 @@ check_subdivide_triangle(struct lp_setup_context *setup,
  const float (*v2)[4],
  triangle_func_t tri)
 {
-   const float maxLen = 2048.0f;  /* longest permissible edge, in pixels */
+   const float maxLen = MAX_FIXED_LENGTH;  /* longest permissible edge, in 
pixels */
float dx10, dy10, len10;
float dx21, dy21, len21;
float dx02, dy02, len02;
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] draw/clip: don't emit so many empty triangles

2013-09-19 Thread Zack Rusin

Compress empty triangles (don't emit more than one in a row) and
never emit empty triangles if we already generated a triangle
covering a non-null area. We can't skip all null-triangles
because c_primitives expects ones that were generated from vertices
exactly at the clipping-plane, to be emitted.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_pipe_clip.c | 39 +
 1 file changed, 39 insertions(+)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c 
b/src/gallium/auxiliary/draw/draw_pipe_clip.c
index 0f90bfd..2d6df81 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_clip.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c
@@ -209,6 +209,29 @@ static void interp( const struct clip_stage *clip,
}
 }
 
+/**
+ * Checks whether the specifed triangle is empty and if it is returns
+ * true, otherwise returns false.
+ * Triangle is considered null/empty if it's area is qual to zero.
+ */
+static INLINE boolean
+is_tri_null(struct draw_context *draw, const struct prim_header *header)
+{
+   const unsigned pos_attr = draw_current_shader_position_output(draw);
+   float x1 = header-v[1]-data[pos_attr][0] - 
header-v[0]-data[pos_attr][0];
+   float y1 = header-v[1]-data[pos_attr][1] - 
header-v[0]-data[pos_attr][1];
+   float z1 = header-v[1]-data[pos_attr][2] - 
header-v[0]-data[pos_attr][2];
+
+   float x2 = header-v[2]-data[pos_attr][0] - 
header-v[0]-data[pos_attr][0];
+   float y2 = header-v[2]-data[pos_attr][1] - 
header-v[0]-data[pos_attr][1];
+   float z2 = header-v[2]-data[pos_attr][2] - 
header-v[0]-data[pos_attr][2];
+
+   float vx = y1 * z2 - z1 * y2;
+   float vy = x1 * z2 - z1 * x2;
+   float vz = x1 * y2 - y1 * x2;
+
+   return (vx*vx  + vy*vy + vz*vz) == 0.f;
+}
 
 /**
  * Emit a post-clip polygon to the next pipeline stage.  The polygon
@@ -223,6 +246,8 @@ static void emit_poly( struct draw_stage *stage,
struct prim_header header;
unsigned i;
ushort edge_first, edge_middle, edge_last;
+   boolean last_tri_was_null = FALSE;
+   boolean tri_was_not_null = FALSE;
 
if (stage-draw-rasterizer-flatshade_first) {
   edge_first  = DRAW_PIPE_EDGE_FLAG_0;
@@ -244,6 +269,7 @@ static void emit_poly( struct draw_stage *stage,
header.pad = 0;
 
for (i = 2; i  n; i++, header.flags = edge_middle) {
+  boolean tri_null;
   /* order the triangle verts to respect the provoking vertex mode */
   if (stage-draw-rasterizer-flatshade_first) {
  header.v[0] = inlist[0];  /* the provoking vertex */
@@ -256,6 +282,19 @@ static void emit_poly( struct draw_stage *stage,
  header.v[2] = inlist[0];  /* the provoking vertex */
   }
 
+  tri_null = is_tri_null(stage-draw, header);
+  /* If we generated a triangle with an area, aka. non-null triangle, 
+   * or if the previous triangle was also null then skip all subsequent
+   * null triangles */
+  if ((tri_was_not_null  tri_null) || (last_tri_was_null  tri_null)) {
+ last_tri_was_null = tri_null;
+ continue;
+  }
+  last_tri_was_null = tri_null;
+  if (!tri_null) {
+ tri_was_not_null = TRUE;
+  }
+
   if (!edgeflags[i-1]) {
  header.flags = ~edge_middle;
   }
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] util/u_blit: Implement util_blit_pixels via pipe_context::blit.

2013-09-17 Thread Zack Rusin

The entire series looks good to me.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: José Fonseca jfons...@vmware.com
 
 This removes a lot of code, but not everything, as util_blit_pixels_tex
 is still useful when one needs to override pipe_sampler_view::swizzle_?.
 ---
  src/gallium/auxiliary/util/u_blit.c | 447
  +++-
  1 file changed, 37 insertions(+), 410 deletions(-)
 
 diff --git a/src/gallium/auxiliary/util/u_blit.c
 b/src/gallium/auxiliary/util/u_blit.c
 index e9bec4a..4ba71b9 100644
 --- a/src/gallium/auxiliary/util/u_blit.c
 +++ b/src/gallium/auxiliary/util/u_blit.c
 @@ -57,29 +57,20 @@ struct blit_state
 struct pipe_context *pipe;
 struct cso_context *cso;
  
 -   struct pipe_blend_state blend_write_color, blend_keep_color;
 +   struct pipe_blend_state blend_write_color;
 struct pipe_depth_stencil_alpha_state dsa_keep_depthstencil;
 -   struct pipe_depth_stencil_alpha_state dsa_write_depthstencil;
 -   struct pipe_depth_stencil_alpha_state dsa_write_depth;
 -   struct pipe_depth_stencil_alpha_state dsa_write_stencil;
 struct pipe_rasterizer_state rasterizer;
 struct pipe_sampler_state sampler;
 struct pipe_viewport_state viewport;
 struct pipe_vertex_element velem[2];
 -   enum pipe_texture_target internal_target;
  
 void *vs;
 void *fs[PIPE_MAX_TEXTURE_TYPES][TGSI_WRITEMASK_XYZW + 1];
 -   void *fs_depthstencil[PIPE_MAX_TEXTURE_TYPES];
 -   void *fs_depth[PIPE_MAX_TEXTURE_TYPES];
 -   void *fs_stencil[PIPE_MAX_TEXTURE_TYPES];
  
 struct pipe_resource *vbuf;  /** quad vertices */
 unsigned vbuf_slot;
  
 float vertices[4][2][4];   /** vertex/texcoords for quad */
 -
 -   boolean has_stencil_export;
  };
  
  
 @@ -103,20 +94,6 @@ util_create_blit(struct pipe_context *pipe, struct
 cso_context *cso)
 /* disabled blending/masking */
 ctx-blend_write_color.rt[0].colormask = PIPE_MASK_RGBA;
  
 -   /* depth stencil states */
 -   ctx-dsa_write_depth.depth.enabled = 1;
 -   ctx-dsa_write_depth.depth.writemask = 1;
 -   ctx-dsa_write_depth.depth.func = PIPE_FUNC_ALWAYS;
 -   ctx-dsa_write_stencil.stencil[0].enabled = 1;
 -   ctx-dsa_write_stencil.stencil[0].func = PIPE_FUNC_ALWAYS;
 -   ctx-dsa_write_stencil.stencil[0].fail_op = PIPE_STENCIL_OP_REPLACE;
 -   ctx-dsa_write_stencil.stencil[0].zpass_op = PIPE_STENCIL_OP_REPLACE;
 -   ctx-dsa_write_stencil.stencil[0].zfail_op = PIPE_STENCIL_OP_REPLACE;
 -   ctx-dsa_write_stencil.stencil[0].valuemask = 0xff;
 -   ctx-dsa_write_stencil.stencil[0].writemask = 0xff;
 -   ctx-dsa_write_depthstencil.depth = ctx-dsa_write_depth.depth;
 -   ctx-dsa_write_depthstencil.stencil[0] =
 ctx-dsa_write_stencil.stencil[0];
 -
 /* rasterizer */
 ctx-rasterizer.cull_face = PIPE_FACE_NONE;
 ctx-rasterizer.half_pixel_center = 1;
 @@ -147,14 +124,6 @@ util_create_blit(struct pipe_context *pipe, struct
 cso_context *cso)
ctx-vertices[i][1][3] = 1.0f; /* q */
 }
  
 -   if(pipe-screen-get_param(pipe-screen, PIPE_CAP_NPOT_TEXTURES))
 -  ctx-internal_target = PIPE_TEXTURE_2D;
 -   else
 -  ctx-internal_target = PIPE_TEXTURE_RECT;
 -
 -   ctx-has_stencil_export =
 -  pipe-screen-get_param(pipe-screen, PIPE_CAP_SHADER_STENCIL_EXPORT);
 -
 return ctx;
  }
  
 @@ -178,18 +147,6 @@ util_destroy_blit(struct blit_state *ctx)
}
 }
  
 -   for (i = 0; i  PIPE_MAX_TEXTURE_TYPES; i++) {
 -  if (ctx-fs_depthstencil[i]) {
 - pipe-delete_fs_state(pipe, ctx-fs_depthstencil[i]);
 -  }
 -  if (ctx-fs_depth[i]) {
 - pipe-delete_fs_state(pipe, ctx-fs_depth[i]);
 -  }
 -  if (ctx-fs_stencil[i]) {
 - pipe-delete_fs_state(pipe, ctx-fs_stencil[i]);
 -  }
 -   }
 -
 pipe_resource_reference(ctx-vbuf, NULL);
  
 FREE(ctx);
 @@ -217,63 +174,6 @@ set_fragment_shader(struct blit_state *ctx, uint
 writemask,
  
  
  /**
 - * Helper function to set the shader which writes depth and stencil.
 - */
 -static INLINE void
 -set_depthstencil_fragment_shader(struct blit_state *ctx,
 - enum pipe_texture_target pipe_tex)
 -{
 -   if (!ctx-fs_depthstencil[pipe_tex]) {
 -  unsigned tgsi_tex = util_pipe_tex_to_tgsi_tex(pipe_tex, 0);
 -
 -  ctx-fs_depthstencil[pipe_tex] =
 - util_make_fragment_tex_shader_writedepthstencil(ctx-pipe,
 tgsi_tex,
 -  TGSI_INTERPOLATE_LINEAR);
 -   }
 -
 -   cso_set_fragment_shader_handle(ctx-cso, ctx-fs_depthstencil[pipe_tex]);
 -}
 -
 -
 -/**
 - * Helper function to set the shader which writes depth.
 - */
 -static INLINE void
 -set_depth_fragment_shader(struct blit_state *ctx,
 -  enum pipe_texture_target pipe_tex)
 -{
 -   if (!ctx-fs_depth[pipe_tex]) {
 -  unsigned tgsi_tex = util_pipe_tex_to_tgsi_tex(pipe_tex, 0);
 -
 -  ctx-fs_depth[pipe_tex] =
 - util_make_fragment_tex_shader_writedepth(ctx-pipe, tgsi_tex

Re: [Mesa-dev] [PATCH] Revert draw: cleanup the extra attribs

2013-09-04 Thread Zack Rusin

 This reverts commit 57cd3267782fcf92d1e7d772760956516d4367df.
 
 This fixes piglit regressions with additional draw stages on
 llvmpipe, softpipe and i915g. The attributes can't be cleared at
 this point because they might be in use by the additional draw
 stages.

The attributes have to cleared but the interface for looking them up has to be 
exactly the same in llvmpipe (i.e. only llvmpipe does it correctly).

 https://bugs.freedesktop.org/show_bug.cgi?id=67963
 https://bugs.freedesktop.org/show_bug.cgi?id=67965
 https://bugs.freedesktop.org/show_bug.cgi?id=67966

All of which have been fixed for a long time, just no one had the time to 
verify and close. In other words please don't revert, if you don't feel like 
changing the shader output lookup just remove the prepare_shader_outputs call, 
like I mentioned, and that should get you the old behavior back.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] draw: cleanup the extra attribs

2013-09-04 Thread Zack Rusin

Hi, Stéphane. 

No we should not revert to the old behavior. The old behavior was incorrect. 
Consider this: 

-- setup state that draws a wireframe - draw should inject frontface 
-- the driver needs to be able to find the injected wireframe output 
-- draw 
-- setup state the draws solid fill with fragment shader using primid input - 
draw should inject primid but not frontface 
-- driver needs to be able to find the injected primid but not frontface info 
-- draw 

Without cleaning the attributed before the second draw the draw will keep the 
frontface id in the extra attribs, incorrectly pointing the driver to a 
non-existing crash. That's why the attribs need to be cleaned before rendering. 

i915g simply shouldn't call draw_prepare_shader_outputs because it doesn't know 
what to do with the injected front-face or primid anyway. That part I'd suggest 
you remove. It will get you back to the old behavior. 

z 

- Original Message -

 Hi Zack,

 This change regresses a bunch of point sprite piglit tests on i915g. Should
 we revert back to the old behaviour? As far as I can see, it was correct (it
 was keeping the attributes in case another stage is using them).

 Stéphane

 On Thu, Aug 8, 2013 at 12:46 PM, Zack Rusin  za...@vmware.com  wrote:

  Before inserting new front face and prim id outputs cleanup
 
  the old extra outputs, otherwise our cache will use previous
 
  output slots which will break as soon as outputs of the current
 
  shader don't match the last.
 

  Signed-off-by: Zack Rusin  za...@vmware.com 
 
  ---
 
  src/gallium/auxiliary/draw/draw_context.c | 1 +
 
  1 file changed, 1 insertion(+)
 

  diff --git a/src/gallium/auxiliary/draw/draw_context.c
  b/src/gallium/auxiliary/draw/draw_context.c
 
  index af9caee..2dc6772 100644
 
  --- a/src/gallium/auxiliary/draw/draw_context.c
 
  +++ b/src/gallium/auxiliary/draw/draw_context.c
 
  @@ -555,6 +555,7 @@ draw_get_shader_info(const struct draw_context *draw)
 
  void
 
  draw_prepare_shader_outputs(struct draw_context *draw)
 
  {
 
  + draw_remove_extra_vertex_attribs(draw);
 
  draw_ia_prepare_outputs(draw, draw-pipeline.ia);
 
  draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled);
 
  }
 
  --
 
  1.7.10.4
 
  ___
 
  mesa-dev mailing list
 
  mesa-dev@lists.freedesktop.org
 
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] gallivm: support indirect registers on both dimensions

2013-09-03 Thread Zack Rusin

We support indirect addressing only on the vertex index, but some
shaders also use indirect addressing on attributes. This patch
adds support for indirect addressing on both dimensions inside
gs arrays.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_llvm.c  | 23 +--
 src/gallium/auxiliary/gallivm/lp_bld_tgsi.h |  3 ++-
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c |  4 +++-
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 820d6b0..03668d9 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1360,8 +1360,9 @@ clipmask_booli32(struct gallivm_state *gallivm,
 static LLVMValueRef
 draw_gs_llvm_fetch_input(const struct lp_build_tgsi_gs_iface *gs_iface,
  struct lp_build_tgsi_context * bld_base,
- boolean is_indirect,
+ boolean is_vindex_indirect,
  LLVMValueRef vertex_index,
+ boolean is_aindex_indirect,
  LLVMValueRef attrib_index,
  LLVMValueRef swizzle_index)
 {
@@ -1372,18 +1373,28 @@ draw_gs_llvm_fetch_input(const struct 
lp_build_tgsi_gs_iface *gs_iface,
LLVMValueRef res;
struct lp_type type = bld_base-base.type;
 
-   if (is_indirect) {
+   if (is_vindex_indirect || is_aindex_indirect) {
   int i;
   res = bld_base-base.zero;
   for (i = 0; i  type.length; ++i) {
  LLVMValueRef idx = lp_build_const_int32(gallivm, i);
- LLVMValueRef vert_chan_index = LLVMBuildExtractElement(builder,
-vertex_index, 
idx, );
+ LLVMValueRef vert_chan_index = vertex_index;
+ LLVMValueRef attr_chan_index = attrib_index;
  LLVMValueRef channel_vec, value;
+
+ if (is_vindex_indirect) {
+vert_chan_index = LLVMBuildExtractElement(builder,
+  vertex_index, idx, );
+ }
+ if (is_aindex_indirect) {
+attr_chan_index = LLVMBuildExtractElement(builder,
+  attrib_index, idx, );
+ }
+
  indices[0] = vert_chan_index;
- indices[1] = attrib_index;
+ indices[1] = attr_chan_index;
  indices[2] = swizzle_index;
- 
+
  channel_vec = LLVMBuildGEP(builder, gs-input, indices, 3, );
  channel_vec = LLVMBuildLoad(builder, channel_vec, );
  value = LLVMBuildExtractElement(builder, channel_vec, idx, );
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
index 522302e..8bcdbc8 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h
@@ -395,8 +395,9 @@ struct lp_build_tgsi_gs_iface
 {
LLVMValueRef (*fetch_input)(const struct lp_build_tgsi_gs_iface *gs_iface,
struct lp_build_tgsi_context * bld_base,
-   boolean is_indirect,
+   boolean is_vindex_indirect,
LLVMValueRef vertex_index,
+   boolean is_aindex_indirect,
LLVMValueRef attrib_index,
LLVMValueRef swizzle_index);
void (*emit_vertex)(const struct lp_build_tgsi_gs_iface *gs_iface,
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index 4c6b6ec..e50f1d1 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -1135,7 +1135,9 @@ emit_fetch_gs_input(
 
res = bld-gs_iface-fetch_input(bld-gs_iface, bld_base,
 reg-Dimension.Indirect,
-vertex_index, attrib_index,
+vertex_index,
+reg-Register.Indirect,
+attrib_index,
 swizzle_index);
 
assert(res);
-- 
1.8.3.2
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] draw: fix PIPE_MAX_SAMPLER/PIPE_MAX_SHADER_SAMPLER_VIEWS issues

2013-08-30 Thread Zack Rusin

Looks good.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 pstipple/aaline stages used PIPE_MAX_SAMPLER instead of
 PIPE_MAX_SHADER_SAMPLER_VIEWS when dealing with sampler views.
 Now these stages can't actually handle sampler_unit != texture_unit anyway
 (they cannot work with d3d10 shaders at all due to using tex not sample
 opcodes as mixed mode shaders are impossible) but this leads to crashes if
 a driver just installs these stages and then more than PIPE_MAX_SAMPLER views
 are set even if the stages aren't even used.
 ---
  src/gallium/auxiliary/draw/draw_pipe_aaline.c   |6 +++---
  src/gallium/auxiliary/draw/draw_pipe_pstipple.c |6 +++---
  2 files changed, 6 insertions(+), 6 deletions(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c
 b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
 index c44c236..8483bd7 100644
 --- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c
 +++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
 @@ -107,7 +107,7 @@ struct aaline_stage
 struct aaline_fragment_shader *fs;
 struct {
void *sampler[PIPE_MAX_SAMPLERS];
 -  struct pipe_sampler_view *sampler_views[PIPE_MAX_SAMPLERS];
 +  struct pipe_sampler_view
 *sampler_views[PIPE_MAX_SHADER_SAMPLER_VIEWS];
 } state;
  
 /*
 @@ -763,7 +763,7 @@ aaline_destroy(struct draw_stage *stage)
 struct pipe_context *pipe = stage-draw-pipe;
 uint i;
  
 -   for (i = 0; i  PIPE_MAX_SAMPLERS; i++) {
 +   for (i = 0; i  PIPE_MAX_SHADER_SAMPLER_VIEWS; i++) {
pipe_sampler_view_reference(aaline-state.sampler_views[i], NULL);
 }
  
 @@ -937,7 +937,7 @@ aaline_set_sampler_views(struct pipe_context *pipe,
 for (i = 0; i  num; i++) {
pipe_sampler_view_reference(aaline-state.sampler_views[i],
views[i]);
 }
 -   for ( ; i  PIPE_MAX_SAMPLERS; i++) {
 +   for ( ; i  PIPE_MAX_SHADER_SAMPLER_VIEWS; i++) {
pipe_sampler_view_reference(aaline-state.sampler_views[i], NULL);
 }
 aaline-num_sampler_views = num;
 diff --git a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
 b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
 index 51f5a86..f38addd 100644
 --- a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
 +++ b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c
 @@ -87,7 +87,7 @@ struct pstip_stage
 struct pstip_fragment_shader *fs;
 struct {
void *samplers[PIPE_MAX_SAMPLERS];
 -  struct pipe_sampler_view *sampler_views[PIPE_MAX_SAMPLERS];
 +  struct pipe_sampler_view
 *sampler_views[PIPE_MAX_SHADER_SAMPLER_VIEWS];
const struct pipe_poly_stipple *stipple;
 } state;
  
 @@ -592,7 +592,7 @@ pstip_destroy(struct draw_stage *stage)
 struct pstip_stage *pstip = pstip_stage(stage);
 uint i;
  
 -   for (i = 0; i  PIPE_MAX_SAMPLERS; i++) {
 +   for (i = 0; i  PIPE_MAX_SHADER_SAMPLER_VIEWS; i++) {
pipe_sampler_view_reference(pstip-state.sampler_views[i], NULL);
 }
  
 @@ -731,7 +731,7 @@ pstip_set_sampler_views(struct pipe_context *pipe,
 for (i = 0; i  num; i++) {
pipe_sampler_view_reference(pstip-state.sampler_views[i], views[i]);
 }
 -   for (; i  PIPE_MAX_SAMPLERS; i++) {
 +   for (; i  PIPE_MAX_SHADER_SAMPLER_VIEWS; i++) {
pipe_sampler_view_reference(pstip-state.sampler_views[i], NULL);
 }
  
 --
 1.7.9.5
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] gallivm: handle unbound textures in texture sampling / texture queries

2013-08-30 Thread Zack Rusin

Same here.

- Original Message -
 Series LGTM.
 
 Jose
 
 - Original Message -
  From: Roland Scheidegger srol...@vmware.com
  
  Turns out we don't need to do much extra work for detecting this case,
  since we are guaranteed to get a empty static texture state in this case,
  hence just rely on format being 0 and return all zero then.
  Previously needed dummy textures (would just have crashed on format being 0
  otherwise) which cannot return the correct result for size queries and when
  sampling textures with wrap modes using border.
  As a bonus should hugely increase performance when sampling unbound
  textures
  -
  too bad it isn't a useful feature :-).
  ---
   src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |   26
   +
   1 file changed, 26 insertions(+)
  
  diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
  b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
  index db5e366..e0d3dd2 100644
  --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
  +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
  @@ -2088,6 +2088,19 @@ lp_build_sample_soa(struct gallivm_state *gallivm,
 debug_printf(Sample from %s\n, util_format_name(fmt));
  }
   
  +   if (static_texture_state-format == PIPE_FORMAT_NONE) {
  +  /*
  +   * If there's nothing bound, format is NONE, and we must return
  +   * all zero as mandated by d3d10 in this case.
  +   */
  +  unsigned chan;
  +  LLVMValueRef zero = lp_build_const_vec(gallivm, type, 0.0F);
  +  for (chan = 0; chan  4; chan++) {
  + texel_out[chan] = zero;
  +  }
  +  return;
  +   }
  +
  assert(type.floating);
   
  /* Setup our build context */
  @@ -2517,6 +2530,19 @@ lp_build_size_query_soa(struct gallivm_state
  *gallivm,
  unsigned num_lods = 1;
  struct lp_build_context bld_int_vec4;
   
  +   if (static_state-format == PIPE_FORMAT_NONE) {
  +  /*
  +   * If there's nothing bound, format is NONE, and we must return
  +   * all zero as mandated by d3d10 in this case.
  +   */
  +  unsigned chan;
  +  LLVMValueRef zero = lp_build_const_vec(gallivm, int_type, 0.0F);
  +  for (chan = 0; chan  4; chan++) {
  + sizes_out[chan] = zero;
  +  }
  +  return;
  +   }
  +
  /*
   * Do some sanity verification about bound texture and shader dcl
   target.
   * Not entirely sure what's possible but assume array/non-array
  --
  1.7.9.5
  
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] llvmpipe: fix stencil bug if we have both stencil and depth tests

2013-08-15 Thread Zack Rusin

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 This is a very well hidden bug found by accident (only the fixed glean
 tstencil2 test so far seems to hit it).
 We must use new mask with combined s_pass values and orig_mask values
 for zpass/zfail stencil ops, otherwise both the sfail op and one of
 zpass/zfail op are applied (probably not hit in most tests because
 some of the ops tend to be KEEP usually).
 
 Note: this is a candidate for the 9.2 branch.

Looks good
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] draw: handle nan clipdistance

2013-08-15 Thread Zack Rusin

If clipdistance for one of the vertices is nan (or inf) then the
entire primitive should be discarded.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_cliptest_tmp.h |2 +-
 src/gallium/auxiliary/draw/draw_llvm.c |3 ++
 src/gallium/auxiliary/draw/draw_pipe_clip.c|   13 +-
 src/gallium/auxiliary/gallivm/lp_bld_arit.c|   53 
 src/gallium/auxiliary/gallivm/lp_bld_arit.h|   11 +
 5 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h 
b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
index e4500db..fc54810 100644
--- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
+++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h
@@ -140,7 +140,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs,
  clipdist = out-data[cd[0]][i];
   else
  clipdist = out-data[cd[1]][i-4];
-  if (clipdist  0)
+  if (clipdist  0 || util_is_inf_or_nan(clipdist))
  mask |= 1  plane_idx;
} else {
   if (dot4(clipvertex, plane[plane_idx])  0)
diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 84e3392..1e9eadb 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1261,6 +1261,7 @@ generate_clipmask(struct draw_llvm *llvm,
if (clip_user) {
   LLVMValueRef planes_ptr = draw_jit_context_planes(gallivm, context_ptr);
   LLVMValueRef indices[3];
+  LLVMValueRef is_nan;
 
   /* userclip planes */
   while (ucp_enable) {
@@ -1280,6 +1281,8 @@ generate_clipmask(struct draw_llvm *llvm,
clipdist = LLVMBuildLoad(builder, outputs[cd[1]][i-4], );
 }
 test = lp_build_compare(gallivm, f32_type, PIPE_FUNC_GREATER, 
zero, clipdist);
+is_nan = lp_build_is_inf_or_nan(gallivm, vs_type, clipdist);
+test = LLVMBuildOr(builder, test, is_nan, );
 temp = lp_build_const_int_vec(gallivm, i32_type, 1  plane_idx);
 test = LLVMBuildAnd(builder, test, temp, );
 mask = LLVMBuildOr(builder, mask, test, );
diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c 
b/src/gallium/auxiliary/draw/draw_pipe_clip.c
index b76e9a5..2f2aadb 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_clip.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c
@@ -104,7 +104,7 @@ static void interp_attr( float dst[4],
 float t,
 const float in[4],
 const float out[4] )
-{  
+{
dst[0] = LINTERP( t, out[0], in[0] );
dst[1] = LINTERP( t, out[1], in[1] );
dst[2] = LINTERP( t, out[2], in[2] );
@@ -380,6 +380,9 @@ do_clip_tri( struct draw_stage *stage,
   dp_prev = getclipdist(clipper, vert_prev, plane_idx);
   clipmask = ~(1plane_idx);
 
+  if (util_is_inf_or_nan(dp_prev))
+ return; //discard nan
+
   assert(n  MAX_CLIPPED_VERTICES);
   if (n = MAX_CLIPPED_VERTICES)
  return;
@@ -392,6 +395,9 @@ do_clip_tri( struct draw_stage *stage,
 
  float dp = getclipdist(clipper, vert, plane_idx);
 
+ if (util_is_inf_or_nan(dp))
+return; //discard nan
+
 if (!IS_NEGATIVE(dp_prev)) {
 assert(outcount  MAX_CLIPPED_VERTICES);
 if (outcount = MAX_CLIPPED_VERTICES)
@@ -522,6 +528,9 @@ do_clip_line( struct draw_stage *stage,
   const float dp0 = getclipdist(clipper, v0, plane_idx);
   const float dp1 = getclipdist(clipper, v1, plane_idx);
 
+  if (util_is_inf_or_nan(dp0) || util_is_inf_or_nan(dp1))
+ return; //discard nan
+
   if (dp1  0.0F) {
 float t = dp1 / (dp1 - dp0);
  t1 = MAX2(t1, t);
@@ -594,7 +603,7 @@ clip_tri( struct draw_stage *stage,
unsigned clipmask = (header-v[0]-clipmask | 
 header-v[1]-clipmask | 
 header-v[2]-clipmask);
-   
+
if (clipmask == 0) {
   /* no clipping needed */
   stage-next-tri( stage-next, header );
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c 
b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
index 98409c3..72b563e 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
@@ -3671,3 +3671,56 @@ lp_build_isfinite(struct lp_build_context *bld,
return lp_build_compare(bld-gallivm, int_type, PIPE_FUNC_NOTEQUAL,
intx, infornan32);
 }
+
+/*
+ * Returns true if the number is nan or inf or false otherwise.
+ * The input has to be a floating point vector.
+ */
+LLVMValueRef
+lp_build_is_inf_or_nan(struct gallivm_state *gallivm,
+   const struct lp_type type,
+   LLVMValueRef x)
+{
+   LLVMBuilderRef builder = gallivm-builder;
+   struct lp_type int_type = lp_int_type(type

Re: [Mesa-dev] [PATCH] draw: handle nan clipdistance

2013-08-15 Thread Zack Rusin

 I realize this function isn't used but it looks unnecessarily
 complicated - two constants one AND plus one comparison when you could
 simply do a single comparison (compare x with x with unordered not
 equal). This is actually doubly bad with AVX because the int comparison
 is going to use 4 instructions instead of 1 (extract/2 cmp/1 insert),
 well if this runs 8-wide at least.

I'm going to kill that function, we already have lp_build_isnan that does the 
correct thing.

 Otherwise looks good. Though I'm not sure you really need to kill the
 prims if the clip distances are infinite?

The d3d10 spec says Coordinates coming in to clipping with infinites at x, y, 
z may or may not result in a discarded primitive.. I liked handling them the 
same way as nan, otherwise we're just generating pointless primitives. I don't 
have a strong opinion though, wlk doesn't seem to test infinites.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] llvmpipe: fix pipeline statistics with a null ps

2013-08-14 Thread Zack Rusin

If the fragment shader is null then pixel shader invocations have
to be equal to zero. And if we're running a null ps then clipper
invocations and primitives should be equal to zero but only
if both stancil and depth testing are disabled.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_rast.c|3 ++-
 src/gallium/drivers/llvmpipe/lp_rast_priv.h   |3 ++-
 src/gallium/drivers/llvmpipe/lp_setup_line.c  |3 ++-
 src/gallium/drivers/llvmpipe/lp_setup_point.c |3 ++-
 src/gallium/drivers/llvmpipe/lp_setup_tri.c   |3 ++-
 src/gallium/drivers/llvmpipe/lp_setup_vbuf.c  |9 +++--
 src/gallium/drivers/llvmpipe/lp_state_fs.c|   24 +++-
 src/gallium/drivers/llvmpipe/lp_state_fs.h|4 
 8 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c 
b/src/gallium/drivers/llvmpipe/lp_rast.c
index 49cdbfe..af661e9 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast.c
+++ b/src/gallium/drivers/llvmpipe/lp_rast.c
@@ -35,6 +35,7 @@
 #include os/os_time.h
 
 #include lp_scene_queue.h
+#include lp_context.h
 #include lp_debug.h
 #include lp_fence.h
 #include lp_perf.h
@@ -459,7 +460,7 @@ lp_rast_shade_quads_mask(struct lp_rasterizer_task *task,
if ((x % TILE_SIZE)  task-width  (y % TILE_SIZE)  task-height) {
   /* not very accurate would need a popcount on the mask */
   /* always count this not worth bothering? */
-  task-ps_invocations++;
+  task-ps_invocations += 1 * variant-ps_inv_multiplier;
 
   /* run shader on 4x4 block */
   BEGIN_JIT_CALL(state, task);
diff --git a/src/gallium/drivers/llvmpipe/lp_rast_priv.h 
b/src/gallium/drivers/llvmpipe/lp_rast_priv.h
index b8bc99c..41fe097 100644
--- a/src/gallium/drivers/llvmpipe/lp_rast_priv.h
+++ b/src/gallium/drivers/llvmpipe/lp_rast_priv.h
@@ -100,6 +100,7 @@ struct lp_rasterizer_task
/* occlude counter for visible pixels */
struct lp_jit_thread_data thread_data;
uint64_t ps_invocations;
+   uint8_t ps_inv_multiplier;
 
pipe_semaphore work_ready;
pipe_semaphore work_done;
@@ -308,7 +309,7 @@ lp_rast_shade_quads_all( struct lp_rasterizer_task *task,
if ((x % TILE_SIZE)  task-width  (y % TILE_SIZE)  task-height) {
   /* not very accurate would need a popcount on the mask */
   /* always count this not worth bothering? */
-  task-ps_invocations++;
+  task-ps_invocations += 1 * variant-ps_inv_multiplier;
 
   /* run shader on 4x4 block */
   BEGIN_JIT_CALL(state, task);
diff --git a/src/gallium/drivers/llvmpipe/lp_setup_line.c 
b/src/gallium/drivers/llvmpipe/lp_setup_line.c
index a25a6b0..e1686ea 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_line.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_line.c
@@ -600,7 +600,8 @@ try_setup_line( struct lp_setup_context *setup,
 
LP_COUNT(nr_tris);
 
-   if (lp_context-active_statistics_queries) {
+   if (lp_context-active_statistics_queries 
+   !llvmpipe_rasterization_disabled(lp_context)) {
   lp_context-pipeline_statistics.c_primitives++;
}
 
diff --git a/src/gallium/drivers/llvmpipe/lp_setup_point.c 
b/src/gallium/drivers/llvmpipe/lp_setup_point.c
index cbcc8d4..45068ec 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_point.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_point.c
@@ -384,7 +384,8 @@ try_setup_point( struct lp_setup_context *setup,
 
LP_COUNT(nr_tris);
 
-   if (lp_context-active_statistics_queries) {
+   if (lp_context-active_statistics_queries 
+   !llvmpipe_rasterization_disabled(lp_context)) {
   lp_context-pipeline_statistics.c_primitives++;
}
 
diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c 
b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
index 579f351..23bc6e2 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c
@@ -340,7 +340,8 @@ do_triangle_ccw(struct lp_setup_context *setup,
 
LP_COUNT(nr_tris);
 
-   if (lp_context-active_statistics_queries) {
+   if (lp_context-active_statistics_queries 
+   !llvmpipe_rasterization_disabled(lp_context)) {
   lp_context-pipeline_statistics.c_primitives++;
}
 
diff --git a/src/gallium/drivers/llvmpipe/lp_setup_vbuf.c 
b/src/gallium/drivers/llvmpipe/lp_setup_vbuf.c
index 8173994..bf9f7e7 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_vbuf.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_vbuf.c
@@ -565,8 +565,13 @@ lp_setup_pipeline_statistics(
   stats-gs_invocations;
llvmpipe-pipeline_statistics.gs_primitives +=
   stats-gs_primitives;
-   llvmpipe-pipeline_statistics.c_invocations +=
-  stats-c_invocations;
+   if (!llvmpipe_rasterization_disabled(llvmpipe)) {
+  llvmpipe-pipeline_statistics.c_invocations +=
+ stats-c_invocations;
+   } else {
+  llvmpipe-pipeline_statistics.c_invocations = 0;
+   }
+   
 }
 
 /**
diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c 
b/src/gallium/drivers/llvmpipe/lp_state_fs.c
index

Re: [Mesa-dev] [PATCH] gallivm: already pass coords in the right place in the sampler interface

2013-08-14 Thread Zack Rusin

I have to admit that I don't know the sampling code, but the patches look good 
to me.

z

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 This makes things a bit nicer, and more importantly it fixes an issue
 where a downgraded array texture (due to view reduced to 1 layer and
 addressed with (non-array) samplec instruction) would use the wrong
 coord as shadow reference value. (This could also be fixed by passing
 target through the sampler interface much the same way as is done for
 size queries, might do this eventually anyway.)
 And if we'd ever want to support (shadow) cube map arrays, we'd need
 5 coords in any case.
 
 v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex
 using wrong shadow coord for 2d...). Plus need to project the shadow
 coord, and just for fun keep projecting the layer coord too.
 ---
  src/gallium/auxiliary/gallivm/lp_bld_sample.h |2 +
  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |   28 +---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c   |  159
  +++--
  3 files changed, 90 insertions(+), 99 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 index c25d171..6d8fe88 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 @@ -335,7 +335,9 @@ texture_dims(enum pipe_texture_target tex)
 case PIPE_TEXTURE_2D_ARRAY:
 case PIPE_TEXTURE_RECT:
 case PIPE_TEXTURE_CUBE:
 +  return 2;
 case PIPE_TEXTURE_CUBE_ARRAY:
 +  assert(0);
return 2;
 case PIPE_TEXTURE_3D:
return 3;
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 index 07ed48e..c312922 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 @@ -1574,7 +1574,7 @@ lp_build_sample_soa(struct gallivm_state *gallivm,
 unsigned target = static_texture_state-target;
 unsigned dims = texture_dims(target);
 unsigned num_quads = type.length / 4;
 -   unsigned mip_filter;
 +   unsigned mip_filter, i;
 struct lp_build_sample_context bld;
 struct lp_static_sampler_state derived_sampler_state =
 *static_sampler_state;
 LLVMTypeRef i32t = LLVMInt32TypeInContext(gallivm-context);
 @@ -1726,30 +1726,8 @@ lp_build_sample_soa(struct gallivm_state *gallivm,
}
 }
  
 -   /*
 -* always use the same coords for layer, shadow cmp, should probably
 -* put that into gallivm sampler interface I get real tired shuffling
 -* coordinates.
 -*/
 -   newcoords[0] = coords[0]; /* 1st coord */
 -   newcoords[1] = coords[1]; /* 2nd coord */
 -   newcoords[2] = coords[2]; /* 3rd coord (for cube, 3d and layer) */
 -   newcoords[3] = coords[3]; /* 4th coord (intended for cube array layer) */
 -   newcoords[4] = coords[2]; /* shadow cmp coord */
 -   if (target == PIPE_TEXTURE_1D_ARRAY) {
 -  newcoords[2] = coords[1]; /* layer coord */
 -  /* FIXME: shadow cmp coord can be wrong if we don't take target from
 shader decl. */
 -   }
 -   else if (target == PIPE_TEXTURE_2D_ARRAY) {
 -  newcoords[2] = coords[2];
 -  newcoords[4] = coords[3];
 -   }
 -   else if (target == PIPE_TEXTURE_CUBE) {
 -  newcoords[4] = coords[3];
 -   }
 -   else if (target == PIPE_TEXTURE_CUBE_ARRAY) {
 -  assert(0); /* not handled */
 -  // layer coord is ok but shadow coord is impossible */
 +   for (i = 0; i  5; i++) {
 +  newcoords[i] = coords[i];
 }
  
 if (0) {
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 index db8e997..cab53df 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 @@ -1614,13 +1614,14 @@ emit_tex( struct lp_build_tgsi_soa_context *bld,
 unsigned unit;
 LLVMValueRef lod_bias, explicit_lod;
 LLVMValueRef oow = NULL;
 -   LLVMValueRef coords[4];
 +   LLVMValueRef coords[5];
 LLVMValueRef offsets[3] = { NULL };
 struct lp_derivatives derivs;
 struct lp_derivatives *deriv_ptr = NULL;
 boolean scalar_lod;
 -   unsigned num_coords, num_derivs, num_offsets;
 -   unsigned i;
 +   unsigned num_derivs, num_offsets, i;
 +   unsigned shadow_coord = 0;
 +   unsigned layer_coord = 0;
  
 if (!bld-sampler) {
_debug_printf(warning: found texture instruction but no sampler
generator supplied\n);
 @@ -1631,55 +1632,58 @@ emit_tex( struct lp_build_tgsi_soa_context *bld,
 }
  
 switch (inst-Texture.Texture) {
 -   case TGSI_TEXTURE_1D:
 -  num_coords = 1;
 -  num_offsets = 1;
 -  num_derivs = 1;
 -  break;
 case TGSI_TEXTURE_1D_ARRAY:
 -  num_coords = 2;
 +  layer_coord = 1;
 +  /* fallthrough */
 +   case TGSI_TEXTURE_1D:
num_offsets = 1;
num_derivs = 1;
break;
 +   case

Re: [Mesa-dev] [PATCH] gallivm: do per-sample depth comparison instead of doing it post-filter

2013-08-14 Thread Zack Rusin

  -  lp_build_sample_compare(bld, newcoords[4], texel_out);
 +  if (0)
 + lp_build_sample_compare(bld, newcoords[4], texel_out);
 }

What does this do? 
The rest looks good to me!

Reviewed-by: Zack Rusin za...@vmware.com 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] llvmpipe: fix pipeline statistics with a null ps

2013-08-13 Thread Zack Rusin

If the fragment shader is null then pixel shader invocations have
to be equal to zero. And if we're running a null ps then clipper
invocations and primitives should be equal to zero but only
if both stancil and depth testing are disabled.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_query.c |   30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_query.c 
b/src/gallium/drivers/llvmpipe/lp_query.c
index cea2d07..fb24c36 100644
--- a/src/gallium/drivers/llvmpipe/lp_query.c
+++ b/src/gallium/drivers/llvmpipe/lp_query.c
@@ -32,6 +32,7 @@
 
 #include draw/draw_context.h
 #include pipe/p_defines.h
+#include tgsi/tgsi_scan.h
 #include util/u_memory.h
 #include os/os_time.h
 #include lp_context.h
@@ -95,6 +96,7 @@ llvmpipe_get_query_result(struct pipe_context *pipe,
   union pipe_query_result *vresult)
 {
struct llvmpipe_screen *screen = llvmpipe_screen(pipe-screen);
+   struct llvmpipe_context *llvmpipe = llvmpipe_context(pipe);
unsigned num_threads = MAX2(1, screen-num_threads);
struct llvmpipe_query *pq = llvmpipe_query(q);
uint64_t *result = (uint64_t *)vresult;
@@ -166,11 +168,31 @@ llvmpipe_get_query_result(struct pipe_context *pipe,
case PIPE_QUERY_PIPELINE_STATISTICS: {
   struct pipe_query_data_pipeline_statistics *stats =
  (struct pipe_query_data_pipeline_statistics *)vresult;
-  /* only ps_invocations come from binned query */
-  for (i = 0; i  num_threads; i++) {
- pq-stats.ps_invocations += pq-end[i];
+  /* If we're running on what's considrered a null fragment
+   * shader, i.e. fragment shader consisting of a single
+   * END opcode or if the fragment shader is null then
+   * the number of ps_invocations should be zero */
+  if (llvmpipe-fs  llvmpipe-fs-info.base.num_tokens  1) {
+ /* only ps_invocations come from binned query */
+ for (i = 0; i  num_threads; i++) {
+pq-stats.ps_invocations += pq-end[i];
+ }
+ pq-stats.ps_invocations *=
+LP_RASTER_BLOCK_SIZE * LP_RASTER_BLOCK_SIZE;
+  } else {
+ /* 
+  * Clipper primitives and invocations are equal to zero
+  * if we're running a null fragment shader but only
+  * if both stencil and depth testing are disabled.
+  */
+ if (!llvmpipe-depth_stencil-depth.enabled 
+ !llvmpipe-depth_stencil-stencil[0].enabled 
+ !llvmpipe-depth_stencil-stencil[1].enabled) {
+pq-stats.c_primitives = 0;
+pq-stats.c_invocations = 0;
+ }
+ pq-stats.ps_invocations = 0;
   }
-  pq-stats.ps_invocations *= LP_RASTER_BLOCK_SIZE * LP_RASTER_BLOCK_SIZE;
   *stats = pq-stats;
}
   break;
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: simplify geometry shader mask handling a bit

2013-08-12 Thread Zack Rusin

 From: Roland Scheidegger srol...@vmware.com
 
 Instead of reducing masks to 0/1 simply use the mask directly as -1.
 Also use some signed comparison instead of unsigned (as far as I understand
 these values have to be (very) small and signed means llvm doesn't have to
 apply additional logic to do the unsigned comparisons the cpu can't do).
 Saves a couple of instructions in some test geometry shader here.
 
 v2: that was a bit to much optimization, don't skip combining the masks...

k, I think that one looks good. 

Reviewed-by: Zack Rusin za...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] draw: simplify prim mask construction

2013-08-12 Thread Zack Rusin

Looks good.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 The code was quite weird, the second comparison was in fact a complete no-op
 and we can also do the comparison with the vector directly instead of scalar,
 which should not also be faster but it is way more obvious how that mask
 is actually going to look like.
 (Not sure how many instructions that saves as it turned out the mask wasn't
 used in the test geometry shader I used at all after all...)
 ---
  src/gallium/auxiliary/draw/draw_llvm.c |   32
  ++--
  1 file changed, 10 insertions(+), 22 deletions(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_llvm.c
 b/src/gallium/auxiliary/draw/draw_llvm.c
 index 68f6369..84e3392 100644
 --- a/src/gallium/auxiliary/draw/draw_llvm.c
 +++ b/src/gallium/auxiliary/draw/draw_llvm.c
 @@ -2040,31 +2040,19 @@ generate_mask_value(struct draw_gs_llvm_variant
 *variant,
  {
 struct gallivm_state *gallivm = variant-gallivm;
 LLVMBuilderRef builder = gallivm-builder;
 -   LLVMValueRef bits[16];
 -   struct lp_type  mask_type = lp_int_type(gs_type);
 -   struct lp_type mask_elem_type = lp_elem_type(mask_type);
 -   LLVMValueRef mask_val = lp_build_const_vec(gallivm,
 -  mask_type,
 -  0);
 +   struct lp_type mask_type = lp_int_type(gs_type);
 +   LLVMValueRef num_prims;
 +   LLVMValueRef mask_val = lp_build_const_vec(gallivm, mask_type, 0);
 unsigned i;
  
 -   assert(gs_type.length = Elements(bits));
 -
 -   for (i = gs_type.length; i = 1; --i) {
 -  int idx = i - 1;
 -  LLVMValueRef ind = lp_build_const_int32(gallivm, i);
 -  bits[idx] = lp_build_compare(gallivm,
 -   mask_elem_type, PIPE_FUNC_GEQUAL,
 -   variant-num_prims, ind);
 -   }
 -   for (i = 0; i  gs_type.length; ++i) {
 -  LLVMValueRef ind = lp_build_const_int32(gallivm, i);
 -  mask_val = LLVMBuildInsertElement(builder, mask_val, bits[i], ind,
 );
 +   num_prims = lp_build_broadcast(gallivm, lp_build_vec_type(gallivm,
 mask_type),
 +  variant-num_prims);
 +   for (i = 0; i = gs_type.length; i++) {
 +  LLVMValueRef idx = lp_build_const_int32(gallivm, i);
 +  mask_val = LLVMBuildInsertElement(builder, mask_val, idx, idx, );
 }
 -   mask_val = lp_build_compare(gallivm,
 -   mask_type, PIPE_FUNC_NOTEQUAL,
 -   mask_val,
 -   lp_build_const_int_vec(gallivm, mask_type,
 0));
 +   mask_val = lp_build_compare(gallivm, mask_type,
 +   PIPE_FUNC_GREATER, num_prims, mask_val);
  
 return mask_val;
  }
 --
 1.7.9.5
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: fix exec_mask interaction with geometry shader after end of main

2013-08-12 Thread Zack Rusin

Ah, that looks like a great catch.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 Because we must maintain an exec_mask even if there's currently nothing
 on the mask stack, we can still have an exec_mask at the end of the program.
 Effectively, this mask should be set back to default when returning from
 main.
 Without relying on END/RET opcode (I think it's valid to have neither) it is
 actually difficult to do this, as there doesn't seem any reasonable place to
 do it, so instead let's just say the exec_mask is invalid outside main (which
 it really is effectively).
 The problem is that geometry shader called end_primitive outside the shader
 (in the epilogue), and as a result used a bogus mask, leading to bugs if we
 had to set the (somewhat misnamed) ret_in_main bit anywhere. So just avoid
 the mask combining function when called from outside the shader.
 ---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi.c |2 +-
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c |   28
  +++
  2 files changed, 14 insertions(+), 16 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
 b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
 index 495940c..5a9e8d0 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c
 @@ -466,7 +466,7 @@ lp_build_tgsi_llvm(
  
 while (bld_base-pc != -1) {
struct tgsi_full_instruction *instr = bld_base-instructions +
 - bld_base-pc;
 +   bld_base-pc;
const struct tgsi_opcode_info *opcode_info =
   tgsi_get_opcode_info(instr-Instruction.Opcode);
if (!lp_build_tgsi_inst_llvm(bld_base, instr)) {
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 index 589ea4f..db8e997 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
 @@ -2691,11 +2691,21 @@ end_primitive_masked(struct lp_build_tgsi_context *
 bld_base,
 LLVMBuilderRef builder = bld-bld_base.base.gallivm-builder;
  
 if (bld-gs_iface-end_primitive) {
 +  struct lp_build_context *uint_bld = bld_base-uint_bld;
LLVMValueRef emitted_vertices_vec =
   LLVMBuildLoad(builder, bld-emitted_vertices_vec_ptr, );
LLVMValueRef emitted_prims_vec =
   LLVMBuildLoad(builder, bld-emitted_prims_vec_ptr, );
  
 +  LLVMValueRef emitted_mask = lp_build_cmp(uint_bld, PIPE_FUNC_NOTEQUAL,
 +   emitted_vertices_vec,
 +   uint_bld-zero);
 +  /* We need to combine the current execution mask with the mask
 + telling us which, if any, execution slots actually have
 + unemitted primitives, this way we make sure that end_primitives
 + executes only on the paths that have unflushed vertices */
 +  mask = LLVMBuildAnd(builder, mask, emitted_mask, );
 +
bld-gs_iface-end_primitive(bld-gs_iface, bld-bld_base,
 emitted_vertices_vec,
 emitted_prims_vec);
 @@ -2735,20 +2745,7 @@ end_primitive(
 struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base);
  
 if (bld-gs_iface-end_primitive) {
 -  LLVMBuilderRef builder = bld_base-base.gallivm-builder;
LLVMValueRef mask = mask_vec(bld_base);
 -  struct lp_build_context *uint_bld = bld_base-uint_bld;
 -  LLVMValueRef emitted_verts = LLVMBuildLoad(
 - builder, bld-emitted_vertices_vec_ptr, );
 -  LLVMValueRef emitted_mask = lp_build_cmp(uint_bld, PIPE_FUNC_NOTEQUAL,
 -   emitted_verts,
 -   uint_bld-zero);
 -  /* We need to combine the current execution mask with the mask
 - telling us which, if any, execution slots actually have
 - unemitted primitives, this way we make sure that end_primitives
 - executes only on the paths that have unflushed vertices */
 -  mask = LLVMBuildAnd(builder, mask, emitted_mask, );
 -
end_primitive_masked(bld_base, mask);
 }
  }
 @@ -3148,8 +3145,9 @@ static void emit_epilogue(struct lp_build_tgsi_context
 * bld_base)
LLVMValueRef total_emitted_vertices_vec;
LLVMValueRef emitted_prims_vec;
/* implicit end_primitives, needed in case there are any unflushed
 - vertices in the cache */
 -  end_primitive(NULL, bld_base, NULL);
 + vertices in the cache. Note must not call end_primitive here
 + since the exec_mask is not valid at this point. */
 +  end_primitive_masked(bld_base, lp_build_mask_value(bld-mask));

total_emitted_vertices_vec =
   LLVMBuildLoad(builder, bld

Re: [Mesa-dev] [PATCH 3/3] gallivm: implement new float comparison instructions returning integer masks

2013-08-12 Thread Zack Rusin

Nice. The entire series looks good.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the
 select.
 And just for consistency use the same appropriate ordered/unordered
 comparisons
 for the old opcodes as well.
 ---
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c |   81
  +++-
  1 file changed, 79 insertions(+), 2 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
 b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
 index f461661..86c3249 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c
 @@ -1094,6 +1094,70 @@ f2u_emit_cpu(
  emit_data-args[0]);
  }
  
 +/* TGSI_OPCODE_FSET Helper (CPU Only) */
 +static void
 +fset_emit_cpu(
 +   const struct lp_build_tgsi_action * action,
 +   struct lp_build_tgsi_context * bld_base,
 +   struct lp_build_emit_data * emit_data,
 +   unsigned pipe_func)
 +{
 +   LLVMValueRef cond;
 +
 +   if (pipe_func != PIPE_FUNC_NOTEQUAL) {
 +  cond = lp_build_cmp_ordered(bld_base-base, pipe_func,
 +  emit_data-args[0], emit_data-args[1]);
 +   }
 +   else {
 +  cond = lp_build_cmp(bld_base-base, pipe_func,
 +  emit_data-args[0], emit_data-args[1]);
 +
 +   }
 +   emit_data-output[emit_data-chan] = cond;
 +}
 +
 +
 +/* TGSI_OPCODE_FSEQ (CPU Only) */
 +static void
 +fseq_emit_cpu(
 +   const struct lp_build_tgsi_action * action,
 +   struct lp_build_tgsi_context * bld_base,
 +   struct lp_build_emit_data * emit_data)
 +{
 +   fset_emit_cpu(action, bld_base, emit_data, PIPE_FUNC_EQUAL);
 +}
 +
 +/* TGSI_OPCODE_ISGE (CPU Only) */
 +static void
 +fsge_emit_cpu(
 +   const struct lp_build_tgsi_action * action,
 +   struct lp_build_tgsi_context * bld_base,
 +   struct lp_build_emit_data * emit_data)
 +{
 +   fset_emit_cpu(action, bld_base, emit_data, PIPE_FUNC_GEQUAL);
 +}
 +
 +/* TGSI_OPCODE_ISLT (CPU Only) */
 +static void
 +fslt_emit_cpu(
 +   const struct lp_build_tgsi_action * action,
 +   struct lp_build_tgsi_context * bld_base,
 +   struct lp_build_emit_data * emit_data)
 +{
 +   fset_emit_cpu(action, bld_base, emit_data, PIPE_FUNC_LESS);
 +}
 +
 +/* TGSI_OPCODE_USNE (CPU Only) */
 +
 +static void
 +fsne_emit_cpu(
 +   const struct lp_build_tgsi_action * action,
 +   struct lp_build_tgsi_context * bld_base,
 +   struct lp_build_emit_data * emit_data)
 +{
 +   fset_emit_cpu(action, bld_base, emit_data, PIPE_FUNC_NOTEQUAL);
 +}
 +
  /* TGSI_OPCODE_FLR (CPU Only) */
  
  static void
 @@ -1396,8 +1460,17 @@ set_emit_cpu(
 struct lp_build_emit_data * emit_data,
 unsigned pipe_func)
  {
 -   LLVMValueRef cond = lp_build_cmp(bld_base-base, pipe_func,
 -emit_data-args[0], emit_data-args[1]);
 +   LLVMValueRef cond;
 +
 +   if (pipe_func != PIPE_FUNC_NOTEQUAL) {
 +  cond = lp_build_cmp_ordered(bld_base-base, pipe_func,
 +  emit_data-args[0], emit_data-args[1]);
 +   }
 +   else {
 +  cond = lp_build_cmp(bld_base-base, pipe_func,
 +  emit_data-args[0], emit_data-args[1]);
 +
 +   }
 emit_data-output[emit_data-chan] = lp_build_select(bld_base-base,
cond,
bld_base-base.one,
 @@ -1716,6 +1789,10 @@ lp_set_default_actions_cpu(
 bld_base-op_actions[TGSI_OPCODE_F2I].emit = f2i_emit_cpu;
 bld_base-op_actions[TGSI_OPCODE_F2U].emit = f2u_emit_cpu;
 bld_base-op_actions[TGSI_OPCODE_FLR].emit = flr_emit_cpu;
 +   bld_base-op_actions[TGSI_OPCODE_FSEQ].emit = fseq_emit_cpu;
 +   bld_base-op_actions[TGSI_OPCODE_FSGE].emit = fsge_emit_cpu;
 +   bld_base-op_actions[TGSI_OPCODE_FSLT].emit = fslt_emit_cpu;
 +   bld_base-op_actions[TGSI_OPCODE_FSNE].emit = fsne_emit_cpu;
  
 bld_base-op_actions[TGSI_OPCODE_I2F].emit = i2f_emit_cpu;
 bld_base-op_actions[TGSI_OPCODE_IABS].emit = iabs_emit_cpu;
 --
 1.7.9.5
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] draw: make sure that the stages setup outputs

2013-08-12 Thread Zack Rusin

Calling the prepare outputs cleans up the slot assignments
for outputs, unfortunately aapoint and aaline didn't have
code to reset their slots after the initial setup, this
was messing up our slot assignments. The unfilled stage
was just missing the initial assignment of the face slot.
This fixes all of the reported piglit failures.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_context.c   |2 +
 src/gallium/auxiliary/draw/draw_pipe.h  |5 +-
 src/gallium/auxiliary/draw/draw_pipe_aaline.c   |   27 ---
 src/gallium/auxiliary/draw/draw_pipe_aapoint.c  |   56 ++-
 src/gallium/auxiliary/draw/draw_pipe_unfilled.c |2 +
 5 files changed, 62 insertions(+), 30 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index 2d4843e..d1fac0c 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -564,6 +564,8 @@ draw_prepare_shader_outputs(struct draw_context *draw)
draw_remove_extra_vertex_attribs(draw);
draw_prim_assembler_prepare_outputs(draw-ia);
draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled);
+   draw_aapoint_prepare_outputs(draw, draw-pipeline.aapoint);
+   draw_aaline_prepare_outputs(draw, draw-pipeline.aaline);
 }
 
 /**
diff --git a/src/gallium/auxiliary/draw/draw_pipe.h 
b/src/gallium/auxiliary/draw/draw_pipe.h
index 7c9ed6c..ad3165f 100644
--- a/src/gallium/auxiliary/draw/draw_pipe.h
+++ b/src/gallium/auxiliary/draw/draw_pipe.h
@@ -101,7 +101,10 @@ void draw_pipe_passthrough_tri(struct draw_stage *stage, 
struct prim_header *hea
 void draw_pipe_passthrough_line(struct draw_stage *stage, struct prim_header 
*header);
 void draw_pipe_passthrough_point(struct draw_stage *stage, struct prim_header 
*header);
 
-
+void draw_aapoint_prepare_outputs(struct draw_context *context,
+  struct draw_stage *stage);
+void draw_aaline_prepare_outputs(struct draw_context *context,
+ struct draw_stage *stage);
 void draw_unfilled_prepare_outputs(struct draw_context *context,
struct draw_stage *stage);
 
diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c 
b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
index aa88459..c44c236 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c
@@ -692,13 +692,7 @@ aaline_first_line(struct draw_stage *stage, struct 
prim_header *header)
   return;
}
 
-   /* update vertex attrib info */
-   aaline-pos_slot = draw_current_shader_position_output(draw);;
-
-   /* allocate the extra post-transformed vertex attribute */
-   aaline-tex_slot = draw_alloc_extra_vertex_attrib(draw,
- TGSI_SEMANTIC_GENERIC,
- 
aaline-fs-generic_attrib);
+   draw_aaline_prepare_outputs(draw, draw-pipeline.aaline);
 
/* how many samplers? */
/* we'll use sampler/texture[pstip-sampler_unit] for the stipple */
@@ -953,6 +947,25 @@ aaline_set_sampler_views(struct pipe_context *pipe,
 }
 
 
+void
+draw_aaline_prepare_outputs(struct draw_context *draw,
+struct draw_stage *stage)
+{
+   struct aaline_stage *aaline = aaline_stage(stage);
+   const struct pipe_rasterizer_state *rast = draw-rasterizer;
+
+   /* update vertex attrib info */
+   aaline-pos_slot = draw_current_shader_position_output(draw);;
+
+   if (!rast-line_smooth)
+  return;
+
+   /* allocate the extra post-transformed vertex attribute */
+   aaline-tex_slot = draw_alloc_extra_vertex_attrib(draw,
+ TGSI_SEMANTIC_GENERIC,
+ 
aaline-fs-generic_attrib);
+}
+
 /**
  * Called by drivers that want to install this AA line prim stage
  * into the draw module's pipeline.  This will not be used if the
diff --git a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c 
b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c
index 0d7b88e..7ae1ddd 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c
@@ -696,28 +696,7 @@ aapoint_first_point(struct draw_stage *stage, struct 
prim_header *header)
 */
bind_aapoint_fragment_shader(aapoint);
 
-   /* update vertex attrib info */
-   aapoint-pos_slot = draw_current_shader_position_output(draw);
-
-   /* allocate the extra post-transformed vertex attribute */
-   aapoint-tex_slot = draw_alloc_extra_vertex_attrib(draw,
-  TGSI_SEMANTIC_GENERIC,
-  
aapoint-fs-generic_attrib);
-   assert(aapoint-tex_slot  0); /* output[0] is vertex pos */
-
-   /* find psize slot in post-transform vertex */
-   aapoint-psize_slot = -1;
-   if (draw-rasterizer

Re: [Mesa-dev] [RFC]: gallium: add new float comparison opcodes returning integer booleans

2013-08-09 Thread Zack Rusin

- Original Message -
 This is a proposal for new comparison instructions, as the old ones
 don't really fit modern (graphic or opencl I guess for that matter)
 languages well.
 If you've got objections, think the naming is crazy or whatnot I'm open
 for suggestions :-). I would think this is not just a much better fit
 for d3d10/glsl but for hw as well.

Yea, that makes sense to me. Comparison instructions should return consistent 
results across types. I'd just add a line or so to the docs to make it explicit 
how they're different from the old opcodes, I expect that for people new to 
gallium it's going to be easy to miss.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] draw: cleanup the extra attribs

2013-08-08 Thread Zack Rusin

Before inserting new front face and prim id outputs cleanup
the old extra outputs, otherwise our cache will use previous
output slots which will break as soon as outputs of the current
shader don't match the last.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_context.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index af9caee..2dc6772 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -555,6 +555,7 @@ draw_get_shader_info(const struct draw_context *draw)
 void
 draw_prepare_shader_outputs(struct draw_context *draw)
 {
+   draw_remove_extra_vertex_attribs(draw);
draw_ia_prepare_outputs(draw, draw-pipeline.ia);
draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled);
 }
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] draw: reset the vertex id when injecting new primitive id

2013-08-08 Thread Zack Rusin

Without reseting the vertex id, with primitives where the same
vertex is used with different primitives (e.g. tri/lines strips)
our vbuf module won't re-emit those vertices with the changed
primitive id. So lets reset the vertex id whenever injecting
new primitive id to make sure that the vertex data is correctly
emitted.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_pipe_ia.c |9 +
 1 file changed, 9 insertions(+)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_ia.c 
b/src/gallium/auxiliary/draw/draw_pipe_ia.c
index ecbb233..d64f19b 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_ia.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_ia.c
@@ -68,6 +68,15 @@ inject_primid(struct draw_stage *stage,
 
for (i = 0; i  num_verts; ++i) {
   struct vertex_header *v = header-v[i];
+  /* We have to reset the vertex_id because it's used by
+   * vbuf to figure out if the vertex had already been
+   * emitted. For line/tri strips the first vertex of
+   * subsequent primitives would already be emitted,
+   * but since we're changing the primitive id on the vertex
+   * we want to make sure it's reemitted with the correct
+   * data.
+   */
+  v-vertex_id = UNDEFINED_VERTEX_ID;
   memcpy(v-data[slot][0], primid, sizeof(primid));
   memcpy(v-data[slot][1], primid, sizeof(primid));
   memcpy(v-data[slot][2], primid, sizeof(primid));
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] draw: rewrite primitive assembler

2013-08-08 Thread Zack Rusin

We can't be injecting the primitive id's in the pipeline because
by that time the primitives have already been decomposed. To
properly number the primitives we need to handle the adjacency
primitives by hand. This patch moves the prim id injection into
the original primitive assembler and completely removes the
useless pipeline stage.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/Makefile.sources   |1 -
 src/gallium/auxiliary/draw/draw_context.c|8 +-
 src/gallium/auxiliary/draw/draw_pipe.c   |4 -
 src/gallium/auxiliary/draw/draw_pipe.h   |7 -
 src/gallium/auxiliary/draw/draw_pipe_ia.c|  259 --
 src/gallium/auxiliary/draw/draw_pipe_validate.c  |   14 --
 src/gallium/auxiliary/draw/draw_prim_assembler.c |  168 +-
 src/gallium/auxiliary/draw/draw_prim_assembler.h |   12 +
 src/gallium/auxiliary/draw/draw_private.h|4 +-
 9 files changed, 180 insertions(+), 297 deletions(-)
 delete mode 100644 src/gallium/auxiliary/draw/draw_pipe_ia.c

diff --git a/src/gallium/auxiliary/Makefile.sources 
b/src/gallium/auxiliary/Makefile.sources
index b0172de..acbcef7 100644
--- a/src/gallium/auxiliary/Makefile.sources
+++ b/src/gallium/auxiliary/Makefile.sources
@@ -13,7 +13,6 @@ C_SOURCES := \
draw/draw_pipe_clip.c \
draw/draw_pipe_cull.c \
draw/draw_pipe_flatshade.c \
-draw/draw_pipe_ia.c \
draw/draw_pipe_offset.c \
draw/draw_pipe_pstipple.c \
draw/draw_pipe_stipple.c \
diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index 2dc6772..2d4843e 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -40,6 +40,7 @@
 #include util/u_prim.h
 #include draw_context.h
 #include draw_pipe.h
+#include draw_prim_assembler.h
 #include draw_vs.h
 #include draw_gs.h
 
@@ -95,6 +96,10 @@ draw_create_context(struct pipe_context *pipe, boolean 
try_llvm)
if (!draw_init(draw))
   goto err_destroy;
 
+   draw-ia = draw_prim_assembler_create(draw);
+   if (!draw-ia)
+  goto err_destroy;
+
return draw;
 
 err_destroy:
@@ -206,6 +211,7 @@ void draw_destroy( struct draw_context *draw )
   draw-render-destroy( draw-render );
*/
 
+   draw_prim_assembler_destroy(draw-ia);
draw_pipeline_destroy( draw );
draw_pt_destroy( draw );
draw_vs_destroy( draw );
@@ -556,7 +562,7 @@ void
 draw_prepare_shader_outputs(struct draw_context *draw)
 {
draw_remove_extra_vertex_attribs(draw);
-   draw_ia_prepare_outputs(draw, draw-pipeline.ia);
+   draw_prim_assembler_prepare_outputs(draw-ia);
draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled);
 }
 
diff --git a/src/gallium/auxiliary/draw/draw_pipe.c 
b/src/gallium/auxiliary/draw/draw_pipe.c
index 8140299..f1ee6cb 100644
--- a/src/gallium/auxiliary/draw/draw_pipe.c
+++ b/src/gallium/auxiliary/draw/draw_pipe.c
@@ -49,7 +49,6 @@ boolean draw_pipeline_init( struct draw_context *draw )
draw-pipeline.clip  = draw_clip_stage( draw );
draw-pipeline.flatshade = draw_flatshade_stage( draw );
draw-pipeline.cull  = draw_cull_stage( draw );
-   draw-pipeline.ia= draw_ia_stage( draw );
draw-pipeline.validate  = draw_validate_stage( draw );
draw-pipeline.first = draw-pipeline.validate;
 
@@ -62,7 +61,6 @@ boolean draw_pipeline_init( struct draw_context *draw )
!draw-pipeline.clip ||
!draw-pipeline.flatshade ||
!draw-pipeline.cull ||
-   !draw-pipeline.ia ||
!draw-pipeline.validate)
   return FALSE;
 
@@ -97,8 +95,6 @@ void draw_pipeline_destroy( struct draw_context *draw )
   draw-pipeline.flatshade-destroy( draw-pipeline.flatshade );
if (draw-pipeline.cull)
   draw-pipeline.cull-destroy( draw-pipeline.cull );
-   if (draw-pipeline.ia)
-  draw-pipeline.ia-destroy( draw-pipeline.ia );
if (draw-pipeline.validate)
   draw-pipeline.validate-destroy( draw-pipeline.validate );
if (draw-pipeline.aaline)
diff --git a/src/gallium/auxiliary/draw/draw_pipe.h 
b/src/gallium/auxiliary/draw/draw_pipe.h
index 70822a4..7c9ed6c 100644
--- a/src/gallium/auxiliary/draw/draw_pipe.h
+++ b/src/gallium/auxiliary/draw/draw_pipe.h
@@ -91,10 +91,6 @@ extern struct draw_stage *draw_stipple_stage( struct 
draw_context *context );
 extern struct draw_stage *draw_wide_line_stage( struct draw_context *context );
 extern struct draw_stage *draw_wide_point_stage( struct draw_context *context 
);
 extern struct draw_stage *draw_validate_stage( struct draw_context *context );
-extern struct draw_stage *draw_ia_stage(struct draw_context *context);
-
-boolean draw_ia_stage_required(const struct draw_context *context,
-   unsigned prim);
 
 extern void draw_free_temp_verts( struct draw_stage *stage );
 extern boolean draw_alloc_temp_verts( struct draw_stage *stage, unsigned nr );
@@ -108,9 +104,6 @@ void

Re: [Mesa-dev] [PATCH 2/3] draw: reset the vertex id when injecting new primitive id

2013-08-08 Thread Zack Rusin

Don't worry about this one too much. The next patch removes draw_pipe_ia.c 
anyway...

- Original Message -
 Without reseting the vertex id, with primitives where the same
 vertex is used with different primitives (e.g. tri/lines strips)
 our vbuf module won't re-emit those vertices with the changed
 primitive id. So lets reset the vertex id whenever injecting
 new primitive id to make sure that the vertex data is correctly
 emitted.
 
 Signed-off-by: Zack Rusin za...@vmware.com
 ---
  src/gallium/auxiliary/draw/draw_pipe_ia.c |9 +
  1 file changed, 9 insertions(+)
 
 diff --git a/src/gallium/auxiliary/draw/draw_pipe_ia.c
 b/src/gallium/auxiliary/draw/draw_pipe_ia.c
 index ecbb233..d64f19b 100644
 --- a/src/gallium/auxiliary/draw/draw_pipe_ia.c
 +++ b/src/gallium/auxiliary/draw/draw_pipe_ia.c
 @@ -68,6 +68,15 @@ inject_primid(struct draw_stage *stage,
  
 for (i = 0; i  num_verts; ++i) {
struct vertex_header *v = header-v[i];
 +  /* We have to reset the vertex_id because it's used by
 +   * vbuf to figure out if the vertex had already been
 +   * emitted. For line/tri strips the first vertex of
 +   * subsequent primitives would already be emitted,
 +   * but since we're changing the primitive id on the vertex
 +   * we want to make sure it's reemitted with the correct
 +   * data.
 +   */
 +  v-vertex_id = UNDEFINED_VERTEX_ID;
memcpy(v-data[slot][0], primid, sizeof(primid));
memcpy(v-data[slot][1], primid, sizeof(primid));
memcpy(v-data[slot][2], primid, sizeof(primid));
 --
 1.7.10.4
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] gallivm: use texture target from shader instead of static state for size query

2013-08-08 Thread Zack Rusin

Series looks good to me.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 d3d10 has no notion of distinct array resources neither at the resource nor
 sampler view level. However, shader dcl of resources certainly has, and
 d3d10 expects resinfo to return the values according to that - in particular
 a resource might have been a 1d texture with some array layers, then the
 sampler view might have only used 1 layer so it can be accessed both as 1d
 or 1d array texture (I think - the former definitely works). resinfo of a
 resource decleared as array needs to return number of array layers but
 non-array resource needs to return 0 (and not 1). Hence fix this by passing
 the target from the shader decl to emit_size_query and use that (in case of
 OpenGL the target will come from the instruction itself).
 Could probably do the same for actual sampling, though it may not matter
 there
 (as the bogus components will essentially get clamped away), possibly could
 wreak havoc though if it REALLY doesn't match (which is of course an error
 but still).
 ---
  src/gallium/auxiliary/draw/draw_llvm_sample.c |2 +
  src/gallium/auxiliary/gallivm/lp_bld_sample.h |1 +
  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |   32 ++-
  src/gallium/auxiliary/gallivm/lp_bld_tgsi.h   |1 +
  src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c   |   43
  -
  src/gallium/drivers/llvmpipe/lp_tex_sample.c  |2 +
  6 files changed, 77 insertions(+), 4 deletions(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_llvm_sample.c
 b/src/gallium/auxiliary/draw/draw_llvm_sample.c
 index 3016d7c..f10cba3 100644
 --- a/src/gallium/auxiliary/draw/draw_llvm_sample.c
 +++ b/src/gallium/auxiliary/draw/draw_llvm_sample.c
 @@ -270,6 +270,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct
 lp_build_sampler_soa *base,
struct gallivm_state *gallivm,
struct lp_type type,
unsigned texture_unit,
 +  unsigned target,
boolean need_nr_mips,
boolean scalar_lod,
LLVMValueRef explicit_lod, /* optional
*/
 @@ -284,6 +285,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct
 lp_build_sampler_soa *base,
 sampler-dynamic_state.base,
 type,
 texture_unit,
 +   target,
 need_nr_mips,
 scalar_lod,
 explicit_lod,
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 index dff8be2..db3ea1d 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 @@ -497,6 +497,7 @@ lp_build_size_query_soa(struct gallivm_state *gallivm,
  struct lp_sampler_dynamic_state *dynamic_state,
  struct lp_type int_type,
  unsigned texture_unit,
 +unsigned target,
  boolean need_nr_mips,
  boolean scalar_lod,
  LLVMValueRef explicit_lod,
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 index b0bb58b..e403ac8 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 @@ -1943,6 +1943,7 @@ lp_build_size_query_soa(struct gallivm_state *gallivm,
  struct lp_sampler_dynamic_state *dynamic_state,
  struct lp_type int_type,
  unsigned texture_unit,
 +unsigned target,
  boolean need_nr_mips,
  boolean scalar_lod,
  LLVMValueRef explicit_lod,
 @@ -1955,9 +1956,36 @@ lp_build_size_query_soa(struct gallivm_state *gallivm,
 unsigned num_lods = 1;
 struct lp_build_context bld_int_vec;
  
 -   dims = texture_dims(static_state-target);
 +   /*
 +* Do some sanity verification about bound texture and shader dcl target.
 +* Not entirely sure what's possible but assume array/non-array
 +* always compatible (probably not ok for OpenGL but d3d10 has no
 +* distinction of arrays at the resource level).
 +* Everything else looks bogus (though not entirely sure about rect/2d).
 +* Currently disabled because it causes assertion failures if there's
 +* nothing bound (or rather a dummy texture, not that this case would
 +* return the right values).
 +*/
 +   if (0  static_state

Re: [Mesa-dev] [PATCH] gallivm: set non-existing values really to zero in size queries for d3d10

2013-08-08 Thread Zack Rusin

Looks good.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 My previous attempt at doing so double-failed miserably (minification of
 zero still gives one, and even if it would not the value was never written
 anyway).
 While here also rename the confusingly named int_vec bld as we have int vecs
 of different sizes, and rename need_nr_mips (as this also changes
 out-of-bounds
 behavior) to is_sviewinfo too.
 ---
  src/gallium/auxiliary/draw/draw_llvm_sample.c |4 +--
  src/gallium/auxiliary/gallivm/lp_bld_sample.h |2 +-
  src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |   34
  ++---
  src/gallium/drivers/llvmpipe/lp_tex_sample.c  |4 +--
  4 files changed, 22 insertions(+), 22 deletions(-)
 
 diff --git a/src/gallium/auxiliary/draw/draw_llvm_sample.c
 b/src/gallium/auxiliary/draw/draw_llvm_sample.c
 index f10cba3..97b0255 100644
 --- a/src/gallium/auxiliary/draw/draw_llvm_sample.c
 +++ b/src/gallium/auxiliary/draw/draw_llvm_sample.c
 @@ -271,7 +271,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct
 lp_build_sampler_soa *base,
struct lp_type type,
unsigned texture_unit,
unsigned target,
 -  boolean need_nr_mips,
 +  boolean is_sviewinfo,
boolean scalar_lod,
LLVMValueRef explicit_lod, /* optional
*/
LLVMValueRef *sizes_out)
 @@ -286,7 +286,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct
 lp_build_sampler_soa *base,
 type,
 texture_unit,
 target,
 -   need_nr_mips,
 +   is_sviewinfo,
 scalar_lod,
 explicit_lod,
 sizes_out);
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 index db3ea1d..75e8c59 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
 @@ -498,7 +498,7 @@ lp_build_size_query_soa(struct gallivm_state *gallivm,
  struct lp_type int_type,
  unsigned texture_unit,
  unsigned target,
 -boolean need_nr_mips,
 +boolean is_viewinfo,
  boolean scalar_lod,
  LLVMValueRef explicit_lod,
  LLVMValueRef *sizes_out);
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 index e403ac8..65d6e7b 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
 @@ -1944,7 +1944,7 @@ lp_build_size_query_soa(struct gallivm_state *gallivm,
  struct lp_type int_type,
  unsigned texture_unit,
  unsigned target,
 -boolean need_nr_mips,
 +boolean is_sviewinfo,
  boolean scalar_lod,
  LLVMValueRef explicit_lod,
  LLVMValueRef *sizes_out)
 @@ -1954,7 +1954,7 @@ lp_build_size_query_soa(struct gallivm_state *gallivm,
 int dims, i;
 boolean has_array;
 unsigned num_lods = 1;
 -   struct lp_build_context bld_int_vec;
 +   struct lp_build_context bld_int_vec4;
  
 /*
  * Do some sanity verification about bound texture and shader dcl target.
 @@ -1997,24 +1997,19 @@ lp_build_size_query_soa(struct gallivm_state
 *gallivm,
  
 assert(!int_type.floating);
  
 -   lp_build_context_init(bld_int_vec, gallivm, lp_type_int_vec(32, 128));
 +   lp_build_context_init(bld_int_vec4, gallivm, lp_type_int_vec(32, 128));
  
 if (explicit_lod) {
/* FIXME: this needs to honor per-element lod */
lod = LLVMBuildExtractElement(gallivm-builder, explicit_lod,
lp_build_const_int32(gallivm, 0), );
first_level = dynamic_state-first_level(dynamic_state, gallivm,
texture_unit);
level = LLVMBuildAdd(gallivm-builder, lod, first_level, level);
 -  lod = lp_build_broadcast_scalar(bld_int_vec, level);
 +  lod = lp_build_broadcast_scalar(bld_int_vec4, level);
 } else {
 -  lod = bld_int_vec.zero;
 +  lod = bld_int_vec4.zero;
 }
  
 -   if (need_nr_mips) {
 -  size = bld_int_vec.zero;
 -   }
 -   else {
 -  size = bld_int_vec.undef;
 -   }
 +   size = bld_int_vec4.undef;
  
 size = LLVMBuildInsertElement(gallivm-builder, size

Re: [Mesa-dev] [PATCH 3/3] draw: rewrite primitive assembler

2013-08-08 Thread Zack Rusin

  Series looks good though I'm unsure why the pipeline stage doesn't work.
 Where does that decomposition happen? Is that something like GS
 outputting multiple prims in the same topology which all need the same id?

No, it's because the pipeline stage is ran on the decomposed primitives. The 
issue is that the pipeline stage is ran after stream output and stream output 
requires decomposed primitives, meaning that by the time we get to the pipeline 
we lost the original primitive info. The d3d10 wants the primitive id's to be 
injected into vertices but in the order in which they are traversed on the 
original (striped) primitives, so we need to do it when doing the original 
decomposition where we have access to the original topology and can number the 
vertices correctly.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] draw: rewrite primitive assembler

2013-08-08 Thread Zack Rusin

 Am 09.08.2013 00:40, schrieb Zack Rusin:
Series looks good though I'm unsure why the pipeline stage doesn't work.
  Where does that decomposition happen? Is that something like GS
  outputting multiple prims in the same topology which all need the same id?
  
  No, it's because the pipeline stage is ran on the decomposed primitives.
  The issue is that the pipeline stage is ran after stream output and stream
  output requires decomposed primitives, meaning that by the time we get to
  the pipeline we lost the original primitive info. The d3d10 wants the
  primitive id's to be injected into vertices but in the order in which they
  are traversed on the original (striped) primitives, so we need to do it
  when doing the original decomposition where we have access to the original
  topology and can number the vertices correctly.
  
  z
  
 
 I see I totally forgot stream out needs decomposed primitives, and I
 guess stream out (and prim assembler) can't run as an ordinary pipeline
 stage?

I was thinking about that when I was doing it and I thought it should be 
possible to rewrite SO as a pipeline stage, but we'd need to change the 
interface to include some sort of a prepare stage and then redo the code in so. 
Once so would be in a pipeline then we could think about primitive assembler, 
but that would require also more changes to the pipeline because we want to 
know if the primitives are adjacency primitives and pipeline stages get only 
tris/lines/points... and this was the point at which I went screw it, i'm 
injecting prim ids in the primitive assembler.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] gallivm: propagate scalar_lod to emit_size_query too

2013-08-07 Thread Zack Rusin

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 Clearly the returned values need to be per-element if the lod is per element.
 Does not actually change behavior yet.

Looks good. For the entire series:
Reviewed-by: Zack Rusin za...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gallivm: honor d3d10 floating point rules for shadow comparisons

2013-08-07 Thread Zack Rusin

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 d3d10 specifies ordered comparisons for everything but not_equal which is
 unordered
 (http://msdn.microsoft.com/en-us/library/windows/desktop/cc308050.aspx).
 OpenGL probably doesn't care.

This series looks good too. For all three:
Reviewed-by: Zack Rusin za...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] draw: fix slot detection

2013-08-06 Thread Zack Rusin

Nowadays -1 for slots means that the semantic is not present, so
we need to store it in a signed variables, otherwise 0 comparisons
are pointless. Fixes
http://bugzilla.eng.vmware.com/show_bug.cgi?id=67811 (at least
with softpipe, edgeflags don't work wit llvmpipe)

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_pipe_unfilled.c |2 +-
 src/gallium/drivers/llvmpipe/lp_setup_context.h |2 +-
 src/gallium/drivers/llvmpipe/lp_setup_line.c|1 -
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c 
b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
index c6ee95c..68bab72 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
@@ -67,7 +67,7 @@ inject_front_face_info(struct draw_stage *stage,
boolean is_front_face = (
   (stage-draw-rasterizer-front_ccw  ccw) ||
   (!stage-draw-rasterizer-front_ccw  !ccw));
-   unsigned slot = unfilled-face_slot;
+   int slot = unfilled-face_slot;
unsigned i;
 
/* In case the backend doesn't care about it */
diff --git a/src/gallium/drivers/llvmpipe/lp_setup_context.h 
b/src/gallium/drivers/llvmpipe/lp_setup_context.h
index ea1d0d6..44be85f 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_context.h
+++ b/src/gallium/drivers/llvmpipe/lp_setup_context.h
@@ -106,7 +106,7 @@ struct lp_setup_context
float psize;
unsigned viewport_index_slot;
unsigned layer_slot;
-   unsigned face_slot;
+   int face_slot;
 
struct pipe_framebuffer_state fb;
struct u_rect framebuffer;
diff --git a/src/gallium/drivers/llvmpipe/lp_setup_line.c 
b/src/gallium/drivers/llvmpipe/lp_setup_line.c
index 3b16163..a25a6b0 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup_line.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup_line.c
@@ -622,7 +622,6 @@ try_setup_line( struct lp_setup_context *setup,
} else {
   line-inputs.frontfacing = TRUE;
}
-   
 
/* Setup parameter interpolants:
 */
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] util: implement table-based + linear interpolation linear-to-srgb conversion

2013-08-05 Thread Zack Rusin

Looks good to me. A small comment above the disabled version noting that it's 
disabled because it's a bit slower might be useful for the next person who 
reads the code.

Reviewed-by: Zack Rusin za...@vmware.com

- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 Should be much faster, seems to work in softpipe.
 While here (also it's now disabled) fix up the pow factor - the former value
 is what is in GL core it is however not actually accurate to fp32 standard
 (as it is 1.0/2.4), and if someone would do all the accurate math there's no
 reason to waste 8 mantissa bits or so...
 
 v2: use real table generating function instead of just printing the values
 (might take a bit longer as it does calculations on some 3+ million floats
 but much more descriptive obviously).
 Also fix up another pow factor (this time in the python code) - wondering
 where the couple one bit errors came from :-(.
 ---
  src/gallium/auxiliary/util/u_format_srgb.h  |   55
  +-
  src/gallium/auxiliary/util/u_format_srgb.py |   57
  ++-
  2 files changed, 101 insertions(+), 11 deletions(-)
 
 diff --git a/src/gallium/auxiliary/util/u_format_srgb.h
 b/src/gallium/auxiliary/util/u_format_srgb.h
 index 82ed957..f3e1b20 100644
 --- a/src/gallium/auxiliary/util/u_format_srgb.h
 +++ b/src/gallium/auxiliary/util/u_format_srgb.h
 @@ -39,6 +39,7 @@
  
  
  #include pipe/p_compiler.h
 +#include u_pack_color.h
  #include u_math.h
  
  
 @@ -51,23 +52,57 @@ util_format_srgb_to_linear_8unorm_table[256];
  extern const uint8_t
  util_format_linear_to_srgb_8unorm_table[256];
  
 +extern const unsigned
 +util_format_linear_to_srgb_helper_table[104];
 +
  
  /**
   * Convert a unclamped linear float to srgb value in the [0,255].
 - * XXX this hasn't been tested (render to srgb surface).
 - * XXX this needs optimization.
   */
  static INLINE uint8_t
  util_format_linear_float_to_srgb_8unorm(float x)
  {
 -   if (x = 1.0f)
 -  return 255;
 -   else if (x = 0.0031308f)
 -  return float_to_ubyte(1.055f * powf(x, 0.41666f) - 0.055f);
 -   else if (x  0.0f)
 -  return float_to_ubyte(12.92f * x);
 -   else
 -  return 0;
 +   if (0) {
 +  if (x = 1.0f)
 + return 255;
 +  else if (x = 0.0031308f)
 + return float_to_ubyte(1.055f * powf(x, 0.4166f) - 0.055f);
 +  else if (x  0.0f)
 + return float_to_ubyte(12.92f * x);
 +  else
 + return 0;
 +   }
 +   else {
 +  /*
 +   * This is taken from https://gist.github.com/rygorous/2203834
 +   * Use LUT and do linear interpolation.
 +   */
 +  union fi almostone, minval, f;
 +  unsigned tab, bias, scale, t;
 +
 +  almostone.ui = 0x3f7f;
 +  minval.ui = (127-13)  23;
 +
 +  /*
 +   * Clamp to [2^(-13), 1-eps]; these two values map to 0 and 1,
 respectively.
 +   * The tests are carefully written so that NaNs map to 0, same as in
 the
 +   * reference implementation.
 +   */
 +  if (!(x  minval.f))
 + x = minval.f;
 +  if (x  almostone.f)
 + x = almostone.f;
 +
 +  /* Do the table lookup and unpack bias, scale */
 +  f.f = x;
 +  tab = util_format_linear_to_srgb_helper_table[(f.ui - minval.ui) 
 20];
 +  bias = (tab  16)  9;
 +  scale = tab  0x;
 +
 +  /* Grab next-highest mantissa bits and perform linear interpolation */
 +  t = (f.ui  12)  0xff;
 +  return (uint8_t) ((bias + scale*t)  16);
 +   }
  }
  
  
 diff --git a/src/gallium/auxiliary/util/u_format_srgb.py
 b/src/gallium/auxiliary/util/u_format_srgb.py
 index cd63ae7..c6c02f0 100644
 --- a/src/gallium/auxiliary/util/u_format_srgb.py
 +++ b/src/gallium/auxiliary/util/u_format_srgb.py
 @@ -40,6 +40,7 @@ CopyRight = '''
  
  
  import math
 +import struct
  
  
  def srgb_to_linear(x):
 @@ -51,10 +52,11 @@ def srgb_to_linear(x):
  
  def linear_to_srgb(x):
  if x = 0.0031308:
 -return 1.055 * math.pow(x, 0.41666) - 0.055
 +return 1.055 * math.pow(x, 0.4166) - 0.055
  else:
  return 12.92 * x
  
 +
  def generate_srgb_tables():
  print 'const float'
  print 'util_format_srgb_8unorm_to_linear_float_table[256] = {'
 @@ -84,6 +86,59 @@ def generate_srgb_tables():
  print '};'
  print
  
 +# calculate the table interpolation values used in float linear to unorm8
 srgb
 +numexp = 13
 +mantissa_msb = 3
 +# stepshift is just used to only use every x-th float to make things faster,
 +# 5 is largest value which still gives exact same table as 0
 +stepshift = 5
 +nbuckets = numexp  mantissa_msb
 +bucketsize = (1  (23 - mantissa_msb))  stepshift
 +mantshift = 12
 +valtable = []
 +sum_aa = float(bucketsize)
 +sum_ab = 0.0
 +sum_bb = 0.0
 +for i in range(0, bucketsize):
 +j = (i  stepshift)  mantshift
 +sum_ab += j
 +sum_bb += j*j
 +inv_det = 1.0 / (sum_aa * sum_bb - sum_ab * sum_ab

[Mesa-dev] [PATCH 1/8] tgsi: detect prim id and front face usage in fs

2013-08-02 Thread Zack Rusin

Adding code to detect the usage of prim id and front face
semantics in fragment shaders.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/tgsi/tgsi_scan.c |9 +++--
 src/gallium/auxiliary/tgsi/tgsi_scan.h |1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c 
b/src/gallium/auxiliary/tgsi/tgsi_scan.c
index 1fe1a07..e7bf6e6 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c
@@ -166,9 +166,14 @@ tgsi_scan_shader(const struct tgsi_token *tokens,
   info-input_cylindrical_wrap[reg] = 
(ubyte)fulldecl-Interp.CylindricalWrap;
   info-num_inputs++;
 
-  if (procType == TGSI_PROCESSOR_FRAGMENT 
-  fulldecl-Semantic.Name == TGSI_SEMANTIC_POSITION)
+  if (procType == TGSI_PROCESSOR_FRAGMENT) {
+ if (fulldecl-Semantic.Name == TGSI_SEMANTIC_POSITION)
 info-reads_position = TRUE;
+ else if (fulldecl-Semantic.Name == TGSI_SEMANTIC_PRIMID)
+info-uses_primid = TRUE;
+ else if (fulldecl-Semantic.Name == TGSI_SEMANTIC_FACE)
+info-uses_frontface = TRUE;
+  }
}
else if (file == TGSI_FILE_SYSTEM_VALUE) {
   unsigned index = fulldecl-Range.First;
diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.h 
b/src/gallium/auxiliary/tgsi/tgsi_scan.h
index cfa2b8e..e2fa73a 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_scan.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_scan.h
@@ -74,6 +74,7 @@ struct tgsi_shader_info
boolean uses_instanceid;
boolean uses_vertexid;
boolean uses_primid;
+   boolean uses_frontface;
boolean origin_lower_left;
boolean pixel_center_integer;
boolean color0_writes_all_cbufs;
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/8] draw: stop crashing with extra shader outputs

2013-08-02 Thread Zack Rusin

Draw sometimes injects extra shader outputs (aa points, lines or
front face), unfortunately most of the pipeline and llvm code
didn't handle them at all. It only worked if number of inputs
happened to be bigger or equal to the number of shader outputs
plus the extra injected outputs. In particular when running
the pipeline which depends on the vertex_id in the vertex_header
things were completely broken. The patch adjust the code to
correctly use the total number of shader outputs (the standard
ones plus the injected ones) to make it all stop crashing and
work.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_context.c  |   43 
 src/gallium/auxiliary/draw/draw_context.h  |5 +++
 src/gallium/auxiliary/draw/draw_gs.c   |2 +-
 src/gallium/auxiliary/draw/draw_llvm.c |3 ++
 src/gallium/auxiliary/draw/draw_llvm.h |4 +-
 src/gallium/auxiliary/draw/draw_pipe.h |3 +-
 .../auxiliary/draw/draw_pt_fetch_shade_pipeline.c  |6 +--
 .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |8 +---
 src/gallium/auxiliary/draw/draw_vs_variant.c   |2 +-
 9 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index 2e95b5c..8bf3596 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -622,6 +622,49 @@ draw_num_shader_outputs(const struct draw_context *draw)
 
 
 /**
+ * Return total number of the vertex shader outputs.  This function
+ * also counts any extra vertex output attributes that may
+ * be filled in by some draw stages (such as AA point, AA line,
+ * front face).
+ */
+uint
+draw_total_vs_shader_outputs(const struct draw_context *draw)
+{
+   const struct tgsi_shader_info *info = draw-vs.vertex_shader-info;
+   uint count;
+
+   count = info-num_outputs;
+   count += draw-extra_shader_outputs.num;
+
+   return count;
+}
+
+/**
+ * Return total number of the geometry shader outputs. This function
+ * also counts any extra geometry output attributes that may
+ * be filled in by some draw stages (such as AA point, AA line, front
+ * face).
+ */
+uint
+draw_total_gs_shader_outputs(const struct draw_context *draw)
+{
+   
+   const struct tgsi_shader_info *info;
+   uint count;
+
+   if (!draw-gs.geometry_shader)
+  return 0;
+
+   info = draw-gs.geometry_shader-info;
+
+   count = info-num_outputs;
+   count += draw-extra_shader_outputs.num;
+
+   return count;
+}
+
+
+/**
  * Provide TGSI sampler objects for vertex/geometry shaders that use
  * texture fetches.  This state only needs to be set once per context.
  * This might only be used by software drivers for the time being.
diff --git a/src/gallium/auxiliary/draw/draw_context.h 
b/src/gallium/auxiliary/draw/draw_context.h
index 0815047..e9aa24d 100644
--- a/src/gallium/auxiliary/draw/draw_context.h
+++ b/src/gallium/auxiliary/draw/draw_context.h
@@ -139,6 +139,11 @@ draw_will_inject_frontface(const struct draw_context 
*draw);
 uint
 draw_num_shader_outputs(const struct draw_context *draw);
 
+uint
+draw_total_vs_shader_outputs(const struct draw_context *draw);
+
+uint
+draw_total_gs_shader_outputs(const struct draw_context *draw);
 
 void
 draw_texture_sampler(struct draw_context *draw,
diff --git a/src/gallium/auxiliary/draw/draw_gs.c 
b/src/gallium/auxiliary/draw/draw_gs.c
index cd63e2b..32fd91f 100644
--- a/src/gallium/auxiliary/draw/draw_gs.c
+++ b/src/gallium/auxiliary/draw/draw_gs.c
@@ -534,7 +534,7 @@ int draw_geometry_shader_run(struct draw_geometry_shader 
*shader,
 {
const float (*input)[4] = (const float (*)[4])input_verts-verts-data;
unsigned input_stride = input_verts-vertex_size;
-   unsigned num_outputs = shader-info.num_outputs;
+   unsigned num_outputs = draw_total_gs_shader_outputs(shader-draw);
unsigned vertex_size = sizeof(struct vertex_header) + num_outputs * 4 * 
sizeof(float);
unsigned num_input_verts = input_prim-linear ?
   input_verts-count :
diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index c195a2b..8ecb3e7 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -1827,6 +1827,7 @@ draw_llvm_make_variant_key(struct draw_llvm *llvm, char 
*store)
key-need_edgeflags = (llvm-draw-vs.edgeflag_output ? TRUE : FALSE);
key-ucp_enable = llvm-draw-rasterizer-clip_plane_enable;
key-has_gs = llvm-draw-gs.geometry_shader != NULL;
+   key-num_outputs = draw_total_vs_shader_outputs(llvm-draw);
key-pad1 = 0;
 
/* All variants of this shader will have the same value for
@@ -2264,6 +2265,8 @@ draw_gs_llvm_make_variant_key(struct draw_llvm *llvm, 
char *store)
 
key = (struct draw_gs_llvm_variant_key *)store;
 
+   key-num_outputs = draw_total_gs_shader_outputs(llvm-draw);
+
/* All variants of this shader will have

[Mesa-dev] [PATCH 3/8] draw/llvm: add some extra debugging output

2013-08-02 Thread Zack Rusin

when dumping shader outputs it's nice to have the integer
values of the outputs, in particular because some values
are integers.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_llvm.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index 8ecb3e7..df0d2ed 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -977,6 +977,12 @@ convert_to_aos(struct gallivm_state *gallivm,
 
LLVMConstInt(LLVMInt32TypeInContext(gallivm-context),
  chan, 0));
 lp_build_print_value(gallivm, val = , out);
+{
+   LLVMValueRef iv =
+  LLVMBuildBitCast(builder, out, 
lp_build_int_vec_type(gallivm, soa_type), );
+   
+   lp_build_print_value(gallivm,   ival = , iv);
+}
 #endif
 soa[chan] = out;
  }
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/8] draw: make sure clipping works with injected outputs

2013-08-02 Thread Zack Rusin

clipping would drop the extra outputs because it always
used the number of standard vertex shader outputs, without
geometry shader or extra outputs. The commit makes sure
that clipping with geometry shaders which have more outputs
than the current vertex shader and with extra outputs correctly
propagates the entire vertex.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_pipe_clip.c |   89 ---
 1 file changed, 54 insertions(+), 35 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c 
b/src/gallium/auxiliary/draw/draw_pipe_clip.c
index e83586e..b76e9a5 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_clip.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c
@@ -136,7 +136,7 @@ static void interp( const struct clip_stage *clip,
const struct vertex_header *in,
 unsigned viewport_index )
 {
-   const unsigned nr_attrs = draw_current_shader_outputs(clip-stage.draw);
+   const unsigned nr_attrs = draw_num_shader_outputs(clip-stage.draw);
const unsigned pos_attr = 
draw_current_shader_position_output(clip-stage.draw);
const unsigned clip_attr = 
draw_current_shader_clipvertex_output(clip-stage.draw);
unsigned j;
@@ -264,7 +264,6 @@ static void emit_poly( struct draw_stage *stage,
  header.flags |= edge_last;
 
   if (DEBUG_CLIP) {
- const struct draw_vertex_shader *vs = stage-draw-vs.vertex_shader;
  uint j, k;
  debug_printf(Clipped tri: (flat-shade-first = %d)\n,
   stage-draw-rasterizer-flatshade_first);
@@ -274,7 +273,7 @@ static void emit_poly( struct draw_stage *stage,
  header.v[j]-clip[1],
  header.v[j]-clip[2],
  header.v[j]-clip[3]);
-for (k = 0; k  vs-info.num_outputs; k++) {
+for (k = 0; k  draw_num_shader_outputs(stage-draw); k++) {
debug_printf(  Vert %d: Attr %d:  %f %f %f %f\n, j, k,
 header.v[j]-data[k][0],
 header.v[j]-data[k][1],
@@ -283,7 +282,6 @@ static void emit_poly( struct draw_stage *stage,
 }
  }
   }
-
   stage-next-tri( stage-next, header );
}
 }
@@ -609,6 +607,35 @@ clip_tri( struct draw_stage *stage,
 }
 
 
+static int
+find_interp(const struct draw_fragment_shader *fs, int *indexed_interp,
+uint semantic_name, uint semantic_index)
+{
+   int interp;
+   /* If it's gl_{Front,Back}{,Secondary}Color, pick up the mode
+* from the array we've filled before. */
+   if (semantic_name == TGSI_SEMANTIC_COLOR ||
+   semantic_name == TGSI_SEMANTIC_BCOLOR) {
+  interp = indexed_interp[semantic_index];
+   } else {
+  /* Otherwise, search in the FS inputs, with a decent default
+   * if we don't find it.
+   */
+  uint j;
+  interp = TGSI_INTERPOLATE_PERSPECTIVE;
+  if (fs) {
+ for (j = 0; j  fs-info.num_inputs; j++) {
+if (semantic_name == fs-info.input_semantic_name[j] 
+semantic_index == fs-info.input_semantic_index[j]) {
+   interp = fs-info.input_interpolate[j];
+   break;
+}
+ }
+  }
+   }
+   return interp;
+}
+
 /* Update state.  Could further delay this until we hit the first
  * primitive that really requires clipping.
  */
@@ -616,11 +643,9 @@ static void
 clip_init_state( struct draw_stage *stage )
 {
struct clip_stage *clipper = clip_stage( stage );
-   const struct draw_vertex_shader *vs = stage-draw-vs.vertex_shader;
-   const struct draw_geometry_shader *gs = stage-draw-gs.geometry_shader;
const struct draw_fragment_shader *fs = stage-draw-fs.fragment_shader;
-   uint i;
-   const struct tgsi_shader_info *vs_info = gs ? gs-info : vs-info;
+   uint i, j;
+   const struct tgsi_shader_info *info = draw_get_shader_info(stage-draw);
 
/* We need to know for each attribute what kind of interpolation is
 * done on it (flat, smooth or noperspective).  But the information
@@ -663,42 +688,36 @@ clip_init_state( struct draw_stage *stage )
 
clipper-num_flat_attribs = 0;
memset(clipper-noperspective_attribs, 0, 
sizeof(clipper-noperspective_attribs));
-   for (i = 0; i  vs_info-num_outputs; i++) {
-  /* Find the interpolation mode for a specific attribute
-   */
-  int interp;
-
-  /* If it's gl_{Front,Back}{,Secondary}Color, pick up the mode
-   * from the array we've filled before. */
-  if (vs_info-output_semantic_name[i] == TGSI_SEMANTIC_COLOR ||
-  vs_info-output_semantic_name[i] == TGSI_SEMANTIC_BCOLOR) {
- interp = indexed_interp[vs_info-output_semantic_index[i]];
-  } else {
- /* Otherwise, search in the FS inputs, with a decent default
-  * if we don't find it.
-  */
- uint j;
- interp = TGSI_INTERPOLATE_PERSPECTIVE;
- if (fs) {
-for (j = 0; j  fs

[Mesa-dev] [PATCH 5/8] draw: use the vertex size

2013-08-02 Thread Zack Rusin

Instead of using the magical 4 use the above computed
vertex size. Doesn't change the behavior, just makes the code
a bit cleaner.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_pipe_vbuf.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_vbuf.c 
b/src/gallium/auxiliary/draw/draw_pipe_vbuf.c
index d3b38eb..092440e 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_vbuf.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_vbuf.c
@@ -250,7 +250,7 @@ vbuf_start_prim( struct vbuf_stage *vbuf, uint prim )
}
 
hw_key.nr_elements = vbuf-vinfo-num_attribs;
-   hw_key.output_stride = vbuf-vinfo-size * 4;
+   hw_key.output_stride = vbuf-vertex_size;
 
/* Don't bother with caching at this stage:
 */
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/8] draw: fix front face injection

2013-08-02 Thread Zack Rusin

Inject front face only if the fragment shader uses it and
propagate through all channels because otherwise we'll
need to figure out the exact swizzle that the fs expects and
it's just simpler to make sure all the components within
the front face register are correctly set.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_pipe_unfilled.c |   24 ++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c 
b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
index d8a603f..f9a31b0 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
@@ -37,6 +37,7 @@
 #include pipe/p_defines.h
 #include draw_private.h
 #include draw_pipe.h
+#include draw_fs.h
 
 
 struct unfilled_stage {
@@ -67,18 +68,20 @@ inject_front_face_info(struct draw_stage *stage,
   (stage-draw-rasterizer-front_ccw  ccw) ||
   (!stage-draw-rasterizer-front_ccw  !ccw));
unsigned slot = unfilled-face_slot;
-   struct vertex_header *v0 = header-v[0];
-   struct vertex_header *v1 = header-v[1];
-   struct vertex_header *v2 = header-v[2];
+   unsigned i;
 
/* In case the backend doesn't care about it */
if (slot  0) {
   return;
}
 
-   v0-data[slot][0] = is_front_face;
-   v1-data[slot][0] = is_front_face;
-   v2-data[slot][0] = is_front_face;
+   for (i = 0; i  3; ++i) {
+  struct vertex_header *v = header-v[i];
+  v-data[slot][0] = is_front_face;
+  v-data[slot][1] = is_front_face;
+  v-data[slot][2] = is_front_face;
+  v-data[slot][3] = is_front_face;
+   }
 }
 

@@ -231,9 +234,12 @@ draw_unfilled_prepare_outputs( struct draw_context *draw,
 {
struct unfilled_stage *unfilled = unfilled_stage(stage);
const struct pipe_rasterizer_state *rast = draw ? draw-rasterizer : 0;
-   if (rast 
-   (rast-fill_front != PIPE_POLYGON_MODE_FILL ||
-rast-fill_back != PIPE_POLYGON_MODE_FILL)) {
+   boolean is_unfilled = (rast 
+  (rast-fill_front != PIPE_POLYGON_MODE_FILL ||
+   rast-fill_back != PIPE_POLYGON_MODE_FILL));
+   const struct draw_fragment_shader *fs = draw-fs.fragment_shader;
+   
+   if (is_unfilled  fs  fs-info.uses_frontface)  {
   unfilled-face_slot = draw_alloc_extra_vertex_attrib(
  stage-draw, TGSI_SEMANTIC_FACE, 0);
} else {
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 7/8] llvmpipe: don't interpolate front face or prim id

2013-08-02 Thread Zack Rusin

The loop was iterating over all the fs inputs and setting them
to perspective interpolation, then after the loop we were
creating extra output slots with the correct interpolation. Instead
of injecting bogus extra outputs, just set the interpolation
on front face and prim id correctly when doing the initial scan
of fs inputs.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_state_derived.c |   30 +++
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_state_derived.c 
b/src/gallium/drivers/llvmpipe/lp_state_derived.c
index 5a51b50..7b1e6f6 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_derived.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_derived.c
@@ -69,8 +69,8 @@ compute_vertex_info(struct llvmpipe_context *llvmpipe)
vinfo-num_attribs = 0;
 
vs_index = draw_find_shader_output(llvmpipe-draw,
-   TGSI_SEMANTIC_POSITION,
-   0);
+  TGSI_SEMANTIC_POSITION,
+  0);
 
draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_PERSPECTIVE, vs_index);
 
@@ -89,12 +89,20 @@ compute_vertex_info(struct llvmpipe_context *llvmpipe)
  llvmpipe-color_slot[idx] = (int)vinfo-num_attribs;
   }
 
-  /*
-   * Emit the requested fs attribute for all but position.
-   */
-  draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_PERSPECTIVE, vs_index);
+  if (lpfs-info.base.input_semantic_index[i] == 0 
+  lpfs-info.base.input_semantic_name[i] == TGSI_SEMANTIC_FACE) {
+ llvmpipe-face_slot = vinfo-num_attribs;
+ draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_CONSTANT, vs_index);
+  } else if (lpfs-info.base.input_semantic_index[i] == 0 
+ lpfs-info.base.input_semantic_name[i] == 
TGSI_SEMANTIC_PRIMID) {
+ draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_CONSTANT, vs_index);
+  } else {
+ /*
+  * Emit the requested fs attribute for all but position.
+  */
+ draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_PERSPECTIVE, vs_index);
+  }
}
-
/* Figure out if we need bcolor as well.
 */
for (i = 0; i  2; i++) {
@@ -140,14 +148,6 @@ compute_vertex_info(struct llvmpipe_context *llvmpipe)
   llvmpipe-layer_slot = 0;
}
 
-   /* Check for a fake front face for unfilled primitives*/
-   vs_index = draw_find_shader_output(llvmpipe-draw,
-  TGSI_SEMANTIC_FACE, 0);
-   if (vs_index = 0) {
-  llvmpipe-face_slot = vinfo-num_attribs;
-  draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_CONSTANT, vs_index);
-   }
-
draw_compute_vertex_size(vinfo);
lp_setup_set_vertex_info(llvmpipe-setup, vinfo);
 }
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 8/8] draw: implement proper primitive assembler as a pipeline stage

2013-08-02 Thread Zack Rusin

we used to have a face primitive assembler that we ran after if
the gs was missing but we had adjacency primitives in the pipeline,
lets convert it to a pipeline stage, which allows us to use it
to inject outputs (primitive id) into the vertices. it's also
a lot cleaner because the decomposition is already handled for us.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/Makefile.sources |2 +-
 src/gallium/auxiliary/draw/draw_context.c  |1 +
 src/gallium/auxiliary/draw/draw_pipe.c |4 +
 src/gallium/auxiliary/draw/draw_pipe.h |5 +
 src/gallium/auxiliary/draw/draw_pipe_ia.c  |  253 
 src/gallium/auxiliary/draw/draw_pipe_validate.c|   15 +-
 src/gallium/auxiliary/draw/draw_prim_assembler.c   |  225 -
 src/gallium/auxiliary/draw/draw_prim_assembler.h   |   62 -
 .../auxiliary/draw/draw_prim_assembler_tmp.h   |   31 ---
 src/gallium/auxiliary/draw/draw_private.h  |1 +
 .../auxiliary/draw/draw_pt_fetch_shade_pipeline.c  |   18 +-
 .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |   18 +-
 12 files changed, 283 insertions(+), 352 deletions(-)
 create mode 100644 src/gallium/auxiliary/draw/draw_pipe_ia.c
 delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler.c
 delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler.h
 delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler_tmp.h

diff --git a/src/gallium/auxiliary/Makefile.sources 
b/src/gallium/auxiliary/Makefile.sources
index acbcef7..ee93e8b 100644
--- a/src/gallium/auxiliary/Makefile.sources
+++ b/src/gallium/auxiliary/Makefile.sources
@@ -13,6 +13,7 @@ C_SOURCES := \
draw/draw_pipe_clip.c \
draw/draw_pipe_cull.c \
draw/draw_pipe_flatshade.c \
+draw/draw_pipe_ia.c \
draw/draw_pipe_offset.c \
draw/draw_pipe_pstipple.c \
draw/draw_pipe_stipple.c \
@@ -23,7 +24,6 @@ C_SOURCES := \
draw/draw_pipe_vbuf.c \
draw/draw_pipe_wide_line.c \
draw/draw_pipe_wide_point.c \
-   draw/draw_prim_assembler.c \
draw/draw_pt.c \
draw/draw_pt_emit.c \
draw/draw_pt_fetch.c \
diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index 8bf3596..bbb2904 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -555,6 +555,7 @@ draw_get_shader_info(const struct draw_context *draw)
 void
 draw_prepare_shader_outputs(struct draw_context *draw)
 {
+   draw_ia_prepare_outputs(draw, draw-pipeline.ia);
draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled);
 }
 
diff --git a/src/gallium/auxiliary/draw/draw_pipe.c 
b/src/gallium/auxiliary/draw/draw_pipe.c
index f1ee6cb..8140299 100644
--- a/src/gallium/auxiliary/draw/draw_pipe.c
+++ b/src/gallium/auxiliary/draw/draw_pipe.c
@@ -49,6 +49,7 @@ boolean draw_pipeline_init( struct draw_context *draw )
draw-pipeline.clip  = draw_clip_stage( draw );
draw-pipeline.flatshade = draw_flatshade_stage( draw );
draw-pipeline.cull  = draw_cull_stage( draw );
+   draw-pipeline.ia= draw_ia_stage( draw );
draw-pipeline.validate  = draw_validate_stage( draw );
draw-pipeline.first = draw-pipeline.validate;
 
@@ -61,6 +62,7 @@ boolean draw_pipeline_init( struct draw_context *draw )
!draw-pipeline.clip ||
!draw-pipeline.flatshade ||
!draw-pipeline.cull ||
+   !draw-pipeline.ia ||
!draw-pipeline.validate)
   return FALSE;
 
@@ -95,6 +97,8 @@ void draw_pipeline_destroy( struct draw_context *draw )
   draw-pipeline.flatshade-destroy( draw-pipeline.flatshade );
if (draw-pipeline.cull)
   draw-pipeline.cull-destroy( draw-pipeline.cull );
+   if (draw-pipeline.ia)
+  draw-pipeline.ia-destroy( draw-pipeline.ia );
if (draw-pipeline.validate)
   draw-pipeline.validate-destroy( draw-pipeline.validate );
if (draw-pipeline.aaline)
diff --git a/src/gallium/auxiliary/draw/draw_pipe.h 
b/src/gallium/auxiliary/draw/draw_pipe.h
index 70c286f..70822a4 100644
--- a/src/gallium/auxiliary/draw/draw_pipe.h
+++ b/src/gallium/auxiliary/draw/draw_pipe.h
@@ -91,7 +91,10 @@ extern struct draw_stage *draw_stipple_stage( struct 
draw_context *context );
 extern struct draw_stage *draw_wide_line_stage( struct draw_context *context );
 extern struct draw_stage *draw_wide_point_stage( struct draw_context *context 
);
 extern struct draw_stage *draw_validate_stage( struct draw_context *context );
+extern struct draw_stage *draw_ia_stage(struct draw_context *context);
 
+boolean draw_ia_stage_required(const struct draw_context *context,
+   unsigned prim);
 
 extern void draw_free_temp_verts( struct draw_stage *stage );
 extern boolean draw_alloc_temp_verts( struct draw_stage *stage, unsigned nr );
@@ -105,6 +108,8 @@ void draw_pipe_passthrough_point(struct draw_stage *stage

Re: [Mesa-dev] [PATCH 8/8] draw: implement proper primitive assembler as a pipeline stage

2013-08-02 Thread Zack Rusin

Yea, it's quite bonkers, but that's the way it has to be to make it work right 
now. Personally I'd really like to write a new version of draw, without the 5 
emit paths, 4 different vertex shading paths, with interface that is capable of 
emitting more than just float[4]'s... For now though this works, even if it is 
very ugly.

z

- Original Message -
 Am 02.08.2013 08:28, schrieb Zack Rusin:
  we used to have a face primitive assembler that we ran after if
  the gs was missing but we had adjacency primitives in the pipeline,
  lets convert it to a pipeline stage, which allows us to use it
  to inject outputs (primitive id) into the vertices. it's also
  a lot cleaner because the decomposition is already handled for us.
  
  Signed-off-by: Zack Rusin za...@vmware.com
  ---
   src/gallium/auxiliary/Makefile.sources |2 +-
   src/gallium/auxiliary/draw/draw_context.c  |1 +
   src/gallium/auxiliary/draw/draw_pipe.c |4 +
   src/gallium/auxiliary/draw/draw_pipe.h |5 +
   src/gallium/auxiliary/draw/draw_pipe_ia.c  |  253
   
   src/gallium/auxiliary/draw/draw_pipe_validate.c|   15 +-
   src/gallium/auxiliary/draw/draw_prim_assembler.c   |  225
   -
   src/gallium/auxiliary/draw/draw_prim_assembler.h   |   62 -
   .../auxiliary/draw/draw_prim_assembler_tmp.h   |   31 ---
   src/gallium/auxiliary/draw/draw_private.h  |1 +
   .../auxiliary/draw/draw_pt_fetch_shade_pipeline.c  |   18 +-
   .../draw/draw_pt_fetch_shade_pipeline_llvm.c   |   18 +-
   12 files changed, 283 insertions(+), 352 deletions(-)
   create mode 100644 src/gallium/auxiliary/draw/draw_pipe_ia.c
   delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler.c
   delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler.h
   delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler_tmp.h
  
  diff --git a/src/gallium/auxiliary/Makefile.sources
  b/src/gallium/auxiliary/Makefile.sources
  index acbcef7..ee93e8b 100644
  --- a/src/gallium/auxiliary/Makefile.sources
  +++ b/src/gallium/auxiliary/Makefile.sources
  @@ -13,6 +13,7 @@ C_SOURCES := \
  draw/draw_pipe_clip.c \
  draw/draw_pipe_cull.c \
  draw/draw_pipe_flatshade.c \
  +draw/draw_pipe_ia.c \
 Formatting looks off here.
 
  draw/draw_pipe_offset.c \
  draw/draw_pipe_pstipple.c \
  draw/draw_pipe_stipple.c \
  @@ -23,7 +24,6 @@ C_SOURCES := \
  draw/draw_pipe_vbuf.c \
  draw/draw_pipe_wide_line.c \
  draw/draw_pipe_wide_point.c \
  -   draw/draw_prim_assembler.c \
  draw/draw_pt.c \
  draw/draw_pt_emit.c \
  draw/draw_pt_fetch.c \
  diff --git a/src/gallium/auxiliary/draw/draw_context.c
  b/src/gallium/auxiliary/draw/draw_context.c
  index 8bf3596..bbb2904 100644
  --- a/src/gallium/auxiliary/draw/draw_context.c
  +++ b/src/gallium/auxiliary/draw/draw_context.c
  @@ -555,6 +555,7 @@ draw_get_shader_info(const struct draw_context *draw)
   void
   draw_prepare_shader_outputs(struct draw_context *draw)
   {
  +   draw_ia_prepare_outputs(draw, draw-pipeline.ia);
  draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled);
   }
   
  diff --git a/src/gallium/auxiliary/draw/draw_pipe.c
  b/src/gallium/auxiliary/draw/draw_pipe.c
  index f1ee6cb..8140299 100644
  --- a/src/gallium/auxiliary/draw/draw_pipe.c
  +++ b/src/gallium/auxiliary/draw/draw_pipe.c
  @@ -49,6 +49,7 @@ boolean draw_pipeline_init( struct draw_context *draw )
  draw-pipeline.clip  = draw_clip_stage( draw );
  draw-pipeline.flatshade = draw_flatshade_stage( draw );
  draw-pipeline.cull  = draw_cull_stage( draw );
  +   draw-pipeline.ia= draw_ia_stage( draw );
  draw-pipeline.validate  = draw_validate_stage( draw );
  draw-pipeline.first = draw-pipeline.validate;
   
  @@ -61,6 +62,7 @@ boolean draw_pipeline_init( struct draw_context *draw )
  !draw-pipeline.clip ||
  !draw-pipeline.flatshade ||
  !draw-pipeline.cull ||
  +   !draw-pipeline.ia ||
  !draw-pipeline.validate)
 return FALSE;
   
  @@ -95,6 +97,8 @@ void draw_pipeline_destroy( struct draw_context *draw )
 draw-pipeline.flatshade-destroy( draw-pipeline.flatshade );
  if (draw-pipeline.cull)
 draw-pipeline.cull-destroy( draw-pipeline.cull );
  +   if (draw-pipeline.ia)
  +  draw-pipeline.ia-destroy( draw-pipeline.ia );
  if (draw-pipeline.validate)
 draw-pipeline.validate-destroy( draw-pipeline.validate );
  if (draw-pipeline.aaline)
  diff --git a/src/gallium/auxiliary/draw/draw_pipe.h
  b/src/gallium/auxiliary/draw/draw_pipe.h
  index 70c286f..70822a4 100644
  --- a/src/gallium/auxiliary/draw/draw_pipe.h
  +++ b/src/gallium/auxiliary/draw/draw_pipe.h
  @@ -91,7 +91,10 @@ extern struct draw_stage *draw_stipple_stage( struct
  draw_context *context );
   extern struct draw_stage *draw_wide_line_stage( struct draw_context

[Mesa-dev] [PATCH 1/2] llvmpipe: make the front-face behavior match the gallium spec

2013-07-31 Thread Zack Rusin

The spec says that front-face is true if the value is 0 and false
if it's 0. To make sure that we follow the spec, lets just
subtract 0.5 from our value (llvmpipe did 1 for frontface and 0
otherwise), which will get us a positive num for frontface and
negative for backface.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_state_setup.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_state_setup.c 
b/src/gallium/drivers/llvmpipe/lp_state_setup.c
index bb5cfc4..cecfbce 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_setup.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_setup.c
@@ -182,7 +182,10 @@ emit_facing_coef(struct gallivm_state *gallivm,
LLVMValueRef a0_0 = args-facing;
LLVMValueRef a0_0f = LLVMBuildSIToFP(builder, a0_0, float_type, );
LLVMValueRef zero = lp_build_const_float(gallivm, 0.0);
-   LLVMValueRef a0 = vec4f(gallivm, a0_0f, zero, zero, zero, facing);
+   LLVMValueRef face_val = LLVMBuildFSub(builder, a0_0f,
+ lp_build_const_float(gallivm, 0.5),
+ );
+   LLVMValueRef a0 = vec4f(gallivm, face_val, zero, zero, zero, facing);
LLVMValueRef zerovec = vec4f_from_scalar(gallivm, zero, zero);
 
store_coef(gallivm, args, slot, a0, zerovec, zerovec);
-- 
1.7.10.4
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] draw: inject frontface info into wireframe outputs

2013-07-31 Thread Zack Rusin

Draw module can decompose primitives into wireframe models, which
is a fancy word for 'lines', unfortunately that decomposition means
that we weren't able to preserve the original front-face info which
could be derived from the original primitives (lines don't have a
'face'). To fix it allow draw module to inject a fake face semantic
into outputs from which the backends can figure out the original
frontfacing info of the primitives.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_context.c   |   43 
 src/gallium/auxiliary/draw/draw_context.h   |6 +++
 src/gallium/auxiliary/draw/draw_pipe.h  |3 ++
 src/gallium/auxiliary/draw/draw_pipe_unfilled.c |   49 +++
 src/gallium/drivers/i915/i915_state_derived.c   |2 +
 src/gallium/drivers/llvmpipe/lp_context.h   |3 ++
 src/gallium/drivers/llvmpipe/lp_setup.c |1 +
 src/gallium/drivers/llvmpipe/lp_setup_context.h |1 +
 src/gallium/drivers/llvmpipe/lp_setup_line.c|   14 ++-
 src/gallium/drivers/llvmpipe/lp_state_derived.c |9 +
 src/gallium/drivers/r300/r300_state_derived.c   |1 +
 src/gallium/drivers/softpipe/sp_state_derived.c |2 +
 src/gallium/drivers/svga/svga_swtnl_state.c |1 +
 13 files changed, 133 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index 4a6ba1a..2e95b5c 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -39,6 +39,7 @@
 #include util/u_helpers.h
 #include util/u_prim.h
 #include draw_context.h
+#include draw_pipe.h
 #include draw_vs.h
 #include draw_gs.h
 
@@ -540,6 +541,22 @@ draw_get_shader_info(const struct draw_context *draw)
}
 }
 
+/**
+ * Prepare outputs slots from the draw module
+ *
+ * Certain parts of the draw module can emit additional
+ * outputs that can be quite useful to the backends, a good
+ * example of it is the process of decomposing primitives
+ * into wireframes (aka. lines) which normally would lose
+ * the face-side information, but using this method we can
+ * inject another shader output which passes the original
+ * face side information to the backend.
+ */
+void
+draw_prepare_shader_outputs(struct draw_context *draw)
+{
+   draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled);
+}
 
 /**
  * Ask the draw module for the location/slot of the given vertex attribute in
@@ -973,3 +990,29 @@ draw_stats_clipper_primitives(struct draw_context *draw,
   }
}
 }
+
+
+/**
+ * Returns true if the draw module will inject the frontface
+ * info into the outputs.
+ *
+ * Given the specified primitive and rasterizer state
+ * the function will figure out if the draw module
+ * will inject the front-face information into shader
+ * outputs. This is done to preserve the front-facing
+ * info when decomposing primitives into wireframes.
+ */
+boolean
+draw_will_inject_frontface(const struct draw_context *draw)
+{
+   unsigned reduced_prim = u_reduced_prim(draw-pt.prim);
+   const struct pipe_rasterizer_state *rast = draw-rasterizer;
+
+   if (reduced_prim != PIPE_PRIM_TRIANGLES) {
+  return FALSE;
+   }
+
+   return (rast 
+   (rast-fill_front != PIPE_POLYGON_MODE_FILL ||
+rast-fill_back != PIPE_POLYGON_MODE_FILL));
+}
diff --git a/src/gallium/auxiliary/draw/draw_context.h 
b/src/gallium/auxiliary/draw/draw_context.h
index 4a1b27e..0815047 100644
--- a/src/gallium/auxiliary/draw/draw_context.h
+++ b/src/gallium/auxiliary/draw/draw_context.h
@@ -126,10 +126,16 @@ draw_install_pstipple_stage(struct draw_context *draw, 
struct pipe_context *pipe
 struct tgsi_shader_info *
 draw_get_shader_info(const struct draw_context *draw);
 
+void
+draw_prepare_shader_outputs(struct draw_context *draw);
+
 int
 draw_find_shader_output(const struct draw_context *draw,
 uint semantic_name, uint semantic_index);
 
+boolean
+draw_will_inject_frontface(const struct draw_context *draw);
+
 uint
 draw_num_shader_outputs(const struct draw_context *draw);
 
diff --git a/src/gallium/auxiliary/draw/draw_pipe.h 
b/src/gallium/auxiliary/draw/draw_pipe.h
index 4792507..2e48b56 100644
--- a/src/gallium/auxiliary/draw/draw_pipe.h
+++ b/src/gallium/auxiliary/draw/draw_pipe.h
@@ -102,6 +102,9 @@ void draw_pipe_passthrough_line(struct draw_stage *stage, 
struct prim_header *he
 void draw_pipe_passthrough_point(struct draw_stage *stage, struct prim_header 
*header);
 
 
+void draw_unfilled_prepare_outputs(struct draw_context *context,
+   struct draw_stage *stage);
+
 
 /**
  * Get a writeable copy of a vertex.
diff --git a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c 
b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
index d87741b..d8a603f 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c
@@ -47,6 +47,8 @@ struct

Re: [Mesa-dev] [PATCH 2/2] draw: inject frontface info into wireframe outputs

2013-07-31 Thread Zack Rusin

  +   if (draw_will_inject_frontface(lp_context-draw) 
 I think it's annoying you have to do these calls to determine if there's
 a valid frontface here for each line instead of just per draw call but
 it doesn't seem easy to avoid it.

Yea, there's no trivial way of avoiding it.

 Also, no love for llvmpipe point face? I realize d3d10 doesn't require
 it but OpenGL (and IIRC d3d9) do.

I didn't know of any tests for the points and we care only about lines right 
now. It's just four extra lines of code or so, so I can trivially add it but I 
don't have anything to test it with.

 Looks like quite a heavy interface (and sort of silly to allocate 128
 bits in the vertex data (so actually twice that for one line) for 1 bit
 of information but given all our data passed on to the line/point funcs
 are float4 I don't really see any other easy way neither), but seems all
 necessary unfortunately. I guess another option would be to pass the
 face info always along the vertex data no matter what (which would mean
 all those additional calls for setting up outputs, determining if
 there's a valid frontface etc. could go along with the storage needed)
 for all primitives to the point/line/tri funcs but I'm not really
 thrilled about that idea neither (passing it for tris so it doesn't have
 to be recalculated may or may not be a good idea neither).

Yes, plus then we'd need a brand new pipeline stage that is always run and that 
is largely useless for vast majority of rendering. It's sort of a lose lose 
scenario. The only thing that is clear is that we have to pass the data along 
the shader outputs, everything else is a messy glue to make it possible.

z
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 3 >

1 - 100 of 299 matches

Mail list logo