Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Wed, Jan 24, 2024 at 6:57 PM Marek Olšák wrote: > > Gallium looks like it was just a copy of DX10, and likely many things were > known from DX10 in advance before anything started. Vulkanium doesn't have > anything to draw inspiration from. It's a completely unexplored idea. I'm not sure if I follow this. GNU/Linux didn't have a unified driver interface to implement GL, but Windows did have a standardized interface to implement D3D10 which we drew inspiration from. The same is still true if you s/GL/Vulkan/ and s/D3D10/D3D12/. It's just that more features of modern API's are tied to kernel features (i.e. wddm versions) than in the past, but with gpuvm, drm scheduler and syncobj that's also going to be Vulkan's path. Now, you might say that this time we're not going to use any lessons from Windows and this interface will be completely unlike what Windows does for D3D12, which is fine but I still wouldn't call the idea of standardizing an interface for a low level graphics API a completely unexplored idea given that it works on Windows on an api that's a lot more like Vulkan, than D3D10 was like GL. z
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Wed, Jan 24, 2024 at 10:27 AM Faith Ekstrand wrote: > > Jose, > > Thanks for your thoughts! > > On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca > wrote: > > > > I don't know much about the current Vulkan driver internals to have or > > provide an informed opinion on the path forward, but I'd like to share my > > backwards looking perspective. > > > > Looking back, Gallium was two things effectively: > > (1) an abstraction layer, that's watertight (as in upper layers shouldn't > > reach through to lower layers) > > (2) an ecosystem of reusable components (draw, util, tgsi, etc.) > > > > (1) was of course important -- and the discipline it imposed is what > > enabled to great simplifications -- but it also became a straight-jacket, > > as GPUs didn't stand still, and sooner or later the > > see-every-hardware-as-the-same lenses stop reflecting reality. > > > > If I had to pick one, I'd say that (2) is far more useful and practical. > > Take components like gallium's draw and other util modules. A driver can > > choose to use them or not. One could fork them within Mesa source tree, > > and only the drivers that opt-in into the fork would need to be > > tested/adapted/etc > > > > On the flip side, Vulkan API is already a pretty low level HW abstraction. > > It's also very flexible and extensible, so it's hard to provide a > > watertight abstraction underneath it without either taking the lowest > > common denominator, or having lots of optional bits of functionality > > governed by a myriad of caps like you alluded to. > > There is a third thing that isn't really recognized in your description: > > (3) A common "language" to talk about GPUs and data structures that > represent that language > > This is precisely what the Vulkan runtime today doesn't have. Classic > meta sucked because we were trying to implement GL in GL. u_blitter, > on the other hand, is pretty fantastic because Gallium provides a much > more sane interface to write those common components in terms of. > > So far, we've been trying to build those components in terms of the > Vulkan API itself with calls jumping back into the dispatch table to > try and get inside the driver. This is working but it's getting more > and more fragile the more tools we add to that box. A lot of what I > want to do with gallium2 or whatever we're calling it is to fix our > layering problems so that calls go in one direction and we can > untangle the jumble. I'm still not sure what I want that to look like > but I think I want it to look a lot like Vulkan, just with a handier > interface. Yes, that makes sense. When we were writing the initial components for gallium (draw and cso) I really liked the general concept and thought about trying to reuse them in the old, non-gallium Mesa drivers but the obstacle was that there was no common interface to lay them on. Using GL to implement GL was silly and using Vulkan to implement Vulkan is not much better. Having said that my general thoughts on GPU abstractions largely match what Jose has said. To me it's a question of whether a clean abstraction: - on top of which you can build an entire GPU driver toolkit (i.e. all the components and helpers) - that makes it trivial to figure up what needs to be done to write a new driver and makes bootstrapping a new driver a lot simpler - that makes it easier to reason about cross hardware concepts (it's a lot easier to understand the entirety of the ecosystem if every driver is not doing something unique to implement similar functionality) is worth more than almost exponentially increasing the difficulty of: - advancing the ecosystem (i.e. it might be easier to understand but it's way harder to create clean abstractions across such different hardware). - driver maintenance (i.e. there will be a constant stream of regressions hitting your driver as a result of other people working on their drivers) - general development (i.e. bug fixes/new features being held back because they break some other driver) Some of those can certainly be titled one way or the other, e.g. the driver maintenance con be somewhat eased by requiring that every driver working on top of the new abstraction has to have a stable Mesa-CI setup (be it lava or ci-tron, or whatever) but all of those things need to be reasoned about. In my experience abstractions never have uniform support because some people will value cons of them more than they value the pros. So the entire process requires some very steadfast individuals to keep going despite hearing that the effort is dumb, at least until the benefits of the new approach are impossible to deny. So you know... "how much do you believe in this approach because some days will suck and you can't give up" ;) is probably the question. z
Re: [Mesa-dev] [PATCH] draw: fix clipvertex trouble if position comes from gs
On Aug 5, 2014, at 9:40 PM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com If the vertex shader has no position but the gs has, the clipvertex output was -1 (because it's the same as vs position in this case if there's no explicit clipvertex output). This caused crashes (or assertion failures) in clipping since in the end position (which came from gs) was different from cv (-1) and we then tried to use the bogus cv input. Rather than just test for -1 cv value in clipping, make it explicitly return the position output of the gs instead which seems cleaner (since we really don't want to use the clipvertex value from the vs (it could be a valid value in the (unsupported) case of vs writing clipvertex but still using a gs). This fixes piglit shader_runner clip-distance-out-values.shader_test. Great. Well done! Both of those look good. Reviewed-by: Zack Rusin za...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/8] draw/gs: fix segfault in glsl-1.50-gs-mismatch-prim-type triangles_adjacency
That looks wrong. The total number of verts per buffer is the maximum number of verts that can be output per invocation (primitive_boundary) times number of invocations of geometry shader (num_in_primitives). It's not maximum number of verts that can be output per invocation (primitive_boundary) times maximum number of primitives output by geometry shader (max_out_prims). z - Original Message - From: Dave Airlie airl...@redhat.com This crashes on softpipe due to a lack of output memory allocated, it appears we allocate memory for enough primtives, but not vertices so convert to number of vertices. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/auxiliary/draw/draw_gs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/draw/draw_gs.c b/src/gallium/auxiliary/draw/draw_gs.c index fc4f697..0a9bf81 100644 --- a/src/gallium/auxiliary/draw/draw_gs.c +++ b/src/gallium/auxiliary/draw/draw_gs.c @@ -555,7 +555,7 @@ int draw_geometry_shader_run(struct draw_geometry_shader *shader, /* we allocate exactly one extra vertex per primitive to allow the GS to emit * overflown vertices into some area where they won't harm anyone */ unsigned total_verts_per_buffer = shader-primitive_boundary * - num_in_primitives; + max_out_prims * u_vertices_per_prim(shader-output_primitive); //Assume at least one primitive max_out_prims = MAX2(max_out_prims, 1); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/8] draw/gs: fix segfault in glsl-1.50-gs-mismatch-prim-type triangles_adjacency
I think the code is already correct and something else goes wrong. The tgsi geometry shader code was never done properly so it's more than likely that tgsi_exec is doing something wonky. Geometry shaders specify the maximum number of vertices that they can emit. That's what draw_geometry_shader::max_output_vertices is. If a geometry shader emits more than that, the verts will be ignored. So our primitive_boundary is max_output_vertices + 1 because we want to make sure that in SoA we have a scratch space where we can keep writing the overflowed vertices. So the worst case scenario for our output buffer is: (max_output_vertices + 1) * geometry shader invocations. That's what we have there now and that's correct. I don't remember what tgsi_exec does, I think I never even implemented proper SoA for gs in tgsi_exec, so if there's anything wrong I'd look for the bug there. z - Original Message - On 11 June 2014 00:02, Zack Rusin za...@vmware.com wrote: That looks wrong. The total number of verts per buffer is the maximum number of verts that can be output per invocation (primitive_boundary) times number of invocations of geometry shader (num_in_primitives). It's not maximum number of verts that can be output per invocation (primitive_boundary) times maximum number of primitives output by geometry shader (max_out_prims). Okay so just adding * u_vertices_per_prim(shader-output_primitive); would suffice? Dave ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/8] draw/gs: fix segfault in glsl-1.50-gs-mismatch-prim-type triangles_adjacency
I'll revisit it today and see if I can spot something else wrong, it fails for triangle adj because there are 6 vertices per primitive and we have only malloced space for 4. It has to be something else because that's impossible, in fact it's 2x impossible ;) 1) It's illegal and impossible for geometry shader to emit adjacency primitives. Only points, lines and triangles can be emitted from gs. 2) The output primitive is irrelevant for the size of the buffer. If a geometry shader claims that the max output vertices is four, then it can, at most, emit 4 points, 2 lines or 1 triangle (incomplete primitives are discarded from geometry shader so the extra 4th vertex will be discarded). If a geometry shader claims to max emit 4 vertices and you try to emit 100 points, you will still get only 4 points (96 will be counted as overflowed but they won't be emitted). My advice would be to check what's in the output buffer with llvmpipe. If tgsi_exec doesn't match llvmpipe then there's a bug in tgsi_exec. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: avoid buffer overflows with bad geometry programs.
To be honest I still don't like it. While the tgsi_exec specific paths in draw_gs don't matter to me and can be as ugly as they need to be, they can't be polluting the draw_pt_emit code, in other words the primitive_lengths can't be bogus at that point - prim_info can't lie about the amount of data that it's holding. z - Original Message - From: Dave Airlie airl...@redhat.com One of the mismatched tests have a max output vertices of 3, but emits 6 vertices, this means the output buffer is undersized and causes problems down the line, so limit things later if we have a number of vertices lower than the number required to execute a primitive. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/auxiliary/draw/draw_gs.c | 4 ++-- src/gallium/auxiliary/draw/draw_pt_emit.c | 8 +++- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_gs.c b/src/gallium/auxiliary/draw/draw_gs.c index fc4f697..d07e88f 100644 --- a/src/gallium/auxiliary/draw/draw_gs.c +++ b/src/gallium/auxiliary/draw/draw_gs.c @@ -92,8 +92,8 @@ tgsi_fetch_gs_outputs(struct draw_geometry_shader *shader, unsigned num_verts_per_prim = machine-Primitives[prim_idx]; shader-primitive_lengths[prim_idx + shader-emitted_primitives] = machine-Primitives[prim_idx]; - shader-emitted_vertices += num_verts_per_prim; - for (j = 0; j num_verts_per_prim; j++, current_idx++) { + shader-emitted_vertices += MIN2(num_verts_per_prim, shader-max_output_vertices); + for (j = 0; j MIN2(num_verts_per_prim, shader-max_output_vertices); j++, current_idx++) { int idx = current_idx * shader-info.num_outputs; #ifdef DEBUG_OUTPUTS debug_printf(%d) Output vert:\n, idx / shader-info.num_outputs); diff --git a/src/gallium/auxiliary/draw/draw_pt_emit.c b/src/gallium/auxiliary/draw/draw_pt_emit.c index 011efe7..d8e2809 100644 --- a/src/gallium/auxiliary/draw/draw_pt_emit.c +++ b/src/gallium/auxiliary/draw/draw_pt_emit.c @@ -26,6 +26,7 @@ **/ #include util/u_memory.h +#include util/u_math.h #include draw/draw_context.h #include draw/draw_private.h #include draw/draw_vbuf.h @@ -255,9 +256,14 @@ draw_pt_emit_linear(struct pt_emit *emit, i prim_info-primitive_count; start += prim_info-primitive_lengths[i], i++) { + int len; + if (start count) + continue; + len = MIN2(prim_info-primitive_lengths[i], count); render-draw_arrays(render, start, - prim_info-primitive_lengths[i]); + len); + } render-release_vertices(render); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] tgsi/gs: bound max output vertices in shader
Looks great. If I was into diffs I'd make sweet and passionate love to this one. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Dave Airlie airl...@redhat.com This limits the number of emitted vertices to the shaders max output vertices, and avoids us writing things into memory that isn't big enough for it. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/auxiliary/tgsi/tgsi_exec.c | 8 src/gallium/auxiliary/tgsi/tgsi_exec.h | 1 + 2 files changed, 9 insertions(+) diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c b/src/gallium/auxiliary/tgsi/tgsi_exec.c index 69d98fd..d848348 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c @@ -789,6 +789,11 @@ tgsi_exec_machine_bind_shader( break; case TGSI_TOKEN_TYPE_PROPERTY: + if (mach-Processor == TGSI_PROCESSOR_GEOMETRY) { +if (parse.FullToken.FullProperty.Property.PropertyName == TGSI_PROPERTY_GS_MAX_OUTPUT_VERTICES) { + mach-MaxOutputVertices = parse.FullToken.FullProperty.u[0].Data; +} + } break; default: @@ -1621,6 +1626,9 @@ emit_vertex(struct tgsi_exec_machine *mach) if ((mach-ExecMask (1 i))) */ if (mach-ExecMask) { + if (mach-Primitives[mach-Temps[TEMP_PRIMITIVE_I].xyzw[TEMP_PRIMITIVE_C].u[0]] = mach-MaxOutputVertices) + return; + mach-Temps[TEMP_OUTPUT_I].xyzw[TEMP_OUTPUT_C].u[0] += mach-NumOutputs; mach-Primitives[mach-Temps[TEMP_PRIMITIVE_I].xyzw[TEMP_PRIMITIVE_C].u[0]]++; } diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h b/src/gallium/auxiliary/tgsi/tgsi_exec.h index 7a82f69..d53c4ba 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.h +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h @@ -297,6 +297,7 @@ struct tgsi_exec_machine unsigned *Primitives; unsigned NumOutputs; unsigned MaxGeometryShaderOutputs; + unsigned MaxOutputVertices; /* FRAGMENT processor only. */ const struct tgsi_interp_coef *InterpCoefs; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 7/8] gallium: create TGSI_PROPERTY to disable viewport and clipping
It's not relevant to anything we have. The last I looked st/nine wasn't even an umd. Everything that's needed for a d3d9 (and d3d10) umd's has already been added to gallium, we don't have any patches against core gallium that we've been keeping from the community. All we could do is review the patch for code-quality, but so does everyone else. z - Original Message - Hi, Could somebody from VMWare please review this patch? It's for st/nine (open d3d9 state tracker). Thanks, Marek On Sat, May 17, 2014 at 1:20 AM, Automated rebase david.heidelber...@ixit.cz wrote: From: Christoph Bumiller e0425...@student.tuwien.ac.at --- src/gallium/auxiliary/tgsi/tgsi_strings.c | 3 ++- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 16 src/gallium/auxiliary/tgsi/tgsi_ureg.h | 4 src/gallium/docs/source/tgsi.rst | 9 + src/gallium/include/pipe/p_shader_tokens.h | 3 ++- 5 files changed, 33 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c b/src/gallium/auxiliary/tgsi/tgsi_strings.c index 5b6e47f..c3e7118 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c @@ -120,7 +120,8 @@ const char *tgsi_property_names[TGSI_PROPERTY_COUNT] = FS_COORD_PIXEL_CENTER, FS_COLOR0_WRITES_ALL_CBUFS, FS_DEPTH_LAYOUT, - VS_PROHIBIT_UCPS + VS_PROHIBIT_UCPS, + VS_POSITION_WINDOW_SPACE }; const char *tgsi_type_names[5] = diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index 2bf93ee..bd0a3f7 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -173,6 +173,7 @@ struct ureg_program unsigned char property_fs_coord_pixel_center; /* = TGSI_FS_COORD_PIXEL_CENTER_* */ unsigned char property_fs_color0_writes_all_cbufs; /* = TGSI_FS_COLOR0_WRITES_ALL_CBUFS * */ unsigned char property_fs_depth_layout; /* TGSI_FS_DEPTH_LAYOUT */ + boolean property_vs_window_space_position; /* TGSI_VS_WINDOW_SPACE_POSITION */ unsigned nr_addrs; unsigned nr_preds; @@ -331,6 +332,13 @@ ureg_property_fs_depth_layout(struct ureg_program *ureg, ureg-property_fs_depth_layout = fs_depth_layout; } +void +ureg_property_vs_window_space_position(struct ureg_program *ureg, + boolean vs_window_space_position) +{ + ureg-property_vs_window_space_position = vs_window_space_position; +} + struct ureg_src ureg_DECL_fs_input_cyl_centroid(struct ureg_program *ureg, unsigned semantic_name, @@ -1508,6 +1516,14 @@ static void emit_decls( struct ureg_program *ureg ) ureg-property_fs_depth_layout); } + if (ureg-property_vs_window_space_position) { + assert(ureg-processor == TGSI_PROCESSOR_VERTEX); + + emit_property(ureg, +TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION, +ureg-property_vs_window_space_position); + } + if (ureg-processor == TGSI_PROCESSOR_VERTEX) { for (i = 0; i UREG_MAX_INPUT; i++) { if (ureg-vs_inputs[i/32] (1 (i%32))) { diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h b/src/gallium/auxiliary/tgsi/tgsi_ureg.h index a0a50b7..28edea6 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h @@ -184,6 +184,10 @@ void ureg_property_fs_depth_layout(struct ureg_program *ureg, unsigned fs_depth_layout); +void +ureg_property_vs_window_space_position(struct ureg_program *ureg, + boolean vs_window_space_position); + /*** * Build shader declarations: diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 9500b9d..2ca3c3b 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -2848,6 +2848,15 @@ input primitive. Each invocation will have a different TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to be 1. +VS_WINDOW_SPACE_POSITION + +If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output +is assumed to contain window space coordinates. +Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is +directly taken from the 4-th component of the shader output. +Naturally, clipping is not performed on window coordinates either. +The effect of this property is undefined if a geometry or tessellation shader +are in use. Texture Sampling and Texture Formats diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index
[Mesa-dev] [PATCH] draw/llvm: reduce memory usage
Lets make draw_get_option_use_llvm function available unconditionally and use it to avoid useless allocations when LLVM paths are active. TGSI machine is never used when we're using LLVM. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_context.c | 6 ++ src/gallium/auxiliary/draw/draw_context.h | 2 -- src/gallium/auxiliary/draw/draw_gs.c | 26 -- src/gallium/auxiliary/draw/draw_vs.c | 11 +++ src/gallium/auxiliary/draw/draw_vs_exec.c | 2 ++ 5 files changed, 27 insertions(+), 20 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 0a67879..ddc305b 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -68,6 +68,12 @@ draw_get_option_use_llvm(void) } return value; } +#else +boolean +draw_get_option_use_llvm(void) +{ + return FALSE; +} #endif diff --git a/src/gallium/auxiliary/draw/draw_context.h b/src/gallium/auxiliary/draw/draw_context.h index f114f50..48549fe 100644 --- a/src/gallium/auxiliary/draw/draw_context.h +++ b/src/gallium/auxiliary/draw/draw_context.h @@ -288,9 +288,7 @@ draw_get_shader_param(unsigned shader, enum pipe_shader_cap param); int draw_get_shader_param_no_llvm(unsigned shader, enum pipe_shader_cap param); -#ifdef HAVE_LLVM boolean draw_get_option_use_llvm(void); -#endif #endif /* DRAW_CONTEXT_H */ diff --git a/src/gallium/auxiliary/draw/draw_gs.c b/src/gallium/auxiliary/draw/draw_gs.c index 7de5e03..5e503ff 100644 --- a/src/gallium/auxiliary/draw/draw_gs.c +++ b/src/gallium/auxiliary/draw/draw_gs.c @@ -674,11 +674,7 @@ int draw_geometry_shader_run(struct draw_geometry_shader *shader, void draw_geometry_shader_prepare(struct draw_geometry_shader *shader, struct draw_context *draw) { -#ifdef HAVE_LLVM boolean use_llvm = draw_get_option_use_llvm(); -#else - boolean use_llvm = FALSE; -#endif if (!use_llvm shader shader-machine-Tokens != shader-state.tokens) { tgsi_exec_machine_bind_shader(shader-machine, shader-state.tokens, @@ -690,16 +686,18 @@ void draw_geometry_shader_prepare(struct draw_geometry_shader *shader, boolean draw_gs_init( struct draw_context *draw ) { - draw-gs.tgsi.machine = tgsi_exec_machine_create(); - if (!draw-gs.tgsi.machine) - return FALSE; - - draw-gs.tgsi.machine-Primitives = align_malloc( - MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector), 16); - if (!draw-gs.tgsi.machine-Primitives) - return FALSE; - memset(draw-gs.tgsi.machine-Primitives, 0, - MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector)); + if (!draw_get_option_use_llvm()) { + draw-gs.tgsi.machine = tgsi_exec_machine_create(); + if (!draw-gs.tgsi.machine) + return FALSE; + + draw-gs.tgsi.machine-Primitives = align_malloc( + MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector), 16); + if (!draw-gs.tgsi.machine-Primitives) + return FALSE; + memset(draw-gs.tgsi.machine-Primitives, 0, + MAX_PRIMITIVES * sizeof(struct tgsi_exec_vector)); + } return TRUE; } diff --git a/src/gallium/auxiliary/draw/draw_vs.c b/src/gallium/auxiliary/draw/draw_vs.c index 55cbeb9..8bb9a7f 100644 --- a/src/gallium/auxiliary/draw/draw_vs.c +++ b/src/gallium/auxiliary/draw/draw_vs.c @@ -149,9 +149,11 @@ draw_vs_init( struct draw_context *draw ) { draw-dump_vs = debug_get_option_gallium_dump_vs(); - draw-vs.tgsi.machine = tgsi_exec_machine_create(); - if (!draw-vs.tgsi.machine) - return FALSE; + if (!draw_get_option_use_llvm()) { + draw-vs.tgsi.machine = tgsi_exec_machine_create(); + if (!draw-vs.tgsi.machine) + return FALSE; + } draw-vs.emit_cache = translate_cache_create(); if (!draw-vs.emit_cache) @@ -173,7 +175,8 @@ draw_vs_destroy( struct draw_context *draw ) if (draw-vs.emit_cache) translate_cache_destroy(draw-vs.emit_cache); - tgsi_exec_machine_destroy(draw-vs.tgsi.machine); + if (draw_get_option_use_llvm()) + tgsi_exec_machine_destroy(draw-vs.tgsi.machine); } diff --git a/src/gallium/auxiliary/draw/draw_vs_exec.c b/src/gallium/auxiliary/draw/draw_vs_exec.c index 133b116..6a18d8c 100644 --- a/src/gallium/auxiliary/draw/draw_vs_exec.c +++ b/src/gallium/auxiliary/draw/draw_vs_exec.c @@ -63,6 +63,7 @@ vs_exec_prepare( struct draw_vertex_shader *shader, { struct exec_vertex_shader *evs = exec_vertex_shader(shader); + debug_assert(!draw_get_option_use_llvm()); /* Specify the vertex program to interpret/execute. * Avoid rebinding when possible. */ @@ -96,6 +97,7 @@ vs_exec_run_linear( struct draw_vertex_shader *shader, unsigned slot; boolean clamp_vertex_color = shader-draw-rasterizer-clamp_vertex_color; + debug_assert(!draw_get_option_use_llvm()); tgsi_exec_set_constant_buffers(machine
Re: [Mesa-dev] [PATCH] draw/llvm: reduce memory usage
- tgsi_exec_machine_destroy(draw-vs.tgsi.machine); + if (draw_get_option_use_llvm()) + tgsi_exec_machine_destroy(draw-vs.tgsi.machine); That part should have used !draw_get_option_use_llvm() ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] XA composite and perf improvements
- Original Message - From: Rob Clark robcl...@freedesktop.org While still more of a stop-gap solution (until glamor) for freedreno, with these few relatively simple changes I get a pretty big performance boost (~40%) for xf86-video-freedreno. That looks great to me. Nice work. But to be honest the only thing I remember about this code is that it has been written in C and I'm probably like 40% certain of that. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] llvmpipe: Simplify vertex and geometry shaders.
Actually Jose I think we'll need to revert this. That's because draw always assumed that if geometry shader is present it means that the geometry shader is present, but that is not true anymore. That's because d3d10 creates a null geometry shader to pass around the stream output. Before the patch the draw geometry shader was only created if tokens weren't null, with your change it always is and it's causing lots and lots of crashes because various parts are trying to execute a null geometry shader. Although I agree it'd be nice if we could handle it, I don't see a trivial way of fixing it. z - Original Message - The series looks great to me. - Original Message - From: José Fonseca jfons...@vmware.com Eliminate lp_vertex_shader, as it added nothing over draw_vertex_shader. Simplify lp_geometry_shader, as most of the incoming state is unneeded. (We could also just use draw_geometry_shader if we were willing to peek inside the structure.) --- src/gallium/drivers/llvmpipe/lp_context.h | 4 +-- src/gallium/drivers/llvmpipe/lp_draw_arrays.c | 8 ++--- src/gallium/drivers/llvmpipe/lp_state.h | 13 ++-- src/gallium/drivers/llvmpipe/lp_state_gs.c| 32 +++ src/gallium/drivers/llvmpipe/lp_state_vs.c| 46 +++ 5 files changed, 33 insertions(+), 70 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_context.h b/src/gallium/drivers/llvmpipe/lp_context.h index 05cdfe5..ee8033c 100644 --- a/src/gallium/drivers/llvmpipe/lp_context.h +++ b/src/gallium/drivers/llvmpipe/lp_context.h @@ -46,8 +46,8 @@ struct llvmpipe_vbuf_render; struct draw_context; struct draw_stage; +struct draw_vertex_shader; struct lp_fragment_shader; -struct lp_vertex_shader; struct lp_blend_state; struct lp_setup_context; struct lp_setup_variant; @@ -63,7 +63,7 @@ struct llvmpipe_context { const struct pipe_depth_stencil_alpha_state *depth_stencil; const struct pipe_rasterizer_state *rasterizer; struct lp_fragment_shader *fs; - const struct lp_vertex_shader *vs; + struct draw_vertex_shader *vs; const struct lp_geometry_shader *gs; const struct lp_velems_state *velems; const struct lp_so_state *so; diff --git a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c index 3df0a5c..99e6d19 100644 --- a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c +++ b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c @@ -112,11 +112,11 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) llvmpipe_prepare_geometry_sampling(lp, lp-num_sampler_views[PIPE_SHADER_GEOMETRY], lp-sampler_views[PIPE_SHADER_GEOMETRY]); - if (lp-gs !lp-gs-shader.tokens) { + if (lp-gs lp-gs-no_tokens) { /* we have an empty geometry shader with stream output, so attach the stream output info to the current vertex shader */ if (lp-vs) { - draw_vs_attach_so(lp-vs-draw_data, lp-gs-shader.stream_output); + draw_vs_attach_so(lp-vs, lp-gs-stream_output); } } draw_collect_pipeline_statistics(draw, @@ -136,11 +136,11 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) } draw_set_mapped_so_targets(draw, 0, NULL); - if (lp-gs !lp-gs-shader.tokens) { + if (lp-gs lp-gs-no_tokens) { /* we have attached stream output to the vs for rendering, now lets reset it */ if (lp-vs) { - draw_vs_reset_so(lp-vs-draw_data); + draw_vs_reset_so(lp-vs); } } diff --git a/src/gallium/drivers/llvmpipe/lp_state.h b/src/gallium/drivers/llvmpipe/lp_state.h index 8635cf1..2da6caa 100644 --- a/src/gallium/drivers/llvmpipe/lp_state.h +++ b/src/gallium/drivers/llvmpipe/lp_state.h @@ -65,17 +65,10 @@ struct llvmpipe_context; -/** Subclass of pipe_shader_state */ -struct lp_vertex_shader -{ - struct pipe_shader_state shader; - struct draw_vertex_shader *draw_data; -}; - -/** Subclass of pipe_shader_state */ struct lp_geometry_shader { - struct pipe_shader_state shader; - struct draw_geometry_shader *draw_data; + boolean no_tokens; + struct pipe_stream_output_info stream_output; + struct draw_geometry_shader *dgs; }; /** Vertex element state */ diff --git a/src/gallium/drivers/llvmpipe/lp_state_gs.c b/src/gallium/drivers/llvmpipe/lp_state_gs.c index 74cf992..c94afed 100644 --- a/src/gallium/drivers/llvmpipe/lp_state_gs.c +++ b/src/gallium/drivers/llvmpipe/lp_state_gs.c @@ -48,7 +48,7 @@ llvmpipe_create_gs_state(struct pipe_context *pipe, state = CALLOC_STRUCT(lp_geometry_shader); if (state == NULL ) - goto fail; + goto no_state;
Re: [Mesa-dev] [PATCH 2/2] llvmpipe: Simplify vertex and geometry shaders.
I see the crashes you're referring to. I don't quite understand why though: concerning the geometry shader, other than cosmetic changes, in theory I should just have replaced a null/non-null `tokens` pointer with a boolean `no_tokens`, though obviously I missed something. Yea, you missed the entire draw pipeline because you replaced: if (templ-tokens) { ... state-draw_data = draw_create_geometry_shader(llvmpipe-draw, templ); } with unconditional: state-dgs = draw_create_geometry_shader(llvmpipe-draw, templ); i.e. draw gs is /always/ created whether tokens are there or not. So the draw_bind_geometry_shader will always bind gs's with null tokens. And that's what draw can't handle. I think that replacing that with: if (!state-no_tokens) { state-dgs = draw_create_geometry_shader(...); ... } should work. I should also had broken this in two separate changes: vs portion, and gs portion. vs's are fine because they're never created with null tokens. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] draw/gs: reduce the size of the gs output buffer
We used to overallocate the output buffer sometimes running out of memory with applications rendering large geometries. The actual maximum number of vertices out is simply the maximum number of primitives in (number of gs invocations) multiplied by the maximum number of output vertices per gs input primitive (i.e. gs invocation). Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_gs.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_gs.c b/src/gallium/auxiliary/draw/draw_gs.c index 97e8a90..7de5e03 100644 --- a/src/gallium/auxiliary/draw/draw_gs.c +++ b/src/gallium/auxiliary/draw/draw_gs.c @@ -552,6 +552,10 @@ int draw_geometry_shader_run(struct draw_geometry_shader *shader, u_decomposed_prims_for_vertices(shader-output_primitive, shader-max_output_vertices) * num_in_primitives; + /* we allocate exactly one extra vertex per primitive to allow the GS to emit +* overflown vertices into some area where they won't harm anyone */ + unsigned total_verts_per_buffer = shader-primitive_boundary * + num_in_primitives; //Assume at least one primitive max_out_prims = MAX2(max_out_prims, 1); @@ -559,23 +563,25 @@ int draw_geometry_shader_run(struct draw_geometry_shader *shader, output_verts-vertex_size = vertex_size; output_verts-stride = output_verts-vertex_size; - /* we allocate exactly one extra vertex per primitive to allow the GS to emit -* overflown vertices into some area where they won't harm anyone */ output_verts-verts = (struct vertex_header *)MALLOC(output_verts-vertex_size * - max_out_prims * - shader-primitive_boundary); + total_verts_per_buffer); + debug_assert(output_verts-verts); #if 0 debug_printf(%s count = %d (in prims # = %d)\n, __FUNCTION__, num_input_verts, num_in_primitives); debug_printf(\tlinear = %d, prim_info-count = %d\n, input_prim-linear, input_prim-count); - debug_printf(\tprim pipe = %s, shader in = %s, shader out = %s, max out = %d\n, + debug_printf(\tprim pipe = %s, shader in = %s, shader out = %s\n u_prim_name(input_prim-prim), u_prim_name(shader-input_primitive), -u_prim_name(shader-output_primitive), -shader-max_output_vertices); +u_prim_name(shader-output_primitive)); + debug_printf(\tmaxv = %d, maxp = %d, primitive_boundary = %d, +vertex_size = %d, tverts = %d\n, +shader-max_output_vertices, max_out_prims, +shader-primitive_boundary, output_verts-vertex_size, +total_verts_per_buffer); #endif shader-emitted_vertices = 0; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] draw/llvm: improve debugging output a bit
it's useful to know what the llvmbuildstore arguments are going to be before executing it because it can crash and make sure to print out the inputs only if we're not generating a gs because it fetches inputs differently. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_llvm.c | 2 +- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 53d13f3..b9f8bb9 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -939,11 +939,11 @@ store_aos_array(struct gallivm_state *gallivm, LLVMValueRef id_ptr = draw_jit_header_id(gallivm, io_ptrs[i]); val = LLVMBuildExtractElement(builder, cliptmp, linear_inds[i], ); val = adjust_mask(gallivm, val); - LLVMBuildStore(builder, val, id_ptr); #if DEBUG_STORE lp_build_printf(gallivm, io = %p, index %d, clipmask = %x\n, io_ptrs[i], inds[i], val); #endif + LLVMBuildStore(builder, val, id_ptr); } } diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index d2cb0a0..8791168 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -3569,7 +3569,8 @@ static void emit_prologue(struct lp_build_tgsi_context * bld_base) if (DEBUG_EXECUTION) { lp_build_printf(gallivm, \n); emit_dump_file(bld, TGSI_FILE_CONSTANT); - emit_dump_file(bld, TGSI_FILE_INPUT); + if (!bld-gs_iface) + emit_dump_file(bld, TGSI_FILE_INPUT); } } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] llvmpipe: Simplify vertex and geometry shaders.
The series looks great to me. - Original Message - From: José Fonseca jfons...@vmware.com Eliminate lp_vertex_shader, as it added nothing over draw_vertex_shader. Simplify lp_geometry_shader, as most of the incoming state is unneeded. (We could also just use draw_geometry_shader if we were willing to peek inside the structure.) --- src/gallium/drivers/llvmpipe/lp_context.h | 4 +-- src/gallium/drivers/llvmpipe/lp_draw_arrays.c | 8 ++--- src/gallium/drivers/llvmpipe/lp_state.h | 13 ++-- src/gallium/drivers/llvmpipe/lp_state_gs.c| 32 +++ src/gallium/drivers/llvmpipe/lp_state_vs.c| 46 +++ 5 files changed, 33 insertions(+), 70 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_context.h b/src/gallium/drivers/llvmpipe/lp_context.h index 05cdfe5..ee8033c 100644 --- a/src/gallium/drivers/llvmpipe/lp_context.h +++ b/src/gallium/drivers/llvmpipe/lp_context.h @@ -46,8 +46,8 @@ struct llvmpipe_vbuf_render; struct draw_context; struct draw_stage; +struct draw_vertex_shader; struct lp_fragment_shader; -struct lp_vertex_shader; struct lp_blend_state; struct lp_setup_context; struct lp_setup_variant; @@ -63,7 +63,7 @@ struct llvmpipe_context { const struct pipe_depth_stencil_alpha_state *depth_stencil; const struct pipe_rasterizer_state *rasterizer; struct lp_fragment_shader *fs; - const struct lp_vertex_shader *vs; + struct draw_vertex_shader *vs; const struct lp_geometry_shader *gs; const struct lp_velems_state *velems; const struct lp_so_state *so; diff --git a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c index 3df0a5c..99e6d19 100644 --- a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c +++ b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c @@ -112,11 +112,11 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) llvmpipe_prepare_geometry_sampling(lp, lp-num_sampler_views[PIPE_SHADER_GEOMETRY], lp-sampler_views[PIPE_SHADER_GEOMETRY]); - if (lp-gs !lp-gs-shader.tokens) { + if (lp-gs lp-gs-no_tokens) { /* we have an empty geometry shader with stream output, so attach the stream output info to the current vertex shader */ if (lp-vs) { - draw_vs_attach_so(lp-vs-draw_data, lp-gs-shader.stream_output); + draw_vs_attach_so(lp-vs, lp-gs-stream_output); } } draw_collect_pipeline_statistics(draw, @@ -136,11 +136,11 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) } draw_set_mapped_so_targets(draw, 0, NULL); - if (lp-gs !lp-gs-shader.tokens) { + if (lp-gs lp-gs-no_tokens) { /* we have attached stream output to the vs for rendering, now lets reset it */ if (lp-vs) { - draw_vs_reset_so(lp-vs-draw_data); + draw_vs_reset_so(lp-vs); } } diff --git a/src/gallium/drivers/llvmpipe/lp_state.h b/src/gallium/drivers/llvmpipe/lp_state.h index 8635cf1..2da6caa 100644 --- a/src/gallium/drivers/llvmpipe/lp_state.h +++ b/src/gallium/drivers/llvmpipe/lp_state.h @@ -65,17 +65,10 @@ struct llvmpipe_context; -/** Subclass of pipe_shader_state */ -struct lp_vertex_shader -{ - struct pipe_shader_state shader; - struct draw_vertex_shader *draw_data; -}; - -/** Subclass of pipe_shader_state */ struct lp_geometry_shader { - struct pipe_shader_state shader; - struct draw_geometry_shader *draw_data; + boolean no_tokens; + struct pipe_stream_output_info stream_output; + struct draw_geometry_shader *dgs; }; /** Vertex element state */ diff --git a/src/gallium/drivers/llvmpipe/lp_state_gs.c b/src/gallium/drivers/llvmpipe/lp_state_gs.c index 74cf992..c94afed 100644 --- a/src/gallium/drivers/llvmpipe/lp_state_gs.c +++ b/src/gallium/drivers/llvmpipe/lp_state_gs.c @@ -48,7 +48,7 @@ llvmpipe_create_gs_state(struct pipe_context *pipe, state = CALLOC_STRUCT(lp_geometry_shader); if (state == NULL ) - goto fail; + goto no_state; /* debug */ if (LP_DEBUG DEBUG_TGSI) { @@ -57,26 +57,19 @@ llvmpipe_create_gs_state(struct pipe_context *pipe, } /* copy stream output info */ - state-shader = *templ; - if (templ-tokens) { - /* copy shader tokens, the ones passed in will go away. */ - state-shader.tokens = tgsi_dup_tokens(templ-tokens); - if (state-shader.tokens == NULL) - goto fail; - - state-draw_data = draw_create_geometry_shader(llvmpipe-draw, templ); - if (state-draw_data == NULL) - goto fail; + state-no_tokens = !templ-tokens; + memcpy(state-stream_output, templ-stream_output, sizeof state-stream_output); + + state-dgs = draw_create_geometry_shader(llvmpipe-draw, templ); + if
Re: [Mesa-dev] [PATCH] gallivm: fix no-op n:n lp_build_resize()
Looks good to me. z - Original Message - From: Roland Scheidegger srol...@vmware.com This can get called in some circumstances if both src type and dst type have same width (seen with float32-unorm32). While this particular case was bogus anyway let's just fix that as it can work trivially (due to the way it was called it actually worked anyway apart from the assert). --- src/gallium/auxiliary/gallivm/lp_bld_pack.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c b/src/gallium/auxiliary/gallivm/lp_bld_pack.c index 22a4f5a8..2b0a1fb 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c @@ -710,9 +710,6 @@ lp_build_resize(struct gallivm_state *gallivm, /* We must not loose or gain channels. Only precision */ assert(src_type.length * num_srcs == dst_type.length * num_dsts); - /* We don't support M:N conversion, only 1:N, M:1, or 1:1 */ - assert(num_srcs == 1 || num_dsts == 1); - assert(src_type.length = LP_MAX_VECTOR_LENGTH); assert(dst_type.length = LP_MAX_VECTOR_LENGTH); assert(num_srcs = LP_MAX_VECTOR_LENGTH); @@ -723,6 +720,7 @@ lp_build_resize(struct gallivm_state *gallivm, * Truncate bit width. */ + /* Conversion must be M:1 */ assert(num_dsts == 1); if (src_type.width * src_type.length == dst_type.width * dst_type.length) { @@ -775,6 +773,7 @@ lp_build_resize(struct gallivm_state *gallivm, * Expand bit width. */ + /* Conversion must be 1:N */ assert(num_srcs == 1); if (src_type.width * src_type.length == dst_type.width * dst_type.length) { @@ -813,10 +812,11 @@ lp_build_resize(struct gallivm_state *gallivm, * No-op */ - assert(num_srcs == 1); - assert(num_dsts == 1); + /* Conversion must be N:N */ + assert(num_srcs == num_dsts); - tmp[0] = src[0]; + for(i = 0; i num_dsts; ++i) + tmp[i] = src[i]; } for(i = 0; i num_dsts; ++i) -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium: allow setting of the internal stream output offset
(This version includes comments from Roland.) D3D10 allows setting of the internal offset of a buffer, which is in general only incremented via actual stream output writes. By allowing setting of the internal offset draw_auto is capable of rendering from buffers which have not been actually streamed out to. Our interface didn't allow. This change functionally shouldn't make any difference to OpenGL where instead of an append_bitmask you just get a real array where -1 means append (like in D3D) and 0 means do not append. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/cso_cache/cso_context.c | 13 - src/gallium/auxiliary/cso_cache/cso_context.h | 2 +- src/gallium/auxiliary/draw/draw_context.h | 3 +-- src/gallium/auxiliary/draw/draw_pt.c | 8 +--- src/gallium/auxiliary/draw/draw_pt_so_emit.c | 3 +-- src/gallium/auxiliary/hud/hud_context.c | 2 +- src/gallium/auxiliary/postprocess/pp_run.c| 2 +- src/gallium/auxiliary/util/u_blit.c | 2 +- src/gallium/auxiliary/util/u_blitter.c| 13 + src/gallium/auxiliary/util/u_gen_mipmap.c | 2 +- src/gallium/docs/source/context.rst | 9 + src/gallium/drivers/galahad/glhd_context.c| 4 ++-- src/gallium/drivers/ilo/ilo_state.c | 8 ++-- src/gallium/drivers/llvmpipe/lp_state_so.c| 12 ++-- src/gallium/drivers/noop/noop_state.c | 2 +- src/gallium/drivers/nouveau/nv50/nv50_state.c | 7 --- src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 7 --- src/gallium/drivers/radeon/r600_pipe_common.h | 2 +- src/gallium/drivers/radeon/r600_streamout.c | 5 - src/gallium/drivers/radeonsi/si_descriptors.c | 4 ++-- src/gallium/drivers/softpipe/sp_state_so.c| 2 +- src/gallium/drivers/trace/tr_context.c| 6 +++--- src/gallium/include/pipe/p_context.h | 2 +- src/gallium/tools/trace/dump_state.py | 4 ++-- src/mesa/state_tracker/st_cb_bitmap.c | 2 +- src/mesa/state_tracker/st_cb_clear.c | 2 +- src/mesa/state_tracker/st_cb_drawpixels.c | 2 +- src/mesa/state_tracker/st_cb_drawtex.c| 2 +- src/mesa/state_tracker/st_cb_xformfb.c| 20 +--- 29 files changed, 88 insertions(+), 64 deletions(-) diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c b/src/gallium/auxiliary/cso_cache/cso_context.c index 2dcf01d..9146684 100644 --- a/src/gallium/auxiliary/cso_cache/cso_context.c +++ b/src/gallium/auxiliary/cso_cache/cso_context.c @@ -332,7 +332,7 @@ void cso_release_all( struct cso_context *ctx ) ctx-pipe-bind_vertex_elements_state( ctx-pipe, NULL ); if (ctx-pipe-set_stream_output_targets) - ctx-pipe-set_stream_output_targets(ctx-pipe, 0, NULL, 0); + ctx-pipe-set_stream_output_targets(ctx-pipe, 0, NULL, NULL); } /* free fragment sampler views */ @@ -1241,7 +1241,7 @@ void cso_set_stream_outputs(struct cso_context *ctx, unsigned num_targets, struct pipe_stream_output_target **targets, - unsigned append_bitmask) + const unsigned *offsets) { struct pipe_context *pipe = ctx-pipe; uint i; @@ -1266,7 +1266,7 @@ cso_set_stream_outputs(struct cso_context *ctx, } pipe-set_stream_output_targets(pipe, num_targets, targets, - append_bitmask); + offsets); ctx-nr_so_targets = num_targets; } @@ -1292,6 +1292,7 @@ cso_restore_stream_outputs(struct cso_context *ctx) { struct pipe_context *pipe = ctx-pipe; uint i; + unsigned offset[PIPE_MAX_SO_BUFFERS]; if (!ctx-has_streamout) { return; @@ -1302,19 +1303,21 @@ cso_restore_stream_outputs(struct cso_context *ctx) return; } + assert(ctx-nr_so_targets_saved = PIPE_MAX_SO_BUFFERS); for (i = 0; i ctx-nr_so_targets_saved; i++) { pipe_so_target_reference(ctx-so_targets[i], NULL); /* move the reference from one pointer to another */ ctx-so_targets[i] = ctx-so_targets_saved[i]; ctx-so_targets_saved[i] = NULL; + /* -1 means append */ + offset[i] = (unsigned)-1; } for (; i ctx-nr_so_targets; i++) { pipe_so_target_reference(ctx-so_targets[i], NULL); } - /* ~0 means append */ pipe-set_stream_output_targets(pipe, ctx-nr_so_targets_saved, - ctx-so_targets, ~0); + ctx-so_targets, offset); ctx-nr_so_targets = ctx-nr_so_targets_saved; ctx-nr_so_targets_saved = 0; diff --git a/src/gallium/auxiliary/cso_cache/cso_context.h b/src/gallium/auxiliary/cso_cache/cso_context.h index 822e2df..1aa9998 100644 --- a/src/gallium/auxiliary/cso_cache/cso_context.h +++ b/src/gallium/auxiliary/cso_cache/cso_context.h @@ -115,7 +115,7 @@ unsigned cso_get_aux_vertex_buffer_slot(struct cso_context *ctx); void
[Mesa-dev] [PATCH] gallium: allow setting of the internal stream output offset
D3D10 allows setting of the internal offset of a buffer, which is in general only incremented via actual stream output writes. By allowing setting of the internal offset draw_auto is capable of rendering from buffers which have not been actually streamed out to. Our interface didn't allow. This change functionally shouldn't make any difference to OpenGL where instead of an append_bitmask you just get a real array where -1 means append (like in D3D) and 0 means do not append. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/cso_cache/cso_context.c | 13 - src/gallium/auxiliary/cso_cache/cso_context.h | 2 +- src/gallium/auxiliary/draw/draw_pt.c | 8 +--- src/gallium/auxiliary/hud/hud_context.c | 2 +- src/gallium/auxiliary/postprocess/pp_run.c| 2 +- src/gallium/auxiliary/util/u_blit.c | 2 +- src/gallium/auxiliary/util/u_blitter.c| 13 + src/gallium/auxiliary/util/u_gen_mipmap.c | 2 +- src/gallium/docs/source/context.rst | 9 + src/gallium/drivers/galahad/glhd_context.c| 4 ++-- src/gallium/drivers/ilo/ilo_state.c | 8 ++-- src/gallium/drivers/llvmpipe/lp_state_so.c| 7 --- src/gallium/drivers/noop/noop_state.c | 2 +- src/gallium/drivers/nouveau/nv50/nv50_state.c | 7 --- src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 7 --- src/gallium/drivers/radeon/r600_pipe_common.h | 2 +- src/gallium/drivers/radeon/r600_streamout.c | 5 - src/gallium/drivers/radeonsi/si_descriptors.c | 4 ++-- src/gallium/drivers/softpipe/sp_state_so.c| 2 +- src/gallium/drivers/trace/tr_context.c| 6 +++--- src/gallium/include/pipe/p_context.h | 2 +- src/gallium/tools/trace/dump_state.py | 4 ++-- src/mesa/state_tracker/st_cb_bitmap.c | 2 +- src/mesa/state_tracker/st_cb_clear.c | 2 +- src/mesa/state_tracker/st_cb_drawpixels.c | 2 +- src/mesa/state_tracker/st_cb_drawtex.c| 2 +- src/mesa/state_tracker/st_cb_xformfb.c| 20 +--- 27 files changed, 84 insertions(+), 57 deletions(-) diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c b/src/gallium/auxiliary/cso_cache/cso_context.c index 2dcf01d..9146684 100644 --- a/src/gallium/auxiliary/cso_cache/cso_context.c +++ b/src/gallium/auxiliary/cso_cache/cso_context.c @@ -332,7 +332,7 @@ void cso_release_all( struct cso_context *ctx ) ctx-pipe-bind_vertex_elements_state( ctx-pipe, NULL ); if (ctx-pipe-set_stream_output_targets) - ctx-pipe-set_stream_output_targets(ctx-pipe, 0, NULL, 0); + ctx-pipe-set_stream_output_targets(ctx-pipe, 0, NULL, NULL); } /* free fragment sampler views */ @@ -1241,7 +1241,7 @@ void cso_set_stream_outputs(struct cso_context *ctx, unsigned num_targets, struct pipe_stream_output_target **targets, - unsigned append_bitmask) + const unsigned *offsets) { struct pipe_context *pipe = ctx-pipe; uint i; @@ -1266,7 +1266,7 @@ cso_set_stream_outputs(struct cso_context *ctx, } pipe-set_stream_output_targets(pipe, num_targets, targets, - append_bitmask); + offsets); ctx-nr_so_targets = num_targets; } @@ -1292,6 +1292,7 @@ cso_restore_stream_outputs(struct cso_context *ctx) { struct pipe_context *pipe = ctx-pipe; uint i; + unsigned offset[PIPE_MAX_SO_BUFFERS]; if (!ctx-has_streamout) { return; @@ -1302,19 +1303,21 @@ cso_restore_stream_outputs(struct cso_context *ctx) return; } + assert(ctx-nr_so_targets_saved = PIPE_MAX_SO_BUFFERS); for (i = 0; i ctx-nr_so_targets_saved; i++) { pipe_so_target_reference(ctx-so_targets[i], NULL); /* move the reference from one pointer to another */ ctx-so_targets[i] = ctx-so_targets_saved[i]; ctx-so_targets_saved[i] = NULL; + /* -1 means append */ + offset[i] = (unsigned)-1; } for (; i ctx-nr_so_targets; i++) { pipe_so_target_reference(ctx-so_targets[i], NULL); } - /* ~0 means append */ pipe-set_stream_output_targets(pipe, ctx-nr_so_targets_saved, - ctx-so_targets, ~0); + ctx-so_targets, offset); ctx-nr_so_targets = ctx-nr_so_targets_saved; ctx-nr_so_targets_saved = 0; diff --git a/src/gallium/auxiliary/cso_cache/cso_context.h b/src/gallium/auxiliary/cso_cache/cso_context.h index 822e2df..1aa9998 100644 --- a/src/gallium/auxiliary/cso_cache/cso_context.h +++ b/src/gallium/auxiliary/cso_cache/cso_context.h @@ -115,7 +115,7 @@ unsigned cso_get_aux_vertex_buffer_slot(struct cso_context *ctx); void cso_set_stream_outputs(struct cso_context *ctx, unsigned num_targets, struct pipe_stream_output_target **targets
[Mesa-dev] [PATCH] draw/llvm: fix generation of the VS with GS present
draw_current_shader_* functions return a final output when considering both the geometry shader and the vertex shader. But when code generating vertex shader we can not be using output slots from the geometry shader because, obviously, those can be completely different. This fixes a number of very non-obvious crashes. A side-effect of this bug was that sometimes the vertex shading code could save some random outputs as position/clip when the geometry shader was writing them and vertex shader had different outputs at those slots (sometimes writing garbage and sometimes something correct). --- src/gallium/auxiliary/draw/draw_llvm.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 0bbb680..53d13f3 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1104,7 +1104,7 @@ generate_viewport(struct draw_llvm_variant *variant, int i; struct gallivm_state *gallivm = variant-gallivm; struct lp_type f32_type = vs_type; - const unsigned pos = draw_current_shader_position_output(variant-llvm-draw); + const unsigned pos = variant-llvm-draw-vs.position_output; LLVMTypeRef vs_type_llvm = lp_build_vec_type(gallivm, vs_type); LLVMValueRef out3 = LLVMBuildLoad(builder, outputs[pos][3], ); /*w0 w1 .. wn*/ LLVMValueRef const1 = lp_build_const_vec(gallivm, f32_type, 1.0); /*1.0 1.0 1.0 1.0*/ @@ -1173,14 +1173,14 @@ generate_clipmask(struct draw_llvm *llvm, LLVMValueRef plane1, planes, plane_ptr, sum; struct lp_type f32_type = vs_type; struct lp_type i32_type = lp_int_type(vs_type); - const unsigned pos = draw_current_shader_position_output(llvm-draw); - const unsigned cv = draw_current_shader_clipvertex_output(llvm-draw); + const unsigned pos = llvm-draw-vs.position_output; + const unsigned cv = llvm-draw-vs.clipvertex_output; int num_written_clipdistance = llvm-draw-vs.vertex_shader-info.num_written_clipdistance; bool have_cd = false; unsigned cd[2]; - cd[0] = draw_current_shader_clipdistance_output(llvm-draw, 0); - cd[1] = draw_current_shader_clipdistance_output(llvm-draw, 1); + cd[0] = llvm-draw-vs.clipdistance_output[0]; + cd[1] = llvm-draw-vs.clipdistance_output[1]; if (cd[0] != pos || cd[1] != pos) have_cd = true; @@ -1551,8 +1551,8 @@ draw_llvm_generate(struct draw_llvm *llvm, struct draw_llvm_variant *variant, key-clip_z || key-clip_user); LLVMValueRef variant_func; - const unsigned pos = draw_current_shader_position_output(llvm-draw); - const unsigned cv = draw_current_shader_clipvertex_output(llvm-draw); + const unsigned pos = llvm-draw-vs.position_output; + const unsigned cv = llvm-draw-vs.clipvertex_output; boolean have_clipdist = FALSE; struct lp_bld_tgsi_system_values system_values; -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] translate: fix buffer overflows
Because in draw we always inject position at slot 0 whenever fragment shader would take the maximum number of inputs (32) it meant that we had PIPE_MAX_ATTRIBS + 1 slots to translate, which meant that we were crashing with fragment shaders that took the maximum number of attributes as inputs. The actual max number of attributes we need to translate thus is PIPE_MAX_ATTRIBS + 1. --- src/gallium/auxiliary/translate/translate_generic.c | 2 +- src/gallium/auxiliary/translate/translate_sse.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/translate/translate_generic.c b/src/gallium/auxiliary/translate/translate_generic.c index 5ffce32..82b4d00 100644 --- a/src/gallium/auxiliary/translate/translate_generic.c +++ b/src/gallium/auxiliary/translate/translate_generic.c @@ -73,7 +73,7 @@ struct translate_generic { */ int copy_size; - } attrib[PIPE_MAX_ATTRIBS]; + } attrib[PIPE_MAX_ATTRIBS + 1]; unsigned nr_attrib; }; diff --git a/src/gallium/auxiliary/translate/translate_sse.c b/src/gallium/auxiliary/translate/translate_sse.c index b6bc222..1833d8a 100644 --- a/src/gallium/auxiliary/translate/translate_sse.c +++ b/src/gallium/auxiliary/translate/translate_sse.c @@ -104,15 +104,15 @@ struct translate_sse int8_t reg_to_const[16]; int8_t const_to_reg[NUM_CONSTS]; - struct translate_buffer buffer[PIPE_MAX_ATTRIBS]; + struct translate_buffer buffer[PIPE_MAX_ATTRIBS + 1]; unsigned nr_buffers; /* Multiple buffer variants can map to a single buffer. */ - struct translate_buffer_variant buffer_variant[PIPE_MAX_ATTRIBS]; + struct translate_buffer_variant buffer_variant[PIPE_MAX_ATTRIBS + 1]; unsigned nr_buffer_variants; /* Multiple elements can map to a single buffer variant. */ - unsigned element_to_buffer_variant[PIPE_MAX_ATTRIBS]; + unsigned element_to_buffer_variant[PIPE_MAX_ATTRIBS + 1]; boolean use_instancing; unsigned instance_id; -- 1.9.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallivm: fix F2U opcode
Looks good. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Roland Scheidegger srol...@vmware.com Previously, we were really doing F2I. And also move it to generic section. (Note that for llvmpipe the code generated is definitely bad, due to lack of unsigned conversions with sse. I think though what llvm does (using scalar conversions to 64bit signed either with x87 fpu (32bit) or sse (64bit) including lots of domain changes is quite suboptimal, could do something like is_large = arg = 2^31 half_arg = 0.5 * arg small_c = fptoint(arg) large_c = fptoint(half_arg) 1 res = select(is_large, large_c, small_c) which should be much less instructions but that's something llvm should do itself.) This fixes piglit fs/vs-float-uint-conversion.shader_test (maybe more, needs GL 3.0 version override to run.) --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 42 ++-- 1 file changed, 22 insertions(+), 20 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c index caaeb01..b9546db 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c @@ -720,10 +720,23 @@ sub_emit( struct lp_build_tgsi_context * bld_base, struct lp_build_emit_data * emit_data) { - emit_data-output[emit_data-chan] = LLVMBuildFSub( - bld_base-base.gallivm-builder, - emit_data-args[0], - emit_data-args[1], ); + emit_data-output[emit_data-chan] = + LLVMBuildFSub(bld_base-base.gallivm-builder, +emit_data-args[0], +emit_data-args[1], ); +} + +/* TGSI_OPCODE_F2U */ +static void +f2u_emit( + const struct lp_build_tgsi_action * action, + struct lp_build_tgsi_context * bld_base, + struct lp_build_emit_data * emit_data) +{ + emit_data-output[emit_data-chan] = + LLVMBuildFPToUI(bld_base-base.gallivm-builder, + emit_data-args[0], + bld_base-base.int_vec_type, ); } /* TGSI_OPCODE_U2F */ @@ -733,9 +746,10 @@ u2f_emit( struct lp_build_tgsi_context * bld_base, struct lp_build_emit_data * emit_data) { - emit_data-output[emit_data-chan] = LLVMBuildUIToFP(bld_base-base.gallivm-builder, - emit_data-args[0], - bld_base-base.vec_type, ); + emit_data-output[emit_data-chan] = + LLVMBuildUIToFP(bld_base-base.gallivm-builder, + emit_data-args[0], + bld_base-base.vec_type, ); } static void @@ -949,6 +963,7 @@ lp_set_default_actions(struct lp_build_tgsi_context * bld_base) bld_base-op_actions[TGSI_OPCODE_SUB].emit = sub_emit; bld_base-op_actions[TGSI_OPCODE_UARL].emit = mov_emit; + bld_base-op_actions[TGSI_OPCODE_F2U].emit = f2u_emit; bld_base-op_actions[TGSI_OPCODE_U2F].emit = u2f_emit; bld_base-op_actions[TGSI_OPCODE_UMAD].emit = umad_emit; bld_base-op_actions[TGSI_OPCODE_UMUL].emit = umul_emit; @@ -1128,18 +1143,6 @@ f2i_emit_cpu( emit_data-args[0]); } -/* TGSI_OPCODE_F2U (CPU Only) */ -static void -f2u_emit_cpu( - const struct lp_build_tgsi_action * action, - struct lp_build_tgsi_context * bld_base, - struct lp_build_emit_data * emit_data) -{ - /* FIXME: implement and use lp_build_utrunc() */ - emit_data-output[emit_data-chan] = lp_build_itrunc(bld_base-base, -emit_data-args[0]); -} - /* TGSI_OPCODE_FSET Helper (CPU Only) */ static void fset_emit_cpu( @@ -1832,7 +1835,6 @@ lp_set_default_actions_cpu( bld_base-op_actions[TGSI_OPCODE_DIV].emit = div_emit_cpu; bld_base-op_actions[TGSI_OPCODE_EX2].emit = ex2_emit_cpu; bld_base-op_actions[TGSI_OPCODE_F2I].emit = f2i_emit_cpu; - bld_base-op_actions[TGSI_OPCODE_F2U].emit = f2u_emit_cpu; bld_base-op_actions[TGSI_OPCODE_FLR].emit = flr_emit_cpu; bld_base-op_actions[TGSI_OPCODE_FSEQ].emit = fseq_emit_cpu; bld_base-op_actions[TGSI_OPCODE_FSGE].emit = fsge_emit_cpu; -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] gallivm: handle huge number of immediates
We only supported up to 256 immediates, which isn't enough. We had code which was allocating immediates as an allocated array, but it was always used along a statically backed array for performance reasons. This commit adds code to skip that performance optimization and always use just the dynamically allocated immediates if the number of them is too great. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/gallivm/lp_bld_tgsi.h | 2 +- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 112 2 files changed, 77 insertions(+), 37 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h index 1a93951..46f7d77 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h @@ -482,7 +482,7 @@ struct lp_build_tgsi_soa_context struct lp_exec_mask exec_mask; uint num_immediates; - + boolean use_immediates_array; }; void diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index 7c5de21..067e6af 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -1295,33 +1295,42 @@ emit_fetch_immediate( LLVMBuilderRef builder = gallivm-builder; LLVMValueRef res = NULL; - if (reg-Register.Indirect) { - LLVMValueRef indirect_index; - LLVMValueRef index_vec; /* index into the immediate register array */ + if (bld-use_immediates_array || reg-Register.Indirect) { LLVMValueRef imms_array; LLVMTypeRef fptr_type; - indirect_index = get_indirect_index(bld, - reg-Register.File, - reg-Register.Index, - reg-Indirect); - /* - * Unlike for other reg classes, adding pixel offsets is unnecessary - - * immediates are stored as full vectors (FIXME??? - might be better - * to store them the same as constants) but all elements are the same - * in any case. - */ - index_vec = get_soa_array_offsets(bld_base-uint_bld, -indirect_index, -swizzle, -FALSE); - /* cast imms_array pointer to float* */ fptr_type = LLVMPointerType(LLVMFloatTypeInContext(gallivm-context), 0); imms_array = LLVMBuildBitCast(builder, bld-imms_array, fptr_type, ); - /* Gather values from the immediate register array */ - res = build_gather(bld_base-base, imms_array, index_vec, NULL); + if (reg-Register.Indirect) { + LLVMValueRef indirect_index; + LLVMValueRef index_vec; /* index into the immediate register array */ + + indirect_index = get_indirect_index(bld, + reg-Register.File, + reg-Register.Index, + reg-Indirect); + /* + * Unlike for other reg classes, adding pixel offsets is unnecessary - + * immediates are stored as full vectors (FIXME??? - might be better + * to store them the same as constants) but all elements are the same + * in any case. + */ + index_vec = get_soa_array_offsets(bld_base-uint_bld, + indirect_index, + swizzle, + FALSE); + + /* Gather values from the immediate register array */ + res = build_gather(bld_base-base, imms_array, index_vec, NULL); + } else { + LLVMValueRef lindex = lp_build_const_int32(gallivm, +reg-Register.Index * 4 + swizzle); + LLVMValueRef imms_ptr = LLVMBuildGEP(builder, +bld-imms_array, lindex, 1, ); + res = LLVMBuildLoad(builder, imms_ptr, ); + } } else { res = bld-immediates[reg-Register.Index][swizzle]; @@ -2728,51 +2737,71 @@ void lp_emit_immediate_soa( { struct lp_build_tgsi_soa_context *bld = lp_soa_context(bld_base); struct gallivm_state * gallivm = bld_base-base.gallivm; - - /* simply copy the immediate values into the next immediates[] slot */ + LLVMValueRef imms[4]; unsigned i; const uint size = imm-Immediate.NrTokens - 1; assert(size = 4); - assert(bld-num_immediates LP_MAX_TGSI_IMMEDIATES); switch (imm-Immediate.DataType) { case TGSI_IMM_FLOAT32: for( i = 0; i size; ++i ) - bld-immediates[bld-num_immediates][i] = -lp_build_const_vec(gallivm, bld_base-base.type, imm-u[i].Float); + imms[i] = + lp_build_const_vec(gallivm, bld_base-base.type, imm-u[i].Float); break; case TGSI_IMM_UINT32
[Mesa-dev] [PATCH 2/3] gallivm: make sure analysis works with large number of immediates
We need to handle a lot more immediates and in order to do that we also switch from allocating this structure on the stack to allocating it on the heap. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c index 184790b..ce0598d 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_info.c @@ -47,7 +47,7 @@ struct analysis_context struct lp_tgsi_info *info; unsigned num_imms; - float imm[128][4]; + float imm[4096][4]; struct lp_tgsi_channel_info temp[32][4]; }; @@ -487,7 +487,7 @@ lp_build_tgsi_info(const struct tgsi_token *tokens, struct lp_tgsi_info *info) { struct tgsi_parse_context parse; - struct analysis_context ctx; + struct analysis_context *ctx; unsigned index; unsigned chan; @@ -495,8 +495,8 @@ lp_build_tgsi_info(const struct tgsi_token *tokens, tgsi_scan_shader(tokens, info-base); - memset(ctx, 0, sizeof ctx); - ctx.info = info; + ctx = CALLOC(1, sizeof(struct analysis_context)); + ctx-info = info; tgsi_parse_init(parse, tokens); @@ -518,7 +518,7 @@ lp_build_tgsi_info(const struct tgsi_token *tokens, goto finished; } -analyse_instruction(ctx, inst); +analyse_instruction(ctx, inst); } break; @@ -527,16 +527,16 @@ lp_build_tgsi_info(const struct tgsi_token *tokens, const unsigned size = parse.FullToken.FullImmediate.Immediate.NrTokens - 1; assert(size = 4); -if (ctx.num_imms Elements(ctx.imm)) { +if (ctx-num_imms Elements(ctx-imm)) { for (chan = 0; chan size; ++chan) { float value = parse.FullToken.FullImmediate.u[chan].Float; - ctx.imm[ctx.num_imms][chan] = value; + ctx-imm[ctx-num_imms][chan] = value; if (value 0.0f || value 1.0f) { info-unclamped_immediates = TRUE; } } - ++ctx.num_imms; + ++ctx-num_imms; } } break; @@ -551,6 +551,7 @@ lp_build_tgsi_info(const struct tgsi_token *tokens, finished: tgsi_parse_free(parse); + FREE(ctx); /* -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] tgsi/ureg: increase the number of immediates
ureg_program is allocated on the heap so we can just bump the number of immediates that it can handle. It's needed for d3d10. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index f06858e..f928f57 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -77,7 +77,7 @@ struct ureg_tokens { #define UREG_MAX_SYSTEM_VALUE PIPE_MAX_ATTRIBS #define UREG_MAX_OUTPUT PIPE_MAX_SHADER_OUTPUTS #define UREG_MAX_CONSTANT_RANGE 32 -#define UREG_MAX_IMMEDIATE 256 +#define UREG_MAX_IMMEDIATE 4096 #define UREG_MAX_ADDR 2 #define UREG_MAX_PRED 1 #define UREG_MAX_ARRAY_TEMPS 256 -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] tgsi/ureg: increase the number of immediates
Yes, they simply always behave as if they were accessed indirectly from our code, but llvm seems to be pretty good at moving all of those accesses to registers (aka. eliminating alloca's) if they're not actually indirectly indexed, so it all ends up pretty. z - Original Message - Am 05.02.2014 01:34, schrieb Zack Rusin: ureg_program is allocated on the heap so we can just bump the number of immediates that it can handle. It's needed for d3d10. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index f06858e..f928f57 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -77,7 +77,7 @@ struct ureg_tokens { #define UREG_MAX_SYSTEM_VALUE PIPE_MAX_ATTRIBS #define UREG_MAX_OUTPUT PIPE_MAX_SHADER_OUTPUTS #define UREG_MAX_CONSTANT_RANGE 32 -#define UREG_MAX_IMMEDIATE 256 +#define UREG_MAX_IMMEDIATE 4096 #define UREG_MAX_ADDR 2 #define UREG_MAX_PRED 1 #define UREG_MAX_ARRAY_TEMPS 256 Series looks good to me. llvm can still perform all optimizations on such immediates right? Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] gallivm: handle huge number of immediates
reasons. This commit adds code to skip that performance optimization and always use just the dynamically allocated immediates if the number of them is too great. So is there any limit on the number of immediates now? Technically not. Practically other parts of the code will max out and assert at anything greater than 4096 which is what sm4 defines as maximum for temps. So at least theoretically the gallivm code will just work if that limit is increased elsewhere. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallivm: allow large numbers of temporaries
The number of allowed temporaries increases almost with every iteration of an api. We used to support 128, then we started increasing and the newer api's support 4096+. So if we notice that the number of temporaries is larger than our statically allocated storage would allow we just treat them as indexable temporaries and allocate them as an array from the start. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index 9db41a9..7c5de21 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -2672,8 +2672,8 @@ lp_emit_declaration_soa( assert(last = bld-bld_base.info-file_max[decl-Declaration.File]); switch (decl-Declaration.File) { case TGSI_FILE_TEMPORARY: - assert(idx LP_MAX_TGSI_TEMPS); if (!(bld-indirect_files (1 TGSI_FILE_TEMPORARY))) { +assert(idx LP_MAX_TGSI_TEMPS); for (i = 0; i TGSI_NUM_CHANNELS; i++) bld-temps[idx][i] = lp_build_alloca(gallivm, vec_type, temp); } @@ -3621,6 +3621,15 @@ lp_build_tgsi_soa(struct gallivm_state *gallivm, bld.bld_base.info = info; bld.indirect_files = info-indirect_files; + /* +* If the number of temporaries is rather large then we just +* allocate them as an array right from the start and treat +* like indirect temporaries. +*/ + if (info-file_max[TGSI_FILE_TEMPORARY] = LP_MAX_TGSI_TEMPS) { + bld.indirect_files |= (1 TGSI_FILE_TEMPORARY); + } + bld.bld_base.soa = TRUE; bld.bld_base.emit_debug = emit_debug; bld.bld_base.emit_fetch_funcs[TGSI_FILE_CONSTANT] = emit_fetch_constant; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] d3d10: allow indexable temporaries as relative registers
Indexable temporaries are 2d (the index of the array and the index within the array) and can be used both as outputs, inputs and relative addressing registers. This fixes parsing of indexable temporaries and fixes their parsing in relative addressing. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/state_trackers/d3d10/ShaderParse.c | 14 ++ src/gallium/state_trackers/d3d10/ShaderParse.h | 2 +- src/gallium/state_trackers/d3d10/ShaderTGSI.c | 8 +++- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/src/gallium/state_trackers/d3d10/ShaderParse.c b/src/gallium/state_trackers/d3d10/ShaderParse.c index 38ec2fe..7cec385 100644 --- a/src/gallium/state_trackers/d3d10/ShaderParse.c +++ b/src/gallium/state_trackers/d3d10/ShaderParse.c @@ -207,13 +207,19 @@ parse_relative_operand(const unsigned **curr, assert(operand-type != D3D10_SB_OPERAND_TYPE_IMMEDIATE32); /* Index dimension. */ - assert(DECODE_D3D10_SB_OPERAND_INDEX_DIMENSION(**curr) == D3D10_SB_OPERAND_INDEX_1D); assert(DECODE_D3D10_SB_OPERAND_INDEX_REPRESENTATION(0, **curr) == D3D10_SB_OPERAND_INDEX_IMMEDIATE32); - (*curr)++; - - operand-index[0].imm = **curr; + if (DECODE_D3D10_SB_OPERAND_INDEX_DIMENSION(**curr) == D3D10_SB_OPERAND_INDEX_1D) { + (*curr)++; + operand-index[0].imm = **curr; + } else { + assert(DECODE_D3D10_SB_OPERAND_INDEX_DIMENSION(**curr) == D3D10_SB_OPERAND_INDEX_2D); + (*curr)++; + operand-index[0].imm = **curr; + (*curr)++; + operand-index[1].imm = **curr; + } (*curr)++; } diff --git a/src/gallium/state_trackers/d3d10/ShaderParse.h b/src/gallium/state_trackers/d3d10/ShaderParse.h index 64f177c..5971864 100644 --- a/src/gallium/state_trackers/d3d10/ShaderParse.h +++ b/src/gallium/state_trackers/d3d10/ShaderParse.h @@ -54,7 +54,7 @@ struct Shader_relative_index { struct Shader_relative_operand { D3D10_SB_OPERAND_TYPE type; - struct Shader_relative_index index[1]; + struct Shader_relative_index index[2]; D3D10_SB_4_COMPONENT_NAME comp; }; diff --git a/src/gallium/state_trackers/d3d10/ShaderTGSI.c b/src/gallium/state_trackers/d3d10/ShaderTGSI.c index 9fb6b1d..2e42b8b 100644 --- a/src/gallium/state_trackers/d3d10/ShaderTGSI.c +++ b/src/gallium/state_trackers/d3d10/ShaderTGSI.c @@ -637,9 +637,15 @@ translate_relative_operand(struct Shader_xlate *sx, reg = sx-prim_id; break; + case D3D10_SB_OPERAND_TYPE_INDEXABLE_TEMP: + assert(operand-index[1].imm SHADER_MAX_TEMPS); + + reg = ureg_src(sx-temps[sx-indexable_temp_offsets[operand-index[0].imm] + +operand-index[1].imm]); + break; + case D3D10_SB_OPERAND_TYPE_INPUT: case D3D10_SB_OPERAND_TYPE_OUTPUT: - case D3D10_SB_OPERAND_TYPE_INDEXABLE_TEMP: case D3D10_SB_OPERAND_TYPE_IMMEDIATE32: case D3D10_SB_OPERAND_TYPE_IMMEDIATE64: case D3D10_SB_OPERAND_TYPE_SAMPLER: -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] d3d10: allow indirect addressing on outputs
Outputs can have relative addressing. This adds basic support for it. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/state_trackers/d3d10/ShaderTGSI.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/src/gallium/state_trackers/d3d10/ShaderTGSI.c b/src/gallium/state_trackers/d3d10/ShaderTGSI.c index 2e42b8b..1cf9e0e 100644 --- a/src/gallium/state_trackers/d3d10/ShaderTGSI.c +++ b/src/gallium/state_trackers/d3d10/ShaderTGSI.c @@ -687,20 +687,26 @@ translate_operand(struct Shader_xlate *sx, case D3D10_SB_OPERAND_TYPE_OUTPUT: assert(operand-index_dim == 1); - assert(operand-index[0].index_rep == D3D10_SB_OPERAND_INDEX_IMMEDIATE32); assert(operand-index[0].imm SHADER_MAX_OUTPUTS); - if (!writemask) { - reg = sx-outputs[operand-index[0].imm].reg[0]; - } else { - unsigned i; - for (i = 0; i 4; ++i) { -unsigned mask = 1 i; -if ((writemask mask)) { - reg = sx-outputs[operand-index[0].imm].reg[i]; - break; + if (operand-index[0].index_rep == D3D10_SB_OPERAND_INDEX_IMMEDIATE32) { + if (!writemask) { +reg = sx-outputs[operand-index[0].imm].reg[0]; + } else { +unsigned i; +for (i = 0; i 4; ++i) { + unsigned mask = 1 i; + if ((writemask mask)) { + reg = sx-outputs[operand-index[0].imm].reg[i]; + break; + } } } + } else { + struct ureg_src addr = +translate_relative_operand(sx, operand-index[0].rel); + assert(operand-index[0].index_rep == D3D10_SB_OPERAND_INDEX_IMMEDIATE32_PLUS_RELATIVE); + reg = ureg_dst_indirect(sx-outputs[operand-index[0].imm].reg[0], addr); } break; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] d3d10: support 1d indirect addressing on inputs
we supported 2d indirect addressing (gs tests were using it) but not 1d indirect addressing (which can be used in vs and ps). This adds support for 1d indirect addressing. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/state_trackers/d3d10/ShaderTGSI.c | 26 ++ 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/src/gallium/state_trackers/d3d10/ShaderTGSI.c b/src/gallium/state_trackers/d3d10/ShaderTGSI.c index 1cf9e0e..76126c5 100644 --- a/src/gallium/state_trackers/d3d10/ShaderTGSI.c +++ b/src/gallium/state_trackers/d3d10/ShaderTGSI.c @@ -828,11 +828,29 @@ translate_src_operand(struct Shader_xlate *sx, switch (operand-base.type) { case D3D10_SB_OPERAND_TYPE_INPUT: if (operand-base.index_dim == 1) { - assert(operand-base.index[0].index_rep == -D3D10_SB_OPERAND_INDEX_IMMEDIATE32); - assert(operand-base.index[0].imm SHADER_MAX_INPUTS); + switch (operand-base.index[0].index_rep) { + case D3D10_SB_OPERAND_INDEX_IMMEDIATE32: +assert(operand-base.index[0].imm SHADER_MAX_INPUTS); +reg = sx-inputs[operand-base.index[0].imm].reg; +break; + case D3D10_SB_OPERAND_INDEX_RELATIVE: { +struct ureg_src tmp = + translate_relative_operand(sx, operand-base.index[0].rel); +reg = ureg_src_indirect(sx-inputs[0].reg, tmp); + } +break; + case D3D10_SB_OPERAND_INDEX_IMMEDIATE32_PLUS_RELATIVE: { +struct ureg_src tmp = + translate_relative_operand(sx, operand-base.index[0].rel); +reg = ureg_src_indirect(sx-inputs[operand-base.index[0].imm].reg, tmp); + } +break; + default: +/* XXX: Other index representations. + */ +LOG_UNSUPPORTED(TRUE); - reg = sx-inputs[operand-base.index[0].imm].reg; + } } else { assert(operand-base.index_dim == 2); assert(operand-base.index[1].imm SHADER_MAX_INPUTS); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallivm: fix opcode and function nesting
- Original Message - Am 28.01.2014 23:08, schrieb Zack Rusin: gallivm soa code supported only a single level of nesting for control flow opcodes (if, switch, loops...) but the d3d10 spec clearly states that those are nested within functions. To support nesting of conditionals inside functions we need to store the nesting data inside function contexts and keep a stack of those. Furthermore we make sure that if nesting for subroutines is deeper than 32 then we simply ignore all subsequent 'call' invocations. Hmm I thought nesting worked just fine, except for the fact that when using just one stack we'd have needed a much larger one. Wasn't that true? (Not arguing about using per-function stacks, just curious.) The issue is that d3d10 spec is very specific about the nesting requirement being per-subroutine. So just increasing those nesting levels wouldn't really work, because 63 levels in one subroutine and 65 in another doesn't necessarily equal code path that has 64 levels in two subroutines (even both have technically 64 levels) - we get away with it for conditionals but not for function calls. It's also worth noting that the patch handles overflows which whck explicitly tests for and we were just crashing on those. The overflow behavior is unidentified only for conditionals, subroutine calls above the level 32 /have/ to be ignored (again whck tests for it). + ctx-loop_stack[ctx-loop_stack_size].loop_block = ctx-loop_block; + ctx-loop_stack[ctx-loop_stack_size].cont_mask = mask-cont_mask; + ctx-loop_stack[ctx-loop_stack_size].break_mask = mask-break_mask; I am confused why some assignments use the variables from ctx and some from mask here. The masks are in general global (the 'call' opcode could have been inside switch'es, loops or/and conditionals) so the function contexts push/pop the global masks and need to operate on those. Things that are not masks are in general per-function-context, which means that we can just store them in function contexts. As mentioned inline, I don't quite get when the values from mask or ctx are used. This might well be correct as this is tricky stuff and the diff is difficult to understand. Otherwise this looks good to me. Would that also help when we'd switch to not always inline all functions? Yea, I think it would, but then we would need a lot of other changes (storing those masks in some struct inside some global object that each function can reference). But yea, this code isn't the cleanest but function calls, conditionals, loops and switches are inherently difficult in SoA mode so there's not a lot we can do. We need to store the nesting data inside something that resembles function context because d3d is very clear that that's what it wants so everything else we'll be a hack where we just try to imitate that behavior that's going to be uglier than this code. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallivm: fix opcode and function nesting
gallivm soa code supported only a single level of nesting for control flow opcodes (if, switch, loops...) but the d3d10 spec clearly states that those are nested within functions. To support nesting of conditionals inside functions we need to store the nesting data inside function contexts and keep a stack of those. Furthermore we make sure that if nesting for subroutines is deeper than 32 then we simply ignore all subsequent 'call' invocations. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/gallivm/lp_bld_tgsi.h | 72 ++--- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 377 2 files changed, 292 insertions(+), 157 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h index 4f988b8..839ab85 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h @@ -260,49 +260,51 @@ struct lp_exec_mask { LLVMTypeRef int_vec_type; - LLVMValueRef cond_stack[LP_MAX_TGSI_NESTING]; - int cond_stack_size; - LLVMValueRef cond_mask; - - /* keep track if break belongs to switch or loop */ - enum lp_exec_mask_break_type break_type_stack[LP_MAX_TGSI_NESTING]; - enum lp_exec_mask_break_type break_type; + LLVMValueRef exec_mask; - struct { - LLVMValueRef switch_val; - LLVMValueRef switch_mask; - LLVMValueRef switch_mask_default; - boolean switch_in_default; - unsigned switch_pc; - } switch_stack[LP_MAX_TGSI_NESTING]; - int switch_stack_size; - LLVMValueRef switch_val; + LLVMValueRef ret_mask; + LLVMValueRef cond_mask; LLVMValueRef switch_mask; /* current switch exec mask */ - LLVMValueRef switch_mask_default; /* reverse of switch mask used for default */ - boolean switch_in_default;/* if switch exec is currently in default */ - unsigned switch_pc; /* when used points to default or endswitch-1 */ - - LLVMBasicBlockRef loop_block; LLVMValueRef cont_mask; LLVMValueRef break_mask; - LLVMValueRef break_var; - struct { - LLVMBasicBlockRef loop_block; - LLVMValueRef cont_mask; - LLVMValueRef break_mask; - LLVMValueRef break_var; - } loop_stack[LP_MAX_TGSI_NESTING]; - int loop_stack_size; - LLVMValueRef ret_mask; - struct { + struct function_ctx { int pc; LLVMValueRef ret_mask; - } call_stack[LP_MAX_TGSI_NESTING]; - int call_stack_size; - LLVMValueRef exec_mask; - LLVMValueRef loop_limiter; + LLVMValueRef cond_stack[LP_MAX_TGSI_NESTING]; + int cond_stack_size; + + /* keep track if break belongs to switch or loop */ + enum lp_exec_mask_break_type break_type_stack[LP_MAX_TGSI_NESTING]; + enum lp_exec_mask_break_type break_type; + + struct { + LLVMValueRef switch_val; + LLVMValueRef switch_mask; + LLVMValueRef switch_mask_default; + boolean switch_in_default; + unsigned switch_pc; + } switch_stack[LP_MAX_TGSI_NESTING]; + int switch_stack_size; + LLVMValueRef switch_val; + LLVMValueRef switch_mask_default; /* reverse of switch mask used for default */ + boolean switch_in_default;/* if switch exec is currently in default */ + unsigned switch_pc; /* when used points to default or endswitch-1 */ + + LLVMValueRef loop_limiter; + LLVMBasicBlockRef loop_block; + LLVMValueRef break_var; + struct { + LLVMBasicBlockRef loop_block; + LLVMValueRef cont_mask; + LLVMValueRef break_mask; + LLVMValueRef break_var; + } loop_stack[LP_MAX_TGSI_NESTING]; + int loop_stack_size; + + } *function_stack; + int function_stack_size; }; struct lp_build_tgsi_inst_list diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index f01b50c..52e1b51 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -66,6 +66,10 @@ #include lp_bld_sample.h #include lp_bld_struct.h +/* SM 4.0 says that subroutines can nest 32 deep and + * we need one more for our main function */ +#define LP_MAX_NUM_FUNCS 33 + #define DUMP_GS_EMITS 0 /* @@ -98,38 +102,108 @@ emit_dump_reg(struct gallivm_state *gallivm, lp_build_print_value(gallivm, buf, value); } +static INLINE struct function_ctx * +func_ctx(struct lp_exec_mask *mask) +{ + assert(mask-function_stack_size 0); + assert(mask-function_stack_size = LP_MAX_NUM_FUNCS); + return mask-function_stack[mask-function_stack_size - 1]; +} -static void lp_exec_mask_init(struct lp_exec_mask *mask, struct lp_build_context *bld) +static INLINE boolean +mask_has_loop(struct lp_exec_mask *mask) { - LLVMTypeRef int_type = LLVMInt32TypeInContext(bld-gallivm-context); - LLVMBuilderRef builder = bld-gallivm-builder; + int i; + for (i = mask-function_stack_size - 1; i = 0; --i
[Mesa-dev] [PATCH] llvmpipe: fix possible constant buffer overflow
It's possible to bind a smaller buffer as a constant buffer, than what the shader actually uses/requires. This could cause nasty crashes. This patch adds the architecture to pass the maximum allowable constant buffer index to the jit to let it make sure that the constant buffer indices are always within bounds. The behavior follows the d3d10 spec, which says the overflow should always return all zeros, and overflow is only defined as access beyond the size of the currently bound buffer. Accesses beyond the declared shader constant register size are not considered an overflow and expected to return garbage but consistent garbage (we follow the behavior which some wlk tests expect which is to return the actual values from the bound buffer). Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_llvm.c | 42 ++ src/gallium/auxiliary/draw/draw_llvm.h | 32 +--- .../draw/draw_pt_fetch_shade_pipeline_llvm.c | 6 ++ src/gallium/auxiliary/gallivm/lp_bld_tgsi.h| 2 + src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c| 89 ++ src/gallium/drivers/llvmpipe/lp_jit.c | 7 +- src/gallium/drivers/llvmpipe/lp_jit.h | 5 ++ src/gallium/drivers/llvmpipe/lp_setup.c| 7 +- src/gallium/drivers/llvmpipe/lp_state_fs.c | 5 +- 9 files changed, 152 insertions(+), 43 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 331039a..0bbb680 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -242,17 +242,20 @@ create_jit_context_type(struct gallivm_state *gallivm, { LLVMTargetDataRef target = gallivm-target; LLVMTypeRef float_type = LLVMFloatTypeInContext(gallivm-context); + LLVMTypeRef int_type = LLVMInt32TypeInContext(gallivm-context); LLVMTypeRef elem_types[DRAW_JIT_CTX_NUM_FIELDS]; LLVMTypeRef context_type; elem_types[0] = LLVMArrayType(LLVMPointerType(float_type, 0), /* vs_constants */ LP_MAX_TGSI_CONST_BUFFERS); - elem_types[1] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4), + elem_types[1] = LLVMArrayType(int_type, /* num_vs_constants */ + LP_MAX_TGSI_CONST_BUFFERS); + elem_types[2] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4), DRAW_TOTAL_CLIP_PLANES), 0); - elem_types[2] = LLVMPointerType(float_type, 0); /* viewport */ - elem_types[3] = LLVMArrayType(texture_type, + elem_types[3] = LLVMPointerType(float_type, 0); /* viewport */ + elem_types[4] = LLVMArrayType(texture_type, PIPE_MAX_SHADER_SAMPLER_VIEWS); /* textures */ - elem_types[4] = LLVMArrayType(sampler_type, + elem_types[5] = LLVMArrayType(sampler_type, PIPE_MAX_SAMPLERS); /* samplers */ context_type = LLVMStructTypeInContext(gallivm-context, elem_types, Elements(elem_types), 0); @@ -264,6 +267,8 @@ create_jit_context_type(struct gallivm_state *gallivm, LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, vs_constants, target, context_type, DRAW_JIT_CTX_CONSTANTS); + LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, num_vs_constants, + target, context_type, DRAW_JIT_CTX_NUM_CONSTANTS); LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, planes, target, context_type, DRAW_JIT_CTX_PLANES); LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, viewport, @@ -298,20 +303,22 @@ create_gs_jit_context_type(struct gallivm_state *gallivm, elem_types[0] = LLVMArrayType(LLVMPointerType(float_type, 0), /* constants */ LP_MAX_TGSI_CONST_BUFFERS); - elem_types[1] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4), + elem_types[1] = LLVMArrayType(int_type, /* num_constants */ + LP_MAX_TGSI_CONST_BUFFERS); + elem_types[2] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4), DRAW_TOTAL_CLIP_PLANES), 0); - elem_types[2] = LLVMPointerType(float_type, 0); /* viewport */ + elem_types[3] = LLVMPointerType(float_type, 0); /* viewport */ - elem_types[3] = LLVMArrayType(texture_type, + elem_types[4] = LLVMArrayType(texture_type, PIPE_MAX_SHADER_SAMPLER_VIEWS); /* textures */ - elem_types[4] = LLVMArrayType(sampler_type, + elem_types[5] = LLVMArrayType(sampler_type, PIPE_MAX_SAMPLERS); /* samplers */ - elem_types[5] = LLVMPointerType(LLVMPointerType(int_type, 0), 0); - elem_types[6] = LLVMPointerType(LLVMVectorType(int_type, - vector_length), 0); + elem_types[6] = LLVMPointerType
Re: [Mesa-dev] [PATCH] llvmpipe: fix primitive input to geom shaders
Yea, this sucks. Geometry shaders can take primitive id (system value) for passed in primitives and generate one (semantic) for primitives generated in the geometry shader. TBH, I thought we already handled it... Maybe wlk doesn't test it, we'll see if it regresses. z - Original Message - Well we were using a system value for prim id in gs, hence this was not necessary. I'm always confused though about system value / normal semantic usage though, Zack might know better. Roland Am 07.01.2014 09:55, schrieb Dave Airlie: Not sure this is 100% the correct way to do this, since it may be a change at the glsl-tgsi level that is required, either way open discussions! fixes piglit tests/spec/glsl-1.50/execution/geometry/primitive-id-in.shader_test with llvmpipe with fake MSAA Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 5 + src/gallium/auxiliary/tgsi/tgsi_scan.c | 3 +++ 2 files changed, 8 insertions(+) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index 6d8dc8c..de2c64f 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -1173,6 +1173,10 @@ emit_fetch_gs_input( LLVMValueRef swizzle_index = lp_build_const_int32(gallivm, swizzle); LLVMValueRef res; + if (bld_base-info-input_semantic_name[reg-Register.Index] == TGSI_SEMANTIC_PRIMID) { + res = bld-system_values.prim_id; + goto out; + } if (reg-Register.Indirect) { attrib_index = get_indirect_index(bld, reg-Register.File, @@ -1200,6 +1204,7 @@ emit_fetch_gs_input( assert(res); + out: if (stype == TGSI_TYPE_UNSIGNED) { res = LLVMBuildBitCast(builder, res, bld_base-uint_bld.vec_type, ); } else if (stype == TGSI_TYPE_SIGNED) { diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c b/src/gallium/auxiliary/tgsi/tgsi_scan.c index 0f10556..ce1f7b6 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_scan.c +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c @@ -198,6 +198,9 @@ tgsi_scan_shader(const struct tgsi_token *tokens, info-uses_primid = TRUE; else if (semName == TGSI_SEMANTIC_FACE) info-uses_frontface = TRUE; + } else if (procType == TGSI_PROCESSOR_GEOMETRY) { + if (semName == TGSI_SEMANTIC_PRIMID) +info-uses_primid = TRUE; } } else if (file == TGSI_FILE_SYSTEM_VALUE) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] llvmpipe: fix possible constant buffer overflow
It's possible to bind a smaller buffer as a constant buffer, than what the shader actually uses/requires. This could cause nasty crashes. This patch adds the architecture to pass the maximum allowable constant buffer index to the jit so let it make sure that the constant buffer indices are always within bounds. Currently only used for indirect addressing. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_llvm.c | 42 +++--- src/gallium/auxiliary/draw/draw_llvm.h | 32 +++-- .../draw/draw_pt_fetch_shade_pipeline_llvm.c | 6 src/gallium/auxiliary/gallivm/lp_bld_tgsi.h| 2 ++ src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c| 14 +++- src/gallium/drivers/llvmpipe/lp_jit.c | 7 +++- src/gallium/drivers/llvmpipe/lp_jit.h | 5 +++ src/gallium/drivers/llvmpipe/lp_setup.c| 7 +++- src/gallium/drivers/llvmpipe/lp_state_fs.c | 6 ++-- 9 files changed, 92 insertions(+), 29 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 71cc45f..e5a3842 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -242,17 +242,20 @@ create_jit_context_type(struct gallivm_state *gallivm, { LLVMTargetDataRef target = gallivm-target; LLVMTypeRef float_type = LLVMFloatTypeInContext(gallivm-context); + LLVMTypeRef int_type = LLVMInt32TypeInContext(gallivm-context); LLVMTypeRef elem_types[DRAW_JIT_CTX_NUM_FIELDS]; LLVMTypeRef context_type; elem_types[0] = LLVMArrayType(LLVMPointerType(float_type, 0), /* vs_constants */ LP_MAX_TGSI_CONST_BUFFERS); - elem_types[1] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4), + elem_types[1] = LLVMArrayType(int_type, /* vs_constants_max_index */ + LP_MAX_TGSI_CONST_BUFFERS); + elem_types[2] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4), DRAW_TOTAL_CLIP_PLANES), 0); - elem_types[2] = LLVMPointerType(float_type, 0); /* viewport */ - elem_types[3] = LLVMArrayType(texture_type, + elem_types[3] = LLVMPointerType(float_type, 0); /* viewport */ + elem_types[4] = LLVMArrayType(texture_type, PIPE_MAX_SHADER_SAMPLER_VIEWS); /* textures */ - elem_types[4] = LLVMArrayType(sampler_type, + elem_types[5] = LLVMArrayType(sampler_type, PIPE_MAX_SAMPLERS); /* samplers */ context_type = LLVMStructTypeInContext(gallivm-context, elem_types, Elements(elem_types), 0); @@ -264,6 +267,8 @@ create_jit_context_type(struct gallivm_state *gallivm, LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, vs_constants, target, context_type, DRAW_JIT_CTX_CONSTANTS); + LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, vs_constants_max_index, + target, context_type, DRAW_JIT_CTX_CONSTANTS_MAX_INDEX); LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, planes, target, context_type, DRAW_JIT_CTX_PLANES); LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, viewport, @@ -298,20 +303,22 @@ create_gs_jit_context_type(struct gallivm_state *gallivm, elem_types[0] = LLVMArrayType(LLVMPointerType(float_type, 0), /* constants */ LP_MAX_TGSI_CONST_BUFFERS); - elem_types[1] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4), + elem_types[1] = LLVMArrayType(int_type, /* constants_max_index */ + LP_MAX_TGSI_CONST_BUFFERS); + elem_types[2] = LLVMPointerType(LLVMArrayType(LLVMArrayType(float_type, 4), DRAW_TOTAL_CLIP_PLANES), 0); - elem_types[2] = LLVMPointerType(float_type, 0); /* viewport */ + elem_types[3] = LLVMPointerType(float_type, 0); /* viewport */ - elem_types[3] = LLVMArrayType(texture_type, + elem_types[4] = LLVMArrayType(texture_type, PIPE_MAX_SHADER_SAMPLER_VIEWS); /* textures */ - elem_types[4] = LLVMArrayType(sampler_type, + elem_types[5] = LLVMArrayType(sampler_type, PIPE_MAX_SAMPLERS); /* samplers */ - elem_types[5] = LLVMPointerType(LLVMPointerType(int_type, 0), 0); - elem_types[6] = LLVMPointerType(LLVMVectorType(int_type, - vector_length), 0); + elem_types[6] = LLVMPointerType(LLVMPointerType(int_type, 0), 0); elem_types[7] = LLVMPointerType(LLVMVectorType(int_type, vector_length), 0); + elem_types[8] = LLVMPointerType(LLVMVectorType(int_type, + vector_length), 0); context_type = LLVMStructTypeInContext(gallivm
Re: [Mesa-dev] [PATCH] gallivm: fix pointer type for stmxcsr/ldmxcsr
Looks good. Thanks Roland! - Original Message - From: Roland Scheidegger srol...@vmware.com The argument is a i8 pointer not a i32 pointer (even though the value actually stored/loaded IS i32). Older llvm versions didn't care but 3.2 and newer do leading to crashes. --- src/gallium/auxiliary/gallivm/lp_bld_arit.c |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c index 440dd0b..e516ae8 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c @@ -3510,10 +3510,12 @@ lp_build_fpstate_get(struct gallivm_state *gallivm) gallivm, LLVMInt32TypeInContext(gallivm-context), mxcsr_ptr); + LLVMValueRef mxcsr_ptr8 = LLVMBuildPointerCast(builder, mxcsr_ptr, + LLVMPointerType(LLVMInt8TypeInContext(gallivm-context), 0), ); lp_build_intrinsic(builder, llvm.x86.sse.stmxcsr, LLVMVoidTypeInContext(gallivm-context), - mxcsr_ptr, 1); + mxcsr_ptr8, 1); return mxcsr_ptr; } return 0; @@ -3554,7 +3556,10 @@ lp_build_fpstate_set(struct gallivm_state *gallivm, LLVMValueRef mxcsr_ptr) { if (util_cpu_caps.has_sse) { - lp_build_intrinsic(gallivm-builder, + LLVMBuilderRef builder = gallivm-builder; + mxcsr_ptr = LLVMBuildPointerCast(builder, mxcsr_ptr, + LLVMPointerType(LLVMInt8TypeInContext(gallivm-context), 0), ); + lp_build_intrinsic(builder, llvm.x86.sse.ldmxcsr, LLVMVoidTypeInContext(gallivm-context), mxcsr_ptr, 1); -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: (trivial) get rid of triangle subdivision code
Ah, good stuff, very sensual and does not need more cowbell. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Roland Scheidegger srol...@vmware.com This code was always problematic, and with 64bit rasterization we no longer need it at all. --- src/gallium/drivers/llvmpipe/lp_setup.c |8 +- src/gallium/drivers/llvmpipe/lp_setup_context.h |1 - src/gallium/drivers/llvmpipe/lp_setup_tri.c | 174 --- 3 files changed, 1 insertion(+), 182 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c b/src/gallium/drivers/llvmpipe/lp_setup.c index 49962af..2fad469 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup.c +++ b/src/gallium/drivers/llvmpipe/lp_setup.c @@ -1081,14 +1081,8 @@ try_update_scene_state( struct lp_setup_context *setup ) setup-draw_regions[i]); } } - /* - * Subdivide triangles if the framebuffer is larger than the - * MAX_FIXED_LENGTH. - */ - setup-subdivide_large_triangles = (setup-fb.width MAX_FIXED_LENGTH || - setup-fb.height MAX_FIXED_LENGTH); } - + setup-dirty = 0; assert(setup-fs.stored); diff --git a/src/gallium/drivers/llvmpipe/lp_setup_context.h b/src/gallium/drivers/llvmpipe/lp_setup_context.h index 8bb95c1..b3fb24a 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_context.h +++ b/src/gallium/drivers/llvmpipe/lp_setup_context.h @@ -93,7 +93,6 @@ struct lp_setup_context struct llvmpipe_query *active_queries[LP_MAX_ACTIVE_BINNED_QUERIES]; unsigned active_binned_queries; - boolean subdivide_large_triangles; boolean flatshade_first; boolean ccw_is_frontface; boolean scissor_test; diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c b/src/gallium/drivers/llvmpipe/lp_setup_tri.c index e22f14c..ce3a0a7 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c @@ -921,168 +921,6 @@ rotate_fixed_position_12( struct fixed_position* position ) } -typedef void (*triangle_func_t)(struct lp_setup_context *setup, -const float (*v0)[4], -const float (*v1)[4], -const float (*v2)[4]); - - -/** - * Subdivide this triangle by bisecting edge (v0, v1). - * \param pv the provoking vertex (must = v0 or v1 or v2) - * TODO: should probably think about non-overflowing arithmetic elsewhere. - * This will definitely screw with pipeline counters for instance. - */ -static void -subdiv_tri(struct lp_setup_context *setup, - const float (*v0)[4], - const float (*v1)[4], - const float (*v2)[4], - const float (*pv)[4], - triangle_func_t tri) -{ - unsigned n = setup-fs.current.variant-shader-info.base.num_inputs + 1; - const struct lp_shader_input *inputs = - setup-fs.current.variant-shader-inputs; - PIPE_ALIGN_VAR(LP_MIN_VECTOR_ALIGN) float vmid[PIPE_MAX_ATTRIBS][4]; - const float (*vm)[4] = (const float (*)[4]) vmid; - unsigned i; - float w0, w1, wm; - boolean flatshade = setup-fs.current.variant-key.flatshade; - - /* find position midpoint (attrib[0] = position) */ - vmid[0][0] = 0.5f * (v1[0][0] + v0[0][0]); - vmid[0][1] = 0.5f * (v1[0][1] + v0[0][1]); - vmid[0][2] = 0.5f * (v1[0][2] + v0[0][2]); - vmid[0][3] = 0.5f * (v1[0][3] + v0[0][3]); - - w0 = v0[0][3]; - w1 = v1[0][3]; - wm = vmid[0][3]; - - /* interpolate other attributes */ - for (i = 1; i n; i++) { - if ((inputs[i - 1].interp == LP_INTERP_COLOR flatshade) || - inputs[i - 1].interp == LP_INTERP_CONSTANT) { - /* copy the provoking vertex's attribute */ - vmid[i][0] = pv[i][0]; - vmid[i][1] = pv[i][1]; - vmid[i][2] = pv[i][2]; - vmid[i][3] = pv[i][3]; - } - else { - /* interpolate with perspective correction (for linear too) */ - vmid[i][0] = 0.5f * (v1[i][0] * w1 + v0[i][0] * w0) / wm; - vmid[i][1] = 0.5f * (v1[i][1] * w1 + v0[i][1] * w0) / wm; - vmid[i][2] = 0.5f * (v1[i][2] * w1 + v0[i][2] * w0) / wm; - vmid[i][3] = 0.5f * (v1[i][3] * w1 + v0[i][3] * w0) / wm; - } - } - - /* handling flat shading and first vs. last provoking vertex is a -* little tricky... -*/ - if (pv == v0) { - if (setup-flatshade_first) { - /* first vertex must be v0 or vm */ - tri(setup, v0, vm, v2); - tri(setup, vm, v1, v2); - } - else { - /* last vertex must be v0 or vm */ - tri(setup, vm, v2, v0); - tri(setup, v1, v2, vm); - } - } - else if (pv == v1) { - if (setup-flatshade_first) { - tri(setup, vm, v2, v0); - tri(setup, v1, v2, vm
[Mesa-dev] [PATCH] llvmpipe: fix blending with half-float formats
The fact that we flush denorms to zero breaks our half-float conversion and blending. This patches enables denorms for blending. It's a little tricky due to the llvm bug that makes it incorrectly reorder the mxcsr intrinsics: http://llvm.org/bugs/show_bug.cgi?id=6393 Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/gallivm/lp_bld_arit.c | 67 + src/gallium/auxiliary/gallivm/lp_bld_arit.h | 11 + src/gallium/drivers/llvmpipe/lp_state_fs.c | 31 ++--- 3 files changed, 104 insertions(+), 5 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c index 70929e7..47e778c 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c @@ -64,6 +64,13 @@ #include lp_bld_arit.h #include lp_bld_flow.h +#if defined(PIPE_ARCH_SSE) +#include xmmintrin.h +/* This is defined in pmmintrin.h, but it can only be included when -msse3 is + * used, so just define it here to avoid further. */ +#define _MM_DENORMALS_ZERO_MASK0x0040 +#endif + #define EXP_POLY_DEGREE 5 @@ -3489,3 +3496,63 @@ lp_build_is_inf_or_nan(struct gallivm_state *gallivm, return ret; } + +LLVMValueRef +lp_build_fpstate_get(struct gallivm_state *gallivm) +{ + if (util_cpu_caps.has_sse) { + LLVMBuilderRef builder = gallivm-builder; + LLVMValueRef mxcsr_ptr = lp_build_alloca( + gallivm, + LLVMInt32TypeInContext(gallivm-context), + mxcsr_ptr); + lp_build_intrinsic(builder, + llvm.x86.sse.stmxcsr, + LLVMVoidTypeInContext(gallivm-context), + mxcsr_ptr, 1); + return mxcsr_ptr; + } + return 0; +} + +void +lp_build_fpstate_set_denorms_zero(struct gallivm_state *gallivm, + boolean zero) +{ + if (util_cpu_caps.has_sse) { + /* turn on DAZ (64) | FTZ (32768) = 32832 if available */ + int daz_ftz = _MM_FLUSH_ZERO_MASK; + + LLVMBuilderRef builder = gallivm-builder; + LLVMValueRef mxcsr_ptr = lp_build_fpstate_get(gallivm); + LLVMValueRef mxcsr = + LLVMBuildLoad(builder, mxcsr_ptr, mxcsr); + + if (util_cpu_caps.has_daz) { + /* Enable denormals are zero mode */ + daz_ftz |= _MM_DENORMALS_ZERO_MASK; + } + if (zero) { + mxcsr = LLVMBuildOr(builder, mxcsr, + LLVMConstInt(LLVMTypeOf(mxcsr), daz_ftz, 0), ); + } else { + mxcsr = LLVMBuildAnd(builder, mxcsr, + LLVMConstInt(LLVMTypeOf(mxcsr), ~daz_ftz, 0), ); + } + + LLVMBuildStore(builder, mxcsr, mxcsr_ptr); + lp_build_fpstate_set(gallivm, mxcsr_ptr); + } +} + +void +lp_build_fpstate_set(struct gallivm_state *gallivm, + LLVMValueRef mxcsr_ptr) +{ + if (util_cpu_caps.has_sse) { + lp_build_intrinsic(gallivm-builder, + llvm.x86.sse.ldmxcsr, + LLVMVoidTypeInContext(gallivm-context), + mxcsr_ptr, 1); + } +} diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.h b/src/gallium/auxiliary/gallivm/lp_bld_arit.h index 75bf89e..9d29093 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.h @@ -358,4 +358,15 @@ lp_build_is_inf_or_nan(struct gallivm_state *gallivm, const struct lp_type type, LLVMValueRef x); + +LLVMValueRef +lp_build_fpstate_get(struct gallivm_state *gallivm); + +void +lp_build_fpstate_set_denorms_zero(struct gallivm_state *gallivm, + boolean zero); +void +lp_build_fpstate_set(struct gallivm_state *gallivm, + LLVMValueRef mxcsr); + #endif /* !LP_BLD_ARIT_H */ diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c b/src/gallium/drivers/llvmpipe/lp_state_fs.c index b5816e0..d0fdc80 100644 --- a/src/gallium/drivers/llvmpipe/lp_state_fs.c +++ b/src/gallium/drivers/llvmpipe/lp_state_fs.c @@ -1490,6 +1490,28 @@ generate_unswizzled_blend(struct gallivm_state *gallivm, const boolean is_1d = variant-key.resource_1d; unsigned num_fullblock_fs = is_1d ? 2 * num_fs : num_fs; + LLVMValueRef fpstate = 0; + + /* Get type from output format */ + lp_blend_type_from_format_desc(out_format_desc, row_type); + lp_mem_type_from_format_desc(out_format_desc, dst_type); + + /* +* Technically this code should go into lp_build_smallfloat_to_float +* and lp_build_float_to_smallfloat but due to the +* http://llvm.org/bugs/show_bug.cgi?id=6393 +* llvm reorders the mxcsr intrinsics in a way that breaks the code. +* So the ordering is important here and there shouldn't be any +* llvm ir instrunctions in this function before +* this, otherwise half-float format conversions won't work +* (again due to llvm bug #6393). +*/ + if (dst_type.floating
[Mesa-dev] [PATCH 2/2] llvmpipe: add a very useful (disabled) debugging output
Disabled by default, but it's very useful when needed. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_setup_point.c | 20 1 file changed, 20 insertions(+) diff --git a/src/gallium/drivers/llvmpipe/lp_setup_point.c b/src/gallium/drivers/llvmpipe/lp_setup_point.c index 4b31495..c42646e 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_point.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_point.c @@ -302,6 +302,23 @@ subpixel_snap(float a) return util_iround(FIXED_ONE * a); } +/** + * Print point vertex attribs (for debug). + */ +static void +print_point(struct lp_setup_context *setup, +const float (*v0)[4]) +{ + const struct lp_setup_variant_key *key = setup-setup.variant-key; + uint i; + + debug_printf(llvmpipe point\n); + for (i = 0; i 1 + key-num_inputs; i++) { + debug_printf( v0[%d]: %f %f %f %f\n, i, + v0[i][0], v0[i][1], v0[i][2], v0[i][3]); + } +} + static boolean try_setup_point( struct lp_setup_context *setup, @@ -342,6 +359,9 @@ try_setup_point( struct lp_setup_context *setup, layer = MIN2(layer, scene-fb_max_layer); } + if (0) + print_point(setup, v0); + /* Bounding rectangle (in pixels) */ { /* Yes this is necessary to accurately calculate bounding boxes -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] draw: fix vbuf caching of vertices with inject front face
Caching in the vbuf module meant that once a vertex has been emitted it was cached, but it's possible for a vertex at the same location to be emitted again, but this time with a different front-face semantic. Caching was causing the first version of the vertex to be emitted, which resulted in the renderer getting incorrect front-face attributes. By reseting the vertex_id (which is used for caching) we make sure that once a front-face info has been injected the vertex will endup getting emitted. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_pipe_unfilled.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c index 8cba07c..4f0326b 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c +++ b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c @@ -81,6 +81,7 @@ inject_front_face_info(struct draw_stage *stage, v-data[slot][1] = is_front_face; v-data[slot][2] = is_front_face; v-data[slot][3] = is_front_face; + v-vertex_id = UNDEFINED_VERTEX_ID; } } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] gallium/cso: fix sampler / sampler_view counts
The entire series looks good to me. Now that it is possible to query drivers for the max sampler view it should be safe to increase this without crashing. Not entirely convinced this really works correctly though if state trackers using non-linked sampler / sampler_views use this. I'm not sure if I get this. What would be the problem in that case? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: support 8bit subpixel precision
For me too, other than the fixed_position members, looks good. Thanks for your perseverance on this Zack! Thanks! ok, attached is a version that makes position and dx/dy 32bit again, it seems to work great. I have a question for you guys if you run the piglits: ./bin/triangle-rasterization-overdraw -max_size -seed 0xA8402F24 -count 1 -auto on master does it fail for you? It fails for me on master, with and without the patch. I'm not sure what to make of it, I might have been looking at rasterization for too long. Looking at the rendering it looks correct. zFrom 55c9a288c7ebc37b32bc75526e6de71a838ccaef Mon Sep 17 00:00:00 2001 From: Zack Rusin za...@vmware.com Date: Thu, 24 Oct 2013 22:05:22 -0400 Subject: [PATCH] llvmpipe: support 8bit subpixel precision 8 bit precision is required by d3d10 but unfortunately requires 64 bit rasterizer. This commit implements 64 bit rasterization with full support for 8bit subpixel precision. It's a combination of all individual commits from the llvmpipe-rast-64 branch. --- src/gallium/drivers/llvmpipe/lp_rast.c | 11 ++ src/gallium/drivers/llvmpipe/lp_rast.h | 47 +-- src/gallium/drivers/llvmpipe/lp_rast_debug.c | 6 +- src/gallium/drivers/llvmpipe/lp_rast_priv.h| 27 src/gallium/drivers/llvmpipe/lp_rast_tri.c | 173 src/gallium/drivers/llvmpipe/lp_rast_tri_tmp.h | 56 src/gallium/drivers/llvmpipe/lp_setup_line.c | 2 +- src/gallium/drivers/llvmpipe/lp_setup_tri.c| 147 + src/gallium/tests/graw/SConscript | 1 + src/gallium/tests/graw/tri-large.c | 174 + 10 files changed, 496 insertions(+), 148 deletions(-) create mode 100644 src/gallium/tests/graw/tri-large.c diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c b/src/gallium/drivers/llvmpipe/lp_rast.c index af661e9..0cd62c2 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.c +++ b/src/gallium/drivers/llvmpipe/lp_rast.c @@ -589,6 +589,17 @@ static lp_rast_cmd_func dispatch[LP_RAST_OP_MAX] = lp_rast_begin_query, lp_rast_end_query, lp_rast_set_state, + lp_rast_triangle_32_1, + lp_rast_triangle_32_2, + lp_rast_triangle_32_3, + lp_rast_triangle_32_4, + lp_rast_triangle_32_5, + lp_rast_triangle_32_6, + lp_rast_triangle_32_7, + lp_rast_triangle_32_8, + lp_rast_triangle_32_3_4, + lp_rast_triangle_32_3_16, + lp_rast_triangle_32_4_16 }; diff --git a/src/gallium/drivers/llvmpipe/lp_rast.h b/src/gallium/drivers/llvmpipe/lp_rast.h index 43c598d..b81d94f 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.h +++ b/src/gallium/drivers/llvmpipe/lp_rast.h @@ -46,10 +46,11 @@ struct lp_scene; struct lp_fence; struct cmd_bin; -#define FIXED_TYPE_WIDTH 32 +#define FIXED_TYPE_WIDTH 64 /** For sub-pixel positioning */ -#define FIXED_ORDER 4 +#define FIXED_ORDER 8 #define FIXED_ONE (1FIXED_ORDER) +#define FIXED_SHIFT (FIXED_TYPE_WIDTH - 1) /** Maximum length of an edge in a primitive in pixels. * If the framebuffer is large we have to think about fixed-point * integer overflow. Coordinates need ((FIXED_TYPE_WIDTH/2) - 1) bits @@ -59,11 +60,14 @@ struct cmd_bin; */ #define MAX_FIXED_LENGTH (1 (((FIXED_TYPE_WIDTH/2) - 1) - FIXED_ORDER)) +#define MAX_FIXED_LENGTH32 (1 (((32/2) - 1) - FIXED_ORDER)) + /* Rasterizer output size going to jit fs, width/height */ #define LP_RASTER_BLOCK_SIZE 4 #define LP_MAX_ACTIVE_BINNED_QUERIES 16 +#define IMUL64(a, b) (((int64_t)(a)) * ((int64_t)(b))) struct lp_rasterizer_task; @@ -102,18 +106,15 @@ struct lp_rast_shader_inputs { /* followed by a0, dadx, dady and planes[] */ }; -/* Note: the order of these values is important as they are loaded by - * sse code in rasterization: - */ struct lp_rast_plane { /* edge function values at minx,miny ?? */ - int c; + int64_t c; - int dcdx; - int dcdy; + int32_t dcdx; + int32_t dcdy; /* one-pixel sized trivial reject offsets for each plane */ - int eo; + int64_t eo; }; /** @@ -277,8 +278,19 @@ lp_rast_arg_null( void ) #define LP_RAST_OP_BEGIN_QUERY 0xf #define LP_RAST_OP_END_QUERY 0x10 #define LP_RAST_OP_SET_STATE 0x11 - -#define LP_RAST_OP_MAX 0x12 +#define LP_RAST_OP_TRIANGLE_32_1 0x12 +#define LP_RAST_OP_TRIANGLE_32_2 0x13 +#define LP_RAST_OP_TRIANGLE_32_3 0x14 +#define LP_RAST_OP_TRIANGLE_32_4 0x15 +#define LP_RAST_OP_TRIANGLE_32_5 0x16 +#define LP_RAST_OP_TRIANGLE_32_6 0x17 +#define LP_RAST_OP_TRIANGLE_32_7 0x18 +#define LP_RAST_OP_TRIANGLE_32_8 0x19 +#define LP_RAST_OP_TRIANGLE_32_3_4 0x1a +#define LP_RAST_OP_TRIANGLE_32_3_16 0x1b +#define LP_RAST_OP_TRIANGLE_32_4_16 0x1c + +#define LP_RAST_OP_MAX 0x1d #define LP_RAST_OP_MASK 0xff void @@ -289,4 +301,17 @@ void lp_debug_draw_bins_by_coverage( struct lp_scene *scene ); +#ifdef PIPE_ARCH_SSE +#include emmintrin.h +#include util/u_sse.h
[Mesa-dev] [PATCH] llvmpipe: support 8bit subpixel precision
8 bit precision is required by d3d10 but unfortunately requires 64 bit rasterizer. This commit implements 64 bit rasterization with full support for 8bit subpixel precision. It's a combination of all individual commits from the llvmpipe-rast-64 branch. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_rast.c | 11 ++ src/gallium/drivers/llvmpipe/lp_rast.h | 47 +-- src/gallium/drivers/llvmpipe/lp_rast_debug.c | 6 +- src/gallium/drivers/llvmpipe/lp_rast_priv.h| 27 src/gallium/drivers/llvmpipe/lp_rast_tri.c | 173 + src/gallium/drivers/llvmpipe/lp_rast_tri_tmp.h | 56 src/gallium/drivers/llvmpipe/lp_setup_line.c | 2 +- src/gallium/drivers/llvmpipe/lp_setup_tri.c| 155 ++ src/gallium/tests/graw/SConscript | 1 + src/gallium/tests/graw/tri-large.c | 173 + 10 files changed, 500 insertions(+), 151 deletions(-) create mode 100644 src/gallium/tests/graw/tri-large.c diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c b/src/gallium/drivers/llvmpipe/lp_rast.c index af661e9..0cd62c2 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.c +++ b/src/gallium/drivers/llvmpipe/lp_rast.c @@ -589,6 +589,17 @@ static lp_rast_cmd_func dispatch[LP_RAST_OP_MAX] = lp_rast_begin_query, lp_rast_end_query, lp_rast_set_state, + lp_rast_triangle_32_1, + lp_rast_triangle_32_2, + lp_rast_triangle_32_3, + lp_rast_triangle_32_4, + lp_rast_triangle_32_5, + lp_rast_triangle_32_6, + lp_rast_triangle_32_7, + lp_rast_triangle_32_8, + lp_rast_triangle_32_3_4, + lp_rast_triangle_32_3_16, + lp_rast_triangle_32_4_16 }; diff --git a/src/gallium/drivers/llvmpipe/lp_rast.h b/src/gallium/drivers/llvmpipe/lp_rast.h index 43c598d..b81d94f 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.h +++ b/src/gallium/drivers/llvmpipe/lp_rast.h @@ -46,10 +46,11 @@ struct lp_scene; struct lp_fence; struct cmd_bin; -#define FIXED_TYPE_WIDTH 32 +#define FIXED_TYPE_WIDTH 64 /** For sub-pixel positioning */ -#define FIXED_ORDER 4 +#define FIXED_ORDER 8 #define FIXED_ONE (1FIXED_ORDER) +#define FIXED_SHIFT (FIXED_TYPE_WIDTH - 1) /** Maximum length of an edge in a primitive in pixels. * If the framebuffer is large we have to think about fixed-point * integer overflow. Coordinates need ((FIXED_TYPE_WIDTH/2) - 1) bits @@ -59,11 +60,14 @@ struct cmd_bin; */ #define MAX_FIXED_LENGTH (1 (((FIXED_TYPE_WIDTH/2) - 1) - FIXED_ORDER)) +#define MAX_FIXED_LENGTH32 (1 (((32/2) - 1) - FIXED_ORDER)) + /* Rasterizer output size going to jit fs, width/height */ #define LP_RASTER_BLOCK_SIZE 4 #define LP_MAX_ACTIVE_BINNED_QUERIES 16 +#define IMUL64(a, b) (((int64_t)(a)) * ((int64_t)(b))) struct lp_rasterizer_task; @@ -102,18 +106,15 @@ struct lp_rast_shader_inputs { /* followed by a0, dadx, dady and planes[] */ }; -/* Note: the order of these values is important as they are loaded by - * sse code in rasterization: - */ struct lp_rast_plane { /* edge function values at minx,miny ?? */ - int c; + int64_t c; - int dcdx; - int dcdy; + int32_t dcdx; + int32_t dcdy; /* one-pixel sized trivial reject offsets for each plane */ - int eo; + int64_t eo; }; /** @@ -277,8 +278,19 @@ lp_rast_arg_null( void ) #define LP_RAST_OP_BEGIN_QUERY 0xf #define LP_RAST_OP_END_QUERY 0x10 #define LP_RAST_OP_SET_STATE 0x11 - -#define LP_RAST_OP_MAX 0x12 +#define LP_RAST_OP_TRIANGLE_32_1 0x12 +#define LP_RAST_OP_TRIANGLE_32_2 0x13 +#define LP_RAST_OP_TRIANGLE_32_3 0x14 +#define LP_RAST_OP_TRIANGLE_32_4 0x15 +#define LP_RAST_OP_TRIANGLE_32_5 0x16 +#define LP_RAST_OP_TRIANGLE_32_6 0x17 +#define LP_RAST_OP_TRIANGLE_32_7 0x18 +#define LP_RAST_OP_TRIANGLE_32_8 0x19 +#define LP_RAST_OP_TRIANGLE_32_3_4 0x1a +#define LP_RAST_OP_TRIANGLE_32_3_16 0x1b +#define LP_RAST_OP_TRIANGLE_32_4_16 0x1c + +#define LP_RAST_OP_MAX 0x1d #define LP_RAST_OP_MASK 0xff void @@ -289,4 +301,17 @@ void lp_debug_draw_bins_by_coverage( struct lp_scene *scene ); +#ifdef PIPE_ARCH_SSE +#include emmintrin.h +#include util/u_sse.h + +static INLINE __m128i +lp_plane_to_m128i(const struct lp_rast_plane *plane) +{ + return _mm_setr_epi32((int32_t)plane-c, (int32_t)plane-dcdx, + (int32_t)plane-dcdy, (int32_t)plane-eo); +} + +#endif + #endif diff --git a/src/gallium/drivers/llvmpipe/lp_rast_debug.c b/src/gallium/drivers/llvmpipe/lp_rast_debug.c index 3bc75aa..587c793 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast_debug.c +++ b/src/gallium/drivers/llvmpipe/lp_rast_debug.c @@ -195,8 +195,8 @@ debug_triangle(int tilex, int tiley, while (plane_mask) { plane[nr_planes] = tri_plane[u_bit_scan(plane_mask)]; plane[nr_planes].c = (plane[nr_planes].c + -plane[nr_planes].dcdy * tiley
Re: [Mesa-dev] [PATCH] gallivm: Compile flag to debug TGSI execution through printfs.
That's very nice Jose! Looks good to me. - Original Message - From: José Fonseca jfons...@vmware.com It is similar to tgsi_exec.c's DEBUG_EXECUTION compile flag. I had prototyped this for a while while debugging an issue, but finally cleaned this up and added a few more bells and whistles. Here is a sample output. CONST[0]: X: 0.006250 0.006250 0.006250 0.006250 Y: -0.007143 -0.007143 -0.007143 -0.007143 Z: -1.00 -1.00 -1.00 -1.00 W: 1.00 1.00 1.00 1.00 IN[0]: X: 143.50 175.50 175.50 143.50 Y: 123.50 123.50 155.50 155.50 Z: 0.00 0.00 0.00 0.00 W: 1.00 1.00 1.00 1.00 1: RCP TEMP[0].w, IN[0]. TEMP[0].w = 1 1 1 1 2: MAD TEMP[0].xy, IN[0], CONST[0], CONST[0].zwzw TEMP[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 TEMP[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 3: MUL OUT[0].xy, TEMP[0], TEMP[0]. OUT[0].x = -0.103124976 0.0968750715 0.0968750715 -0.103124976 OUT[0].y = 0.117857158 0.117857158 -0.110714316 -0.110714316 4: MUL OUT[0].z, IN[0]., TEMP[0]. OUT[0].z = 0 0 0 0 5: MOV OUT[0].w, TEMP[0] OUT[0].w = 1 1 1 1 --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 158 +++- src/gallium/auxiliary/tgsi/tgsi_dump.c | 23 src/gallium/auxiliary/tgsi/tgsi_dump.h | 7 ++ 3 files changed, 159 insertions(+), 29 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index 5f81066..917826d 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -47,6 +47,7 @@ #include tgsi/tgsi_parse.h #include tgsi/tgsi_util.h #include tgsi/tgsi_scan.h +#include tgsi/tgsi_strings.h #include lp_bld_tgsi_action.h #include lp_bld_type.h #include lp_bld_const.h @@ -67,6 +68,17 @@ #define DUMP_GS_EMITS 0 +/* + * If non-zero, the generated LLVM IR will print intermediate results on every TGSI + * instruction. + * + * TODO: + * - take execution masks in consideration + * - debug control-flow instructions + */ +#define DEBUG_EXECUTION 0 + + static void lp_exec_mask_init(struct lp_exec_mask *mask, struct lp_build_context *bld) { LLVMTypeRef int_type = LLVMInt32TypeInContext(bld-gallivm-context); @@ -664,6 +676,43 @@ static void lp_exec_mask_endsub(struct lp_exec_mask *mask, int *pc) } +static LLVMValueRef +get_file_ptr(struct lp_build_tgsi_soa_context *bld, + unsigned file, + unsigned index, + unsigned chan) +{ + LLVMBuilderRef builder = bld-bld_base.base.gallivm-builder; + LLVMValueRef (*array_of_vars)[TGSI_NUM_CHANNELS]; + LLVMValueRef var_of_array; + + switch (file) { + case TGSI_FILE_TEMPORARY: + array_of_vars = bld-temps; + var_of_array = bld-temps_array; + break; + case TGSI_FILE_OUTPUT: + array_of_vars = bld-outputs; + var_of_array = bld-outputs_array; + break; + default: + assert(0); + return NULL; + } + + assert(chan 4); + + if (bld-indirect_files (1 file)) { + LLVMValueRef lindex = lp_build_const_int32(bld-bld_base.base.gallivm, index * 4 + chan); + return LLVMBuildGEP(builder, var_of_array, lindex, 1, ); + } + else { + assert(index = bld-bld_base.info-file_max[file]); + return array_of_vars[index][chan]; + } +} + + /** * Return pointer to a temporary register channel (src or dest). * Note that indirect addressing cannot be handled here. @@ -675,15 +724,7 @@ lp_get_temp_ptr_soa(struct lp_build_tgsi_soa_context *bld, unsigned index, unsigned chan) { - LLVMBuilderRef builder = bld-bld_base.base.gallivm-builder; - assert(chan 4); - if (bld-indirect_files (1 TGSI_FILE_TEMPORARY)) { - LLVMValueRef lindex = lp_build_const_int32(bld-bld_base.base.gallivm, index * 4 + chan); - return LLVMBuildGEP(builder, bld-temps_array, lindex, 1, ); - } - else { - return bld-temps[index][chan]; - } + return get_file_ptr(bld, TGSI_FILE_TEMPORARY, index, chan); } /** @@ -697,16 +738,7 @@ lp_get_output_ptr(struct lp_build_tgsi_soa_context *bld, unsigned index, unsigned chan) { - LLVMBuilderRef builder = bld-bld_base.base.gallivm-builder; - assert(chan 4); - if (bld-indirect_files (1 TGSI_FILE_OUTPUT)) { - LLVMValueRef lindex = lp_build_const_int32(bld-bld_base.base.gallivm, - index * 4 + chan); - return LLVMBuildGEP(builder, bld-outputs_array, lindex, 1, ); - } - else { - return bld-outputs[index][chan]; - } + return get_file_ptr(bld, TGSI_FILE_OUTPUT, index, chan); } /* @@ -1415,6
Re: [Mesa-dev] [PATCH] gallivm: deduplicate some indirect register address code
Looks good. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Roland Scheidegger srol...@vmware.com There's only one minor functional change, for immediates the pixel offsets are no longer added since the values are all the same for all elements in any case (it might be better if those weren't stored as soa vectors in the first place maybe). --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 253 +-- 1 file changed, 96 insertions(+), 157 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index 75f6def..5f81066 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -898,6 +898,39 @@ stype_to_fetch(struct lp_build_tgsi_context * bld_base, } static LLVMValueRef +get_soa_array_offsets(struct lp_build_context *uint_bld, + LLVMValueRef indirect_index, + unsigned chan_index, + boolean need_perelement_offset) +{ + struct gallivm_state *gallivm = uint_bld-gallivm; + LLVMValueRef chan_vec = + lp_build_const_int_vec(uint_bld-gallivm, uint_bld-type, chan_index); + LLVMValueRef length_vec = + lp_build_const_int_vec(gallivm, uint_bld-type, uint_bld-type.length); + LLVMValueRef index_vec; + + /* index_vec = (indirect_index * 4 + chan_index) * length + offsets */ + index_vec = lp_build_shl_imm(uint_bld, indirect_index, 2); + index_vec = lp_build_add(uint_bld, index_vec, chan_vec); + index_vec = lp_build_mul(uint_bld, index_vec, length_vec); + + if (need_perelement_offset) { + LLVMValueRef pixel_offsets; + int i; + /* build pixel offset vector: {0, 1, 2, 3, ...} */ + pixel_offsets = uint_bld-undef; + for (i = 0; i uint_bld-type.length; i++) { + LLVMValueRef ii = lp_build_const_int32(gallivm, i); + pixel_offsets = LLVMBuildInsertElement(gallivm-builder, pixel_offsets, +ii, ii, ); + } + index_vec = lp_build_add(uint_bld, index_vec, pixel_offsets); + } + return index_vec; +} + +static LLVMValueRef emit_fetch_constant( struct lp_build_tgsi_context * bld_base, const struct tgsi_full_src_register * reg, @@ -908,7 +941,6 @@ emit_fetch_constant( struct gallivm_state *gallivm = bld_base-base.gallivm; LLVMBuilderRef builder = gallivm-builder; struct lp_build_context *uint_bld = bld_base-uint_bld; - LLVMValueRef indirect_index = NULL; unsigned dimension = 0; LLVMValueRef dimension_index; LLVMValueRef consts_ptr; @@ -927,16 +959,15 @@ emit_fetch_constant( consts_ptr = lp_build_array_get(gallivm, bld-consts_ptr, dimension_index); if (reg-Register.Indirect) { + LLVMValueRef indirect_index; + LLVMValueRef swizzle_vec = + lp_build_const_int_vec(gallivm, uint_bld-type, swizzle); + LLVMValueRef index_vec; /* index into the const buffer */ + indirect_index = get_indirect_index(bld, reg-Register.File, reg-Register.Index, reg-Indirect); - } - - if (reg-Register.Indirect) { - LLVMValueRef swizzle_vec = - lp_build_const_int_vec(bld-bld_base.base.gallivm, uint_bld-type, swizzle); - LLVMValueRef index_vec; /* index into the const buffer */ /* index_vec = indirect_index * 4 + swizzle */ index_vec = lp_build_shl_imm(uint_bld, indirect_index, 2); @@ -949,7 +980,7 @@ emit_fetch_constant( LLVMValueRef index; /* index into the const buffer */ LLVMValueRef scalar, scalar_ptr; - index = lp_build_const_int32(gallivm, reg-Register.Index*4 + swizzle); + index = lp_build_const_int32(gallivm, reg-Register.Index * 4 + swizzle); scalar_ptr = LLVMBuildGEP(builder, consts_ptr, index, 1, ); @@ -974,49 +1005,32 @@ emit_fetch_immediate( struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base); struct gallivm_state *gallivm = bld-bld_base.base.gallivm; LLVMBuilderRef builder = gallivm-builder; - struct lp_build_context *uint_bld = bld_base-uint_bld; - struct lp_build_context *float_bld = bld_base-base; LLVMValueRef res = NULL; - LLVMValueRef indirect_index = NULL; if (reg-Register.Indirect) { + LLVMValueRef indirect_index; + LLVMValueRef index_vec; /* index into the immediate register array */ + LLVMValueRef imms_array; + LLVMTypeRef fptr_type; + indirect_index = get_indirect_index(bld, reg-Register.File, reg-Register.Index, reg-Indirect); - } - - if (reg
[Mesa-dev] [PATCH] graw: add a test rendering a huge triangle
Used to test rasterization, because we often breakdown on subdivision of triangles with long edges. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/tests/graw/SConscript | 1 + src/gallium/tests/graw/tri-large.c | 173 + 2 files changed, 174 insertions(+) create mode 100644 src/gallium/tests/graw/tri-large.c diff --git a/src/gallium/tests/graw/SConscript b/src/gallium/tests/graw/SConscript index 8740ff3..8723807 100644 --- a/src/gallium/tests/graw/SConscript +++ b/src/gallium/tests/graw/SConscript @@ -29,6 +29,7 @@ progs = [ 'tex-srgb', 'tex-swizzle', 'tri', +'tri-large', 'tri-gs', 'tri-instanced', 'vs-test', diff --git a/src/gallium/tests/graw/tri-large.c b/src/gallium/tests/graw/tri-large.c new file mode 100644 index 000..3fbbfb3 --- /dev/null +++ b/src/gallium/tests/graw/tri-large.c @@ -0,0 +1,173 @@ +/* Display a cleared blue window. This demo has no dependencies on + * any utility code, just the graw interface and gallium. + */ + +#include graw_util.h +#include util/u_debug.h + +#include stdio.h + +static struct graw_info info; + +static const int WIDTH = 4*2048; +static const int HEIGHT = 4*2048; + + +struct vertex { + float position[4]; + float color[4]; +}; + +static boolean FlatShade = FALSE; + + +static struct vertex vertices[3] = +{ + { + { -1.0f, -1.0f, 0.0f, 1.0f }, + { 1.0f, 0.0f, 0.0f, 1.0f } + }, + { + { -1.0f, 1.0f, 0.0f, 1.0f }, + { 0.0f, 1.0f, 0.0f, 1.0f } + }, + { + { 1.0f, 1.0f, 0.0f, 1.0f }, + { 0.0f, 0.0f, 1.0f, 1.0f } + } +}; + + +static void set_vertices( void ) +{ + struct pipe_vertex_element ve[2]; + struct pipe_vertex_buffer vbuf; + void *handle; + + memset(ve, 0, sizeof ve); + + ve[0].src_offset = Offset(struct vertex, position); + ve[0].src_format = PIPE_FORMAT_R32G32B32A32_FLOAT; + ve[1].src_offset = Offset(struct vertex, color); + ve[1].src_format = PIPE_FORMAT_R32G32B32A32_FLOAT; + + handle = info.ctx-create_vertex_elements_state(info.ctx, 2, ve); + info.ctx-bind_vertex_elements_state(info.ctx, handle); + + memset(vbuf, 0, sizeof vbuf); + + vbuf.stride = sizeof( struct vertex ); + vbuf.buffer_offset = 0; + vbuf.buffer = pipe_buffer_create_with_data(info.ctx, + PIPE_BIND_VERTEX_BUFFER, + PIPE_USAGE_STATIC, + sizeof(vertices), + vertices); + + info.ctx-set_vertex_buffers(info.ctx, 0, 1, vbuf); +} + + +static void set_vertex_shader( void ) +{ + void *handle; + const char *text = + VERT\n + DCL IN[0]\n + DCL IN[1]\n + DCL OUT[0], POSITION\n + DCL OUT[1], COLOR\n +0: MOV OUT[1], IN[1]\n +1: MOV OUT[0], IN[0]\n +2: END\n; + + handle = graw_parse_vertex_shader(info.ctx, text); + info.ctx-bind_vs_state(info.ctx, handle); +} + + +static void set_fragment_shader( void ) +{ + void *handle; + const char *text = + FRAG\n + DCL IN[0], COLOR, LINEAR\n + DCL OUT[0], COLOR\n +0: MOV OUT[0], IN[0]\n +1: END\n; + + handle = graw_parse_fragment_shader(info.ctx, text); + info.ctx-bind_fs_state(info.ctx, handle); +} + + +static void draw( void ) +{ + union pipe_color_union clear_color = { {1,0,1,1} }; + + info.ctx-clear(info.ctx, PIPE_CLEAR_COLOR, clear_color, 0, 0); + util_draw_arrays(info.ctx, PIPE_PRIM_TRIANGLES, 0, 3); + info.ctx-flush(info.ctx, NULL, 0); + + graw_save_surface_to_file(info.ctx, info.color_surf[0], NULL); + + graw_util_flush_front(info); +} + + +static void init( void ) +{ + if (!graw_util_create_window(info, WIDTH, HEIGHT, 1, FALSE)) + exit(1); + + graw_util_default_state(info, FALSE); + + { + struct pipe_rasterizer_state rasterizer; + void *handle; + memset(rasterizer, 0, sizeof rasterizer); + rasterizer.cull_face = PIPE_FACE_NONE; + rasterizer.half_pixel_center = 1; + rasterizer.bottom_edge_rule = 1; + rasterizer.flatshade = FlatShade; + rasterizer.depth_clip = 1; + handle = info.ctx-create_rasterizer_state(info.ctx, rasterizer); + info.ctx-bind_rasterizer_state(info.ctx, handle); + } + + + graw_util_viewport(info, 0, 0, WIDTH, HEIGHT, 30, 1000); + + set_vertices(); + set_vertex_shader(); + set_fragment_shader(); +} + +static void args(int argc, char *argv[]) +{ + int i; + + for (i = 1; i argc; ) { + if (graw_parse_args(i, argc, argv)) { + /* ok */ + } + else if (strcmp(argv[i], -f) == 0) { + FlatShade = TRUE; + i++; + } + else { + printf(Invalid arg %s\n, argv[i]); + exit(1); + } + } +} + +int main( int argc, char *argv[] ) +{ + args(argc, argv); + init(); + + graw_set_display_func( draw ); + graw_main_loop(); + return 0; +} -- 1.8.3.2
[Mesa-dev] [PATCH 1/3] gallivm: support printing of 64 bit integers
only 8 and 32 bit integers were supported before. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/gallivm/lp_bld_printf.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_printf.c b/src/gallium/auxiliary/gallivm/lp_bld_printf.c index 1324da2..d06209a 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_printf.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_printf.c @@ -106,7 +106,11 @@ lp_build_print_value(struct gallivm_state *gallivm, type_fmt[4] = 'g'; type_fmt[5] = '\0'; } else if (type_kind == LLVMIntegerTypeKind) { - if (LLVMGetIntTypeWidth(type_ref) == 8) { + if (LLVMGetIntTypeWidth(type_ref) == 64) { + type_fmt[2] = 'l'; + type_fmt[3] = 'd'; + type_fmt[4] = '\0'; + } else if (LLVMGetIntTypeWidth(type_ref) == 8) { type_fmt[2] = 'u'; } else { type_fmt[2] = 'i'; -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] gallium: Add support for 32x32 muls with 64 bit results
The code introduces two new 32bit integer multiplication opcodes which can be used to produce correct 64 bit results. GLSL, OpenCL and D3D10+ require them. We use two seperate opcodes, because they match the behavior of GLSL and OpenCL, are a lot easier to add than a single opcode with multiple destinations and because there's not much (any) difference wrt code-generation. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/tgsi/tgsi_exec.c | 34 ++ src/gallium/auxiliary/tgsi/tgsi_info.c | 6 src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h | 3 ++ src/gallium/auxiliary/tgsi/tgsi_util.c | 2 ++ src/gallium/docs/source/tgsi.rst | 30 +++ src/gallium/include/pipe/p_shader_tokens.h | 5 +++- .../tests/graw/vertex-shader/vert-imul_hi.sh | 13 + .../tests/graw/vertex-shader/vert-umul_hi.sh | 11 +++ 8 files changed, 103 insertions(+), 1 deletion(-) create mode 100644 src/gallium/tests/graw/vertex-shader/vert-imul_hi.sh create mode 100644 src/gallium/tests/graw/vertex-shader/vert-umul_hi.sh diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c b/src/gallium/auxiliary/tgsi/tgsi_exec.c index 0750a50..6db1238 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c @@ -3478,6 +3478,32 @@ micro_umul(union tgsi_exec_channel *dst, } static void +micro_imul_hi(union tgsi_exec_channel *dst, + const union tgsi_exec_channel *src0, + const union tgsi_exec_channel *src1) +{ +#define I64M(x, y) int64_t)x) * ((int64_t)y)) 32) + dst-i[0] = I64M(src0-i[0], src1-i[0]); + dst-i[1] = I64M(src0-i[1], src1-i[1]); + dst-i[2] = I64M(src0-i[2], src1-i[2]); + dst-i[3] = I64M(src0-i[3], src1-i[3]); +#undef I64M +} + +static void +micro_umul_hi(union tgsi_exec_channel *dst, + const union tgsi_exec_channel *src0, + const union tgsi_exec_channel *src1) +{ +#define U64M(x, y) uint64_t)x) * ((uint64_t)y)) 32) + dst-u[0] = U64M(src0-u[0], src1-u[0]); + dst-u[1] = U64M(src0-u[1], src1-u[1]); + dst-u[2] = U64M(src0-u[2], src1-u[2]); + dst-u[3] = U64M(src0-u[3], src1-u[3]); +#undef U64M +} + +static void micro_useq(union tgsi_exec_channel *dst, const union tgsi_exec_channel *src0, const union tgsi_exec_channel *src1) @@ -4277,6 +4303,14 @@ exec_instruction( exec_vector_binary(mach, inst, micro_umul, TGSI_EXEC_DATA_UINT, TGSI_EXEC_DATA_UINT); break; + case TGSI_OPCODE_IMUL_HI: + exec_vector_binary(mach, inst, micro_imul_hi, TGSI_EXEC_DATA_INT, TGSI_EXEC_DATA_INT); + break; + + case TGSI_OPCODE_UMUL_HI: + exec_vector_binary(mach, inst, micro_umul_hi, TGSI_EXEC_DATA_UINT, TGSI_EXEC_DATA_UINT); + break; + case TGSI_OPCODE_USEQ: exec_vector_binary(mach, inst, micro_useq, TGSI_EXEC_DATA_UINT, TGSI_EXEC_DATA_UINT); break; diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index 7a5d18f..0beef44 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -219,6 +219,8 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 3, 1, 0, 0, 0, OTHR, TEX2, TGSI_OPCODE_TEX2 }, { 1, 3, 1, 0, 0, 0, OTHR, TXB2, TGSI_OPCODE_TXB2 }, { 1, 3, 1, 0, 0, 0, OTHR, TXL2, TGSI_OPCODE_TXL2 }, + { 1, 2, 0, 0, 0, 0, COMP, IMUL_HI, TGSI_OPCODE_IMUL_HI }, + { 1, 2, 0, 0, 0, 0, COMP, UMUL_HI, TGSI_OPCODE_UMUL_HI }, }; const struct tgsi_opcode_info * @@ -297,6 +299,7 @@ tgsi_opcode_infer_type( uint opcode ) case TGSI_OPCODE_USLT: case TGSI_OPCODE_USNE: case TGSI_OPCODE_SVIEWINFO: + case TGSI_OPCODE_UMUL_HI: return TGSI_TYPE_UNSIGNED; case TGSI_OPCODE_ARL: case TGSI_OPCODE_ARR: @@ -317,6 +320,7 @@ tgsi_opcode_infer_type( uint opcode ) case TGSI_OPCODE_UARL: case TGSI_OPCODE_IABS: case TGSI_OPCODE_ISSG: + case TGSI_OPCODE_IMUL_HI: return TGSI_TYPE_SIGNED; default: return TGSI_TYPE_FLOAT; @@ -339,7 +343,9 @@ tgsi_opcode_infer_src_type( uint opcode ) case TGSI_OPCODE_CASE: case TGSI_OPCODE_SAMPLE_I: case TGSI_OPCODE_SAMPLE_I_MS: + case TGSI_OPCODE_UMUL_HI: return TGSI_TYPE_UNSIGNED; + case TGSI_OPCODE_IMUL_HI: case TGSI_OPCODE_I2F: return TGSI_TYPE_SIGNED; case TGSI_OPCODE_ARL: diff --git a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h index b8144a8..1ef78dd 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h +++ b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h @@ -204,6 +204,9 @@ OP12(SAMPLE_INFO) OP13(UCMP) +OP12(IMUL_HI) +OP12(UMUL_HI) + #undef OP00 #undef OP01 #undef OP10 diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c b/src/gallium/auxiliary/tgsi/tgsi_util.c index b3bc8f2..73a0667 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_util.c +++ b/src
[Mesa-dev] [PATCH 3/3] llvmpipe: implement 64 bit mul opcodes in llvmpipe
Both the imul_hi and umul_hi are working with this patch. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 60 ++ 1 file changed, 60 insertions(+) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c index 1cfaf78..8caaf83 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c @@ -763,6 +763,64 @@ umul_emit( emit_data-args[0], emit_data-args[1]); } +/* TGSI_OPCODE_IMUL_HI */ +static void +imul_hi_emit( + const struct lp_build_tgsi_action * action, + struct lp_build_tgsi_context * bld_base, + struct lp_build_emit_data * emit_data) +{ + LLVMBuilderRef builder = bld_base-base.gallivm-builder; + struct lp_build_context *int_bld = bld_base-int_bld; + struct lp_type type = int_bld-type; + LLVMValueRef src0, src1; + LLVMValueRef dst64; + LLVMTypeRef typeRef; + + assert(type.width == 32); + type.width = 64; + typeRef = lp_build_vec_type(bld_base-base.gallivm, type); + src0 = LLVMBuildSExt(builder, emit_data-args[0], typeRef, ); + src1 = LLVMBuildSExt(builder, emit_data-args[1], typeRef, ); + dst64 = LLVMBuildMul(builder, src0, src1, ); + dst64 = LLVMBuildAShr( +builder, dst64, +lp_build_const_vec(bld_base-base.gallivm, type, 32), ); + type.width = 32; + typeRef = lp_build_vec_type(bld_base-base.gallivm, type); + emit_data-output[emit_data-chan] = + LLVMBuildTrunc(builder, dst64, typeRef, ); +} + +/* TGSI_OPCODE_UMUL_HI */ +static void +umul_hi_emit( + const struct lp_build_tgsi_action * action, + struct lp_build_tgsi_context * bld_base, + struct lp_build_emit_data * emit_data) +{ + LLVMBuilderRef builder = bld_base-base.gallivm-builder; + struct lp_build_context *uint_bld = bld_base-uint_bld; + struct lp_type type = uint_bld-type; + LLVMValueRef src0, src1; + LLVMValueRef dst64; + LLVMTypeRef typeRef; + + assert(type.width == 32); + type.width = 64; + typeRef = lp_build_vec_type(bld_base-base.gallivm, type); + src0 = LLVMBuildZExt(builder, emit_data-args[0], typeRef, ); + src1 = LLVMBuildZExt(builder, emit_data-args[1], typeRef, ); + dst64 = LLVMBuildMul(builder, src0, src1, ); + dst64 = LLVMBuildLShr( +builder, dst64, +lp_build_const_vec(bld_base-base.gallivm, type, 32), ); + type.width = 32; + typeRef = lp_build_vec_type(bld_base-base.gallivm, type); + emit_data-output[emit_data-chan] = + LLVMBuildTrunc(builder, dst64, typeRef, ); +} + /* TGSI_OPCODE_MAX */ static void fmax_emit( const struct lp_build_tgsi_action * action, @@ -894,6 +952,8 @@ lp_set_default_actions(struct lp_build_tgsi_context * bld_base) bld_base-op_actions[TGSI_OPCODE_U2F].emit = u2f_emit; bld_base-op_actions[TGSI_OPCODE_UMAD].emit = umad_emit; bld_base-op_actions[TGSI_OPCODE_UMUL].emit = umul_emit; + bld_base-op_actions[TGSI_OPCODE_IMUL_HI].emit = imul_hi_emit; + bld_base-op_actions[TGSI_OPCODE_UMUL_HI].emit = umul_hi_emit; bld_base-op_actions[TGSI_OPCODE_MAX].emit = fmax_emit; bld_base-op_actions[TGSI_OPCODE_MIN].emit = fmin_emit; -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] llvmpipe: abstract the code to set number of subpixel bits
As we're moving towards expanding the number of subpixel bits and the width of the variables used in the computations we need to make this code a bit more centralized. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_rast.h | 9 + src/gallium/drivers/llvmpipe/lp_setup.c | 14 +- src/gallium/drivers/llvmpipe/lp_setup_tri.c | 2 +- 3 files changed, 15 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_rast.h b/src/gallium/drivers/llvmpipe/lp_rast.h index c57f2ea..43c598d 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.h +++ b/src/gallium/drivers/llvmpipe/lp_rast.h @@ -46,9 +46,18 @@ struct lp_scene; struct lp_fence; struct cmd_bin; +#define FIXED_TYPE_WIDTH 32 /** For sub-pixel positioning */ #define FIXED_ORDER 4 #define FIXED_ONE (1FIXED_ORDER) +/** Maximum length of an edge in a primitive in pixels. + * If the framebuffer is large we have to think about fixed-point + * integer overflow. Coordinates need ((FIXED_TYPE_WIDTH/2) - 1) bits + * to be able to fit product of two such coordinates inside + * FIXED_TYPE_WIDTH, any larger and we could overflow a + * FIXED_TYPE_WIDTH_-bit int. + */ +#define MAX_FIXED_LENGTH (1 (((FIXED_TYPE_WIDTH/2) - 1) - FIXED_ORDER)) /* Rasterizer output size going to jit fs, width/height */ #define LP_RASTER_BLOCK_SIZE 4 diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c b/src/gallium/drivers/llvmpipe/lp_setup.c index c8199b4..9b277d3 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup.c +++ b/src/gallium/drivers/llvmpipe/lp_setup.c @@ -1007,16 +1007,12 @@ try_update_scene_state( struct lp_setup_context *setup ) setup-draw_regions[i]); } } - /* If the framebuffer is large we have to think about fixed-point - * integer overflow. For 2K by 2K images, coordinates need 15 bits - * (2^11 + 4 subpixel bits). The product of two such numbers would - * use 30 bits. Any larger and we could overflow a 32-bit int. - * - * To cope with this problem we check if triangles are large and - * subdivide them if needed. + /* + * Subdivide triangles if the framebuffer is larger than the + * MAX_FIXED_LENGTH. */ - setup-subdivide_large_triangles = (setup-fb.width 2048 || - setup-fb.height 2048); + setup-subdivide_large_triangles = (setup-fb.width MAX_FIXED_LENGTH || + setup-fb.height MAX_FIXED_LENGTH); } setup-dirty = 0; diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c b/src/gallium/drivers/llvmpipe/lp_setup_tri.c index 051ffa0..9cc81e9 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c @@ -988,7 +988,7 @@ check_subdivide_triangle(struct lp_setup_context *setup, const float (*v2)[4], triangle_func_t tri) { - const float maxLen = 2048.0f; /* longest permissible edge, in pixels */ + const float maxLen = MAX_FIXED_LENGTH; /* longest permissible edge, in pixels */ float dx10, dy10, len10; float dx21, dy21, len21; float dx02, dy02, len02; -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] llvmpipe: we need to subdivide if fb is bigger in either direction
We need to subdivide triangles if either of the dimensions is larger than the max edge length, not when both of them are larger. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c b/src/gallium/drivers/llvmpipe/lp_setup.c index 5fde01f..c8199b4 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup.c +++ b/src/gallium/drivers/llvmpipe/lp_setup.c @@ -1015,7 +1015,7 @@ try_update_scene_state( struct lp_setup_context *setup ) * To cope with this problem we check if triangles are large and * subdivide them if needed. */ - setup-subdivide_large_triangles = (setup-fb.width 2048 + setup-subdivide_large_triangles = (setup-fb.width 2048 || setup-fb.height 2048); } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] llvmpipe: align the array used for subdivived vertices
When subdiving a triangle we're using a temporary array to store the new coordinates for the subdivided triangles. Unfortunately the array used for that was not aligned properly causing random crashes in the llvm jit code which was trying to load vectors from it. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_setup_tri.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c b/src/gallium/drivers/llvmpipe/lp_setup_tri.c index 8b0fcd0..cf67f29 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c @@ -909,7 +909,7 @@ subdiv_tri(struct lp_setup_context *setup, unsigned n = setup-fs.current.variant-shader-info.base.num_inputs + 1; const struct lp_shader_input *inputs = setup-fs.current.variant-shader-inputs; - float vmid[PIPE_MAX_ATTRIBS][4]; + PIPE_ALIGN_VAR(LP_MIN_VECTOR_ALIGN) float vmid[PIPE_MAX_ATTRIBS][4]; const float (*vm)[4] = (const float (*)[4]) vmid; unsigned i; float w0, w1, wm; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] llvmpipe: count c_primitives before discarding null prims
We need to count the clipper primitives before the rasterizer discards one it considers to be null. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_setup_tri.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c b/src/gallium/drivers/llvmpipe/lp_setup_tri.c index 23bc6e2..e61efd4 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c @@ -252,7 +252,6 @@ do_triangle_ccw(struct lp_setup_context *setup, const float (*v2)[4], boolean frontfacing ) { - struct llvmpipe_context *lp_context = (struct llvmpipe_context *)setup-pipe; struct lp_scene *scene = setup-scene; const struct lp_setup_variant_key *key = setup-setup.variant-key; struct lp_rast_triangle *tri; @@ -340,11 +339,6 @@ do_triangle_ccw(struct lp_setup_context *setup, LP_COUNT(nr_tris); - if (lp_context-active_statistics_queries - !llvmpipe_rasterization_disabled(lp_context)) { - lp_context-pipeline_statistics.c_primitives++; - } - /* Setup parameter interpolants: */ setup-setup.variant-jit_function( v0, @@ -803,7 +797,6 @@ static void retry_triangle_ccw( struct lp_setup_context *setup, } } - /** * Calculate fixed position data for a triangle */ @@ -1102,11 +1095,17 @@ static void triangle_both( struct lp_setup_context *setup, const float (*v2)[4] ) { struct fixed_position position; + struct llvmpipe_context *lp_context = (struct llvmpipe_context *)setup-pipe; if (setup-subdivide_large_triangles check_subdivide_triangle(setup, v0, v1, v2, triangle_both)) return; + if (lp_context-active_statistics_queries + !llvmpipe_rasterization_disabled(lp_context)) { + lp_context-pipeline_statistics.c_primitives++; + } + calc_fixed_position(setup, position, v0, v1, v2); if (0) { -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] llvmpipe: increase number of subpixel bits to eight
Unfortunately d3d10 requires a lot higher precision (e.g. wgf11clipping tests for it). The smallest number of precision bits with which it passes is 8. That means that we need to decrease the maximum length of an edge that we can handle without subdivision by 4 bits. Abstracted the code a bit to make it easier to change once to switch to 64bit rasterization. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_rast.h | 12 +++- src/gallium/drivers/llvmpipe/lp_setup.c | 14 +- src/gallium/drivers/llvmpipe/lp_setup_tri.c | 2 +- 3 files changed, 17 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_rast.h b/src/gallium/drivers/llvmpipe/lp_rast.h index c57f2ea..b72be55 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.h +++ b/src/gallium/drivers/llvmpipe/lp_rast.h @@ -46,10 +46,20 @@ struct lp_scene; struct lp_fence; struct cmd_bin; +#define FIXED_TYPE_WIDTH 32 /** For sub-pixel positioning */ -#define FIXED_ORDER 4 +#define FIXED_ORDER 8 #define FIXED_ONE (1FIXED_ORDER) +/** Maximum length of an edge in a primitive in pixels. + * If the framebuffer is large we have to think about fixed-point + * integer overflow. Coordinates need ((FIXED_TYPE_WIDTH/2) - 1) bits + * to be able to fit product of two such coordinates inside + * FIXED_TYPE_WIDTH, any larger and we could overflow a + * FIXED_TYPE_WIDTH_-bit int. + */ +#define MAX_FIXED_LENGTH (1 (((FIXED_TYPE_WIDTH/2) - 1) - FIXED_ORDER)) + /* Rasterizer output size going to jit fs, width/height */ #define LP_RASTER_BLOCK_SIZE 4 diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c b/src/gallium/drivers/llvmpipe/lp_setup.c index 5fde01f..edb55ad 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup.c +++ b/src/gallium/drivers/llvmpipe/lp_setup.c @@ -1007,16 +1007,12 @@ try_update_scene_state( struct lp_setup_context *setup ) setup-draw_regions[i]); } } - /* If the framebuffer is large we have to think about fixed-point - * integer overflow. For 2K by 2K images, coordinates need 15 bits - * (2^11 + 4 subpixel bits). The product of two such numbers would - * use 30 bits. Any larger and we could overflow a 32-bit int. - * - * To cope with this problem we check if triangles are large and - * subdivide them if needed. + /* + * Subdivide triangles if the framebuffer is larger than our + * MAX_FIXED_LENGTH cab accomodate. */ - setup-subdivide_large_triangles = (setup-fb.width 2048 - setup-fb.height 2048); + setup-subdivide_large_triangles = (setup-fb.width MAX_FIXED_LENGTH + setup-fb.height MAX_FIXED_LENGTH); } setup-dirty = 0; diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c b/src/gallium/drivers/llvmpipe/lp_setup_tri.c index e61efd4..ee30a64 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c @@ -988,7 +988,7 @@ check_subdivide_triangle(struct lp_setup_context *setup, const float (*v2)[4], triangle_func_t tri) { - const float maxLen = 2048.0f; /* longest permissible edge, in pixels */ + const float maxLen = MAX_FIXED_LENGTH; /* longest permissible edge, in pixels */ float dx10, dy10, len10; float dx21, dy21, len21; float dx02, dy02, len02; -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] draw/clip: don't emit so many empty triangles
Compress empty triangles (don't emit more than one in a row) and never emit empty triangles if we already generated a triangle covering a non-null area. We can't skip all null-triangles because c_primitives expects ones that were generated from vertices exactly at the clipping-plane, to be emitted. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_pipe_clip.c | 39 + 1 file changed, 39 insertions(+) diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c b/src/gallium/auxiliary/draw/draw_pipe_clip.c index 0f90bfd..2d6df81 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_clip.c +++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c @@ -209,6 +209,29 @@ static void interp( const struct clip_stage *clip, } } +/** + * Checks whether the specifed triangle is empty and if it is returns + * true, otherwise returns false. + * Triangle is considered null/empty if it's area is qual to zero. + */ +static INLINE boolean +is_tri_null(struct draw_context *draw, const struct prim_header *header) +{ + const unsigned pos_attr = draw_current_shader_position_output(draw); + float x1 = header-v[1]-data[pos_attr][0] - header-v[0]-data[pos_attr][0]; + float y1 = header-v[1]-data[pos_attr][1] - header-v[0]-data[pos_attr][1]; + float z1 = header-v[1]-data[pos_attr][2] - header-v[0]-data[pos_attr][2]; + + float x2 = header-v[2]-data[pos_attr][0] - header-v[0]-data[pos_attr][0]; + float y2 = header-v[2]-data[pos_attr][1] - header-v[0]-data[pos_attr][1]; + float z2 = header-v[2]-data[pos_attr][2] - header-v[0]-data[pos_attr][2]; + + float vx = y1 * z2 - z1 * y2; + float vy = x1 * z2 - z1 * x2; + float vz = x1 * y2 - y1 * x2; + + return (vx*vx + vy*vy + vz*vz) == 0.f; +} /** * Emit a post-clip polygon to the next pipeline stage. The polygon @@ -223,6 +246,8 @@ static void emit_poly( struct draw_stage *stage, struct prim_header header; unsigned i; ushort edge_first, edge_middle, edge_last; + boolean last_tri_was_null = FALSE; + boolean tri_was_not_null = FALSE; if (stage-draw-rasterizer-flatshade_first) { edge_first = DRAW_PIPE_EDGE_FLAG_0; @@ -244,6 +269,7 @@ static void emit_poly( struct draw_stage *stage, header.pad = 0; for (i = 2; i n; i++, header.flags = edge_middle) { + boolean tri_null; /* order the triangle verts to respect the provoking vertex mode */ if (stage-draw-rasterizer-flatshade_first) { header.v[0] = inlist[0]; /* the provoking vertex */ @@ -256,6 +282,19 @@ static void emit_poly( struct draw_stage *stage, header.v[2] = inlist[0]; /* the provoking vertex */ } + tri_null = is_tri_null(stage-draw, header); + /* If we generated a triangle with an area, aka. non-null triangle, + * or if the previous triangle was also null then skip all subsequent + * null triangles */ + if ((tri_was_not_null tri_null) || (last_tri_was_null tri_null)) { + last_tri_was_null = tri_null; + continue; + } + last_tri_was_null = tri_null; + if (!tri_null) { + tri_was_not_null = TRUE; + } + if (!edgeflags[i-1]) { header.flags = ~edge_middle; } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] util/u_blit: Implement util_blit_pixels via pipe_context::blit.
The entire series looks good to me. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: José Fonseca jfons...@vmware.com This removes a lot of code, but not everything, as util_blit_pixels_tex is still useful when one needs to override pipe_sampler_view::swizzle_?. --- src/gallium/auxiliary/util/u_blit.c | 447 +++- 1 file changed, 37 insertions(+), 410 deletions(-) diff --git a/src/gallium/auxiliary/util/u_blit.c b/src/gallium/auxiliary/util/u_blit.c index e9bec4a..4ba71b9 100644 --- a/src/gallium/auxiliary/util/u_blit.c +++ b/src/gallium/auxiliary/util/u_blit.c @@ -57,29 +57,20 @@ struct blit_state struct pipe_context *pipe; struct cso_context *cso; - struct pipe_blend_state blend_write_color, blend_keep_color; + struct pipe_blend_state blend_write_color; struct pipe_depth_stencil_alpha_state dsa_keep_depthstencil; - struct pipe_depth_stencil_alpha_state dsa_write_depthstencil; - struct pipe_depth_stencil_alpha_state dsa_write_depth; - struct pipe_depth_stencil_alpha_state dsa_write_stencil; struct pipe_rasterizer_state rasterizer; struct pipe_sampler_state sampler; struct pipe_viewport_state viewport; struct pipe_vertex_element velem[2]; - enum pipe_texture_target internal_target; void *vs; void *fs[PIPE_MAX_TEXTURE_TYPES][TGSI_WRITEMASK_XYZW + 1]; - void *fs_depthstencil[PIPE_MAX_TEXTURE_TYPES]; - void *fs_depth[PIPE_MAX_TEXTURE_TYPES]; - void *fs_stencil[PIPE_MAX_TEXTURE_TYPES]; struct pipe_resource *vbuf; /** quad vertices */ unsigned vbuf_slot; float vertices[4][2][4]; /** vertex/texcoords for quad */ - - boolean has_stencil_export; }; @@ -103,20 +94,6 @@ util_create_blit(struct pipe_context *pipe, struct cso_context *cso) /* disabled blending/masking */ ctx-blend_write_color.rt[0].colormask = PIPE_MASK_RGBA; - /* depth stencil states */ - ctx-dsa_write_depth.depth.enabled = 1; - ctx-dsa_write_depth.depth.writemask = 1; - ctx-dsa_write_depth.depth.func = PIPE_FUNC_ALWAYS; - ctx-dsa_write_stencil.stencil[0].enabled = 1; - ctx-dsa_write_stencil.stencil[0].func = PIPE_FUNC_ALWAYS; - ctx-dsa_write_stencil.stencil[0].fail_op = PIPE_STENCIL_OP_REPLACE; - ctx-dsa_write_stencil.stencil[0].zpass_op = PIPE_STENCIL_OP_REPLACE; - ctx-dsa_write_stencil.stencil[0].zfail_op = PIPE_STENCIL_OP_REPLACE; - ctx-dsa_write_stencil.stencil[0].valuemask = 0xff; - ctx-dsa_write_stencil.stencil[0].writemask = 0xff; - ctx-dsa_write_depthstencil.depth = ctx-dsa_write_depth.depth; - ctx-dsa_write_depthstencil.stencil[0] = ctx-dsa_write_stencil.stencil[0]; - /* rasterizer */ ctx-rasterizer.cull_face = PIPE_FACE_NONE; ctx-rasterizer.half_pixel_center = 1; @@ -147,14 +124,6 @@ util_create_blit(struct pipe_context *pipe, struct cso_context *cso) ctx-vertices[i][1][3] = 1.0f; /* q */ } - if(pipe-screen-get_param(pipe-screen, PIPE_CAP_NPOT_TEXTURES)) - ctx-internal_target = PIPE_TEXTURE_2D; - else - ctx-internal_target = PIPE_TEXTURE_RECT; - - ctx-has_stencil_export = - pipe-screen-get_param(pipe-screen, PIPE_CAP_SHADER_STENCIL_EXPORT); - return ctx; } @@ -178,18 +147,6 @@ util_destroy_blit(struct blit_state *ctx) } } - for (i = 0; i PIPE_MAX_TEXTURE_TYPES; i++) { - if (ctx-fs_depthstencil[i]) { - pipe-delete_fs_state(pipe, ctx-fs_depthstencil[i]); - } - if (ctx-fs_depth[i]) { - pipe-delete_fs_state(pipe, ctx-fs_depth[i]); - } - if (ctx-fs_stencil[i]) { - pipe-delete_fs_state(pipe, ctx-fs_stencil[i]); - } - } - pipe_resource_reference(ctx-vbuf, NULL); FREE(ctx); @@ -217,63 +174,6 @@ set_fragment_shader(struct blit_state *ctx, uint writemask, /** - * Helper function to set the shader which writes depth and stencil. - */ -static INLINE void -set_depthstencil_fragment_shader(struct blit_state *ctx, - enum pipe_texture_target pipe_tex) -{ - if (!ctx-fs_depthstencil[pipe_tex]) { - unsigned tgsi_tex = util_pipe_tex_to_tgsi_tex(pipe_tex, 0); - - ctx-fs_depthstencil[pipe_tex] = - util_make_fragment_tex_shader_writedepthstencil(ctx-pipe, tgsi_tex, - TGSI_INTERPOLATE_LINEAR); - } - - cso_set_fragment_shader_handle(ctx-cso, ctx-fs_depthstencil[pipe_tex]); -} - - -/** - * Helper function to set the shader which writes depth. - */ -static INLINE void -set_depth_fragment_shader(struct blit_state *ctx, - enum pipe_texture_target pipe_tex) -{ - if (!ctx-fs_depth[pipe_tex]) { - unsigned tgsi_tex = util_pipe_tex_to_tgsi_tex(pipe_tex, 0); - - ctx-fs_depth[pipe_tex] = - util_make_fragment_tex_shader_writedepth(ctx-pipe, tgsi_tex
Re: [Mesa-dev] [PATCH] Revert draw: cleanup the extra attribs
This reverts commit 57cd3267782fcf92d1e7d772760956516d4367df. This fixes piglit regressions with additional draw stages on llvmpipe, softpipe and i915g. The attributes can't be cleared at this point because they might be in use by the additional draw stages. The attributes have to cleared but the interface for looking them up has to be exactly the same in llvmpipe (i.e. only llvmpipe does it correctly). https://bugs.freedesktop.org/show_bug.cgi?id=67963 https://bugs.freedesktop.org/show_bug.cgi?id=67965 https://bugs.freedesktop.org/show_bug.cgi?id=67966 All of which have been fixed for a long time, just no one had the time to verify and close. In other words please don't revert, if you don't feel like changing the shader output lookup just remove the prepare_shader_outputs call, like I mentioned, and that should get you the old behavior back. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] draw: cleanup the extra attribs
Hi, Stéphane. No we should not revert to the old behavior. The old behavior was incorrect. Consider this: -- setup state that draws a wireframe - draw should inject frontface -- the driver needs to be able to find the injected wireframe output -- draw -- setup state the draws solid fill with fragment shader using primid input - draw should inject primid but not frontface -- driver needs to be able to find the injected primid but not frontface info -- draw Without cleaning the attributed before the second draw the draw will keep the frontface id in the extra attribs, incorrectly pointing the driver to a non-existing crash. That's why the attribs need to be cleaned before rendering. i915g simply shouldn't call draw_prepare_shader_outputs because it doesn't know what to do with the injected front-face or primid anyway. That part I'd suggest you remove. It will get you back to the old behavior. z - Original Message - Hi Zack, This change regresses a bunch of point sprite piglit tests on i915g. Should we revert back to the old behaviour? As far as I can see, it was correct (it was keeping the attributes in case another stage is using them). Stéphane On Thu, Aug 8, 2013 at 12:46 PM, Zack Rusin za...@vmware.com wrote: Before inserting new front face and prim id outputs cleanup the old extra outputs, otherwise our cache will use previous output slots which will break as soon as outputs of the current shader don't match the last. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_context.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index af9caee..2dc6772 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -555,6 +555,7 @@ draw_get_shader_info(const struct draw_context *draw) void draw_prepare_shader_outputs(struct draw_context *draw) { + draw_remove_extra_vertex_attribs(draw); draw_ia_prepare_outputs(draw, draw-pipeline.ia); draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled); } -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallivm: support indirect registers on both dimensions
We support indirect addressing only on the vertex index, but some shaders also use indirect addressing on attributes. This patch adds support for indirect addressing on both dimensions inside gs arrays. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_llvm.c | 23 +-- src/gallium/auxiliary/gallivm/lp_bld_tgsi.h | 3 ++- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 4 +++- 3 files changed, 22 insertions(+), 8 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 820d6b0..03668d9 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1360,8 +1360,9 @@ clipmask_booli32(struct gallivm_state *gallivm, static LLVMValueRef draw_gs_llvm_fetch_input(const struct lp_build_tgsi_gs_iface *gs_iface, struct lp_build_tgsi_context * bld_base, - boolean is_indirect, + boolean is_vindex_indirect, LLVMValueRef vertex_index, + boolean is_aindex_indirect, LLVMValueRef attrib_index, LLVMValueRef swizzle_index) { @@ -1372,18 +1373,28 @@ draw_gs_llvm_fetch_input(const struct lp_build_tgsi_gs_iface *gs_iface, LLVMValueRef res; struct lp_type type = bld_base-base.type; - if (is_indirect) { + if (is_vindex_indirect || is_aindex_indirect) { int i; res = bld_base-base.zero; for (i = 0; i type.length; ++i) { LLVMValueRef idx = lp_build_const_int32(gallivm, i); - LLVMValueRef vert_chan_index = LLVMBuildExtractElement(builder, -vertex_index, idx, ); + LLVMValueRef vert_chan_index = vertex_index; + LLVMValueRef attr_chan_index = attrib_index; LLVMValueRef channel_vec, value; + + if (is_vindex_indirect) { +vert_chan_index = LLVMBuildExtractElement(builder, + vertex_index, idx, ); + } + if (is_aindex_indirect) { +attr_chan_index = LLVMBuildExtractElement(builder, + attrib_index, idx, ); + } + indices[0] = vert_chan_index; - indices[1] = attrib_index; + indices[1] = attr_chan_index; indices[2] = swizzle_index; - + channel_vec = LLVMBuildGEP(builder, gs-input, indices, 3, ); channel_vec = LLVMBuildLoad(builder, channel_vec, ); value = LLVMBuildExtractElement(builder, channel_vec, idx, ); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h index 522302e..8bcdbc8 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h @@ -395,8 +395,9 @@ struct lp_build_tgsi_gs_iface { LLVMValueRef (*fetch_input)(const struct lp_build_tgsi_gs_iface *gs_iface, struct lp_build_tgsi_context * bld_base, - boolean is_indirect, + boolean is_vindex_indirect, LLVMValueRef vertex_index, + boolean is_aindex_indirect, LLVMValueRef attrib_index, LLVMValueRef swizzle_index); void (*emit_vertex)(const struct lp_build_tgsi_gs_iface *gs_iface, diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index 4c6b6ec..e50f1d1 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -1135,7 +1135,9 @@ emit_fetch_gs_input( res = bld-gs_iface-fetch_input(bld-gs_iface, bld_base, reg-Dimension.Indirect, -vertex_index, attrib_index, +vertex_index, +reg-Register.Indirect, +attrib_index, swizzle_index); assert(res); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: fix PIPE_MAX_SAMPLER/PIPE_MAX_SHADER_SAMPLER_VIEWS issues
Looks good. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Roland Scheidegger srol...@vmware.com pstipple/aaline stages used PIPE_MAX_SAMPLER instead of PIPE_MAX_SHADER_SAMPLER_VIEWS when dealing with sampler views. Now these stages can't actually handle sampler_unit != texture_unit anyway (they cannot work with d3d10 shaders at all due to using tex not sample opcodes as mixed mode shaders are impossible) but this leads to crashes if a driver just installs these stages and then more than PIPE_MAX_SAMPLER views are set even if the stages aren't even used. --- src/gallium/auxiliary/draw/draw_pipe_aaline.c |6 +++--- src/gallium/auxiliary/draw/draw_pipe_pstipple.c |6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c b/src/gallium/auxiliary/draw/draw_pipe_aaline.c index c44c236..8483bd7 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c +++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c @@ -107,7 +107,7 @@ struct aaline_stage struct aaline_fragment_shader *fs; struct { void *sampler[PIPE_MAX_SAMPLERS]; - struct pipe_sampler_view *sampler_views[PIPE_MAX_SAMPLERS]; + struct pipe_sampler_view *sampler_views[PIPE_MAX_SHADER_SAMPLER_VIEWS]; } state; /* @@ -763,7 +763,7 @@ aaline_destroy(struct draw_stage *stage) struct pipe_context *pipe = stage-draw-pipe; uint i; - for (i = 0; i PIPE_MAX_SAMPLERS; i++) { + for (i = 0; i PIPE_MAX_SHADER_SAMPLER_VIEWS; i++) { pipe_sampler_view_reference(aaline-state.sampler_views[i], NULL); } @@ -937,7 +937,7 @@ aaline_set_sampler_views(struct pipe_context *pipe, for (i = 0; i num; i++) { pipe_sampler_view_reference(aaline-state.sampler_views[i], views[i]); } - for ( ; i PIPE_MAX_SAMPLERS; i++) { + for ( ; i PIPE_MAX_SHADER_SAMPLER_VIEWS; i++) { pipe_sampler_view_reference(aaline-state.sampler_views[i], NULL); } aaline-num_sampler_views = num; diff --git a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c index 51f5a86..f38addd 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_pstipple.c +++ b/src/gallium/auxiliary/draw/draw_pipe_pstipple.c @@ -87,7 +87,7 @@ struct pstip_stage struct pstip_fragment_shader *fs; struct { void *samplers[PIPE_MAX_SAMPLERS]; - struct pipe_sampler_view *sampler_views[PIPE_MAX_SAMPLERS]; + struct pipe_sampler_view *sampler_views[PIPE_MAX_SHADER_SAMPLER_VIEWS]; const struct pipe_poly_stipple *stipple; } state; @@ -592,7 +592,7 @@ pstip_destroy(struct draw_stage *stage) struct pstip_stage *pstip = pstip_stage(stage); uint i; - for (i = 0; i PIPE_MAX_SAMPLERS; i++) { + for (i = 0; i PIPE_MAX_SHADER_SAMPLER_VIEWS; i++) { pipe_sampler_view_reference(pstip-state.sampler_views[i], NULL); } @@ -731,7 +731,7 @@ pstip_set_sampler_views(struct pipe_context *pipe, for (i = 0; i num; i++) { pipe_sampler_view_reference(pstip-state.sampler_views[i], views[i]); } - for (; i PIPE_MAX_SAMPLERS; i++) { + for (; i PIPE_MAX_SHADER_SAMPLER_VIEWS; i++) { pipe_sampler_view_reference(pstip-state.sampler_views[i], NULL); } -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] gallivm: handle unbound textures in texture sampling / texture queries
Same here. - Original Message - Series LGTM. Jose - Original Message - From: Roland Scheidegger srol...@vmware.com Turns out we don't need to do much extra work for detecting this case, since we are guaranteed to get a empty static texture state in this case, hence just rely on format being 0 and return all zero then. Previously needed dummy textures (would just have crashed on format being 0 otherwise) which cannot return the correct result for size queries and when sampling textures with wrap modes using border. As a bonus should hugely increase performance when sampling unbound textures - too bad it isn't a useful feature :-). --- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 26 + 1 file changed, 26 insertions(+) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c index db5e366..e0d3dd2 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c @@ -2088,6 +2088,19 @@ lp_build_sample_soa(struct gallivm_state *gallivm, debug_printf(Sample from %s\n, util_format_name(fmt)); } + if (static_texture_state-format == PIPE_FORMAT_NONE) { + /* + * If there's nothing bound, format is NONE, and we must return + * all zero as mandated by d3d10 in this case. + */ + unsigned chan; + LLVMValueRef zero = lp_build_const_vec(gallivm, type, 0.0F); + for (chan = 0; chan 4; chan++) { + texel_out[chan] = zero; + } + return; + } + assert(type.floating); /* Setup our build context */ @@ -2517,6 +2530,19 @@ lp_build_size_query_soa(struct gallivm_state *gallivm, unsigned num_lods = 1; struct lp_build_context bld_int_vec4; + if (static_state-format == PIPE_FORMAT_NONE) { + /* + * If there's nothing bound, format is NONE, and we must return + * all zero as mandated by d3d10 in this case. + */ + unsigned chan; + LLVMValueRef zero = lp_build_const_vec(gallivm, int_type, 0.0F); + for (chan = 0; chan 4; chan++) { + sizes_out[chan] = zero; + } + return; + } + /* * Do some sanity verification about bound texture and shader dcl target. * Not entirely sure what's possible but assume array/non-array -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: fix stencil bug if we have both stencil and depth tests
- Original Message - From: Roland Scheidegger srol...@vmware.com This is a very well hidden bug found by accident (only the fixed glean tstencil2 test so far seems to hit it). We must use new mask with combined s_pass values and orig_mask values for zpass/zfail stencil ops, otherwise both the sfail op and one of zpass/zfail op are applied (probably not hit in most tests because some of the ops tend to be KEEP usually). Note: this is a candidate for the 9.2 branch. Looks good ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] draw: handle nan clipdistance
If clipdistance for one of the vertices is nan (or inf) then the entire primitive should be discarded. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_cliptest_tmp.h |2 +- src/gallium/auxiliary/draw/draw_llvm.c |3 ++ src/gallium/auxiliary/draw/draw_pipe_clip.c| 13 +- src/gallium/auxiliary/gallivm/lp_bld_arit.c| 53 src/gallium/auxiliary/gallivm/lp_bld_arit.h| 11 + 5 files changed, 79 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h index e4500db..fc54810 100644 --- a/src/gallium/auxiliary/draw/draw_cliptest_tmp.h +++ b/src/gallium/auxiliary/draw/draw_cliptest_tmp.h @@ -140,7 +140,7 @@ static boolean TAG(do_cliptest)( struct pt_post_vs *pvs, clipdist = out-data[cd[0]][i]; else clipdist = out-data[cd[1]][i-4]; - if (clipdist 0) + if (clipdist 0 || util_is_inf_or_nan(clipdist)) mask |= 1 plane_idx; } else { if (dot4(clipvertex, plane[plane_idx]) 0) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 84e3392..1e9eadb 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1261,6 +1261,7 @@ generate_clipmask(struct draw_llvm *llvm, if (clip_user) { LLVMValueRef planes_ptr = draw_jit_context_planes(gallivm, context_ptr); LLVMValueRef indices[3]; + LLVMValueRef is_nan; /* userclip planes */ while (ucp_enable) { @@ -1280,6 +1281,8 @@ generate_clipmask(struct draw_llvm *llvm, clipdist = LLVMBuildLoad(builder, outputs[cd[1]][i-4], ); } test = lp_build_compare(gallivm, f32_type, PIPE_FUNC_GREATER, zero, clipdist); +is_nan = lp_build_is_inf_or_nan(gallivm, vs_type, clipdist); +test = LLVMBuildOr(builder, test, is_nan, ); temp = lp_build_const_int_vec(gallivm, i32_type, 1 plane_idx); test = LLVMBuildAnd(builder, test, temp, ); mask = LLVMBuildOr(builder, mask, test, ); diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c b/src/gallium/auxiliary/draw/draw_pipe_clip.c index b76e9a5..2f2aadb 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_clip.c +++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c @@ -104,7 +104,7 @@ static void interp_attr( float dst[4], float t, const float in[4], const float out[4] ) -{ +{ dst[0] = LINTERP( t, out[0], in[0] ); dst[1] = LINTERP( t, out[1], in[1] ); dst[2] = LINTERP( t, out[2], in[2] ); @@ -380,6 +380,9 @@ do_clip_tri( struct draw_stage *stage, dp_prev = getclipdist(clipper, vert_prev, plane_idx); clipmask = ~(1plane_idx); + if (util_is_inf_or_nan(dp_prev)) + return; //discard nan + assert(n MAX_CLIPPED_VERTICES); if (n = MAX_CLIPPED_VERTICES) return; @@ -392,6 +395,9 @@ do_clip_tri( struct draw_stage *stage, float dp = getclipdist(clipper, vert, plane_idx); + if (util_is_inf_or_nan(dp)) +return; //discard nan + if (!IS_NEGATIVE(dp_prev)) { assert(outcount MAX_CLIPPED_VERTICES); if (outcount = MAX_CLIPPED_VERTICES) @@ -522,6 +528,9 @@ do_clip_line( struct draw_stage *stage, const float dp0 = getclipdist(clipper, v0, plane_idx); const float dp1 = getclipdist(clipper, v1, plane_idx); + if (util_is_inf_or_nan(dp0) || util_is_inf_or_nan(dp1)) + return; //discard nan + if (dp1 0.0F) { float t = dp1 / (dp1 - dp0); t1 = MAX2(t1, t); @@ -594,7 +603,7 @@ clip_tri( struct draw_stage *stage, unsigned clipmask = (header-v[0]-clipmask | header-v[1]-clipmask | header-v[2]-clipmask); - + if (clipmask == 0) { /* no clipping needed */ stage-next-tri( stage-next, header ); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c b/src/gallium/auxiliary/gallivm/lp_bld_arit.c index 98409c3..72b563e 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c @@ -3671,3 +3671,56 @@ lp_build_isfinite(struct lp_build_context *bld, return lp_build_compare(bld-gallivm, int_type, PIPE_FUNC_NOTEQUAL, intx, infornan32); } + +/* + * Returns true if the number is nan or inf or false otherwise. + * The input has to be a floating point vector. + */ +LLVMValueRef +lp_build_is_inf_or_nan(struct gallivm_state *gallivm, + const struct lp_type type, + LLVMValueRef x) +{ + LLVMBuilderRef builder = gallivm-builder; + struct lp_type int_type = lp_int_type(type
Re: [Mesa-dev] [PATCH] draw: handle nan clipdistance
I realize this function isn't used but it looks unnecessarily complicated - two constants one AND plus one comparison when you could simply do a single comparison (compare x with x with unordered not equal). This is actually doubly bad with AVX because the int comparison is going to use 4 instructions instead of 1 (extract/2 cmp/1 insert), well if this runs 8-wide at least. I'm going to kill that function, we already have lp_build_isnan that does the correct thing. Otherwise looks good. Though I'm not sure you really need to kill the prims if the clip distances are infinite? The d3d10 spec says Coordinates coming in to clipping with infinites at x, y, z may or may not result in a discarded primitive.. I liked handling them the same way as nan, otherwise we're just generating pointless primitives. I don't have a strong opinion though, wlk doesn't seem to test infinites. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] llvmpipe: fix pipeline statistics with a null ps
If the fragment shader is null then pixel shader invocations have to be equal to zero. And if we're running a null ps then clipper invocations and primitives should be equal to zero but only if both stancil and depth testing are disabled. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_rast.c|3 ++- src/gallium/drivers/llvmpipe/lp_rast_priv.h |3 ++- src/gallium/drivers/llvmpipe/lp_setup_line.c |3 ++- src/gallium/drivers/llvmpipe/lp_setup_point.c |3 ++- src/gallium/drivers/llvmpipe/lp_setup_tri.c |3 ++- src/gallium/drivers/llvmpipe/lp_setup_vbuf.c |9 +++-- src/gallium/drivers/llvmpipe/lp_state_fs.c| 24 +++- src/gallium/drivers/llvmpipe/lp_state_fs.h|4 8 files changed, 44 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c b/src/gallium/drivers/llvmpipe/lp_rast.c index 49cdbfe..af661e9 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.c +++ b/src/gallium/drivers/llvmpipe/lp_rast.c @@ -35,6 +35,7 @@ #include os/os_time.h #include lp_scene_queue.h +#include lp_context.h #include lp_debug.h #include lp_fence.h #include lp_perf.h @@ -459,7 +460,7 @@ lp_rast_shade_quads_mask(struct lp_rasterizer_task *task, if ((x % TILE_SIZE) task-width (y % TILE_SIZE) task-height) { /* not very accurate would need a popcount on the mask */ /* always count this not worth bothering? */ - task-ps_invocations++; + task-ps_invocations += 1 * variant-ps_inv_multiplier; /* run shader on 4x4 block */ BEGIN_JIT_CALL(state, task); diff --git a/src/gallium/drivers/llvmpipe/lp_rast_priv.h b/src/gallium/drivers/llvmpipe/lp_rast_priv.h index b8bc99c..41fe097 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast_priv.h +++ b/src/gallium/drivers/llvmpipe/lp_rast_priv.h @@ -100,6 +100,7 @@ struct lp_rasterizer_task /* occlude counter for visible pixels */ struct lp_jit_thread_data thread_data; uint64_t ps_invocations; + uint8_t ps_inv_multiplier; pipe_semaphore work_ready; pipe_semaphore work_done; @@ -308,7 +309,7 @@ lp_rast_shade_quads_all( struct lp_rasterizer_task *task, if ((x % TILE_SIZE) task-width (y % TILE_SIZE) task-height) { /* not very accurate would need a popcount on the mask */ /* always count this not worth bothering? */ - task-ps_invocations++; + task-ps_invocations += 1 * variant-ps_inv_multiplier; /* run shader on 4x4 block */ BEGIN_JIT_CALL(state, task); diff --git a/src/gallium/drivers/llvmpipe/lp_setup_line.c b/src/gallium/drivers/llvmpipe/lp_setup_line.c index a25a6b0..e1686ea 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_line.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_line.c @@ -600,7 +600,8 @@ try_setup_line( struct lp_setup_context *setup, LP_COUNT(nr_tris); - if (lp_context-active_statistics_queries) { + if (lp_context-active_statistics_queries + !llvmpipe_rasterization_disabled(lp_context)) { lp_context-pipeline_statistics.c_primitives++; } diff --git a/src/gallium/drivers/llvmpipe/lp_setup_point.c b/src/gallium/drivers/llvmpipe/lp_setup_point.c index cbcc8d4..45068ec 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_point.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_point.c @@ -384,7 +384,8 @@ try_setup_point( struct lp_setup_context *setup, LP_COUNT(nr_tris); - if (lp_context-active_statistics_queries) { + if (lp_context-active_statistics_queries + !llvmpipe_rasterization_disabled(lp_context)) { lp_context-pipeline_statistics.c_primitives++; } diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c b/src/gallium/drivers/llvmpipe/lp_setup_tri.c index 579f351..23bc6e2 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c @@ -340,7 +340,8 @@ do_triangle_ccw(struct lp_setup_context *setup, LP_COUNT(nr_tris); - if (lp_context-active_statistics_queries) { + if (lp_context-active_statistics_queries + !llvmpipe_rasterization_disabled(lp_context)) { lp_context-pipeline_statistics.c_primitives++; } diff --git a/src/gallium/drivers/llvmpipe/lp_setup_vbuf.c b/src/gallium/drivers/llvmpipe/lp_setup_vbuf.c index 8173994..bf9f7e7 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_vbuf.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_vbuf.c @@ -565,8 +565,13 @@ lp_setup_pipeline_statistics( stats-gs_invocations; llvmpipe-pipeline_statistics.gs_primitives += stats-gs_primitives; - llvmpipe-pipeline_statistics.c_invocations += - stats-c_invocations; + if (!llvmpipe_rasterization_disabled(llvmpipe)) { + llvmpipe-pipeline_statistics.c_invocations += + stats-c_invocations; + } else { + llvmpipe-pipeline_statistics.c_invocations = 0; + } + } /** diff --git a/src/gallium/drivers/llvmpipe/lp_state_fs.c b/src/gallium/drivers/llvmpipe/lp_state_fs.c index
Re: [Mesa-dev] [PATCH] gallivm: already pass coords in the right place in the sampler interface
I have to admit that I don't know the sampling code, but the patches look good to me. z - Original Message - From: Roland Scheidegger srol...@vmware.com This makes things a bit nicer, and more importantly it fixes an issue where a downgraded array texture (due to view reduced to 1 layer and addressed with (non-array) samplec instruction) would use the wrong coord as shadow reference value. (This could also be fixed by passing target through the sampler interface much the same way as is done for size queries, might do this eventually anyway.) And if we'd ever want to support (shadow) cube map arrays, we'd need 5 coords in any case. v2: fix bugs (texel fetch using wrong layer coord for 1d, shadow tex using wrong shadow coord for 2d...). Plus need to project the shadow coord, and just for fun keep projecting the layer coord too. --- src/gallium/auxiliary/gallivm/lp_bld_sample.h |2 + src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 28 +--- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 159 +++-- 3 files changed, 90 insertions(+), 99 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.h b/src/gallium/auxiliary/gallivm/lp_bld_sample.h index c25d171..6d8fe88 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h @@ -335,7 +335,9 @@ texture_dims(enum pipe_texture_target tex) case PIPE_TEXTURE_2D_ARRAY: case PIPE_TEXTURE_RECT: case PIPE_TEXTURE_CUBE: + return 2; case PIPE_TEXTURE_CUBE_ARRAY: + assert(0); return 2; case PIPE_TEXTURE_3D: return 3; diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c index 07ed48e..c312922 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c @@ -1574,7 +1574,7 @@ lp_build_sample_soa(struct gallivm_state *gallivm, unsigned target = static_texture_state-target; unsigned dims = texture_dims(target); unsigned num_quads = type.length / 4; - unsigned mip_filter; + unsigned mip_filter, i; struct lp_build_sample_context bld; struct lp_static_sampler_state derived_sampler_state = *static_sampler_state; LLVMTypeRef i32t = LLVMInt32TypeInContext(gallivm-context); @@ -1726,30 +1726,8 @@ lp_build_sample_soa(struct gallivm_state *gallivm, } } - /* -* always use the same coords for layer, shadow cmp, should probably -* put that into gallivm sampler interface I get real tired shuffling -* coordinates. -*/ - newcoords[0] = coords[0]; /* 1st coord */ - newcoords[1] = coords[1]; /* 2nd coord */ - newcoords[2] = coords[2]; /* 3rd coord (for cube, 3d and layer) */ - newcoords[3] = coords[3]; /* 4th coord (intended for cube array layer) */ - newcoords[4] = coords[2]; /* shadow cmp coord */ - if (target == PIPE_TEXTURE_1D_ARRAY) { - newcoords[2] = coords[1]; /* layer coord */ - /* FIXME: shadow cmp coord can be wrong if we don't take target from shader decl. */ - } - else if (target == PIPE_TEXTURE_2D_ARRAY) { - newcoords[2] = coords[2]; - newcoords[4] = coords[3]; - } - else if (target == PIPE_TEXTURE_CUBE) { - newcoords[4] = coords[3]; - } - else if (target == PIPE_TEXTURE_CUBE_ARRAY) { - assert(0); /* not handled */ - // layer coord is ok but shadow coord is impossible */ + for (i = 0; i 5; i++) { + newcoords[i] = coords[i]; } if (0) { diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index db8e997..cab53df 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -1614,13 +1614,14 @@ emit_tex( struct lp_build_tgsi_soa_context *bld, unsigned unit; LLVMValueRef lod_bias, explicit_lod; LLVMValueRef oow = NULL; - LLVMValueRef coords[4]; + LLVMValueRef coords[5]; LLVMValueRef offsets[3] = { NULL }; struct lp_derivatives derivs; struct lp_derivatives *deriv_ptr = NULL; boolean scalar_lod; - unsigned num_coords, num_derivs, num_offsets; - unsigned i; + unsigned num_derivs, num_offsets, i; + unsigned shadow_coord = 0; + unsigned layer_coord = 0; if (!bld-sampler) { _debug_printf(warning: found texture instruction but no sampler generator supplied\n); @@ -1631,55 +1632,58 @@ emit_tex( struct lp_build_tgsi_soa_context *bld, } switch (inst-Texture.Texture) { - case TGSI_TEXTURE_1D: - num_coords = 1; - num_offsets = 1; - num_derivs = 1; - break; case TGSI_TEXTURE_1D_ARRAY: - num_coords = 2; + layer_coord = 1; + /* fallthrough */ + case TGSI_TEXTURE_1D: num_offsets = 1; num_derivs = 1; break; + case
Re: [Mesa-dev] [PATCH] gallivm: do per-sample depth comparison instead of doing it post-filter
- lp_build_sample_compare(bld, newcoords[4], texel_out); + if (0) + lp_build_sample_compare(bld, newcoords[4], texel_out); } What does this do? The rest looks good to me! Reviewed-by: Zack Rusin za...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] llvmpipe: fix pipeline statistics with a null ps
If the fragment shader is null then pixel shader invocations have to be equal to zero. And if we're running a null ps then clipper invocations and primitives should be equal to zero but only if both stancil and depth testing are disabled. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_query.c | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_query.c b/src/gallium/drivers/llvmpipe/lp_query.c index cea2d07..fb24c36 100644 --- a/src/gallium/drivers/llvmpipe/lp_query.c +++ b/src/gallium/drivers/llvmpipe/lp_query.c @@ -32,6 +32,7 @@ #include draw/draw_context.h #include pipe/p_defines.h +#include tgsi/tgsi_scan.h #include util/u_memory.h #include os/os_time.h #include lp_context.h @@ -95,6 +96,7 @@ llvmpipe_get_query_result(struct pipe_context *pipe, union pipe_query_result *vresult) { struct llvmpipe_screen *screen = llvmpipe_screen(pipe-screen); + struct llvmpipe_context *llvmpipe = llvmpipe_context(pipe); unsigned num_threads = MAX2(1, screen-num_threads); struct llvmpipe_query *pq = llvmpipe_query(q); uint64_t *result = (uint64_t *)vresult; @@ -166,11 +168,31 @@ llvmpipe_get_query_result(struct pipe_context *pipe, case PIPE_QUERY_PIPELINE_STATISTICS: { struct pipe_query_data_pipeline_statistics *stats = (struct pipe_query_data_pipeline_statistics *)vresult; - /* only ps_invocations come from binned query */ - for (i = 0; i num_threads; i++) { - pq-stats.ps_invocations += pq-end[i]; + /* If we're running on what's considrered a null fragment + * shader, i.e. fragment shader consisting of a single + * END opcode or if the fragment shader is null then + * the number of ps_invocations should be zero */ + if (llvmpipe-fs llvmpipe-fs-info.base.num_tokens 1) { + /* only ps_invocations come from binned query */ + for (i = 0; i num_threads; i++) { +pq-stats.ps_invocations += pq-end[i]; + } + pq-stats.ps_invocations *= +LP_RASTER_BLOCK_SIZE * LP_RASTER_BLOCK_SIZE; + } else { + /* + * Clipper primitives and invocations are equal to zero + * if we're running a null fragment shader but only + * if both stencil and depth testing are disabled. + */ + if (!llvmpipe-depth_stencil-depth.enabled + !llvmpipe-depth_stencil-stencil[0].enabled + !llvmpipe-depth_stencil-stencil[1].enabled) { +pq-stats.c_primitives = 0; +pq-stats.c_invocations = 0; + } + pq-stats.ps_invocations = 0; } - pq-stats.ps_invocations *= LP_RASTER_BLOCK_SIZE * LP_RASTER_BLOCK_SIZE; *stats = pq-stats; } break; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallivm: simplify geometry shader mask handling a bit
From: Roland Scheidegger srol...@vmware.com Instead of reducing masks to 0/1 simply use the mask directly as -1. Also use some signed comparison instead of unsigned (as far as I understand these values have to be (very) small and signed means llvm doesn't have to apply additional logic to do the unsigned comparisons the cpu can't do). Saves a couple of instructions in some test geometry shader here. v2: that was a bit to much optimization, don't skip combining the masks... k, I think that one looks good. Reviewed-by: Zack Rusin za...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] draw: simplify prim mask construction
Looks good. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Roland Scheidegger srol...@vmware.com The code was quite weird, the second comparison was in fact a complete no-op and we can also do the comparison with the vector directly instead of scalar, which should not also be faster but it is way more obvious how that mask is actually going to look like. (Not sure how many instructions that saves as it turned out the mask wasn't used in the test geometry shader I used at all after all...) --- src/gallium/auxiliary/draw/draw_llvm.c | 32 ++-- 1 file changed, 10 insertions(+), 22 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 68f6369..84e3392 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -2040,31 +2040,19 @@ generate_mask_value(struct draw_gs_llvm_variant *variant, { struct gallivm_state *gallivm = variant-gallivm; LLVMBuilderRef builder = gallivm-builder; - LLVMValueRef bits[16]; - struct lp_type mask_type = lp_int_type(gs_type); - struct lp_type mask_elem_type = lp_elem_type(mask_type); - LLVMValueRef mask_val = lp_build_const_vec(gallivm, - mask_type, - 0); + struct lp_type mask_type = lp_int_type(gs_type); + LLVMValueRef num_prims; + LLVMValueRef mask_val = lp_build_const_vec(gallivm, mask_type, 0); unsigned i; - assert(gs_type.length = Elements(bits)); - - for (i = gs_type.length; i = 1; --i) { - int idx = i - 1; - LLVMValueRef ind = lp_build_const_int32(gallivm, i); - bits[idx] = lp_build_compare(gallivm, - mask_elem_type, PIPE_FUNC_GEQUAL, - variant-num_prims, ind); - } - for (i = 0; i gs_type.length; ++i) { - LLVMValueRef ind = lp_build_const_int32(gallivm, i); - mask_val = LLVMBuildInsertElement(builder, mask_val, bits[i], ind, ); + num_prims = lp_build_broadcast(gallivm, lp_build_vec_type(gallivm, mask_type), + variant-num_prims); + for (i = 0; i = gs_type.length; i++) { + LLVMValueRef idx = lp_build_const_int32(gallivm, i); + mask_val = LLVMBuildInsertElement(builder, mask_val, idx, idx, ); } - mask_val = lp_build_compare(gallivm, - mask_type, PIPE_FUNC_NOTEQUAL, - mask_val, - lp_build_const_int_vec(gallivm, mask_type, 0)); + mask_val = lp_build_compare(gallivm, mask_type, + PIPE_FUNC_GREATER, num_prims, mask_val); return mask_val; } -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallivm: fix exec_mask interaction with geometry shader after end of main
Ah, that looks like a great catch. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Roland Scheidegger srol...@vmware.com Because we must maintain an exec_mask even if there's currently nothing on the mask stack, we can still have an exec_mask at the end of the program. Effectively, this mask should be set back to default when returning from main. Without relying on END/RET opcode (I think it's valid to have neither) it is actually difficult to do this, as there doesn't seem any reasonable place to do it, so instead let's just say the exec_mask is invalid outside main (which it really is effectively). The problem is that geometry shader called end_primitive outside the shader (in the epilogue), and as a result used a bogus mask, leading to bugs if we had to set the (somewhat misnamed) ret_in_main bit anywhere. So just avoid the mask combining function when called from outside the shader. --- src/gallium/auxiliary/gallivm/lp_bld_tgsi.c |2 +- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 28 +++ 2 files changed, 14 insertions(+), 16 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c index 495940c..5a9e8d0 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c @@ -466,7 +466,7 @@ lp_build_tgsi_llvm( while (bld_base-pc != -1) { struct tgsi_full_instruction *instr = bld_base-instructions + - bld_base-pc; + bld_base-pc; const struct tgsi_opcode_info *opcode_info = tgsi_get_opcode_info(instr-Instruction.Opcode); if (!lp_build_tgsi_inst_llvm(bld_base, instr)) { diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index 589ea4f..db8e997 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -2691,11 +2691,21 @@ end_primitive_masked(struct lp_build_tgsi_context * bld_base, LLVMBuilderRef builder = bld-bld_base.base.gallivm-builder; if (bld-gs_iface-end_primitive) { + struct lp_build_context *uint_bld = bld_base-uint_bld; LLVMValueRef emitted_vertices_vec = LLVMBuildLoad(builder, bld-emitted_vertices_vec_ptr, ); LLVMValueRef emitted_prims_vec = LLVMBuildLoad(builder, bld-emitted_prims_vec_ptr, ); + LLVMValueRef emitted_mask = lp_build_cmp(uint_bld, PIPE_FUNC_NOTEQUAL, + emitted_vertices_vec, + uint_bld-zero); + /* We need to combine the current execution mask with the mask + telling us which, if any, execution slots actually have + unemitted primitives, this way we make sure that end_primitives + executes only on the paths that have unflushed vertices */ + mask = LLVMBuildAnd(builder, mask, emitted_mask, ); + bld-gs_iface-end_primitive(bld-gs_iface, bld-bld_base, emitted_vertices_vec, emitted_prims_vec); @@ -2735,20 +2745,7 @@ end_primitive( struct lp_build_tgsi_soa_context * bld = lp_soa_context(bld_base); if (bld-gs_iface-end_primitive) { - LLVMBuilderRef builder = bld_base-base.gallivm-builder; LLVMValueRef mask = mask_vec(bld_base); - struct lp_build_context *uint_bld = bld_base-uint_bld; - LLVMValueRef emitted_verts = LLVMBuildLoad( - builder, bld-emitted_vertices_vec_ptr, ); - LLVMValueRef emitted_mask = lp_build_cmp(uint_bld, PIPE_FUNC_NOTEQUAL, - emitted_verts, - uint_bld-zero); - /* We need to combine the current execution mask with the mask - telling us which, if any, execution slots actually have - unemitted primitives, this way we make sure that end_primitives - executes only on the paths that have unflushed vertices */ - mask = LLVMBuildAnd(builder, mask, emitted_mask, ); - end_primitive_masked(bld_base, mask); } } @@ -3148,8 +3145,9 @@ static void emit_epilogue(struct lp_build_tgsi_context * bld_base) LLVMValueRef total_emitted_vertices_vec; LLVMValueRef emitted_prims_vec; /* implicit end_primitives, needed in case there are any unflushed - vertices in the cache */ - end_primitive(NULL, bld_base, NULL); + vertices in the cache. Note must not call end_primitive here + since the exec_mask is not valid at this point. */ + end_primitive_masked(bld_base, lp_build_mask_value(bld-mask)); total_emitted_vertices_vec = LLVMBuildLoad(builder, bld
Re: [Mesa-dev] [PATCH 3/3] gallivm: implement new float comparison instructions returning integer masks
Nice. The entire series looks good. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Roland Scheidegger srol...@vmware.com FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the select. And just for consistency use the same appropriate ordered/unordered comparisons for the old opcodes as well. --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 81 +++- 1 file changed, 79 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c index f461661..86c3249 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c @@ -1094,6 +1094,70 @@ f2u_emit_cpu( emit_data-args[0]); } +/* TGSI_OPCODE_FSET Helper (CPU Only) */ +static void +fset_emit_cpu( + const struct lp_build_tgsi_action * action, + struct lp_build_tgsi_context * bld_base, + struct lp_build_emit_data * emit_data, + unsigned pipe_func) +{ + LLVMValueRef cond; + + if (pipe_func != PIPE_FUNC_NOTEQUAL) { + cond = lp_build_cmp_ordered(bld_base-base, pipe_func, + emit_data-args[0], emit_data-args[1]); + } + else { + cond = lp_build_cmp(bld_base-base, pipe_func, + emit_data-args[0], emit_data-args[1]); + + } + emit_data-output[emit_data-chan] = cond; +} + + +/* TGSI_OPCODE_FSEQ (CPU Only) */ +static void +fseq_emit_cpu( + const struct lp_build_tgsi_action * action, + struct lp_build_tgsi_context * bld_base, + struct lp_build_emit_data * emit_data) +{ + fset_emit_cpu(action, bld_base, emit_data, PIPE_FUNC_EQUAL); +} + +/* TGSI_OPCODE_ISGE (CPU Only) */ +static void +fsge_emit_cpu( + const struct lp_build_tgsi_action * action, + struct lp_build_tgsi_context * bld_base, + struct lp_build_emit_data * emit_data) +{ + fset_emit_cpu(action, bld_base, emit_data, PIPE_FUNC_GEQUAL); +} + +/* TGSI_OPCODE_ISLT (CPU Only) */ +static void +fslt_emit_cpu( + const struct lp_build_tgsi_action * action, + struct lp_build_tgsi_context * bld_base, + struct lp_build_emit_data * emit_data) +{ + fset_emit_cpu(action, bld_base, emit_data, PIPE_FUNC_LESS); +} + +/* TGSI_OPCODE_USNE (CPU Only) */ + +static void +fsne_emit_cpu( + const struct lp_build_tgsi_action * action, + struct lp_build_tgsi_context * bld_base, + struct lp_build_emit_data * emit_data) +{ + fset_emit_cpu(action, bld_base, emit_data, PIPE_FUNC_NOTEQUAL); +} + /* TGSI_OPCODE_FLR (CPU Only) */ static void @@ -1396,8 +1460,17 @@ set_emit_cpu( struct lp_build_emit_data * emit_data, unsigned pipe_func) { - LLVMValueRef cond = lp_build_cmp(bld_base-base, pipe_func, -emit_data-args[0], emit_data-args[1]); + LLVMValueRef cond; + + if (pipe_func != PIPE_FUNC_NOTEQUAL) { + cond = lp_build_cmp_ordered(bld_base-base, pipe_func, + emit_data-args[0], emit_data-args[1]); + } + else { + cond = lp_build_cmp(bld_base-base, pipe_func, + emit_data-args[0], emit_data-args[1]); + + } emit_data-output[emit_data-chan] = lp_build_select(bld_base-base, cond, bld_base-base.one, @@ -1716,6 +1789,10 @@ lp_set_default_actions_cpu( bld_base-op_actions[TGSI_OPCODE_F2I].emit = f2i_emit_cpu; bld_base-op_actions[TGSI_OPCODE_F2U].emit = f2u_emit_cpu; bld_base-op_actions[TGSI_OPCODE_FLR].emit = flr_emit_cpu; + bld_base-op_actions[TGSI_OPCODE_FSEQ].emit = fseq_emit_cpu; + bld_base-op_actions[TGSI_OPCODE_FSGE].emit = fsge_emit_cpu; + bld_base-op_actions[TGSI_OPCODE_FSLT].emit = fslt_emit_cpu; + bld_base-op_actions[TGSI_OPCODE_FSNE].emit = fsne_emit_cpu; bld_base-op_actions[TGSI_OPCODE_I2F].emit = i2f_emit_cpu; bld_base-op_actions[TGSI_OPCODE_IABS].emit = iabs_emit_cpu; -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] draw: make sure that the stages setup outputs
Calling the prepare outputs cleans up the slot assignments for outputs, unfortunately aapoint and aaline didn't have code to reset their slots after the initial setup, this was messing up our slot assignments. The unfilled stage was just missing the initial assignment of the face slot. This fixes all of the reported piglit failures. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_context.c |2 + src/gallium/auxiliary/draw/draw_pipe.h |5 +- src/gallium/auxiliary/draw/draw_pipe_aaline.c | 27 --- src/gallium/auxiliary/draw/draw_pipe_aapoint.c | 56 ++- src/gallium/auxiliary/draw/draw_pipe_unfilled.c |2 + 5 files changed, 62 insertions(+), 30 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 2d4843e..d1fac0c 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -564,6 +564,8 @@ draw_prepare_shader_outputs(struct draw_context *draw) draw_remove_extra_vertex_attribs(draw); draw_prim_assembler_prepare_outputs(draw-ia); draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled); + draw_aapoint_prepare_outputs(draw, draw-pipeline.aapoint); + draw_aaline_prepare_outputs(draw, draw-pipeline.aaline); } /** diff --git a/src/gallium/auxiliary/draw/draw_pipe.h b/src/gallium/auxiliary/draw/draw_pipe.h index 7c9ed6c..ad3165f 100644 --- a/src/gallium/auxiliary/draw/draw_pipe.h +++ b/src/gallium/auxiliary/draw/draw_pipe.h @@ -101,7 +101,10 @@ void draw_pipe_passthrough_tri(struct draw_stage *stage, struct prim_header *hea void draw_pipe_passthrough_line(struct draw_stage *stage, struct prim_header *header); void draw_pipe_passthrough_point(struct draw_stage *stage, struct prim_header *header); - +void draw_aapoint_prepare_outputs(struct draw_context *context, + struct draw_stage *stage); +void draw_aaline_prepare_outputs(struct draw_context *context, + struct draw_stage *stage); void draw_unfilled_prepare_outputs(struct draw_context *context, struct draw_stage *stage); diff --git a/src/gallium/auxiliary/draw/draw_pipe_aaline.c b/src/gallium/auxiliary/draw/draw_pipe_aaline.c index aa88459..c44c236 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_aaline.c +++ b/src/gallium/auxiliary/draw/draw_pipe_aaline.c @@ -692,13 +692,7 @@ aaline_first_line(struct draw_stage *stage, struct prim_header *header) return; } - /* update vertex attrib info */ - aaline-pos_slot = draw_current_shader_position_output(draw);; - - /* allocate the extra post-transformed vertex attribute */ - aaline-tex_slot = draw_alloc_extra_vertex_attrib(draw, - TGSI_SEMANTIC_GENERIC, - aaline-fs-generic_attrib); + draw_aaline_prepare_outputs(draw, draw-pipeline.aaline); /* how many samplers? */ /* we'll use sampler/texture[pstip-sampler_unit] for the stipple */ @@ -953,6 +947,25 @@ aaline_set_sampler_views(struct pipe_context *pipe, } +void +draw_aaline_prepare_outputs(struct draw_context *draw, +struct draw_stage *stage) +{ + struct aaline_stage *aaline = aaline_stage(stage); + const struct pipe_rasterizer_state *rast = draw-rasterizer; + + /* update vertex attrib info */ + aaline-pos_slot = draw_current_shader_position_output(draw);; + + if (!rast-line_smooth) + return; + + /* allocate the extra post-transformed vertex attribute */ + aaline-tex_slot = draw_alloc_extra_vertex_attrib(draw, + TGSI_SEMANTIC_GENERIC, + aaline-fs-generic_attrib); +} + /** * Called by drivers that want to install this AA line prim stage * into the draw module's pipeline. This will not be used if the diff --git a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c index 0d7b88e..7ae1ddd 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_aapoint.c +++ b/src/gallium/auxiliary/draw/draw_pipe_aapoint.c @@ -696,28 +696,7 @@ aapoint_first_point(struct draw_stage *stage, struct prim_header *header) */ bind_aapoint_fragment_shader(aapoint); - /* update vertex attrib info */ - aapoint-pos_slot = draw_current_shader_position_output(draw); - - /* allocate the extra post-transformed vertex attribute */ - aapoint-tex_slot = draw_alloc_extra_vertex_attrib(draw, - TGSI_SEMANTIC_GENERIC, - aapoint-fs-generic_attrib); - assert(aapoint-tex_slot 0); /* output[0] is vertex pos */ - - /* find psize slot in post-transform vertex */ - aapoint-psize_slot = -1; - if (draw-rasterizer
Re: [Mesa-dev] [RFC]: gallium: add new float comparison opcodes returning integer booleans
- Original Message - This is a proposal for new comparison instructions, as the old ones don't really fit modern (graphic or opencl I guess for that matter) languages well. If you've got objections, think the naming is crazy or whatnot I'm open for suggestions :-). I would think this is not just a much better fit for d3d10/glsl but for hw as well. Yea, that makes sense to me. Comparison instructions should return consistent results across types. I'd just add a line or so to the docs to make it explicit how they're different from the old opcodes, I expect that for people new to gallium it's going to be easy to miss. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] draw: cleanup the extra attribs
Before inserting new front face and prim id outputs cleanup the old extra outputs, otherwise our cache will use previous output slots which will break as soon as outputs of the current shader don't match the last. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_context.c |1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index af9caee..2dc6772 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -555,6 +555,7 @@ draw_get_shader_info(const struct draw_context *draw) void draw_prepare_shader_outputs(struct draw_context *draw) { + draw_remove_extra_vertex_attribs(draw); draw_ia_prepare_outputs(draw, draw-pipeline.ia); draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled); } -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] draw: reset the vertex id when injecting new primitive id
Without reseting the vertex id, with primitives where the same vertex is used with different primitives (e.g. tri/lines strips) our vbuf module won't re-emit those vertices with the changed primitive id. So lets reset the vertex id whenever injecting new primitive id to make sure that the vertex data is correctly emitted. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_pipe_ia.c |9 + 1 file changed, 9 insertions(+) diff --git a/src/gallium/auxiliary/draw/draw_pipe_ia.c b/src/gallium/auxiliary/draw/draw_pipe_ia.c index ecbb233..d64f19b 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_ia.c +++ b/src/gallium/auxiliary/draw/draw_pipe_ia.c @@ -68,6 +68,15 @@ inject_primid(struct draw_stage *stage, for (i = 0; i num_verts; ++i) { struct vertex_header *v = header-v[i]; + /* We have to reset the vertex_id because it's used by + * vbuf to figure out if the vertex had already been + * emitted. For line/tri strips the first vertex of + * subsequent primitives would already be emitted, + * but since we're changing the primitive id on the vertex + * we want to make sure it's reemitted with the correct + * data. + */ + v-vertex_id = UNDEFINED_VERTEX_ID; memcpy(v-data[slot][0], primid, sizeof(primid)); memcpy(v-data[slot][1], primid, sizeof(primid)); memcpy(v-data[slot][2], primid, sizeof(primid)); -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] draw: rewrite primitive assembler
We can't be injecting the primitive id's in the pipeline because by that time the primitives have already been decomposed. To properly number the primitives we need to handle the adjacency primitives by hand. This patch moves the prim id injection into the original primitive assembler and completely removes the useless pipeline stage. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/Makefile.sources |1 - src/gallium/auxiliary/draw/draw_context.c|8 +- src/gallium/auxiliary/draw/draw_pipe.c |4 - src/gallium/auxiliary/draw/draw_pipe.h |7 - src/gallium/auxiliary/draw/draw_pipe_ia.c| 259 -- src/gallium/auxiliary/draw/draw_pipe_validate.c | 14 -- src/gallium/auxiliary/draw/draw_prim_assembler.c | 168 +- src/gallium/auxiliary/draw/draw_prim_assembler.h | 12 + src/gallium/auxiliary/draw/draw_private.h|4 +- 9 files changed, 180 insertions(+), 297 deletions(-) delete mode 100644 src/gallium/auxiliary/draw/draw_pipe_ia.c diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index b0172de..acbcef7 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -13,7 +13,6 @@ C_SOURCES := \ draw/draw_pipe_clip.c \ draw/draw_pipe_cull.c \ draw/draw_pipe_flatshade.c \ -draw/draw_pipe_ia.c \ draw/draw_pipe_offset.c \ draw/draw_pipe_pstipple.c \ draw/draw_pipe_stipple.c \ diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 2dc6772..2d4843e 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -40,6 +40,7 @@ #include util/u_prim.h #include draw_context.h #include draw_pipe.h +#include draw_prim_assembler.h #include draw_vs.h #include draw_gs.h @@ -95,6 +96,10 @@ draw_create_context(struct pipe_context *pipe, boolean try_llvm) if (!draw_init(draw)) goto err_destroy; + draw-ia = draw_prim_assembler_create(draw); + if (!draw-ia) + goto err_destroy; + return draw; err_destroy: @@ -206,6 +211,7 @@ void draw_destroy( struct draw_context *draw ) draw-render-destroy( draw-render ); */ + draw_prim_assembler_destroy(draw-ia); draw_pipeline_destroy( draw ); draw_pt_destroy( draw ); draw_vs_destroy( draw ); @@ -556,7 +562,7 @@ void draw_prepare_shader_outputs(struct draw_context *draw) { draw_remove_extra_vertex_attribs(draw); - draw_ia_prepare_outputs(draw, draw-pipeline.ia); + draw_prim_assembler_prepare_outputs(draw-ia); draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled); } diff --git a/src/gallium/auxiliary/draw/draw_pipe.c b/src/gallium/auxiliary/draw/draw_pipe.c index 8140299..f1ee6cb 100644 --- a/src/gallium/auxiliary/draw/draw_pipe.c +++ b/src/gallium/auxiliary/draw/draw_pipe.c @@ -49,7 +49,6 @@ boolean draw_pipeline_init( struct draw_context *draw ) draw-pipeline.clip = draw_clip_stage( draw ); draw-pipeline.flatshade = draw_flatshade_stage( draw ); draw-pipeline.cull = draw_cull_stage( draw ); - draw-pipeline.ia= draw_ia_stage( draw ); draw-pipeline.validate = draw_validate_stage( draw ); draw-pipeline.first = draw-pipeline.validate; @@ -62,7 +61,6 @@ boolean draw_pipeline_init( struct draw_context *draw ) !draw-pipeline.clip || !draw-pipeline.flatshade || !draw-pipeline.cull || - !draw-pipeline.ia || !draw-pipeline.validate) return FALSE; @@ -97,8 +95,6 @@ void draw_pipeline_destroy( struct draw_context *draw ) draw-pipeline.flatshade-destroy( draw-pipeline.flatshade ); if (draw-pipeline.cull) draw-pipeline.cull-destroy( draw-pipeline.cull ); - if (draw-pipeline.ia) - draw-pipeline.ia-destroy( draw-pipeline.ia ); if (draw-pipeline.validate) draw-pipeline.validate-destroy( draw-pipeline.validate ); if (draw-pipeline.aaline) diff --git a/src/gallium/auxiliary/draw/draw_pipe.h b/src/gallium/auxiliary/draw/draw_pipe.h index 70822a4..7c9ed6c 100644 --- a/src/gallium/auxiliary/draw/draw_pipe.h +++ b/src/gallium/auxiliary/draw/draw_pipe.h @@ -91,10 +91,6 @@ extern struct draw_stage *draw_stipple_stage( struct draw_context *context ); extern struct draw_stage *draw_wide_line_stage( struct draw_context *context ); extern struct draw_stage *draw_wide_point_stage( struct draw_context *context ); extern struct draw_stage *draw_validate_stage( struct draw_context *context ); -extern struct draw_stage *draw_ia_stage(struct draw_context *context); - -boolean draw_ia_stage_required(const struct draw_context *context, - unsigned prim); extern void draw_free_temp_verts( struct draw_stage *stage ); extern boolean draw_alloc_temp_verts( struct draw_stage *stage, unsigned nr ); @@ -108,9 +104,6 @@ void
Re: [Mesa-dev] [PATCH 2/3] draw: reset the vertex id when injecting new primitive id
Don't worry about this one too much. The next patch removes draw_pipe_ia.c anyway... - Original Message - Without reseting the vertex id, with primitives where the same vertex is used with different primitives (e.g. tri/lines strips) our vbuf module won't re-emit those vertices with the changed primitive id. So lets reset the vertex id whenever injecting new primitive id to make sure that the vertex data is correctly emitted. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_pipe_ia.c |9 + 1 file changed, 9 insertions(+) diff --git a/src/gallium/auxiliary/draw/draw_pipe_ia.c b/src/gallium/auxiliary/draw/draw_pipe_ia.c index ecbb233..d64f19b 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_ia.c +++ b/src/gallium/auxiliary/draw/draw_pipe_ia.c @@ -68,6 +68,15 @@ inject_primid(struct draw_stage *stage, for (i = 0; i num_verts; ++i) { struct vertex_header *v = header-v[i]; + /* We have to reset the vertex_id because it's used by + * vbuf to figure out if the vertex had already been + * emitted. For line/tri strips the first vertex of + * subsequent primitives would already be emitted, + * but since we're changing the primitive id on the vertex + * we want to make sure it's reemitted with the correct + * data. + */ + v-vertex_id = UNDEFINED_VERTEX_ID; memcpy(v-data[slot][0], primid, sizeof(primid)); memcpy(v-data[slot][1], primid, sizeof(primid)); memcpy(v-data[slot][2], primid, sizeof(primid)); -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] gallivm: use texture target from shader instead of static state for size query
Series looks good to me. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Roland Scheidegger srol...@vmware.com d3d10 has no notion of distinct array resources neither at the resource nor sampler view level. However, shader dcl of resources certainly has, and d3d10 expects resinfo to return the values according to that - in particular a resource might have been a 1d texture with some array layers, then the sampler view might have only used 1 layer so it can be accessed both as 1d or 1d array texture (I think - the former definitely works). resinfo of a resource decleared as array needs to return number of array layers but non-array resource needs to return 0 (and not 1). Hence fix this by passing the target from the shader decl to emit_size_query and use that (in case of OpenGL the target will come from the instruction itself). Could probably do the same for actual sampling, though it may not matter there (as the bogus components will essentially get clamped away), possibly could wreak havoc though if it REALLY doesn't match (which is of course an error but still). --- src/gallium/auxiliary/draw/draw_llvm_sample.c |2 + src/gallium/auxiliary/gallivm/lp_bld_sample.h |1 + src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 32 ++- src/gallium/auxiliary/gallivm/lp_bld_tgsi.h |1 + src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 43 - src/gallium/drivers/llvmpipe/lp_tex_sample.c |2 + 6 files changed, 77 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm_sample.c b/src/gallium/auxiliary/draw/draw_llvm_sample.c index 3016d7c..f10cba3 100644 --- a/src/gallium/auxiliary/draw/draw_llvm_sample.c +++ b/src/gallium/auxiliary/draw/draw_llvm_sample.c @@ -270,6 +270,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct lp_build_sampler_soa *base, struct gallivm_state *gallivm, struct lp_type type, unsigned texture_unit, + unsigned target, boolean need_nr_mips, boolean scalar_lod, LLVMValueRef explicit_lod, /* optional */ @@ -284,6 +285,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct lp_build_sampler_soa *base, sampler-dynamic_state.base, type, texture_unit, + target, need_nr_mips, scalar_lod, explicit_lod, diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.h b/src/gallium/auxiliary/gallivm/lp_bld_sample.h index dff8be2..db3ea1d 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h @@ -497,6 +497,7 @@ lp_build_size_query_soa(struct gallivm_state *gallivm, struct lp_sampler_dynamic_state *dynamic_state, struct lp_type int_type, unsigned texture_unit, +unsigned target, boolean need_nr_mips, boolean scalar_lod, LLVMValueRef explicit_lod, diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c index b0bb58b..e403ac8 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c @@ -1943,6 +1943,7 @@ lp_build_size_query_soa(struct gallivm_state *gallivm, struct lp_sampler_dynamic_state *dynamic_state, struct lp_type int_type, unsigned texture_unit, +unsigned target, boolean need_nr_mips, boolean scalar_lod, LLVMValueRef explicit_lod, @@ -1955,9 +1956,36 @@ lp_build_size_query_soa(struct gallivm_state *gallivm, unsigned num_lods = 1; struct lp_build_context bld_int_vec; - dims = texture_dims(static_state-target); + /* +* Do some sanity verification about bound texture and shader dcl target. +* Not entirely sure what's possible but assume array/non-array +* always compatible (probably not ok for OpenGL but d3d10 has no +* distinction of arrays at the resource level). +* Everything else looks bogus (though not entirely sure about rect/2d). +* Currently disabled because it causes assertion failures if there's +* nothing bound (or rather a dummy texture, not that this case would +* return the right values). +*/ + if (0 static_state
Re: [Mesa-dev] [PATCH] gallivm: set non-existing values really to zero in size queries for d3d10
Looks good. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Roland Scheidegger srol...@vmware.com My previous attempt at doing so double-failed miserably (minification of zero still gives one, and even if it would not the value was never written anyway). While here also rename the confusingly named int_vec bld as we have int vecs of different sizes, and rename need_nr_mips (as this also changes out-of-bounds behavior) to is_sviewinfo too. --- src/gallium/auxiliary/draw/draw_llvm_sample.c |4 +-- src/gallium/auxiliary/gallivm/lp_bld_sample.h |2 +- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 34 ++--- src/gallium/drivers/llvmpipe/lp_tex_sample.c |4 +-- 4 files changed, 22 insertions(+), 22 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm_sample.c b/src/gallium/auxiliary/draw/draw_llvm_sample.c index f10cba3..97b0255 100644 --- a/src/gallium/auxiliary/draw/draw_llvm_sample.c +++ b/src/gallium/auxiliary/draw/draw_llvm_sample.c @@ -271,7 +271,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct lp_build_sampler_soa *base, struct lp_type type, unsigned texture_unit, unsigned target, - boolean need_nr_mips, + boolean is_sviewinfo, boolean scalar_lod, LLVMValueRef explicit_lod, /* optional */ LLVMValueRef *sizes_out) @@ -286,7 +286,7 @@ draw_llvm_sampler_soa_emit_size_query(const struct lp_build_sampler_soa *base, type, texture_unit, target, - need_nr_mips, + is_sviewinfo, scalar_lod, explicit_lod, sizes_out); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.h b/src/gallium/auxiliary/gallivm/lp_bld_sample.h index db3ea1d..75e8c59 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h @@ -498,7 +498,7 @@ lp_build_size_query_soa(struct gallivm_state *gallivm, struct lp_type int_type, unsigned texture_unit, unsigned target, -boolean need_nr_mips, +boolean is_viewinfo, boolean scalar_lod, LLVMValueRef explicit_lod, LLVMValueRef *sizes_out); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c index e403ac8..65d6e7b 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c @@ -1944,7 +1944,7 @@ lp_build_size_query_soa(struct gallivm_state *gallivm, struct lp_type int_type, unsigned texture_unit, unsigned target, -boolean need_nr_mips, +boolean is_sviewinfo, boolean scalar_lod, LLVMValueRef explicit_lod, LLVMValueRef *sizes_out) @@ -1954,7 +1954,7 @@ lp_build_size_query_soa(struct gallivm_state *gallivm, int dims, i; boolean has_array; unsigned num_lods = 1; - struct lp_build_context bld_int_vec; + struct lp_build_context bld_int_vec4; /* * Do some sanity verification about bound texture and shader dcl target. @@ -1997,24 +1997,19 @@ lp_build_size_query_soa(struct gallivm_state *gallivm, assert(!int_type.floating); - lp_build_context_init(bld_int_vec, gallivm, lp_type_int_vec(32, 128)); + lp_build_context_init(bld_int_vec4, gallivm, lp_type_int_vec(32, 128)); if (explicit_lod) { /* FIXME: this needs to honor per-element lod */ lod = LLVMBuildExtractElement(gallivm-builder, explicit_lod, lp_build_const_int32(gallivm, 0), ); first_level = dynamic_state-first_level(dynamic_state, gallivm, texture_unit); level = LLVMBuildAdd(gallivm-builder, lod, first_level, level); - lod = lp_build_broadcast_scalar(bld_int_vec, level); + lod = lp_build_broadcast_scalar(bld_int_vec4, level); } else { - lod = bld_int_vec.zero; + lod = bld_int_vec4.zero; } - if (need_nr_mips) { - size = bld_int_vec.zero; - } - else { - size = bld_int_vec.undef; - } + size = bld_int_vec4.undef; size = LLVMBuildInsertElement(gallivm-builder, size
Re: [Mesa-dev] [PATCH 3/3] draw: rewrite primitive assembler
Series looks good though I'm unsure why the pipeline stage doesn't work. Where does that decomposition happen? Is that something like GS outputting multiple prims in the same topology which all need the same id? No, it's because the pipeline stage is ran on the decomposed primitives. The issue is that the pipeline stage is ran after stream output and stream output requires decomposed primitives, meaning that by the time we get to the pipeline we lost the original primitive info. The d3d10 wants the primitive id's to be injected into vertices but in the order in which they are traversed on the original (striped) primitives, so we need to do it when doing the original decomposition where we have access to the original topology and can number the vertices correctly. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] draw: rewrite primitive assembler
Am 09.08.2013 00:40, schrieb Zack Rusin: Series looks good though I'm unsure why the pipeline stage doesn't work. Where does that decomposition happen? Is that something like GS outputting multiple prims in the same topology which all need the same id? No, it's because the pipeline stage is ran on the decomposed primitives. The issue is that the pipeline stage is ran after stream output and stream output requires decomposed primitives, meaning that by the time we get to the pipeline we lost the original primitive info. The d3d10 wants the primitive id's to be injected into vertices but in the order in which they are traversed on the original (striped) primitives, so we need to do it when doing the original decomposition where we have access to the original topology and can number the vertices correctly. z I see I totally forgot stream out needs decomposed primitives, and I guess stream out (and prim assembler) can't run as an ordinary pipeline stage? I was thinking about that when I was doing it and I thought it should be possible to rewrite SO as a pipeline stage, but we'd need to change the interface to include some sort of a prepare stage and then redo the code in so. Once so would be in a pipeline then we could think about primitive assembler, but that would require also more changes to the pipeline because we want to know if the primitives are adjacency primitives and pipeline stages get only tris/lines/points... and this was the point at which I went screw it, i'm injecting prim ids in the primitive assembler. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] gallivm: propagate scalar_lod to emit_size_query too
- Original Message - From: Roland Scheidegger srol...@vmware.com Clearly the returned values need to be per-element if the lod is per element. Does not actually change behavior yet. Looks good. For the entire series: Reviewed-by: Zack Rusin za...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallivm: honor d3d10 floating point rules for shadow comparisons
- Original Message - From: Roland Scheidegger srol...@vmware.com d3d10 specifies ordered comparisons for everything but not_equal which is unordered (http://msdn.microsoft.com/en-us/library/windows/desktop/cc308050.aspx). OpenGL probably doesn't care. This series looks good too. For all three: Reviewed-by: Zack Rusin za...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] draw: fix slot detection
Nowadays -1 for slots means that the semantic is not present, so we need to store it in a signed variables, otherwise 0 comparisons are pointless. Fixes http://bugzilla.eng.vmware.com/show_bug.cgi?id=67811 (at least with softpipe, edgeflags don't work wit llvmpipe) Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_pipe_unfilled.c |2 +- src/gallium/drivers/llvmpipe/lp_setup_context.h |2 +- src/gallium/drivers/llvmpipe/lp_setup_line.c|1 - 3 files changed, 2 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c index c6ee95c..68bab72 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c +++ b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c @@ -67,7 +67,7 @@ inject_front_face_info(struct draw_stage *stage, boolean is_front_face = ( (stage-draw-rasterizer-front_ccw ccw) || (!stage-draw-rasterizer-front_ccw !ccw)); - unsigned slot = unfilled-face_slot; + int slot = unfilled-face_slot; unsigned i; /* In case the backend doesn't care about it */ diff --git a/src/gallium/drivers/llvmpipe/lp_setup_context.h b/src/gallium/drivers/llvmpipe/lp_setup_context.h index ea1d0d6..44be85f 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_context.h +++ b/src/gallium/drivers/llvmpipe/lp_setup_context.h @@ -106,7 +106,7 @@ struct lp_setup_context float psize; unsigned viewport_index_slot; unsigned layer_slot; - unsigned face_slot; + int face_slot; struct pipe_framebuffer_state fb; struct u_rect framebuffer; diff --git a/src/gallium/drivers/llvmpipe/lp_setup_line.c b/src/gallium/drivers/llvmpipe/lp_setup_line.c index 3b16163..a25a6b0 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_line.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_line.c @@ -622,7 +622,6 @@ try_setup_line( struct lp_setup_context *setup, } else { line-inputs.frontfacing = TRUE; } - /* Setup parameter interpolants: */ -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] util: implement table-based + linear interpolation linear-to-srgb conversion
Looks good to me. A small comment above the disabled version noting that it's disabled because it's a bit slower might be useful for the next person who reads the code. Reviewed-by: Zack Rusin za...@vmware.com - Original Message - From: Roland Scheidegger srol...@vmware.com Should be much faster, seems to work in softpipe. While here (also it's now disabled) fix up the pow factor - the former value is what is in GL core it is however not actually accurate to fp32 standard (as it is 1.0/2.4), and if someone would do all the accurate math there's no reason to waste 8 mantissa bits or so... v2: use real table generating function instead of just printing the values (might take a bit longer as it does calculations on some 3+ million floats but much more descriptive obviously). Also fix up another pow factor (this time in the python code) - wondering where the couple one bit errors came from :-(. --- src/gallium/auxiliary/util/u_format_srgb.h | 55 +- src/gallium/auxiliary/util/u_format_srgb.py | 57 ++- 2 files changed, 101 insertions(+), 11 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format_srgb.h b/src/gallium/auxiliary/util/u_format_srgb.h index 82ed957..f3e1b20 100644 --- a/src/gallium/auxiliary/util/u_format_srgb.h +++ b/src/gallium/auxiliary/util/u_format_srgb.h @@ -39,6 +39,7 @@ #include pipe/p_compiler.h +#include u_pack_color.h #include u_math.h @@ -51,23 +52,57 @@ util_format_srgb_to_linear_8unorm_table[256]; extern const uint8_t util_format_linear_to_srgb_8unorm_table[256]; +extern const unsigned +util_format_linear_to_srgb_helper_table[104]; + /** * Convert a unclamped linear float to srgb value in the [0,255]. - * XXX this hasn't been tested (render to srgb surface). - * XXX this needs optimization. */ static INLINE uint8_t util_format_linear_float_to_srgb_8unorm(float x) { - if (x = 1.0f) - return 255; - else if (x = 0.0031308f) - return float_to_ubyte(1.055f * powf(x, 0.41666f) - 0.055f); - else if (x 0.0f) - return float_to_ubyte(12.92f * x); - else - return 0; + if (0) { + if (x = 1.0f) + return 255; + else if (x = 0.0031308f) + return float_to_ubyte(1.055f * powf(x, 0.4166f) - 0.055f); + else if (x 0.0f) + return float_to_ubyte(12.92f * x); + else + return 0; + } + else { + /* + * This is taken from https://gist.github.com/rygorous/2203834 + * Use LUT and do linear interpolation. + */ + union fi almostone, minval, f; + unsigned tab, bias, scale, t; + + almostone.ui = 0x3f7f; + minval.ui = (127-13) 23; + + /* + * Clamp to [2^(-13), 1-eps]; these two values map to 0 and 1, respectively. + * The tests are carefully written so that NaNs map to 0, same as in the + * reference implementation. + */ + if (!(x minval.f)) + x = minval.f; + if (x almostone.f) + x = almostone.f; + + /* Do the table lookup and unpack bias, scale */ + f.f = x; + tab = util_format_linear_to_srgb_helper_table[(f.ui - minval.ui) 20]; + bias = (tab 16) 9; + scale = tab 0x; + + /* Grab next-highest mantissa bits and perform linear interpolation */ + t = (f.ui 12) 0xff; + return (uint8_t) ((bias + scale*t) 16); + } } diff --git a/src/gallium/auxiliary/util/u_format_srgb.py b/src/gallium/auxiliary/util/u_format_srgb.py index cd63ae7..c6c02f0 100644 --- a/src/gallium/auxiliary/util/u_format_srgb.py +++ b/src/gallium/auxiliary/util/u_format_srgb.py @@ -40,6 +40,7 @@ CopyRight = ''' import math +import struct def srgb_to_linear(x): @@ -51,10 +52,11 @@ def srgb_to_linear(x): def linear_to_srgb(x): if x = 0.0031308: -return 1.055 * math.pow(x, 0.41666) - 0.055 +return 1.055 * math.pow(x, 0.4166) - 0.055 else: return 12.92 * x + def generate_srgb_tables(): print 'const float' print 'util_format_srgb_8unorm_to_linear_float_table[256] = {' @@ -84,6 +86,59 @@ def generate_srgb_tables(): print '};' print +# calculate the table interpolation values used in float linear to unorm8 srgb +numexp = 13 +mantissa_msb = 3 +# stepshift is just used to only use every x-th float to make things faster, +# 5 is largest value which still gives exact same table as 0 +stepshift = 5 +nbuckets = numexp mantissa_msb +bucketsize = (1 (23 - mantissa_msb)) stepshift +mantshift = 12 +valtable = [] +sum_aa = float(bucketsize) +sum_ab = 0.0 +sum_bb = 0.0 +for i in range(0, bucketsize): +j = (i stepshift) mantshift +sum_ab += j +sum_bb += j*j +inv_det = 1.0 / (sum_aa * sum_bb - sum_ab * sum_ab
[Mesa-dev] [PATCH 1/8] tgsi: detect prim id and front face usage in fs
Adding code to detect the usage of prim id and front face semantics in fragment shaders. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/tgsi/tgsi_scan.c |9 +++-- src/gallium/auxiliary/tgsi/tgsi_scan.h |1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.c b/src/gallium/auxiliary/tgsi/tgsi_scan.c index 1fe1a07..e7bf6e6 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_scan.c +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.c @@ -166,9 +166,14 @@ tgsi_scan_shader(const struct tgsi_token *tokens, info-input_cylindrical_wrap[reg] = (ubyte)fulldecl-Interp.CylindricalWrap; info-num_inputs++; - if (procType == TGSI_PROCESSOR_FRAGMENT - fulldecl-Semantic.Name == TGSI_SEMANTIC_POSITION) + if (procType == TGSI_PROCESSOR_FRAGMENT) { + if (fulldecl-Semantic.Name == TGSI_SEMANTIC_POSITION) info-reads_position = TRUE; + else if (fulldecl-Semantic.Name == TGSI_SEMANTIC_PRIMID) +info-uses_primid = TRUE; + else if (fulldecl-Semantic.Name == TGSI_SEMANTIC_FACE) +info-uses_frontface = TRUE; + } } else if (file == TGSI_FILE_SYSTEM_VALUE) { unsigned index = fulldecl-Range.First; diff --git a/src/gallium/auxiliary/tgsi/tgsi_scan.h b/src/gallium/auxiliary/tgsi/tgsi_scan.h index cfa2b8e..e2fa73a 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_scan.h +++ b/src/gallium/auxiliary/tgsi/tgsi_scan.h @@ -74,6 +74,7 @@ struct tgsi_shader_info boolean uses_instanceid; boolean uses_vertexid; boolean uses_primid; + boolean uses_frontface; boolean origin_lower_left; boolean pixel_center_integer; boolean color0_writes_all_cbufs; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/8] draw: stop crashing with extra shader outputs
Draw sometimes injects extra shader outputs (aa points, lines or front face), unfortunately most of the pipeline and llvm code didn't handle them at all. It only worked if number of inputs happened to be bigger or equal to the number of shader outputs plus the extra injected outputs. In particular when running the pipeline which depends on the vertex_id in the vertex_header things were completely broken. The patch adjust the code to correctly use the total number of shader outputs (the standard ones plus the injected ones) to make it all stop crashing and work. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_context.c | 43 src/gallium/auxiliary/draw/draw_context.h |5 +++ src/gallium/auxiliary/draw/draw_gs.c |2 +- src/gallium/auxiliary/draw/draw_llvm.c |3 ++ src/gallium/auxiliary/draw/draw_llvm.h |4 +- src/gallium/auxiliary/draw/draw_pipe.h |3 +- .../auxiliary/draw/draw_pt_fetch_shade_pipeline.c |6 +-- .../draw/draw_pt_fetch_shade_pipeline_llvm.c |8 +--- src/gallium/auxiliary/draw/draw_vs_variant.c |2 +- 9 files changed, 61 insertions(+), 15 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 2e95b5c..8bf3596 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -622,6 +622,49 @@ draw_num_shader_outputs(const struct draw_context *draw) /** + * Return total number of the vertex shader outputs. This function + * also counts any extra vertex output attributes that may + * be filled in by some draw stages (such as AA point, AA line, + * front face). + */ +uint +draw_total_vs_shader_outputs(const struct draw_context *draw) +{ + const struct tgsi_shader_info *info = draw-vs.vertex_shader-info; + uint count; + + count = info-num_outputs; + count += draw-extra_shader_outputs.num; + + return count; +} + +/** + * Return total number of the geometry shader outputs. This function + * also counts any extra geometry output attributes that may + * be filled in by some draw stages (such as AA point, AA line, front + * face). + */ +uint +draw_total_gs_shader_outputs(const struct draw_context *draw) +{ + + const struct tgsi_shader_info *info; + uint count; + + if (!draw-gs.geometry_shader) + return 0; + + info = draw-gs.geometry_shader-info; + + count = info-num_outputs; + count += draw-extra_shader_outputs.num; + + return count; +} + + +/** * Provide TGSI sampler objects for vertex/geometry shaders that use * texture fetches. This state only needs to be set once per context. * This might only be used by software drivers for the time being. diff --git a/src/gallium/auxiliary/draw/draw_context.h b/src/gallium/auxiliary/draw/draw_context.h index 0815047..e9aa24d 100644 --- a/src/gallium/auxiliary/draw/draw_context.h +++ b/src/gallium/auxiliary/draw/draw_context.h @@ -139,6 +139,11 @@ draw_will_inject_frontface(const struct draw_context *draw); uint draw_num_shader_outputs(const struct draw_context *draw); +uint +draw_total_vs_shader_outputs(const struct draw_context *draw); + +uint +draw_total_gs_shader_outputs(const struct draw_context *draw); void draw_texture_sampler(struct draw_context *draw, diff --git a/src/gallium/auxiliary/draw/draw_gs.c b/src/gallium/auxiliary/draw/draw_gs.c index cd63e2b..32fd91f 100644 --- a/src/gallium/auxiliary/draw/draw_gs.c +++ b/src/gallium/auxiliary/draw/draw_gs.c @@ -534,7 +534,7 @@ int draw_geometry_shader_run(struct draw_geometry_shader *shader, { const float (*input)[4] = (const float (*)[4])input_verts-verts-data; unsigned input_stride = input_verts-vertex_size; - unsigned num_outputs = shader-info.num_outputs; + unsigned num_outputs = draw_total_gs_shader_outputs(shader-draw); unsigned vertex_size = sizeof(struct vertex_header) + num_outputs * 4 * sizeof(float); unsigned num_input_verts = input_prim-linear ? input_verts-count : diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index c195a2b..8ecb3e7 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1827,6 +1827,7 @@ draw_llvm_make_variant_key(struct draw_llvm *llvm, char *store) key-need_edgeflags = (llvm-draw-vs.edgeflag_output ? TRUE : FALSE); key-ucp_enable = llvm-draw-rasterizer-clip_plane_enable; key-has_gs = llvm-draw-gs.geometry_shader != NULL; + key-num_outputs = draw_total_vs_shader_outputs(llvm-draw); key-pad1 = 0; /* All variants of this shader will have the same value for @@ -2264,6 +2265,8 @@ draw_gs_llvm_make_variant_key(struct draw_llvm *llvm, char *store) key = (struct draw_gs_llvm_variant_key *)store; + key-num_outputs = draw_total_gs_shader_outputs(llvm-draw); + /* All variants of this shader will have
[Mesa-dev] [PATCH 3/8] draw/llvm: add some extra debugging output
when dumping shader outputs it's nice to have the integer values of the outputs, in particular because some values are integers. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_llvm.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 8ecb3e7..df0d2ed 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -977,6 +977,12 @@ convert_to_aos(struct gallivm_state *gallivm, LLVMConstInt(LLVMInt32TypeInContext(gallivm-context), chan, 0)); lp_build_print_value(gallivm, val = , out); +{ + LLVMValueRef iv = + LLVMBuildBitCast(builder, out, lp_build_int_vec_type(gallivm, soa_type), ); + + lp_build_print_value(gallivm, ival = , iv); +} #endif soa[chan] = out; } -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/8] draw: make sure clipping works with injected outputs
clipping would drop the extra outputs because it always used the number of standard vertex shader outputs, without geometry shader or extra outputs. The commit makes sure that clipping with geometry shaders which have more outputs than the current vertex shader and with extra outputs correctly propagates the entire vertex. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_pipe_clip.c | 89 --- 1 file changed, 54 insertions(+), 35 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c b/src/gallium/auxiliary/draw/draw_pipe_clip.c index e83586e..b76e9a5 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_clip.c +++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c @@ -136,7 +136,7 @@ static void interp( const struct clip_stage *clip, const struct vertex_header *in, unsigned viewport_index ) { - const unsigned nr_attrs = draw_current_shader_outputs(clip-stage.draw); + const unsigned nr_attrs = draw_num_shader_outputs(clip-stage.draw); const unsigned pos_attr = draw_current_shader_position_output(clip-stage.draw); const unsigned clip_attr = draw_current_shader_clipvertex_output(clip-stage.draw); unsigned j; @@ -264,7 +264,6 @@ static void emit_poly( struct draw_stage *stage, header.flags |= edge_last; if (DEBUG_CLIP) { - const struct draw_vertex_shader *vs = stage-draw-vs.vertex_shader; uint j, k; debug_printf(Clipped tri: (flat-shade-first = %d)\n, stage-draw-rasterizer-flatshade_first); @@ -274,7 +273,7 @@ static void emit_poly( struct draw_stage *stage, header.v[j]-clip[1], header.v[j]-clip[2], header.v[j]-clip[3]); -for (k = 0; k vs-info.num_outputs; k++) { +for (k = 0; k draw_num_shader_outputs(stage-draw); k++) { debug_printf( Vert %d: Attr %d: %f %f %f %f\n, j, k, header.v[j]-data[k][0], header.v[j]-data[k][1], @@ -283,7 +282,6 @@ static void emit_poly( struct draw_stage *stage, } } } - stage-next-tri( stage-next, header ); } } @@ -609,6 +607,35 @@ clip_tri( struct draw_stage *stage, } +static int +find_interp(const struct draw_fragment_shader *fs, int *indexed_interp, +uint semantic_name, uint semantic_index) +{ + int interp; + /* If it's gl_{Front,Back}{,Secondary}Color, pick up the mode +* from the array we've filled before. */ + if (semantic_name == TGSI_SEMANTIC_COLOR || + semantic_name == TGSI_SEMANTIC_BCOLOR) { + interp = indexed_interp[semantic_index]; + } else { + /* Otherwise, search in the FS inputs, with a decent default + * if we don't find it. + */ + uint j; + interp = TGSI_INTERPOLATE_PERSPECTIVE; + if (fs) { + for (j = 0; j fs-info.num_inputs; j++) { +if (semantic_name == fs-info.input_semantic_name[j] +semantic_index == fs-info.input_semantic_index[j]) { + interp = fs-info.input_interpolate[j]; + break; +} + } + } + } + return interp; +} + /* Update state. Could further delay this until we hit the first * primitive that really requires clipping. */ @@ -616,11 +643,9 @@ static void clip_init_state( struct draw_stage *stage ) { struct clip_stage *clipper = clip_stage( stage ); - const struct draw_vertex_shader *vs = stage-draw-vs.vertex_shader; - const struct draw_geometry_shader *gs = stage-draw-gs.geometry_shader; const struct draw_fragment_shader *fs = stage-draw-fs.fragment_shader; - uint i; - const struct tgsi_shader_info *vs_info = gs ? gs-info : vs-info; + uint i, j; + const struct tgsi_shader_info *info = draw_get_shader_info(stage-draw); /* We need to know for each attribute what kind of interpolation is * done on it (flat, smooth or noperspective). But the information @@ -663,42 +688,36 @@ clip_init_state( struct draw_stage *stage ) clipper-num_flat_attribs = 0; memset(clipper-noperspective_attribs, 0, sizeof(clipper-noperspective_attribs)); - for (i = 0; i vs_info-num_outputs; i++) { - /* Find the interpolation mode for a specific attribute - */ - int interp; - - /* If it's gl_{Front,Back}{,Secondary}Color, pick up the mode - * from the array we've filled before. */ - if (vs_info-output_semantic_name[i] == TGSI_SEMANTIC_COLOR || - vs_info-output_semantic_name[i] == TGSI_SEMANTIC_BCOLOR) { - interp = indexed_interp[vs_info-output_semantic_index[i]]; - } else { - /* Otherwise, search in the FS inputs, with a decent default - * if we don't find it. - */ - uint j; - interp = TGSI_INTERPOLATE_PERSPECTIVE; - if (fs) { -for (j = 0; j fs
[Mesa-dev] [PATCH 5/8] draw: use the vertex size
Instead of using the magical 4 use the above computed vertex size. Doesn't change the behavior, just makes the code a bit cleaner. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_pipe_vbuf.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_vbuf.c b/src/gallium/auxiliary/draw/draw_pipe_vbuf.c index d3b38eb..092440e 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_vbuf.c +++ b/src/gallium/auxiliary/draw/draw_pipe_vbuf.c @@ -250,7 +250,7 @@ vbuf_start_prim( struct vbuf_stage *vbuf, uint prim ) } hw_key.nr_elements = vbuf-vinfo-num_attribs; - hw_key.output_stride = vbuf-vinfo-size * 4; + hw_key.output_stride = vbuf-vertex_size; /* Don't bother with caching at this stage: */ -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/8] draw: fix front face injection
Inject front face only if the fragment shader uses it and propagate through all channels because otherwise we'll need to figure out the exact swizzle that the fs expects and it's just simpler to make sure all the components within the front face register are correctly set. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_pipe_unfilled.c | 24 ++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c index d8a603f..f9a31b0 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c +++ b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c @@ -37,6 +37,7 @@ #include pipe/p_defines.h #include draw_private.h #include draw_pipe.h +#include draw_fs.h struct unfilled_stage { @@ -67,18 +68,20 @@ inject_front_face_info(struct draw_stage *stage, (stage-draw-rasterizer-front_ccw ccw) || (!stage-draw-rasterizer-front_ccw !ccw)); unsigned slot = unfilled-face_slot; - struct vertex_header *v0 = header-v[0]; - struct vertex_header *v1 = header-v[1]; - struct vertex_header *v2 = header-v[2]; + unsigned i; /* In case the backend doesn't care about it */ if (slot 0) { return; } - v0-data[slot][0] = is_front_face; - v1-data[slot][0] = is_front_face; - v2-data[slot][0] = is_front_face; + for (i = 0; i 3; ++i) { + struct vertex_header *v = header-v[i]; + v-data[slot][0] = is_front_face; + v-data[slot][1] = is_front_face; + v-data[slot][2] = is_front_face; + v-data[slot][3] = is_front_face; + } } @@ -231,9 +234,12 @@ draw_unfilled_prepare_outputs( struct draw_context *draw, { struct unfilled_stage *unfilled = unfilled_stage(stage); const struct pipe_rasterizer_state *rast = draw ? draw-rasterizer : 0; - if (rast - (rast-fill_front != PIPE_POLYGON_MODE_FILL || -rast-fill_back != PIPE_POLYGON_MODE_FILL)) { + boolean is_unfilled = (rast + (rast-fill_front != PIPE_POLYGON_MODE_FILL || + rast-fill_back != PIPE_POLYGON_MODE_FILL)); + const struct draw_fragment_shader *fs = draw-fs.fragment_shader; + + if (is_unfilled fs fs-info.uses_frontface) { unfilled-face_slot = draw_alloc_extra_vertex_attrib( stage-draw, TGSI_SEMANTIC_FACE, 0); } else { -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/8] llvmpipe: don't interpolate front face or prim id
The loop was iterating over all the fs inputs and setting them to perspective interpolation, then after the loop we were creating extra output slots with the correct interpolation. Instead of injecting bogus extra outputs, just set the interpolation on front face and prim id correctly when doing the initial scan of fs inputs. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_state_derived.c | 30 +++ 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_state_derived.c b/src/gallium/drivers/llvmpipe/lp_state_derived.c index 5a51b50..7b1e6f6 100644 --- a/src/gallium/drivers/llvmpipe/lp_state_derived.c +++ b/src/gallium/drivers/llvmpipe/lp_state_derived.c @@ -69,8 +69,8 @@ compute_vertex_info(struct llvmpipe_context *llvmpipe) vinfo-num_attribs = 0; vs_index = draw_find_shader_output(llvmpipe-draw, - TGSI_SEMANTIC_POSITION, - 0); + TGSI_SEMANTIC_POSITION, + 0); draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_PERSPECTIVE, vs_index); @@ -89,12 +89,20 @@ compute_vertex_info(struct llvmpipe_context *llvmpipe) llvmpipe-color_slot[idx] = (int)vinfo-num_attribs; } - /* - * Emit the requested fs attribute for all but position. - */ - draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_PERSPECTIVE, vs_index); + if (lpfs-info.base.input_semantic_index[i] == 0 + lpfs-info.base.input_semantic_name[i] == TGSI_SEMANTIC_FACE) { + llvmpipe-face_slot = vinfo-num_attribs; + draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_CONSTANT, vs_index); + } else if (lpfs-info.base.input_semantic_index[i] == 0 + lpfs-info.base.input_semantic_name[i] == TGSI_SEMANTIC_PRIMID) { + draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_CONSTANT, vs_index); + } else { + /* + * Emit the requested fs attribute for all but position. + */ + draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_PERSPECTIVE, vs_index); + } } - /* Figure out if we need bcolor as well. */ for (i = 0; i 2; i++) { @@ -140,14 +148,6 @@ compute_vertex_info(struct llvmpipe_context *llvmpipe) llvmpipe-layer_slot = 0; } - /* Check for a fake front face for unfilled primitives*/ - vs_index = draw_find_shader_output(llvmpipe-draw, - TGSI_SEMANTIC_FACE, 0); - if (vs_index = 0) { - llvmpipe-face_slot = vinfo-num_attribs; - draw_emit_vertex_attr(vinfo, EMIT_4F, INTERP_CONSTANT, vs_index); - } - draw_compute_vertex_size(vinfo); lp_setup_set_vertex_info(llvmpipe-setup, vinfo); } -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/8] draw: implement proper primitive assembler as a pipeline stage
we used to have a face primitive assembler that we ran after if the gs was missing but we had adjacency primitives in the pipeline, lets convert it to a pipeline stage, which allows us to use it to inject outputs (primitive id) into the vertices. it's also a lot cleaner because the decomposition is already handled for us. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/Makefile.sources |2 +- src/gallium/auxiliary/draw/draw_context.c |1 + src/gallium/auxiliary/draw/draw_pipe.c |4 + src/gallium/auxiliary/draw/draw_pipe.h |5 + src/gallium/auxiliary/draw/draw_pipe_ia.c | 253 src/gallium/auxiliary/draw/draw_pipe_validate.c| 15 +- src/gallium/auxiliary/draw/draw_prim_assembler.c | 225 - src/gallium/auxiliary/draw/draw_prim_assembler.h | 62 - .../auxiliary/draw/draw_prim_assembler_tmp.h | 31 --- src/gallium/auxiliary/draw/draw_private.h |1 + .../auxiliary/draw/draw_pt_fetch_shade_pipeline.c | 18 +- .../draw/draw_pt_fetch_shade_pipeline_llvm.c | 18 +- 12 files changed, 283 insertions(+), 352 deletions(-) create mode 100644 src/gallium/auxiliary/draw/draw_pipe_ia.c delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler.c delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler.h delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler_tmp.h diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index acbcef7..ee93e8b 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -13,6 +13,7 @@ C_SOURCES := \ draw/draw_pipe_clip.c \ draw/draw_pipe_cull.c \ draw/draw_pipe_flatshade.c \ +draw/draw_pipe_ia.c \ draw/draw_pipe_offset.c \ draw/draw_pipe_pstipple.c \ draw/draw_pipe_stipple.c \ @@ -23,7 +24,6 @@ C_SOURCES := \ draw/draw_pipe_vbuf.c \ draw/draw_pipe_wide_line.c \ draw/draw_pipe_wide_point.c \ - draw/draw_prim_assembler.c \ draw/draw_pt.c \ draw/draw_pt_emit.c \ draw/draw_pt_fetch.c \ diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 8bf3596..bbb2904 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -555,6 +555,7 @@ draw_get_shader_info(const struct draw_context *draw) void draw_prepare_shader_outputs(struct draw_context *draw) { + draw_ia_prepare_outputs(draw, draw-pipeline.ia); draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled); } diff --git a/src/gallium/auxiliary/draw/draw_pipe.c b/src/gallium/auxiliary/draw/draw_pipe.c index f1ee6cb..8140299 100644 --- a/src/gallium/auxiliary/draw/draw_pipe.c +++ b/src/gallium/auxiliary/draw/draw_pipe.c @@ -49,6 +49,7 @@ boolean draw_pipeline_init( struct draw_context *draw ) draw-pipeline.clip = draw_clip_stage( draw ); draw-pipeline.flatshade = draw_flatshade_stage( draw ); draw-pipeline.cull = draw_cull_stage( draw ); + draw-pipeline.ia= draw_ia_stage( draw ); draw-pipeline.validate = draw_validate_stage( draw ); draw-pipeline.first = draw-pipeline.validate; @@ -61,6 +62,7 @@ boolean draw_pipeline_init( struct draw_context *draw ) !draw-pipeline.clip || !draw-pipeline.flatshade || !draw-pipeline.cull || + !draw-pipeline.ia || !draw-pipeline.validate) return FALSE; @@ -95,6 +97,8 @@ void draw_pipeline_destroy( struct draw_context *draw ) draw-pipeline.flatshade-destroy( draw-pipeline.flatshade ); if (draw-pipeline.cull) draw-pipeline.cull-destroy( draw-pipeline.cull ); + if (draw-pipeline.ia) + draw-pipeline.ia-destroy( draw-pipeline.ia ); if (draw-pipeline.validate) draw-pipeline.validate-destroy( draw-pipeline.validate ); if (draw-pipeline.aaline) diff --git a/src/gallium/auxiliary/draw/draw_pipe.h b/src/gallium/auxiliary/draw/draw_pipe.h index 70c286f..70822a4 100644 --- a/src/gallium/auxiliary/draw/draw_pipe.h +++ b/src/gallium/auxiliary/draw/draw_pipe.h @@ -91,7 +91,10 @@ extern struct draw_stage *draw_stipple_stage( struct draw_context *context ); extern struct draw_stage *draw_wide_line_stage( struct draw_context *context ); extern struct draw_stage *draw_wide_point_stage( struct draw_context *context ); extern struct draw_stage *draw_validate_stage( struct draw_context *context ); +extern struct draw_stage *draw_ia_stage(struct draw_context *context); +boolean draw_ia_stage_required(const struct draw_context *context, + unsigned prim); extern void draw_free_temp_verts( struct draw_stage *stage ); extern boolean draw_alloc_temp_verts( struct draw_stage *stage, unsigned nr ); @@ -105,6 +108,8 @@ void draw_pipe_passthrough_point(struct draw_stage *stage
Re: [Mesa-dev] [PATCH 8/8] draw: implement proper primitive assembler as a pipeline stage
Yea, it's quite bonkers, but that's the way it has to be to make it work right now. Personally I'd really like to write a new version of draw, without the 5 emit paths, 4 different vertex shading paths, with interface that is capable of emitting more than just float[4]'s... For now though this works, even if it is very ugly. z - Original Message - Am 02.08.2013 08:28, schrieb Zack Rusin: we used to have a face primitive assembler that we ran after if the gs was missing but we had adjacency primitives in the pipeline, lets convert it to a pipeline stage, which allows us to use it to inject outputs (primitive id) into the vertices. it's also a lot cleaner because the decomposition is already handled for us. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/Makefile.sources |2 +- src/gallium/auxiliary/draw/draw_context.c |1 + src/gallium/auxiliary/draw/draw_pipe.c |4 + src/gallium/auxiliary/draw/draw_pipe.h |5 + src/gallium/auxiliary/draw/draw_pipe_ia.c | 253 src/gallium/auxiliary/draw/draw_pipe_validate.c| 15 +- src/gallium/auxiliary/draw/draw_prim_assembler.c | 225 - src/gallium/auxiliary/draw/draw_prim_assembler.h | 62 - .../auxiliary/draw/draw_prim_assembler_tmp.h | 31 --- src/gallium/auxiliary/draw/draw_private.h |1 + .../auxiliary/draw/draw_pt_fetch_shade_pipeline.c | 18 +- .../draw/draw_pt_fetch_shade_pipeline_llvm.c | 18 +- 12 files changed, 283 insertions(+), 352 deletions(-) create mode 100644 src/gallium/auxiliary/draw/draw_pipe_ia.c delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler.c delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler.h delete mode 100644 src/gallium/auxiliary/draw/draw_prim_assembler_tmp.h diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index acbcef7..ee93e8b 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -13,6 +13,7 @@ C_SOURCES := \ draw/draw_pipe_clip.c \ draw/draw_pipe_cull.c \ draw/draw_pipe_flatshade.c \ +draw/draw_pipe_ia.c \ Formatting looks off here. draw/draw_pipe_offset.c \ draw/draw_pipe_pstipple.c \ draw/draw_pipe_stipple.c \ @@ -23,7 +24,6 @@ C_SOURCES := \ draw/draw_pipe_vbuf.c \ draw/draw_pipe_wide_line.c \ draw/draw_pipe_wide_point.c \ - draw/draw_prim_assembler.c \ draw/draw_pt.c \ draw/draw_pt_emit.c \ draw/draw_pt_fetch.c \ diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 8bf3596..bbb2904 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -555,6 +555,7 @@ draw_get_shader_info(const struct draw_context *draw) void draw_prepare_shader_outputs(struct draw_context *draw) { + draw_ia_prepare_outputs(draw, draw-pipeline.ia); draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled); } diff --git a/src/gallium/auxiliary/draw/draw_pipe.c b/src/gallium/auxiliary/draw/draw_pipe.c index f1ee6cb..8140299 100644 --- a/src/gallium/auxiliary/draw/draw_pipe.c +++ b/src/gallium/auxiliary/draw/draw_pipe.c @@ -49,6 +49,7 @@ boolean draw_pipeline_init( struct draw_context *draw ) draw-pipeline.clip = draw_clip_stage( draw ); draw-pipeline.flatshade = draw_flatshade_stage( draw ); draw-pipeline.cull = draw_cull_stage( draw ); + draw-pipeline.ia= draw_ia_stage( draw ); draw-pipeline.validate = draw_validate_stage( draw ); draw-pipeline.first = draw-pipeline.validate; @@ -61,6 +62,7 @@ boolean draw_pipeline_init( struct draw_context *draw ) !draw-pipeline.clip || !draw-pipeline.flatshade || !draw-pipeline.cull || + !draw-pipeline.ia || !draw-pipeline.validate) return FALSE; @@ -95,6 +97,8 @@ void draw_pipeline_destroy( struct draw_context *draw ) draw-pipeline.flatshade-destroy( draw-pipeline.flatshade ); if (draw-pipeline.cull) draw-pipeline.cull-destroy( draw-pipeline.cull ); + if (draw-pipeline.ia) + draw-pipeline.ia-destroy( draw-pipeline.ia ); if (draw-pipeline.validate) draw-pipeline.validate-destroy( draw-pipeline.validate ); if (draw-pipeline.aaline) diff --git a/src/gallium/auxiliary/draw/draw_pipe.h b/src/gallium/auxiliary/draw/draw_pipe.h index 70c286f..70822a4 100644 --- a/src/gallium/auxiliary/draw/draw_pipe.h +++ b/src/gallium/auxiliary/draw/draw_pipe.h @@ -91,7 +91,10 @@ extern struct draw_stage *draw_stipple_stage( struct draw_context *context ); extern struct draw_stage *draw_wide_line_stage( struct draw_context
[Mesa-dev] [PATCH 1/2] llvmpipe: make the front-face behavior match the gallium spec
The spec says that front-face is true if the value is 0 and false if it's 0. To make sure that we follow the spec, lets just subtract 0.5 from our value (llvmpipe did 1 for frontface and 0 otherwise), which will get us a positive num for frontface and negative for backface. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/drivers/llvmpipe/lp_state_setup.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/llvmpipe/lp_state_setup.c b/src/gallium/drivers/llvmpipe/lp_state_setup.c index bb5cfc4..cecfbce 100644 --- a/src/gallium/drivers/llvmpipe/lp_state_setup.c +++ b/src/gallium/drivers/llvmpipe/lp_state_setup.c @@ -182,7 +182,10 @@ emit_facing_coef(struct gallivm_state *gallivm, LLVMValueRef a0_0 = args-facing; LLVMValueRef a0_0f = LLVMBuildSIToFP(builder, a0_0, float_type, ); LLVMValueRef zero = lp_build_const_float(gallivm, 0.0); - LLVMValueRef a0 = vec4f(gallivm, a0_0f, zero, zero, zero, facing); + LLVMValueRef face_val = LLVMBuildFSub(builder, a0_0f, + lp_build_const_float(gallivm, 0.5), + ); + LLVMValueRef a0 = vec4f(gallivm, face_val, zero, zero, zero, facing); LLVMValueRef zerovec = vec4f_from_scalar(gallivm, zero, zero); store_coef(gallivm, args, slot, a0, zerovec, zerovec); -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] draw: inject frontface info into wireframe outputs
Draw module can decompose primitives into wireframe models, which is a fancy word for 'lines', unfortunately that decomposition means that we weren't able to preserve the original front-face info which could be derived from the original primitives (lines don't have a 'face'). To fix it allow draw module to inject a fake face semantic into outputs from which the backends can figure out the original frontfacing info of the primitives. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_context.c | 43 src/gallium/auxiliary/draw/draw_context.h |6 +++ src/gallium/auxiliary/draw/draw_pipe.h |3 ++ src/gallium/auxiliary/draw/draw_pipe_unfilled.c | 49 +++ src/gallium/drivers/i915/i915_state_derived.c |2 + src/gallium/drivers/llvmpipe/lp_context.h |3 ++ src/gallium/drivers/llvmpipe/lp_setup.c |1 + src/gallium/drivers/llvmpipe/lp_setup_context.h |1 + src/gallium/drivers/llvmpipe/lp_setup_line.c| 14 ++- src/gallium/drivers/llvmpipe/lp_state_derived.c |9 + src/gallium/drivers/r300/r300_state_derived.c |1 + src/gallium/drivers/softpipe/sp_state_derived.c |2 + src/gallium/drivers/svga/svga_swtnl_state.c |1 + 13 files changed, 133 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 4a6ba1a..2e95b5c 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -39,6 +39,7 @@ #include util/u_helpers.h #include util/u_prim.h #include draw_context.h +#include draw_pipe.h #include draw_vs.h #include draw_gs.h @@ -540,6 +541,22 @@ draw_get_shader_info(const struct draw_context *draw) } } +/** + * Prepare outputs slots from the draw module + * + * Certain parts of the draw module can emit additional + * outputs that can be quite useful to the backends, a good + * example of it is the process of decomposing primitives + * into wireframes (aka. lines) which normally would lose + * the face-side information, but using this method we can + * inject another shader output which passes the original + * face side information to the backend. + */ +void +draw_prepare_shader_outputs(struct draw_context *draw) +{ + draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled); +} /** * Ask the draw module for the location/slot of the given vertex attribute in @@ -973,3 +990,29 @@ draw_stats_clipper_primitives(struct draw_context *draw, } } } + + +/** + * Returns true if the draw module will inject the frontface + * info into the outputs. + * + * Given the specified primitive and rasterizer state + * the function will figure out if the draw module + * will inject the front-face information into shader + * outputs. This is done to preserve the front-facing + * info when decomposing primitives into wireframes. + */ +boolean +draw_will_inject_frontface(const struct draw_context *draw) +{ + unsigned reduced_prim = u_reduced_prim(draw-pt.prim); + const struct pipe_rasterizer_state *rast = draw-rasterizer; + + if (reduced_prim != PIPE_PRIM_TRIANGLES) { + return FALSE; + } + + return (rast + (rast-fill_front != PIPE_POLYGON_MODE_FILL || +rast-fill_back != PIPE_POLYGON_MODE_FILL)); +} diff --git a/src/gallium/auxiliary/draw/draw_context.h b/src/gallium/auxiliary/draw/draw_context.h index 4a1b27e..0815047 100644 --- a/src/gallium/auxiliary/draw/draw_context.h +++ b/src/gallium/auxiliary/draw/draw_context.h @@ -126,10 +126,16 @@ draw_install_pstipple_stage(struct draw_context *draw, struct pipe_context *pipe struct tgsi_shader_info * draw_get_shader_info(const struct draw_context *draw); +void +draw_prepare_shader_outputs(struct draw_context *draw); + int draw_find_shader_output(const struct draw_context *draw, uint semantic_name, uint semantic_index); +boolean +draw_will_inject_frontface(const struct draw_context *draw); + uint draw_num_shader_outputs(const struct draw_context *draw); diff --git a/src/gallium/auxiliary/draw/draw_pipe.h b/src/gallium/auxiliary/draw/draw_pipe.h index 4792507..2e48b56 100644 --- a/src/gallium/auxiliary/draw/draw_pipe.h +++ b/src/gallium/auxiliary/draw/draw_pipe.h @@ -102,6 +102,9 @@ void draw_pipe_passthrough_line(struct draw_stage *stage, struct prim_header *he void draw_pipe_passthrough_point(struct draw_stage *stage, struct prim_header *header); +void draw_unfilled_prepare_outputs(struct draw_context *context, + struct draw_stage *stage); + /** * Get a writeable copy of a vertex. diff --git a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c index d87741b..d8a603f 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_unfilled.c +++ b/src/gallium/auxiliary/draw/draw_pipe_unfilled.c @@ -47,6 +47,8 @@ struct
Re: [Mesa-dev] [PATCH 2/2] draw: inject frontface info into wireframe outputs
+ if (draw_will_inject_frontface(lp_context-draw) I think it's annoying you have to do these calls to determine if there's a valid frontface here for each line instead of just per draw call but it doesn't seem easy to avoid it. Yea, there's no trivial way of avoiding it. Also, no love for llvmpipe point face? I realize d3d10 doesn't require it but OpenGL (and IIRC d3d9) do. I didn't know of any tests for the points and we care only about lines right now. It's just four extra lines of code or so, so I can trivially add it but I don't have anything to test it with. Looks like quite a heavy interface (and sort of silly to allocate 128 bits in the vertex data (so actually twice that for one line) for 1 bit of information but given all our data passed on to the line/point funcs are float4 I don't really see any other easy way neither), but seems all necessary unfortunately. I guess another option would be to pass the face info always along the vertex data no matter what (which would mean all those additional calls for setting up outputs, determining if there's a valid frontface etc. could go along with the storage needed) for all primitives to the point/line/tri funcs but I'm not really thrilled about that idea neither (passing it for tris so it doesn't have to be recalculated may or may not be a good idea neither). Yes, plus then we'd need a brand new pipeline stage that is always run and that is largely useless for vast majority of rendering. It's sort of a lose lose scenario. The only thing that is clear is that we have to pass the data along the shader outputs, everything else is a messy glue to make it possible. z ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev