Re: [Mesa-dev] [glsl] indvar in ir_loop
Ian, I am sure I comes across trouble for the following code. (function main (signature void (parameters ) ( (loop ((declare () int i@0x8d19434)) ((constant int (0)) ) ((constant int (32)) ) ((constant int (1)) ) ( (call foo ((var_ref sampler2d@0x8eef134) (var_ref myTexCoord@0x8eef05c) )) )) the loop is generated by hand, using the following code. ir_loop * loop = new (ctx)ir_loop(); ir_variable * indvar = new (ctx) ir_variable(glsl_type::int_type, i,ir_var_auto); ir_dereference * idx = new (ctx) ir_dereference_variable(indvar); loop-from = new(ctx) ir_constant(0); loop-to = new(ctx) ir_constant(32); loop-increment = new (ctx) ir_constant(1); loop-cmp = ir_binop_less; loop-counter = indvar; loop-body_instructions = sig-body; sig-body.make_empty(); call_link_visitor(link_function.cpp) can not see the variable i@0x8d19434. it's because call_link_visitor extends from ir_hierachical_visitor. ir_loop::accept(ir_hierarchical_visitor *v)**doesn't look at ir_loop::counter . that is to say, it assumes the indvar is out of loop contruct, right? perhaps my usage is wrong. my changeset makes it like breeze. index be8b36a..4e4dd4c 100644* *--- a/src/glsl/ir_hv_accept.cpp +++ b/src/glsl/ir_hv_accept.cpp @@ -71,6 +71,7 @@ ir_loop::accept(ir_hierarchical_visitor *v) if (s != visit_continue) return (s == visit_continue_with_parent) ? visit_continue : s; + if (this-counter) s = this-counter-accept(v); s = visit_list_elements(v, this-body_instructions); if (s == visit_stop) return s; * * thanks, --lx On 10/12/2013 05:39 AM, Ian Romanick wrote: On 10/10/2013 11:14 PM, Liu Xin wrote: Hi, Mesa developers, According to glsl v1.0, we have loop construct: for (for-init-statement; condition(opt); expression) statement-no-new-scope Variables declared in for-init-statement or condition are only in scope until the end of the statement-no-new-scope of the for loop. let's assume I have a fragment shader: ~/testbed$ cat indvar.frag void main(void) { vec4 a[32]; for(int i=0; i10; ++i) { if (i == 9) gl_FragColor = a[i]; } } I found current glsl compiler emits HIR like this: The HIR loses all notions of scope. (function main (signature void (parameters ) ( (declare () int i@0x988eb84) (declare () (array vec4 32) a@0x988ec5c) (declare (temporary ) int assignment_tmp@0x988eaac) (assign (constant bool (1)) (x) (var_ref assignment_tmp@0x988eaac) (constant int (0)) ) (assign (constant bool (1)) (x) (var_ref i@0x988eb84) (var_ref assignment_tmp@0x988eaac) ) (loop () () () () ( (if (expression bool ! (expression bool (var_ref i@0x988eb84) (constant int (10)) ) ) ( break ) ()) (if (expression bool all_equal (var_ref i@0x988eb84) (constant int (9)) ) ( (declare (temporary ) vec4 assignment_tmp@0x987cee4) (assign (constant bool (1)) (xyzw) (var_ref assignment_tmp@0x987cee4) (array_ref (var_ref a@0x988ec5c) (var_ref i@0x988eb84) ) ) (assign (constant bool (1)) (xyzw) (var_ref gl_FragColor@0x96d8fc4) (var_ref assignment_tmp@0x987cee4) ) ) ()) (declare (temporary ) int assignment_tmp@0x987cb84) (assign (constant bool (1)) (x) (var_ref assignment_tmp@0x987cb84) (expression int + (var_ref i@0x988eb84) (constant int (1)) ) ) (assign (constant bool (1)) (x) (var_ref i@0x988eb84) (var_ref assignment_tmp@0x987cb84) ) )) )) ) I think glsl compiler translates AST like this int i = 0; for (;;) { if (i 10) break; if (i == 9) gl_FragColor = a [ i ] ; i = i + 1; } Is it correct? I believe this block is implicitly surrounded by { and }. I'm pretty sure that we have test cases for the situation you're describing, but I'd have to go dig around. Another question, for class ir_loop, why is ir_loop::counter ir_variable while from/to/increment are all ir_rvalue? I create an ir_variable for ir_loop counter, but hierarchical visitor won't access it. I don't think ir_loop::accept(ir_hierarchical_visitor *v) will visit ir_loop::counter at all. ir_loop::counter is the variable that hold the loop counter. ir_loop::from is the initial value of the counter, ir_loop::to is the end value, and ir_loop::increment is the value that ::counter is modified by on each iteration. thanks, --lx ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
Re: [Mesa-dev] [PATCH 00/18] Implement GLX_MESA_query_renderer
Do either of you guys plan to implement support for this extension? The value to developers is obviously increased if more drivers support the extension. This extension was born from feedback that I received from people at FOSDEM and from various game developers at Game Developer Conference and elsewhere. I'd like to land this extension, and I haven't received any review. I know you guys are both pretty busy, so I don't expect detailed reviews. I would really appreciate a quick skim of the extension spec (patch 15) and an Acked-by or two. Is there a test app or piglit set for this? I might try and fit in looking at this, Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] R600: Make sure OQAP defs and uses happen in the same clause
This patch should work when checking than no OQAP is used before beeing queued, assuming that a value in OQAP is consumed and cannot be read twice. However I'm not sure I cover all LDS instructions that queues a value, I only use LDS_RET_READ in switch case. Vincent - Mail original - De : Tom Stellard t...@stellard.net À : Vincent Lejeune v...@ovi.com Cc : llvm-comm...@cs.uiuc.edu llvm-comm...@cs.uiuc.edu; mesa-dev@lists.freedesktop.org mesa-dev@lists.freedesktop.org; Tom Stellard thomas.stell...@amd.com Envoyé le : Mardi 22 octobre 2013 23h20 Objet : Re: [PATCH] R600: Make sure OQAP defs and uses happen in the same clause Hi Vincent, Here is an updated patch. I wasn't sure where to put the assertion to check that UnscheduledNoLiveOut{Defs,Uses} is empty when switching to a new clause. I tried adding it to R600SchedStartegy::schedNode() behind the if (NextInstKind != CurInstKind) condition, but it always failed. Any suggestions on where I should but it? -Tom On Mon, Oct 21, 2013 at 12:40:28PM -0700, Vincent Lejeune wrote: - Mail original - De : Tom Stellard t...@stellard.net À : llvm-comm...@cs.uiuc.edu Cc : mesa-dev@lists.freedesktop.org; Tom Stellard thomas.stell...@amd.com Envoyé le : Vendredi 11 octobre 2013 20h10 Objet : [PATCH] R600: Make sure OQAP defs and uses happen in the same clause From: Tom Stellard thomas.stell...@amd.com Reading the special OQAP register pops the top value off the LDS input queue and returns it to the instruction. This queue is invalidated at the end of an ALU clause and leaving values in the queue can lead to GPU hangs. This means that if we load a value into the queue, we must use it before the end of the clause. This fixes some hangs in the OpenCV test suite. --- lib/Target/R600/R600MachineScheduler.cpp | 25 + lib/Target/R600/R600MachineScheduler.h | 4 ++-- test/CodeGen/R600/lds-input-queue.ll | 26 ++ 3 files changed, 41 insertions(+), 14 deletions(-) create mode 100644 test/CodeGen/R600/lds-input-queue.ll diff --git a/lib/Target/R600/R600MachineScheduler.cpp b/lib/Target/R600/R600MachineScheduler.cpp index 6c26d9e..611b7f4 100644 --- a/lib/Target/R600/R600MachineScheduler.cpp +++ b/lib/Target/R600/R600MachineScheduler.cpp @@ -93,11 +93,12 @@ SUnit* R600SchedStrategy::pickNode(bool IsTopNode) { } - // We want to scheduled AR defs as soon as possible to make sure they aren't - // put in a different ALU clause from their uses. - if (!SU !UnscheduledARDefs.empty()) { - SU = UnscheduledARDefs[0]; - UnscheduledARDefs.erase(UnscheduledARDefs.begin()); + // We want to scheduled defs that cannot be live outside of this clause + // as soon as possible to make sure they aren't put in a different + // ALU clause from their uses. + if (!SU !UnscheduledNoLiveOutDefs.empty()) { + SU = UnscheduledNoLiveOutDefs[0]; + UnscheduledNoLiveOutDefs.erase(UnscheduledNoLiveOutDefs.begin()); NextInstKind = IDAlu; } @@ -132,9 +133,9 @@ SUnit* R600SchedStrategy::pickNode(bool IsTopNode) { // We want to schedule the AR uses as late as possible to make sure that // the AR defs have been released. - if (!SU !UnscheduledARUses.empty()) { - SU = UnscheduledARUses[0]; - UnscheduledARUses.erase(UnscheduledARUses.begin()); + if (!SU !UnscheduledNoLiveOutUses.empty()) { + SU = UnscheduledNoLiveOutUses[0]; + UnscheduledNoLiveOutUses.erase(UnscheduledNoLiveOutUses.begin()); Can we use std::queueSUnit* instead of a std::vector for UnscheduledNoLiveOutUses ? I had to use a vector because I needed to be able to pop non topmost SUnit in some case (to fit Instruction Group const read limitation) but I would rather avoid erase(iterator) call when possible. NextInstKind = IDAlu; } @@ -217,15 +218,15 @@ void R600SchedStrategy::releaseBottomNode(SUnit *SU) { int IK = getInstKind(SU); - // Check for AR register defines + // Check for registers that do not live across ALU clauses. for (MachineInstr::const_mop_iterator I = SU-getInstr()-operands_begin(), E = SU-getInstr()-operands_end(); I != E; ++I) { - if (I-isReg() I-getReg() == AMDGPU::AR_X) { + if (I-isReg() (I-getReg() == AMDGPU::AR_X || I-getReg() == AMDGPU::OQAP)) { if (I-isDef()) { - UnscheduledARDefs.push_back(SU); + UnscheduledNoLiveOutDefs.push_back(SU); } else { - UnscheduledARUses.push_back(SU); + UnscheduledNoLiveOutUses.push_back(SU); } return; } diff --git
Re: [Mesa-dev] [PATCH] i965: Make fs gl_PrimitiveID input work even when there's no gs.
On 23 October 2013 10:51, Eric Anholt e...@anholt.net wrote: Paul Berry stereotype...@gmail.com writes: When a geometry shader is present, the fragment shader gl_PrimitiveID input acts like an ordinary varying, receiving data from the gs gl_PrimitiveID output. When there's no geometry shader, we have to ask the fixed function SF hardware to provide the primitive ID to the fragment shader instead. Previously, the SF setup code would handle this situation by recognizing that the FS gl_PrimitiveID input didn't match to any VS output; since normally an FS input with no corresponding VS output leads to undefined data, the SF setup code used to just arbitrarily assign it to receive data from attribute 0. This patch changes the SF setup code so that instead of arbitrarily using attribute 0, it assigns the unmatched FS input to receive gl_PrimitiveID. In the case where the FS input really is gl_PrimitiveID, this produces the intended result. In all other cases, no harm is done since GL specifies that the behaviour is undefined. Fixes piglit test primitive-id-no-gs. Reviewed-by: Eric Anholt e...@anholt.net I was about to push this when I realized that it regressed point sprite functionality. It seems that if an attribute has its component override bots set *and* its point sprite texture coordinate enable bit set, the component override takes predence (this isn't documented; I found it out by running piglit tests). As a result, this patch was causing gl_PointCoord to get overridden with gl_PrimitiveID. I'll follow up shortly with a corrected patch. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] i965: Make fs gl_PrimitiveID input work even when there's no gs.
When a geometry shader is present, the fragment shader gl_PrimitiveID input acts like an ordinary varying, receiving data from the gs gl_PrimitiveID output. When there's no geometry shader, we have to ask the fixed function SF hardware to provide the primitive ID to the fragment shader instead. Previously, the SF setup code would handle this situation by recognizing that the FS gl_PrimitiveID input didn't match to any VS output; since normally an FS input with no corresponding VS output leads to undefined data, the SF setup code used to just arbitrarily assign it to receive data from attribute 0. This patch changes the SF setup code so that instead of arbitrarily using attribute 0, it assigns the unmatched FS input to receive gl_PrimitiveID. In the case where the FS input really is gl_PrimitiveID, this produces the intended result. In all other cases, no harm is done since GL specifies that the behaviour is undefined. Fixes piglit test primitive-id-no-gs. Reviewed-by: Eric Anholt e...@anholt.net v2: If an attribute is already being overridden with point coordinates, don't try to also override it with gl_PrimitiveID. This is necessary to avoid regressing piglit tests such as shaders/glsl-fs-pointcoord. --- src/mesa/drivers/dri/i965/brw_defines.h | 4 src/mesa/drivers/dri/i965/gen6_sf_state.c | 27 ++- 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 5ba9d45..b661194 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1508,6 +1508,10 @@ enum brw_message_target { # define ATTRIBUTE_0_OVERRIDE_Y(1 13) # define ATTRIBUTE_0_OVERRIDE_X(1 12) # define ATTRIBUTE_0_CONST_SOURCE_SHIFT9 +# define ATTRIBUTE_CONST_ 0 +# define ATTRIBUTE_CONST_0001_FLOAT 1 +# define ATTRIBUTE_CONST__FLOAT 2 +# define ATTRIBUTE_CONST_PRIM_ID 3 # define ATTRIBUTE_0_SWIZZLE_SHIFT 6 # define ATTRIBUTE_0_SOURCE_SHIFT 0 diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c b/src/mesa/drivers/dri/i965/gen6_sf_state.c index 6a9fa60..47d76e9 100644 --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c @@ -80,10 +80,23 @@ get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset, * the vertex shader, so its value is undefined. Therefore the * attribute override we supply doesn't matter. * - * In either case the attribute override we supply doesn't matter, so - * just reference the first available attribute. + * (c) This attribute is gl_PrimitiveID, and it wasn't written by the + * previous shader stage. + * + * Note that we don't have to worry about the cases where the attribute + * is gl_PointCoord or is undergoing point sprite coordinate + * replacement, because in those cases, this function isn't called. + * + * In case (c), we need to program the attribute overrides so that the + * primitive ID will be stored in this slot. In every other case, the + * attribute override we supply doesn't matter. So just go ahead and + * program primitive ID in every case. */ - return 0; + return (ATTRIBUTE_0_OVERRIDE_W | + ATTRIBUTE_0_OVERRIDE_Z | + ATTRIBUTE_0_OVERRIDE_Y | + ATTRIBUTE_0_OVERRIDE_X | + (ATTRIBUTE_CONST_PRIM_ID ATTRIBUTE_0_CONST_SOURCE_SHIFT)); } /* Compute the location of the attribute relative to urb_entry_read_offset. @@ -149,13 +162,17 @@ calculate_attr_overrides(const struct brw_context *brw, continue; /* _NEW_POINT */ + bool point_sprite = false; if (brw-ctx.Point.PointSprite (attr = VARYING_SLOT_TEX0 attr = VARYING_SLOT_TEX7) brw-ctx.Point.CoordReplace[attr - VARYING_SLOT_TEX0]) { -*point_sprite_enables |= (1 input_index); + point_sprite = true; } if (attr == VARYING_SLOT_PNTC) + point_sprite = true; + + if (point_sprite) *point_sprite_enables |= (1 input_index); /* flat shading */ @@ -165,7 +182,7 @@ calculate_attr_overrides(const struct brw_context *brw, *flat_enables |= (1 input_index); /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_LIGHT | _NEW_PROGRAM */ - uint16_t attr_override = + uint16_t attr_override = point_sprite ? 0 : get_attr_override(brw-vue_map_geom_out, urb_entry_read_offset, attr, brw-ctx.VertexProgram._TwoSideEnabled, -- 1.8.4.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
[Mesa-dev] [Bug 34495] Selecting objects in Blender 2.56 slow due the software gl_select mode
https://bugs.freedesktop.org/show_bug.cgi?id=34495 Alex Deucher ag...@yahoo.com changed: What|Removed |Added Assignee|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop. |.org|org Summary|Selecting objects in|Selecting objects in |Blender 2.56 slow with |Blender 2.56 slow due the |gallium r600 driver |software gl_select mode Component|Drivers/Gallium/r600|Mesa core --- Comment #73 from Alex Deucher ag...@yahoo.com --- (In reply to comment #71) (In reply to comment #70) I wonder if the fix has been committed? I am using Debian testing with the mesa 9.1-7 package provided in the testing repository, and the selections with Blender are still very slow. It has been merged to master in mesa 9.2. It hasn't been merged yet. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] graw: add a test rendering a huge triangle
Looks good. A future improvement could be querying PIPE_CAP_MAX_TEXTURE_2D_LEVELS instead of using a constant width/height. Jose - Original Message - Used to test rasterization, because we often breakdown on subdivision of triangles with long edges. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/tests/graw/SConscript | 1 + src/gallium/tests/graw/tri-large.c | 173 + 2 files changed, 174 insertions(+) create mode 100644 src/gallium/tests/graw/tri-large.c diff --git a/src/gallium/tests/graw/SConscript b/src/gallium/tests/graw/SConscript index 8740ff3..8723807 100644 --- a/src/gallium/tests/graw/SConscript +++ b/src/gallium/tests/graw/SConscript @@ -29,6 +29,7 @@ progs = [ 'tex-srgb', 'tex-swizzle', 'tri', +'tri-large', 'tri-gs', 'tri-instanced', 'vs-test', diff --git a/src/gallium/tests/graw/tri-large.c b/src/gallium/tests/graw/tri-large.c new file mode 100644 index 000..3fbbfb3 --- /dev/null +++ b/src/gallium/tests/graw/tri-large.c @@ -0,0 +1,173 @@ +/* Display a cleared blue window. This demo has no dependencies on + * any utility code, just the graw interface and gallium. + */ + +#include graw_util.h +#include util/u_debug.h + +#include stdio.h + +static struct graw_info info; + +static const int WIDTH = 4*2048; +static const int HEIGHT = 4*2048; + + +struct vertex { + float position[4]; + float color[4]; +}; + +static boolean FlatShade = FALSE; + + +static struct vertex vertices[3] = +{ + { + { -1.0f, -1.0f, 0.0f, 1.0f }, + { 1.0f, 0.0f, 0.0f, 1.0f } + }, + { + { -1.0f, 1.0f, 0.0f, 1.0f }, + { 0.0f, 1.0f, 0.0f, 1.0f } + }, + { + { 1.0f, 1.0f, 0.0f, 1.0f }, + { 0.0f, 0.0f, 1.0f, 1.0f } + } +}; + + +static void set_vertices( void ) +{ + struct pipe_vertex_element ve[2]; + struct pipe_vertex_buffer vbuf; + void *handle; + + memset(ve, 0, sizeof ve); + + ve[0].src_offset = Offset(struct vertex, position); + ve[0].src_format = PIPE_FORMAT_R32G32B32A32_FLOAT; + ve[1].src_offset = Offset(struct vertex, color); + ve[1].src_format = PIPE_FORMAT_R32G32B32A32_FLOAT; + + handle = info.ctx-create_vertex_elements_state(info.ctx, 2, ve); + info.ctx-bind_vertex_elements_state(info.ctx, handle); + + memset(vbuf, 0, sizeof vbuf); + + vbuf.stride = sizeof( struct vertex ); + vbuf.buffer_offset = 0; + vbuf.buffer = pipe_buffer_create_with_data(info.ctx, + PIPE_BIND_VERTEX_BUFFER, + PIPE_USAGE_STATIC, + sizeof(vertices), + vertices); + + info.ctx-set_vertex_buffers(info.ctx, 0, 1, vbuf); +} + + +static void set_vertex_shader( void ) +{ + void *handle; + const char *text = + VERT\n + DCL IN[0]\n + DCL IN[1]\n + DCL OUT[0], POSITION\n + DCL OUT[1], COLOR\n +0: MOV OUT[1], IN[1]\n +1: MOV OUT[0], IN[0]\n +2: END\n; + + handle = graw_parse_vertex_shader(info.ctx, text); + info.ctx-bind_vs_state(info.ctx, handle); +} + + +static void set_fragment_shader( void ) +{ + void *handle; + const char *text = + FRAG\n + DCL IN[0], COLOR, LINEAR\n + DCL OUT[0], COLOR\n +0: MOV OUT[0], IN[0]\n +1: END\n; + + handle = graw_parse_fragment_shader(info.ctx, text); + info.ctx-bind_fs_state(info.ctx, handle); +} + + +static void draw( void ) +{ + union pipe_color_union clear_color = { {1,0,1,1} }; + + info.ctx-clear(info.ctx, PIPE_CLEAR_COLOR, clear_color, 0, 0); + util_draw_arrays(info.ctx, PIPE_PRIM_TRIANGLES, 0, 3); + info.ctx-flush(info.ctx, NULL, 0); + + graw_save_surface_to_file(info.ctx, info.color_surf[0], NULL); + + graw_util_flush_front(info); +} + + +static void init( void ) +{ + if (!graw_util_create_window(info, WIDTH, HEIGHT, 1, FALSE)) + exit(1); + + graw_util_default_state(info, FALSE); + + { + struct pipe_rasterizer_state rasterizer; + void *handle; + memset(rasterizer, 0, sizeof rasterizer); + rasterizer.cull_face = PIPE_FACE_NONE; + rasterizer.half_pixel_center = 1; + rasterizer.bottom_edge_rule = 1; + rasterizer.flatshade = FlatShade; + rasterizer.depth_clip = 1; + handle = info.ctx-create_rasterizer_state(info.ctx, rasterizer); + info.ctx-bind_rasterizer_state(info.ctx, handle); + } + + + graw_util_viewport(info, 0, 0, WIDTH, HEIGHT, 30, 1000); + + set_vertices(); + set_vertex_shader(); + set_fragment_shader(); +} + +static void args(int argc, char *argv[]) +{ + int i; + + for (i = 1; i argc; ) { + if (graw_parse_args(i, argc, argv)) { + /* ok */ + } + else if (strcmp(argv[i], -f)
Re: [Mesa-dev] [PATCH] mesa: Update MESA_INFO to eliminate error
On 10/24/2013 01:13 PM, Courtney Goeltzenleuchter wrote: If a user set MESA_INFO and the OpenGL application uses a 3.0 or later context then the MESA_INFO debug output will have an error when it queries for extensions using the deprecated enum GL_EXTENSIONS. Passing context argument allows code to return extension list directly regardless of profile. Commit title updated as recommended by Kenneth Graunke. --- Reviewed-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70864] New: classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon
https://bugs.freedesktop.org/show_bug.cgi?id=70864 Priority: medium Bug ID: 70864 Assignee: mesa-dev@lists.freedesktop.org Summary: classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon Severity: normal Classification: Unclassified OS: Linux (All) Reporter: fabio@libero.it Hardware: x86 (IA32) Status: NEW Version: git Component: Mesa core Product: Mesa I noticed (using ldd, incidentally while testing the new mesa_dri_drivers.so) the classic drivers (radeon, r200, i915, i965, nouveau_vieux) links to all three of libdrm_intel / libdrm_nouveau / libdrm_radeon, while only the matching one should be needed. Gallium drivers (i915, r300, r600, radeonsi, nouveau) are OK, linking only to their libdrm. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70864] classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon
https://bugs.freedesktop.org/show_bug.cgi?id=70864 --- Comment #1 from Fabio Pedretti fabio@libero.it --- It looks every classic driver (including swrast) now includes all the classic drivers, they are more or less a copy of mesa_dri_drivers.so. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 69437] Composite Bypass no longer works
https://bugs.freedesktop.org/show_bug.cgi?id=69437 U. Artie Eoff ullysses.a.e...@intel.com changed: What|Removed |Added Status|RESOLVED|VERIFIED --- Comment #8 from U. Artie Eoff ullysses.a.e...@intel.com --- Verified fixed on both master and 9.2 branches... Thanks! -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70864] classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon
https://bugs.freedesktop.org/show_bug.cgi?id=70864 Matt Turner matts...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |NOTABUG --- Comment #2 from Matt Turner matts...@gmail.com --- (In reply to comment #1) It looks every classic driver (including swrast) now includes all the classic drivers, they are more or less a copy of mesa_dri_drivers.so. They are in fact exact copies -- they're hardlinks. The point of mega drivers was to link all of the (classic) drivers into a single file. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70864] classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon
https://bugs.freedesktop.org/show_bug.cgi?id=70864 --- Comment #3 from Fabio Pedretti fabio@libero.it --- Thanks but it doesn't look they are hardlinks anyway, the md5sum all differ, also their size is slightly different. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 34495] Selecting objects in Blender 2.56 slow due the software gl_select mode
https://bugs.freedesktop.org/show_bug.cgi?id=34495 --- Comment #74 from hapoofesg...@goingon.ir --- S(In reply to comment #73) (In reply to comment #71) (In reply to comment #70) I wonder if the fix has been committed? I am using Debian testing with the mesa 9.1-7 package provided in the testing repository, and the selections with Blender are still very slow. It has been merged to master in mesa 9.2. It hasn't been merged yet. So why i don't have any selection problems? I'm currently running fedora 19 mesa 9.2.1 and i don't see those slow selections anymore. BTW sorry if i was/am wrong. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70864] classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon
https://bugs.freedesktop.org/show_bug.cgi?id=70864 --- Comment #4 from Johannes Obermayr johannesoberm...@gmx.de --- (In reply to comment #2) The point of mega drivers was to link all of the (classic) drivers into a single file. ... to waste memory on runtime if you make use of packages provided by distributions which contain all classic drivers ... I bet the solution to make only required symbols PUBLIC in former libdricore and libgallium isn't that much worse like you propagate. I just want to mention that there was never a comparision to my patchset which closes a lot of symbols for libdricore's replacement: http://lists.freedesktop.org/archives/mesa-dev/2013-September/044593.html -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] llvmpipe: fix bogus layer clamping in setup
From: Roland Scheidegger srol...@vmware.com The layer coming from GS needs to be clamped (not sure if that's actually the correct error behavior but we need something) as the number can be higher than the amount of layers in the fb. However, this code was using the layer calculation from the scene, and this was actually calculated in lp_scene_begin_rasterization() hence too late (so setup was using the value from the _previous_ scene or just zero if it was the first scene). Since the value is used in both rasterization and setup, move calculation up to lp_scene_begin_binning() though it's a bit more inconvenient to calculate there. (Theoretically could move _all_ code which was in lp_scene_begin_rasterization() to there, because ever since we got rid of swizzled render/depth buffers our map functions preparing the fb data for render don't actually change the data in there at all, but it feels like it would be a hack.) --- src/gallium/drivers/llvmpipe/lp_scene.c | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_scene.c b/src/gallium/drivers/llvmpipe/lp_scene.c index 2abbd25..483bfa5 100644 --- a/src/gallium/drivers/llvmpipe/lp_scene.c +++ b/src/gallium/drivers/llvmpipe/lp_scene.c @@ -151,7 +151,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene) { const struct pipe_framebuffer_state *fb = scene-fb; int i; - unsigned max_layer = ~0; //LP_DBG(DEBUG_RAST, %s\n, __FUNCTION__); @@ -162,7 +161,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene) cbuf-u.tex.level); scene-cbufs[i].layer_stride = llvmpipe_layer_stride(cbuf-texture, cbuf-u.tex.level); - max_layer = MIN2(max_layer, cbuf-u.tex.last_layer - cbuf-u.tex.first_layer); scene-cbufs[i].map = llvmpipe_resource_map(cbuf-texture, cbuf-u.tex.level, @@ -173,7 +171,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene) struct llvmpipe_resource *lpr = llvmpipe_resource(cbuf-texture); unsigned pixstride = util_format_get_blocksize(cbuf-format); scene-cbufs[i].stride = cbuf-texture-width0; - max_layer = 0; scene-cbufs[i].map = lpr-data; scene-cbufs[i].map += cbuf-u.buf.first_element * pixstride; @@ -184,15 +181,12 @@ lp_scene_begin_rasterization(struct lp_scene *scene) struct pipe_surface *zsbuf = scene-fb.zsbuf; scene-zsbuf.stride = llvmpipe_resource_stride(zsbuf-texture, zsbuf-u.tex.level); scene-zsbuf.layer_stride = llvmpipe_layer_stride(zsbuf-texture, zsbuf-u.tex.level); - max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer - zsbuf-u.tex.first_layer); scene-zsbuf.map = llvmpipe_resource_map(zsbuf-texture, zsbuf-u.tex.level, zsbuf-u.tex.first_layer, LP_TEX_USAGE_READ_WRITE); } - - scene-fb_max_layer = max_layer; } @@ -506,6 +500,9 @@ end: void lp_scene_begin_binning( struct lp_scene *scene, struct pipe_framebuffer_state *fb, boolean discard ) { + int i; + unsigned max_layer = ~0; + assert(lp_scene_is_empty(scene)); scene-discard = discard; @@ -513,9 +510,23 @@ void lp_scene_begin_binning( struct lp_scene *scene, scene-tiles_x = align(fb-width, TILE_SIZE) / TILE_SIZE; scene-tiles_y = align(fb-height, TILE_SIZE) / TILE_SIZE; - assert(scene-tiles_x = TILES_X); assert(scene-tiles_y = TILES_Y); + + for (i = 0; i scene-fb.nr_cbufs; i++) { + struct pipe_surface *cbuf = scene-fb.cbufs[i]; + if (llvmpipe_resource_is_texture(cbuf-texture)) { + max_layer = MIN2(max_layer, cbuf-u.tex.last_layer - cbuf-u.tex.first_layer); + } + else { + max_layer = 0; + } + } + if (fb-zsbuf) { + struct pipe_surface *zsbuf = scene-fb.zsbuf; + max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer - zsbuf-u.tex.first_layer); + } + scene-fb_max_layer = max_layer; } -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 70864] classic drivers needlessly link to libdrm_intel / libdrm_nouveau / libdrm_radeon
https://bugs.freedesktop.org/show_bug.cgi?id=70864 --- Comment #5 from Matt Turner matts...@gmail.com --- (In reply to comment #3) Thanks but it doesn't look they are hardlinks anyway, the md5sum all differ, also their size is slightly different. Strange, that's not what I see on my system: mattst88@work-Thinkpad mesa % md5sum $(find -name '*_dri.so') e918941fd19d4afaeb5834e90c5f88a4 ./lib/swrast_dri.so e918941fd19d4afaeb5834e90c5f88a4 ./lib/r200_dri.so e918941fd19d4afaeb5834e90c5f88a4 ./lib/i915_dri.so e918941fd19d4afaeb5834e90c5f88a4 ./lib/i965_dri.so e918941fd19d4afaeb5834e90c5f88a4 ./lib/radeon_dri.so e918941fd19d4afaeb5834e90c5f88a4 ./install/usr/local/lib/dri/swrast_dri.so e918941fd19d4afaeb5834e90c5f88a4 ./install/usr/local/lib/dri/r200_dri.so e918941fd19d4afaeb5834e90c5f88a4 ./install/usr/local/lib/dri/i915_dri.so e918941fd19d4afaeb5834e90c5f88a4 ./install/usr/local/lib/dri/i965_dri.so e918941fd19d4afaeb5834e90c5f88a4 ./install/usr/local/lib/dri/radeon_dri.so -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 06/14] glsl: Add built-in functions and constants required for ARB_shader_atomic_counters.
Reviewed-by: Ian Romanick ian.d.roman...@intel.com On 10/01/2013 07:15 PM, Francisco Jerez wrote: v2: Represent atomics as GLSL intrinsics. --- src/glsl/builtin_functions.cpp | 58 + src/glsl/builtin_variables.cpp | 15 +++ src/glsl/glcpp/glcpp-parse.y| 3 +++ src/glsl/glsl_parser_extras.cpp | 6 + src/glsl/glsl_parser_extras.h | 7 + 5 files changed, 89 insertions(+) diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp index 03efb6d..d704b84 100644 --- a/src/glsl/builtin_functions.cpp +++ b/src/glsl/builtin_functions.cpp @@ -300,6 +300,13 @@ tex3d_lod(const _mesa_glsl_parse_state *state) { return tex3d(state) lod_exists_in_stage(state); } + +static bool +shader_atomic_counters(const _mesa_glsl_parse_state *state) +{ + return state-ARB_shader_atomic_counters_enable; +} + /** @} */ /**/ @@ -515,6 +522,11 @@ private: B1(fma) B2(ldexp) B2(frexp) + + ir_function_signature *_atomic_intrinsic(builtin_available_predicate avail); + ir_function_signature *_atomic_op(const char *intrinsic, + builtin_available_predicate avail); + #undef B0 #undef B1 #undef B2 @@ -621,6 +633,15 @@ builtin_builder::create_shader() void builtin_builder::create_intrinsics() { + add_function(__intrinsic_atomic_read, +_atomic_intrinsic(shader_atomic_counters), +NULL); + add_function(__intrinsic_atomic_increment, +_atomic_intrinsic(shader_atomic_counters), +NULL); + add_function(__intrinsic_atomic_predecrement, +_atomic_intrinsic(shader_atomic_counters), +NULL); } /** @@ -1856,6 +1877,20 @@ builtin_builder::create_builtins() _frexp(glsl_type::vec3_type, glsl_type::ivec3_type), _frexp(glsl_type::vec4_type, glsl_type::ivec4_type), NULL); + + add_function(atomicCounter, +_atomic_op(__intrinsic_atomic_read, + shader_atomic_counters), +NULL); + add_function(atomicCounterIncrement, +_atomic_op(__intrinsic_atomic_increment, + shader_atomic_counters), +NULL); + add_function(atomicCounterDecrement, +_atomic_op(__intrinsic_atomic_predecrement, + shader_atomic_counters), +NULL); + #undef F #undef FI #undef FIU @@ -3606,6 +3641,29 @@ builtin_builder::_frexp(const glsl_type *x_type, const glsl_type *exp_type) return sig; } + +ir_function_signature * +builtin_builder::_atomic_intrinsic(builtin_available_predicate avail) +{ + ir_variable *counter = in_var(glsl_type::atomic_uint_type, counter); + MAKE_INTRINSIC(glsl_type::uint_type, avail, 1, counter); + return sig; +} + +ir_function_signature * +builtin_builder::_atomic_op(const char *intrinsic, +builtin_available_predicate avail) +{ + ir_variable *counter = in_var(glsl_type::atomic_uint_type, atomic_counter); + MAKE_SIG(glsl_type::uint_type, avail, 1, counter); + + ir_variable *retval = body.make_temp(glsl_type::uint_type, atomic_retval); + body.emit(call(shader-symbols-get_function(intrinsic), retval, 1, + operand(counter))); + body.emit(ret(retval)); + return sig; +} + /** @} */ /**/ diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp index 6a808c0..49f0f42 100644 --- a/src/glsl/builtin_variables.cpp +++ b/src/glsl/builtin_variables.cpp @@ -555,6 +555,21 @@ builtin_variable_generator::generate_constants() */ add_const(gl_MaxTextureCoords, state-Const.MaxTextureCoords); } + + if (state-ARB_shader_atomic_counters_enable) { + add_const(gl_MaxVertexAtomicCounters, +state-Const.MaxVertexAtomicCounters); + add_const(gl_MaxGeometryAtomicCounters, +state-Const.MaxGeometryAtomicCounters); + add_const(gl_MaxFragmentAtomicCounters, +state-Const.MaxFragmentAtomicCounters); + add_const(gl_MaxCombinedAtomicCounters, +state-Const.MaxCombinedAtomicCounters); + add_const(gl_MaxAtomicCounterBindings, +state-Const.MaxAtomicBufferBindings); + add_const(gl_MaxTessControlAtomicCounters, 0); + add_const(gl_MaxTessEvaluationAtomicCounters, 0); + } } diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y index 6eaa5f9..2b4e988 100644 --- a/src/glsl/glcpp/glcpp-parse.y +++ b/src/glsl/glcpp/glcpp-parse.y @@ -1248,6 +1248,9 @@ glcpp_parser_create (const struct gl_extensions
Re: [Mesa-dev] [PATCH 02/14] glsl: Add type predicate to check whether a type contains any opaque types.
Reviewed-by: Ian Romanick ian.d.roman...@intel.com On 10/01/2013 07:15 PM, Francisco Jerez wrote: And use it to forbid comparisons of opaque operands. According to the GL 4.2 specification: Except for array indexing, structure member selection, and parentheses, opaque variables are not allowed to be operands in expressions. --- src/glsl/ast_to_hir.cpp | 4 src/glsl/glsl_types.cpp | 18 ++ src/glsl/glsl_types.h | 5 + 3 files changed, 27 insertions(+) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 99159dc..db59d0a 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1197,6 +1197,10 @@ ast_expression::hir(exec_list *instructions, !state-check_version(120, 300, loc, array comparisons forbidden)) { error_emitted = true; + } else if ((op[0]-type-contains_opaque() || + op[1]-type-contains_opaque())) { + _mesa_glsl_error(loc, state, opaque type comparisons forbidden); + error_emitted = true; } if (error_emitted) { diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp index e1fe153..a9b7eb3 100644 --- a/src/glsl/glsl_types.cpp +++ b/src/glsl/glsl_types.cpp @@ -162,6 +162,24 @@ glsl_type::contains_integer() const } } +bool +glsl_type::contains_opaque() const { + switch (base_type) { + case GLSL_TYPE_SAMPLER: + case GLSL_TYPE_ATOMIC_UINT: + return true; + case GLSL_TYPE_ARRAY: + return element_type()-contains_opaque(); + case GLSL_TYPE_STRUCT: + for (unsigned int i = 0; i length; i++) { + if (fields.structure[i].type-contains_opaque()) +return true; + } + return false; + default: + return false; + } +} gl_texture_index glsl_type::sampler_index() const diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h index d00b9e7..133b0af 100644 --- a/src/glsl/glsl_types.h +++ b/src/glsl/glsl_types.h @@ -463,6 +463,11 @@ struct glsl_type { } /** +* Return whether a type contains any opaque types. +*/ + bool contains_opaque() const; + + /** * Query the full type of a matrix row * * \return ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/18] Implement GLX_MESA_query_renderer
On 10/25/2013 03:20 AM, Dave Airlie wrote: Do either of you guys plan to implement support for this extension? The value to developers is obviously increased if more drivers support the extension. This extension was born from feedback that I received from people at FOSDEM and from various game developers at Game Developer Conference and elsewhere. I'd like to land this extension, and I haven't received any review. I know you guys are both pretty busy, so I don't expect detailed reviews. I would really appreciate a quick skim of the extension spec (patch 15) and an Acked-by or two. Is there a test app or piglit set for this? I might try and fit in looking at this, There is a piglit test... that I just sent to the list about a minute ago. :) I was also planning to update glxinfo, but I haven't gotten around to it / I forgot. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): glx: Propagate failures from SendMakeCurrentRequest where possible
On 10/08/2013 10:24 AM, Adam Jackson wrote: Module: Mesa Branch: master Commit: d101204c23ba2f593881ede357309f3924cd URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=d101204c23ba2f593881ede357309f3924cd Author: Adam Jackson a...@redhat.com Date: Fri Oct 4 09:25:51 2013 -0400 glx: Propagate failures from SendMakeCurrentRequest where possible Reviewed-by: Brian Paul bri...@vmware.com Signed-off-by: Adam Jackson a...@redhat.com --- src/glx/indirect_glx.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/src/glx/indirect_glx.c b/src/glx/indirect_glx.c index d0457fe..d27b019 100644 --- a/src/glx/indirect_glx.c +++ b/src/glx/indirect_glx.c @@ -132,6 +132,7 @@ indirect_bind_context(struct glx_context *gc, struct glx_context *old, __GLXattribute *state; Display *dpy = gc-psc-dpy; int opcode = __glXSetupForCommand(dpy); + Bool ret; if (old != dummyContext !old-isDirect old-psc-dpy == dpy) { tag = old-currentContextTag; @@ -140,8 +141,8 @@ indirect_bind_context(struct glx_context *gc, struct glx_context *old, tag = 0; } - SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read, - gc-currentContextTag); + ret = SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read, +gc-currentContextTag); if (!IndirectAPI) IndirectAPI = __glXNewIndirectAPI(); @@ -154,7 +155,7 @@ indirect_bind_context(struct glx_context *gc, struct glx_context *old, __glXInitVertexArrayState(gc); } - return Success; + return ret; This is completely wrong. SendMakeCurrentRequest returns the value from _XReply. _XReply returns True on success, and False on failure. However, Success is 0. So now indirect_bind_context returns 1 (True) every time it is successful, and the caller interprets that to mean failure (non-Success). This is the source of https://bugs.freedesktop.org/show_bug.cgi?id=70486 } static void ___ mesa-commit mailing list mesa-com...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-commit ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: implement multisample textures
On 21/10/13 23:23, Bryan Cain wrote: This is a port of 4da54c91d24da (nvc0: implement multisample textures) to nv50. When coupled with the patch to only report 16 texture samplers (to fix crashes), all of the Piglit tests in spec/arb_texture_multisample pass. Hello Bryan, Big thanks for your work. As promised here is a quick piglit summary on my nv96 pass/fail/crash 69/32/27 * dmesg does not spit anything nouveau related during the tests * any geometry shader related tests were skipped (piglit: info: Failed to create GL 3.2 core context) * all the crashes are due to the following assert codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed. PASSarb_texture_multisample-* PASSfb-completeness/* FAILsample-position/* FAILtexelFetch fs sampler2DMS 4* CRASH texelFetch fs sampler2DMSArray 4* FAILtexelFetch/*-*s-isampler2DMS CRASH texelFetch/*-*s-isampler2DMSArray PASStextureSize/* Hope you find this useful :) No real world apps that use multisample textures were tested, yet. Cheers Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glx: Fix return value from indirect_bind_context
_XReply returns 1 on success, but indirect_bind_context returns 0 on success. Signed-off-by: Adam Jackson a...@redhat.com --- src/glx/indirect_glx.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/glx/indirect_glx.c b/src/glx/indirect_glx.c index d27b019..28b8cd0 100644 --- a/src/glx/indirect_glx.c +++ b/src/glx/indirect_glx.c @@ -132,7 +132,7 @@ indirect_bind_context(struct glx_context *gc, struct glx_context *old, __GLXattribute *state; Display *dpy = gc-psc-dpy; int opcode = __glXSetupForCommand(dpy); - Bool ret; + Bool sent; if (old != dummyContext !old-isDirect old-psc-dpy == dpy) { tag = old-currentContextTag; @@ -141,8 +141,8 @@ indirect_bind_context(struct glx_context *gc, struct glx_context *old, tag = 0; } - ret = SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read, -gc-currentContextTag); + sent = SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read, +gc-currentContextTag); if (!IndirectAPI) IndirectAPI = __glXNewIndirectAPI(); @@ -155,7 +155,7 @@ indirect_bind_context(struct glx_context *gc, struct glx_context *old, __glXInitVertexArrayState(gc); } - return ret; + return !sent; } static void -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] gallium/auxiliary/indices: add start param
From: Rob Clark robcl...@freedesktop.org Add 'start' parameter to generator/translator. Signed-off-by: Rob Clark robcl...@freedesktop.org --- src/gallium/auxiliary/indices/u_indices.c | 6 -- src/gallium/auxiliary/indices/u_indices.h | 4 +++- src/gallium/auxiliary/indices/u_indices_gen.py | 21 +++-- src/gallium/auxiliary/indices/u_unfilled_gen.py| 13 +++-- src/gallium/auxiliary/indices/u_unfilled_indices.c | 19 --- src/gallium/drivers/svga/svga_draw_arrays.c| 3 ++- src/gallium/drivers/svga/svga_draw_elements.c | 1 + 7 files changed, 40 insertions(+), 27 deletions(-) diff --git a/src/gallium/auxiliary/indices/u_indices.c b/src/gallium/auxiliary/indices/u_indices.c index 72c46f7..30b54b9 100644 --- a/src/gallium/auxiliary/indices/u_indices.c +++ b/src/gallium/auxiliary/indices/u_indices.c @@ -26,17 +26,19 @@ #include u_indices_priv.h static void translate_memcpy_ushort( const void *in, + unsigned start, unsigned nr, void *out ) { - memcpy(out, in, nr*sizeof(short)); + memcpy(out, ((short *)in)[start], nr*sizeof(short)); } static void translate_memcpy_uint( const void *in, + unsigned start, unsigned nr, void *out ) { - memcpy(out, in, nr*sizeof(int)); + memcpy(out, ((int *)in)[start], nr*sizeof(int)); } diff --git a/src/gallium/auxiliary/indices/u_indices.h b/src/gallium/auxiliary/indices/u_indices.h index be522c6..922bfe6 100644 --- a/src/gallium/auxiliary/indices/u_indices.h +++ b/src/gallium/auxiliary/indices/u_indices.h @@ -32,10 +32,12 @@ #define PV_COUNT 2 typedef void (*u_translate_func)( const void *in, + unsigned start, unsigned nr, void *out ); -typedef void (*u_generate_func)( unsigned nr, +typedef void (*u_generate_func)( unsigned start, + unsigned nr, void *out ); diff --git a/src/gallium/auxiliary/indices/u_indices_gen.py b/src/gallium/auxiliary/indices/u_indices_gen.py index af63d09..2714df8 100644 --- a/src/gallium/auxiliary/indices/u_indices_gen.py +++ b/src/gallium/auxiliary/indices/u_indices_gen.py @@ -153,6 +153,7 @@ def preamble(intype, outtype, inpv, outpv, prim): print 'static void ' + name( intype, outtype, inpv, outpv, prim ) + '(' if intype != GENERATE: print 'const void * _in,' +print 'unsigned start,' print 'unsigned nr,' print 'void *_out )' print '{' @@ -168,28 +169,28 @@ def postamble(): def points(intype, outtype, inpv, outpv): preamble(intype, outtype, inpv, outpv, prim='points') -print ' for (i = 0; i nr; i++) { ' +print ' for (i = start; i (nr+start); i++) { ' do_point( intype, outtype, 'out+i', 'i' ); print ' }' postamble() def lines(intype, outtype, inpv, outpv): preamble(intype, outtype, inpv, outpv, prim='lines') -print ' for (i = 0; i nr; i+=2) { ' +print ' for (i = start; i (nr+start); i+=2) { ' do_line( intype, outtype, 'out+i', 'i', 'i+1', inpv, outpv ); print ' }' postamble() def linestrip(intype, outtype, inpv, outpv): preamble(intype, outtype, inpv, outpv, prim='linestrip') -print ' for (j = i = 0; j nr; j+=2, i++) { ' +print ' for (i = start, j = 0; j nr; j+=2, i++) { ' do_line( intype, outtype, 'out+j', 'i', 'i+1', inpv, outpv ); print ' }' postamble() def lineloop(intype, outtype, inpv, outpv): preamble(intype, outtype, inpv, outpv, prim='lineloop') -print ' for (j = i = 0; j nr - 2; j+=2, i++) { ' +print ' for (i = start, j = 0; j nr - 2; j+=2, i++) { ' do_line( intype, outtype, 'out+j', 'i', 'i+1', inpv, outpv ); print ' }' do_line( intype, outtype, 'out+j', 'i', '0', inpv, outpv ); @@ -197,7 +198,7 @@ def lineloop(intype, outtype, inpv, outpv): def tris(intype, outtype, inpv, outpv): preamble(intype, outtype, inpv, outpv, prim='tris') -print ' for (i = 0; i nr; i+=3) { ' +print ' for (i = start; i (nr+start); i+=3) { ' do_tri( intype, outtype, 'out+i', 'i', 'i+1', 'i+2', inpv, outpv ); print ' }' postamble() @@ -205,7 +206,7 @@ def tris(intype, outtype, inpv, outpv): def tristrip(intype, outtype, inpv, outpv): preamble(intype, outtype, inpv, outpv, prim='tristrip') -print ' for (j = i = 0; j nr; j+=3, i++) { ' +print ' for (i = start, j = 0; j nr; j+=3, i++) { ' if inpv == FIRST: do_tri( intype, outtype, 'out+j', 'i', 'i+1+(i1)', 'i+2-(i1)', inpv, outpv ); else: @@ -216,7 +217,7 @@ def tristrip(intype,
[Mesa-dev] [PATCH 0/3] Add u_primconvert front-end to u_indices
From: Rob Clark robcl...@freedesktop.org This patchset (compared to RFC I sent previously) changes u_primconvert to just be a front-end to the u_indices stuff. It handles binding/ restoring new index buffer state, etc. So driver using it just has to put this at the top of their pipe-draw_vbo(): if (prim_needs_emulating) { util_primconvert_save_index_buffer(ctx-primconvert, ctx-indexbuf); util_primconvert_save_rasterizer_state(ctx-primconvert, ctx-rasterizer); util_primconvert_draw_vbo(ctx-primconvert, info); return; } It does not yet handle changing provoking vertex (since I didn't need this), but that looks like it should be easy to add. I considered first just using u_indices directly from freedreno, like svga does. But it is at least more complex than it needs to be and it seemed like eventually more code could be shared with this approach. I suspect some of the index buffer caching done in svga could be moved into u_primconvert and shared between svga and freedreno (and any other future drivers for GLES hw which might need the same thing). The last patch converts freedreno over to use this. It depends on another patch with regenerated envytools headers (updated to take into account differences between a20x/a22x/a3xx) for the draw initiator. Rob Clark (3): gallium/auxiliary/indices: add start param gallium/auxiliary/indices: add u_primconvert freedreno: emulated unsupported primitive types src/gallium/auxiliary/Makefile.sources | 1 + src/gallium/auxiliary/indices/u_indices.c | 6 +- src/gallium/auxiliary/indices/u_indices.h | 4 +- src/gallium/auxiliary/indices/u_indices_gen.py | 21 +-- src/gallium/auxiliary/indices/u_primconvert.c | 171 + src/gallium/auxiliary/indices/u_primconvert.h | 46 ++ src/gallium/auxiliary/indices/u_unfilled_gen.py| 13 +- src/gallium/auxiliary/indices/u_unfilled_indices.c | 19 ++- src/gallium/drivers/freedreno/a2xx/fd2_context.c | 24 ++- src/gallium/drivers/freedreno/a3xx/fd3_context.c | 12 +- src/gallium/drivers/freedreno/freedreno_context.c | 16 +- src/gallium/drivers/freedreno/freedreno_context.h | 18 ++- src/gallium/drivers/freedreno/freedreno_draw.c | 29 ++-- src/gallium/drivers/svga/svga_draw_arrays.c| 3 +- src/gallium/drivers/svga/svga_draw_elements.c | 1 + 15 files changed, 332 insertions(+), 52 deletions(-) create mode 100644 src/gallium/auxiliary/indices/u_primconvert.c create mode 100644 src/gallium/auxiliary/indices/u_primconvert.h -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] gallium/auxiliary/indices: add u_primconvert
From: Rob Clark robcl...@freedesktop.org A convenient front end to indices generate/translate code, for emulating primitives which are not supported natively by the driver. This handles saving/restoring index buffer state, etc. Signed-off-by: Rob Clark robcl...@freedesktop.org --- src/gallium/auxiliary/Makefile.sources| 1 + src/gallium/auxiliary/indices/u_primconvert.c | 171 ++ src/gallium/auxiliary/indices/u_primconvert.h | 46 +++ 3 files changed, 218 insertions(+) create mode 100644 src/gallium/auxiliary/indices/u_primconvert.c create mode 100644 src/gallium/auxiliary/indices/u_primconvert.h diff --git a/src/gallium/auxiliary/Makefile.sources b/src/gallium/auxiliary/Makefile.sources index acbcef7..c89cbdd 100644 --- a/src/gallium/auxiliary/Makefile.sources +++ b/src/gallium/auxiliary/Makefile.sources @@ -43,6 +43,7 @@ C_SOURCES := \ hud/hud_cpu.c \ hud/hud_fps.c \ hud/hud_driver_query.c \ + indices/u_primconvert.c \ os/os_misc.c \ os/os_process.c \ os/os_time.c \ diff --git a/src/gallium/auxiliary/indices/u_primconvert.c b/src/gallium/auxiliary/indices/u_primconvert.c new file mode 100644 index 000..f7cf349 --- /dev/null +++ b/src/gallium/auxiliary/indices/u_primconvert.c @@ -0,0 +1,171 @@ +/* -*- mode: C; c-file-style: kr; tab-width 4; indent-tabs-mode: t; -*- */ + +/* + * Copyright (C) 2013 Rob Clark robcl...@freedesktop.org + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * Authors: + *Rob Clark robcl...@freedesktop.org + */ + +/** + * This module converts provides a more convenient front-end to u_indices, + * etc, utils to convert primitive types supported not supported by the + * hardware. It handles binding new index buffer state, and restoring + * previous state after. To use, put something like this at the front of + * drivers pipe-draw_vbo(): + * + *// emulate unsupported primitives: + *if (info-mode needs emulating) { + * util_primconvert_save_index_buffer(ctx-primconvert, ctx-indexbuf); + * util_primconvert_save_rasterizer_state(ctx-primconvert, ctx-rasterizer); + * util_primconvert_draw_vbo(ctx-primconvert, info); + * return; + *} + * + */ + +#include pipe/p_state.h +#include util/u_memory.h +#include util/u_inlines.h + +#include indices/u_primconvert.h +#include indices/u_indices.h + +struct primconvert_context { + struct pipe_context *pipe; + struct pipe_index_buffer saved_ib; + uint32_t primtypes_mask; + unsigned api_pv; + // TODO we could cache/recycle the indexbuf created to translate prims.. +}; + + +struct primconvert_context *util_primconvert_create(struct pipe_context *pipe, + uint32_t primtypes_mask) +{ + struct primconvert_context *pc = CALLOC_STRUCT(primconvert_context); + if (!pc) + return NULL; + pc-pipe = pipe; + pc-primtypes_mask = primtypes_mask; + return pc; +} + +void util_primconvert_destroy(struct primconvert_context *pc) +{ + util_primconvert_save_index_buffer(pc, NULL); + free(pc); +} + +void util_primconvert_save_index_buffer(struct primconvert_context *pc, + const struct pipe_index_buffer *ib) +{ + if (ib) { + pipe_resource_reference(pc-saved_ib.buffer, ib-buffer); + pc-saved_ib.index_size = ib-index_size; + pc-saved_ib.offset = ib-offset; + pc-saved_ib.user_buffer = ib-user_buffer; + } else { + pipe_resource_reference(pc-saved_ib.buffer, NULL); + } +} + +void util_primconvert_save_rasterizer_state(struct primconvert_context *pc, + const struct pipe_rasterizer_state *rast) +{ + /* if we actually translated the provoking vertex for the buffer, +* we would actually need to save/restore rasterizer state. As +* it is, we
[Mesa-dev] [PATCH 3/3] freedreno: emulated unsupported primitive types
From: Rob Clark robcl...@freedesktop.org Use u_primconvert to convert unsupported primitives into supported primitive plus index buffer. Signed-off-by: Rob Clark robcl...@freedesktop.org --- src/gallium/drivers/freedreno/a2xx/fd2_context.c | 24 ++- src/gallium/drivers/freedreno/a3xx/fd3_context.c | 12 +- src/gallium/drivers/freedreno/freedreno_context.c | 16 +++-- src/gallium/drivers/freedreno/freedreno_context.h | 18 +- src/gallium/drivers/freedreno/freedreno_draw.c| 29 +++ 5 files changed, 74 insertions(+), 25 deletions(-) diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_context.c b/src/gallium/drivers/freedreno/a2xx/fd2_context.c index a319275..ec9eaf6 100644 --- a/src/gallium/drivers/freedreno/a2xx/fd2_context.c +++ b/src/gallium/drivers/freedreno/a2xx/fd2_context.c @@ -67,9 +67,29 @@ create_solid_vertexbuf(struct pipe_context *pctx) return prsc; } +static const uint8_t a22x_primtypes[PIPE_PRIM_MAX] = { + [PIPE_PRIM_POINTS] = DI_PT_POINTLIST_A2XX, + [PIPE_PRIM_LINES] = DI_PT_LINELIST, + [PIPE_PRIM_LINE_STRIP] = DI_PT_LINESTRIP, + [PIPE_PRIM_LINE_LOOP] = DI_PT_LINELOOP, + [PIPE_PRIM_TRIANGLES] = DI_PT_TRILIST, + [PIPE_PRIM_TRIANGLE_STRIP] = DI_PT_TRISTRIP, + [PIPE_PRIM_TRIANGLE_FAN] = DI_PT_TRIFAN, +}; + +static const uint8_t a20x_primtypes[PIPE_PRIM_MAX] = { + [PIPE_PRIM_POINTS] = DI_PT_POINTLIST_A2XX, + [PIPE_PRIM_LINES] = DI_PT_LINELIST, + [PIPE_PRIM_LINE_STRIP] = DI_PT_LINESTRIP, + [PIPE_PRIM_TRIANGLES] = DI_PT_TRILIST, + [PIPE_PRIM_TRIANGLE_STRIP] = DI_PT_TRISTRIP, + [PIPE_PRIM_TRIANGLE_FAN] = DI_PT_TRIFAN, +}; + struct pipe_context * fd2_context_create(struct pipe_screen *pscreen, void *priv) { + struct fd_screen *screen = fd_screen(pscreen); struct fd2_context *fd2_ctx = CALLOC_STRUCT(fd2_context); struct pipe_context *pctx; @@ -88,7 +108,9 @@ fd2_context_create(struct pipe_screen *pscreen, void *priv) fd2_texture_init(pctx); fd2_prog_init(pctx); - pctx = fd_context_init(fd2_ctx-base, pscreen, priv); + pctx = fd_context_init(fd2_ctx-base, pscreen, + (screen-gpu_id = 220) ? a22x_primtypes : a20x_primtypes, + priv); if (!pctx) return NULL; diff --git a/src/gallium/drivers/freedreno/a3xx/fd3_context.c b/src/gallium/drivers/freedreno/a3xx/fd3_context.c index 589aeed..13f91e9 100644 --- a/src/gallium/drivers/freedreno/a3xx/fd3_context.c +++ b/src/gallium/drivers/freedreno/a3xx/fd3_context.c @@ -82,6 +82,16 @@ create_blit_texcoord_vertexbuf(struct pipe_context *pctx) return prsc; } +static const uint8_t primtypes[PIPE_PRIM_MAX] = { + [PIPE_PRIM_POINTS] = DI_PT_POINTLIST_A3XX, + [PIPE_PRIM_LINES] = DI_PT_LINELIST, + [PIPE_PRIM_LINE_STRIP] = DI_PT_LINESTRIP, + [PIPE_PRIM_LINE_LOOP] = DI_PT_LINELOOP, + [PIPE_PRIM_TRIANGLES] = DI_PT_TRILIST, + [PIPE_PRIM_TRIANGLE_STRIP] = DI_PT_TRISTRIP, + [PIPE_PRIM_TRIANGLE_FAN] = DI_PT_TRIFAN, +}; + struct pipe_context * fd3_context_create(struct pipe_screen *pscreen, void *priv) { @@ -106,7 +116,7 @@ fd3_context_create(struct pipe_screen *pscreen, void *priv) fd3_texture_init(pctx); fd3_prog_init(pctx); - pctx = fd_context_init(fd3_ctx-base, pscreen, priv); + pctx = fd_context_init(fd3_ctx-base, pscreen, primtypes, priv); if (!pctx) return NULL; diff --git a/src/gallium/drivers/freedreno/freedreno_context.c b/src/gallium/drivers/freedreno/freedreno_context.c index 96e1ef6..ddb8a0b 100644 --- a/src/gallium/drivers/freedreno/freedreno_context.c +++ b/src/gallium/drivers/freedreno/freedreno_context.c @@ -123,6 +123,9 @@ fd_context_destroy(struct pipe_context *pctx) if (ctx-blitter) util_blitter_destroy(ctx-blitter); + if (ctx-primconvert) + util_primconvert_destroy(ctx-primconvert); + fd_ringmarker_del(ctx-draw_start); fd_ringmarker_del(ctx-draw_end); fd_ringbuffer_del(ctx-ring); @@ -131,8 +134,8 @@ fd_context_destroy(struct pipe_context *pctx) } struct pipe_context * -fd_context_init(struct fd_context *ctx, - struct pipe_screen *pscreen, void *priv) +fd_context_init(struct fd_context *ctx, struct pipe_screen *pscreen, + const uint8_t *primtypes, void *priv) { struct fd_screen *screen = fd_screen(pscreen); struct pipe_context *pctx; @@ -140,6 +143,12 @@ fd_context_init(struct fd_context *ctx, ctx-screen = screen; + ctx-primtypes = primtypes; + ctx-primtype_mask = 0; +
Re: [Mesa-dev] [PATCH] glx: Fix return value from indirect_bind_context
On 10/25/2013 12:14 PM, Adam Jackson wrote: _XReply returns 1 on success, but indirect_bind_context returns 0 on success. Signed-off-by: Adam Jackson a...@redhat.com Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70486 Reviewed-and-tested-by: Ian Romanick ian.d.roman...@intel.com The other way to fix this would be to make glx_context_vtable::bind return Bool instead of int. That would be more work, but it may be a tiny bit cleaner in the end. *shrug* --- src/glx/indirect_glx.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/glx/indirect_glx.c b/src/glx/indirect_glx.c index d27b019..28b8cd0 100644 --- a/src/glx/indirect_glx.c +++ b/src/glx/indirect_glx.c @@ -132,7 +132,7 @@ indirect_bind_context(struct glx_context *gc, struct glx_context *old, __GLXattribute *state; Display *dpy = gc-psc-dpy; int opcode = __glXSetupForCommand(dpy); - Bool ret; + Bool sent; if (old != dummyContext !old-isDirect old-psc-dpy == dpy) { tag = old-currentContextTag; @@ -141,8 +141,8 @@ indirect_bind_context(struct glx_context *gc, struct glx_context *old, tag = 0; } - ret = SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read, -gc-currentContextTag); + sent = SendMakeCurrentRequest(dpy, opcode, gc-xid, tag, draw, read, + gc-currentContextTag); if (!IndirectAPI) IndirectAPI = __glXNewIndirectAPI(); @@ -155,7 +155,7 @@ indirect_bind_context(struct glx_context *gc, struct glx_context *old, __glXInitVertexArrayState(gc); } - return ret; + return !sent; } static void ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.
On 10/21/2013 05:55 PM, Eric Anholt wrote: [snip] This interface means synchronizing with the GPU, which sucks when we have the ability to actually do DTFB in the hardware pipeline (Indirect Parameter Enable of 3DPRIMITIVE). It's not that simple. The 3DPRIMITIVE indirect registers require you to specify a vertex count (which should be the number of vertices actually written to the SO buffer, which may be less than you asked for due to overflow). As far as I can tell, the Gen7 SOL stage has no mechanism to give you the number of vertices written to the SOL buffer. There is SO_NUM_PRIMS_WRITTEN(0-3), which gives you the number of primitives actually written. For POINTS, this works since each primitive is a single vertex. But for LINES and TRIANGLES, you need to multiply this count by 2 or 3 vertices per primitive. Haswell has an MI_MATH command which might be usable for this. But on Ivybridge, I don't know how to do this other than writing a shader program that reads from the buffer, does the multiplication, and writes it back out (and draw a single point). Then MI_LOAD_REGISTER_MEM it into the indirect vertex count register. That might work, but is it better? The other complexity is PauseTransformFeedback and switching. The vertex count is the # of verts actually written between Begin/End on a single object. If you have two objects, you might do: Begin A, draw, Pause A, Begin B, draw, End B, Resume A, draw, End A. But there is only one SO_NUM_PRIMS_WRITTEN register, which is intended to be free running. If you leave it free running, you need to take snapshots at Begin/End/Pause/Resume and subtract deltas to get the actual number of primitives written...then do the multiplication above. We could violate the free running assumption and set SO_NUM_PRIMS_WRITTEN to 0 on Begin, and save/restore it on Pause/Resume. Then the value at End would be the final value, and we wouldn't have to deal with deltas, which would be simpler. I'm open to trying that if people would prefer it. Maybe I am fundamentally missing something here, but it seems far from obvious to me how to use draw indirect to do this properly. On Ivybridge doing it on the GPU sounds very complex and heavyweight. Haswell could probably do it if we adopt the save/restore approach. We could mostly use the hw pipelined version only, as long as we had core contexts (meaning that we don't need vertex start/count to figure out how much user vertex array data to upload). Right, so we'd need this for that case, anyway. But, given that we have sw primitive restart on some lame hardware that we want to support this on, we've got to have this path anyway. Where by lame hardware you mean Ivybridge. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.
On 10/22/2013 04:30 AM, Marek Olšák wrote: On Fri, Oct 18, 2013 at 8:09 AM, Kenneth Graunke kenn...@whitecape.org wrote: DrawTransformFeedback() needs to obtain the number of vertices written to a particular stream during the last Begin/EndTransformFeedback block. The new driver hook returns exactly that information. Gallium drivers already implement this functionality by passing the transform feedback object to the drawing function. I prefer to avoid this for two reasons: 1. Complexity: Normally, the drawing function takes an array of _mesa_prim objects, each of which specifies a vertex count. If tfb_vertcount != NULL, however, there will only be one _mesa_prim object with an invalid vertex count (of 1), so it needs to be ignored. Since the _mesa_prim pointers are const, you can't even override it to the proper value; you need to pass around extra ignore that, here's the real count parameters. The drawing function is already terribly complicated, so I don't want to make it even more complicated. I don't understand this. Are you saying that the software emulation of the feature is always better because of complexity the real hardware-accelerated solution would have? On Ivybridge hardware, I think that a GPU-only implementation of DrawTransformFeedback would be very complicated, and probably less efficient than this (extremely simple) software solution. It might be possible to do a reasonable GPU-only implementation on Haswell, but I haven't looked into the details yet. (See my reply to Eric.) At least for Ivybridge, I think I want this software path 100% of the time. We may want to remove the stall on Haswell as a later optimization. It sounds like for Gallium, you already have a decent GPU-only solution. I tried to follow that code to understand how it works, and got lost after jumping through around 5 files...which is probably just my poor understanding of the Gallium architecture. [snip] diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c index 1670409..11bb76a 100644 --- a/src/mesa/vbo/vbo_exec_array.c +++ b/src/mesa/vbo/vbo_exec_array.c @@ -1464,6 +1464,12 @@ vbo_draw_transform_feedback(struct gl_context *ctx, GLenum mode, return; } + if (ctx-Driver.GetTransformFeedbackVertexCount) { + GLsizei n = ctx-Driver.GetTransformFeedbackVertexCount(ctx, obj, stream); + vbo_draw_arrays(ctx, mode, 0, n, numInstances, 0); + return; + } As you mentioned, the only issue is with primitive restart, so why is this done even if primitive restart is disabled? Drivers which will have to implement this just to make e.g. non-VBO vertex uploads work will suffer from the CPU-GPU synchronization this code forces. Marek I hadn't thought about non-VBO vertex uploads. What does Gallium do in that case? Has it just been broken this whole time? I guess I figured drivers would either implement this hook, or do the tfb_vertcount approach, but not both. Maybe that's a bad assumption. --Ken ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: fix bogus layer clamping in setup
On 10/25/2013 10:14 AM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com The layer coming from GS needs to be clamped (not sure if that's actually the correct error behavior but we need something) as the number can be higher than the amount of layers in the fb. However, this code was using the layer calculation from the scene, and this was actually calculated in lp_scene_begin_rasterization() hence too late (so setup was using the value from the _previous_ scene or just zero if it was the first scene). Since the value is used in both rasterization and setup, move calculation up to lp_scene_begin_binning() though it's a bit more inconvenient to calculate there. (Theoretically could move _all_ code which was in lp_scene_begin_rasterization() to there, because ever since we got rid of swizzled render/depth buffers our map functions preparing the fb data for render don't actually change the data in there at all, but it feels like it would be a hack.) --- src/gallium/drivers/llvmpipe/lp_scene.c | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_scene.c b/src/gallium/drivers/llvmpipe/lp_scene.c index 2abbd25..483bfa5 100644 --- a/src/gallium/drivers/llvmpipe/lp_scene.c +++ b/src/gallium/drivers/llvmpipe/lp_scene.c @@ -151,7 +151,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene) { const struct pipe_framebuffer_state *fb = scene-fb; int i; - unsigned max_layer = ~0; //LP_DBG(DEBUG_RAST, %s\n, __FUNCTION__); @@ -162,7 +161,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene) cbuf-u.tex.level); scene-cbufs[i].layer_stride = llvmpipe_layer_stride(cbuf-texture, cbuf-u.tex.level); - max_layer = MIN2(max_layer, cbuf-u.tex.last_layer - cbuf-u.tex.first_layer); scene-cbufs[i].map = llvmpipe_resource_map(cbuf-texture, cbuf-u.tex.level, @@ -173,7 +171,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene) struct llvmpipe_resource *lpr = llvmpipe_resource(cbuf-texture); unsigned pixstride = util_format_get_blocksize(cbuf-format); scene-cbufs[i].stride = cbuf-texture-width0; - max_layer = 0; scene-cbufs[i].map = lpr-data; scene-cbufs[i].map += cbuf-u.buf.first_element * pixstride; @@ -184,15 +181,12 @@ lp_scene_begin_rasterization(struct lp_scene *scene) struct pipe_surface *zsbuf = scene-fb.zsbuf; scene-zsbuf.stride = llvmpipe_resource_stride(zsbuf-texture, zsbuf-u.tex.level); scene-zsbuf.layer_stride = llvmpipe_layer_stride(zsbuf-texture, zsbuf-u.tex.level); - max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer - zsbuf-u.tex.first_layer); scene-zsbuf.map = llvmpipe_resource_map(zsbuf-texture, zsbuf-u.tex.level, zsbuf-u.tex.first_layer, LP_TEX_USAGE_READ_WRITE); } - - scene-fb_max_layer = max_layer; } @@ -506,6 +500,9 @@ end: void lp_scene_begin_binning( struct lp_scene *scene, struct pipe_framebuffer_state *fb, boolean discard ) { + int i; + unsigned max_layer = ~0; + assert(lp_scene_is_empty(scene)); scene-discard = discard; @@ -513,9 +510,23 @@ void lp_scene_begin_binning( struct lp_scene *scene, scene-tiles_x = align(fb-width, TILE_SIZE) / TILE_SIZE; scene-tiles_y = align(fb-height, TILE_SIZE) / TILE_SIZE; - assert(scene-tiles_x = TILES_X); assert(scene-tiles_y = TILES_Y); + Maybe add a comment here indicating what we're doing. + for (i = 0; i scene-fb.nr_cbufs; i++) { + struct pipe_surface *cbuf = scene-fb.cbufs[i]; + if (llvmpipe_resource_is_texture(cbuf-texture)) { + max_layer = MIN2(max_layer, cbuf-u.tex.last_layer - cbuf-u.tex.first_layer); + } + else { + max_layer = 0; + } + } + if (fb-zsbuf) { + struct pipe_surface *zsbuf = scene-fb.zsbuf; + max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer - zsbuf-u.tex.first_layer); + } + scene-fb_max_layer = max_layer; Suppose we have a layered color buffer and layered Z/S buffer, but the number of layers differs. Are you sure we shouldn't be using separate max_layers for color vs. Z/S? For the time being though I'm fine with the code as-is though. Reviewed-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glx: Fix return value from indirect_bind_context
On Fri, 2013-10-25 at 12:59 -0700, Ian Romanick wrote: Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70486 Reviewed-and-tested-by: Ian Romanick ian.d.roman...@intel.com Pushed, thanks and sorry. - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Drop no-op shifts by 0.
Erik Faye-Lund kusmab...@gmail.com writes: Why is this tagged as i965/fs, when everything seems to happen in the glsl-optimizer? On Thu, Oct 24, 2013 at 5:53 PM, Eric Anholt e...@anholt.net wrote: I noticed this in a shader in Unigine Heaven that was spilling. While it doesn't really reduce register pressure, it shaves a few instructions anyway (7955 - 7882). --- src/glsl/opt_algebraic.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 37b2f02..ff06cfc 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -387,6 +387,14 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) } break; + case ir_binop_rshift: + case ir_binop_lshift: + if (is_vec_zero(op_const[0])) + return ir-operands[1]; + else if (is_vec_zero(op_const[1])) + return ir-operands[0]; + break; + Maybe update progress inside the conditionals also? But wait a minute. x shifted by 0 is x, so the latter part looks correct. But the first conditional seems to assume that 0 sifted by x is x, but it's really 0, no? Shouldn't both cases return ir-operands[0]? What am I missing? You're not missing anything -- it was just copy-and-paste fail. New patch series incoming that should fix that, plus should reduce other copy and paste fail in this code. pgpzBfLcZxNDT.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: fix bogus layer clamping in setup
Am 25.10.2013 22:33, schrieb Brian Paul: On 10/25/2013 10:14 AM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com The layer coming from GS needs to be clamped (not sure if that's actually the correct error behavior but we need something) as the number can be higher than the amount of layers in the fb. However, this code was using the layer calculation from the scene, and this was actually calculated in lp_scene_begin_rasterization() hence too late (so setup was using the value from the _previous_ scene or just zero if it was the first scene). Since the value is used in both rasterization and setup, move calculation up to lp_scene_begin_binning() though it's a bit more inconvenient to calculate there. (Theoretically could move _all_ code which was in lp_scene_begin_rasterization() to there, because ever since we got rid of swizzled render/depth buffers our map functions preparing the fb data for render don't actually change the data in there at all, but it feels like it would be a hack.) --- src/gallium/drivers/llvmpipe/lp_scene.c | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_scene.c b/src/gallium/drivers/llvmpipe/lp_scene.c index 2abbd25..483bfa5 100644 --- a/src/gallium/drivers/llvmpipe/lp_scene.c +++ b/src/gallium/drivers/llvmpipe/lp_scene.c @@ -151,7 +151,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene) { const struct pipe_framebuffer_state *fb = scene-fb; int i; - unsigned max_layer = ~0; //LP_DBG(DEBUG_RAST, %s\n, __FUNCTION__); @@ -162,7 +161,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene) cbuf-u.tex.level); scene-cbufs[i].layer_stride = llvmpipe_layer_stride(cbuf-texture, cbuf-u.tex.level); - max_layer = MIN2(max_layer, cbuf-u.tex.last_layer - cbuf-u.tex.first_layer); scene-cbufs[i].map = llvmpipe_resource_map(cbuf-texture, cbuf-u.tex.level, @@ -173,7 +171,6 @@ lp_scene_begin_rasterization(struct lp_scene *scene) struct llvmpipe_resource *lpr = llvmpipe_resource(cbuf-texture); unsigned pixstride = util_format_get_blocksize(cbuf-format); scene-cbufs[i].stride = cbuf-texture-width0; - max_layer = 0; scene-cbufs[i].map = lpr-data; scene-cbufs[i].map += cbuf-u.buf.first_element * pixstride; @@ -184,15 +181,12 @@ lp_scene_begin_rasterization(struct lp_scene *scene) struct pipe_surface *zsbuf = scene-fb.zsbuf; scene-zsbuf.stride = llvmpipe_resource_stride(zsbuf-texture, zsbuf-u.tex.level); scene-zsbuf.layer_stride = llvmpipe_layer_stride(zsbuf-texture, zsbuf-u.tex.level); - max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer - zsbuf-u.tex.first_layer); scene-zsbuf.map = llvmpipe_resource_map(zsbuf-texture, zsbuf-u.tex.level, zsbuf-u.tex.first_layer, LP_TEX_USAGE_READ_WRITE); } - - scene-fb_max_layer = max_layer; } @@ -506,6 +500,9 @@ end: void lp_scene_begin_binning( struct lp_scene *scene, struct pipe_framebuffer_state *fb, boolean discard ) { + int i; + unsigned max_layer = ~0; + assert(lp_scene_is_empty(scene)); scene-discard = discard; @@ -513,9 +510,23 @@ void lp_scene_begin_binning( struct lp_scene *scene, scene-tiles_x = align(fb-width, TILE_SIZE) / TILE_SIZE; scene-tiles_y = align(fb-height, TILE_SIZE) / TILE_SIZE; - assert(scene-tiles_x = TILES_X); assert(scene-tiles_y = TILES_Y); + Maybe add a comment here indicating what we're doing. Ok. + for (i = 0; i scene-fb.nr_cbufs; i++) { + struct pipe_surface *cbuf = scene-fb.cbufs[i]; + if (llvmpipe_resource_is_texture(cbuf-texture)) { + max_layer = MIN2(max_layer, cbuf-u.tex.last_layer - cbuf-u.tex.first_layer); + } + else { + max_layer = 0; + } + } + if (fb-zsbuf) { + struct pipe_surface *zsbuf = scene-fb.zsbuf; + max_layer = MIN2(max_layer, zsbuf-u.tex.last_layer - zsbuf-u.tex.first_layer); + } + scene-fb_max_layer = max_layer; Suppose we have a layered color buffer and layered Z/S buffer, but the number of layers differs. Are you sure we shouldn't be using separate max_layers for color vs. Z/S? I believe this should be fine (such a setup is illegal in d3d10 fwiw). I've put a comment already at some point to the fb_max_layer variable in lp_scene.h which reads: /* OpenGL permits different amount of layers per rt, but rendering limited to minimum */ It is legal in OpenGL not only to have different
Re: [Mesa-dev] [PATCH] nv50: implement multisample textures
On 25.10.2013 20:35, Emil Velikov wrote: On 21/10/13 23:23, Bryan Cain wrote: This is a port of 4da54c91d24da (nvc0: implement multisample textures) to nv50. When coupled with the patch to only report 16 texture samplers (to fix crashes), all of the Piglit tests in spec/arb_texture_multisample pass. Hello Bryan, Big thanks for your work. As promised here is a quick piglit summary on my nv96 pass/fail/crash 69/32/27 * dmesg does not spit anything nouveau related during the tests * any geometry shader related tests were skipped (piglit: info: Failed to create GL 3.2 core context) * all the crashes are due to the following assert codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed. I'm not sure how you'd get 4 arguments there (x y layer sample ?). There's no mip maps for multisample textures. But either way you're probably going to have to do things by hand: E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle and then add the correct offsets for the sample id as seen in get_sample_position (store the info in a constant buffer, that has to be updated when texture changes). You might want to use a lookup table like in nve4 compute (look for MS sample coordinate offsets) to map sample id to coordinate offset, that one works for any sample count as long as you don't use the ALT modes (nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs where the whole VM address calculation is done by hand). PASSarb_texture_multisample-* PASSfb-completeness/* FAILsample-position/* FAILtexelFetch fs sampler2DMS 4* CRASH texelFetch fs sampler2DMSArray 4* FAILtexelFetch/*-*s-isampler2DMS CRASH texelFetch/*-*s-isampler2DMSArray PASStextureSize/* Hope you find this useful :) No real world apps that use multisample textures were tested, yet. Cheers Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gallium: add PIPE_CAP_MIXED_FRAMEBUFFER_SIZES
Thanks, Marek. Could someone with commit access pick this up? Let me know if you'd like me to reformat/resend/create a git tree/whatever. -ilia On Sun, Oct 13, 2013 at 9:16 AM, Marek Olšák mar...@gmail.com wrote: For the series: Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Sun, Oct 13, 2013 at 3:43 AM, Ilia Mirkin imir...@alum.mit.edu wrote: ping On Fri, Oct 4, 2013 at 4:32 AM, Ilia Mirkin imir...@alum.mit.edu wrote: This CAP will determine whether ARB_framebuffer_object can be enabled. The nv30 driver does not allow mixing swizzled and linear zsbuf/cbuf textures. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/docs/source/screen.rst | 3 +++ src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/radeonsi_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 3 ++- 14 files changed, 17 insertions(+), 1 deletion(-) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index d19cd1a..a01f548 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -173,6 +173,9 @@ The integer capabilities: viewport/scissor combination. * ''PIPE_CAP_ENDIANNESS``:: The endianness of the device. Either PIPE_ENDIAN_BIG or PIPE_ENDIAN_LITTLE. +* ``PIPE_CAP_MIXED_FRAMEBUFFER_SIZES``: Whether it is allowed to have + different sizes for fb color/zs attachments. This controls whether + ARB_framebuffer_object is provided. .. _pipe_capf: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index a038a77..7d0fb3b 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -140,6 +140,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) switch (param) { /* Supported features (boolean caps). */ case PIPE_CAP_NPOT_TEXTURES: + case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: case PIPE_CAP_TWO_SIDED_STENCIL: case PIPE_CAP_ANISOTROPIC_FILTER: case PIPE_CAP_POINT_SPRITE: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 556dda8..77607d0 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -172,6 +172,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) /* Supported features (boolean caps). */ case PIPE_CAP_ANISOTROPIC_FILTER: case PIPE_CAP_NPOT_TEXTURES: + case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: case PIPE_CAP_POINT_SPRITE: case PIPE_CAP_PRIMITIVE_RESTART: /* draw module */ case PIPE_CAP_TEXTURE_SHADOW_MAP: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index 3f8d431..ddf11ff 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -286,6 +286,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) switch (param) { case PIPE_CAP_NPOT_TEXTURES: + case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: case PIPE_CAP_TWO_SIDED_STENCIL: return true; case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS: diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index b3cd77f..2bbc2c9 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -109,6 +109,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_MAX_COMBINED_SAMPLERS: return 2 * PIPE_MAX_SAMPLERS; /* VS + FS samplers */ case PIPE_CAP_NPOT_TEXTURES: + case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: return 1; case PIPE_CAP_TWO_SIDED_STENCIL: return 1; diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c index 50ddfec..807100e 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c @@ -125,6 +125,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_QUERY_PIPELINE_STATISTICS: case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK: case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE: + case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: return 0; case
Re: [Mesa-dev] [PATCH] nv50: implement multisample textures
On 10/25/2013 01:35 PM, Emil Velikov wrote: On 21/10/13 23:23, Bryan Cain wrote: This is a port of 4da54c91d24da (nvc0: implement multisample textures) to nv50. When coupled with the patch to only report 16 texture samplers (to fix crashes), all of the Piglit tests in spec/arb_texture_multisample pass. Hello Bryan, Big thanks for your work. As promised here is a quick piglit summary on my nv96 pass/fail/crash 69/32/27 * dmesg does not spit anything nouveau related during the tests * any geometry shader related tests were skipped (piglit: info: Failed to create GL 3.2 core context) * all the crashes are due to the following assert codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed. PASSarb_texture_multisample-* PASSfb-completeness/* FAILsample-position/* FAILtexelFetch fs sampler2DMS 4* CRASH texelFetch fs sampler2DMSArray 4* FAILtexelFetch/*-*s-isampler2DMS CRASH texelFetch/*-*s-isampler2DMSArray PASStextureSize/* Hope you find this useful :) No real world apps that use multisample textures were tested, yet. Cheers Emil Hi Emil, Thanks for testing on nv96. It seems, though, that I messed up my piglit-run command and didn't include all of the relevant tests as a result. Now that I've fixed that, I'm seeing the same failures and crashes on my nva5. It seems that multisampling is broken with texelFetch (both the texelFetch and sample-position tests use it) but works otherwise, unless it turns out not to produce the right results in real world applications for pre-nva3 cards. I'm going to take some time this weekend to see what's going on with multisampling and texelFetch. Thanks again, Bryan ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.
Kenneth Graunke kenn...@whitecape.org writes: On 10/21/2013 05:55 PM, Eric Anholt wrote: [snip] This interface means synchronizing with the GPU, which sucks when we have the ability to actually do DTFB in the hardware pipeline (Indirect Parameter Enable of 3DPRIMITIVE). It's not that simple. The 3DPRIMITIVE indirect registers require you to specify a vertex count (which should be the number of vertices actually written to the SO buffer, which may be less than you asked for due to overflow). As far as I can tell, the Gen7 SOL stage has no mechanism to give you the number of vertices written to the SOL buffer. There is SO_NUM_PRIMS_WRITTEN(0-3), which gives you the number of primitives actually written. *headdesk* OK, so it looks like our hardware is just really not cut out for this job, and the SW fallback's the way to go. pgpeCg3NO3nk2.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965: Make fs gl_PrimitiveID input work even when there's no gs.
Paul Berry stereotype...@gmail.com writes: When a geometry shader is present, the fragment shader gl_PrimitiveID input acts like an ordinary varying, receiving data from the gs gl_PrimitiveID output. When there's no geometry shader, we have to ask the fixed function SF hardware to provide the primitive ID to the fragment shader instead. Previously, the SF setup code would handle this situation by recognizing that the FS gl_PrimitiveID input didn't match to any VS output; since normally an FS input with no corresponding VS output leads to undefined data, the SF setup code used to just arbitrarily assign it to receive data from attribute 0. This patch changes the SF setup code so that instead of arbitrarily using attribute 0, it assigns the unmatched FS input to receive gl_PrimitiveID. In the case where the FS input really is gl_PrimitiveID, this produces the intended result. In all other cases, no harm is done since GL specifies that the behaviour is undefined. Fixes piglit test primitive-id-no-gs. Reviewed-by: Eric Anholt e...@anholt.net Looks good still. pgpAjFd63YBBs.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] i965/fs: Drop no-op shifts involving 0.
I noticed this in a shader in Unigine Heaven that was spilling. While it doesn't really reduce register pressure, it shaves a few instructions anyway (7955 - 7882). v2: Fix turning 0 x into x instead of 0 (caught by Erik Faye-Lund). --- src/glsl/opt_algebraic.cpp | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 2e33dfe..a07e153 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -346,6 +346,16 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) } break; + case ir_binop_rshift: + case ir_binop_lshift: + /* 0 x == 0 */ + if (is_vec_zero(op_const[0])) + return ir-operands[0]; + /* x 0 == x */ + if (is_vec_zero(op_const[1])) + return ir-operands[0]; + break; + case ir_binop_logic_and: /* FINISHME: Also simplify (a a) to (a). */ if (is_vec_one(op_const[0])) { -- 1.8.4.rc3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] glsl: Use ir_builder more in opt_algebraic.
While ir_builder is slightly less efficient, we're only increasing the work when there's actual optimization being done, and it's way more readable code. --- src/glsl/opt_algebraic.cpp | 40 ++-- 1 file changed, 10 insertions(+), 30 deletions(-) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 8d02cad..2e33dfe 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -219,10 +219,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) switch (op_expr[0]-operation) { case ir_unop_abs: case ir_unop_neg: - return new(mem_ctx) ir_expression(ir_unop_abs, - ir-type, - op_expr[0]-operands[0], - NULL); + return abs(op_expr[0]-operands[0]); default: break; } @@ -285,12 +282,8 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) break; case ir_binop_sub: - if (is_vec_zero(op_const[0])) { -return new(mem_ctx) ir_expression(ir_unop_neg, - ir-operands[1]-type, - ir-operands[1], - NULL); - } + if (is_vec_zero(op_const[0])) +return neg(ir-operands[1]); if (is_vec_zero(op_const[1])) return ir-operands[0]; break; @@ -304,18 +297,10 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1])) return ir_constant::zero(ir, ir-type); - if (is_vec_negative_one(op_const[0])) { - return new(mem_ctx) ir_expression(ir_unop_neg, - ir-operands[1]-type, - ir-operands[1], - NULL); - } - if (is_vec_negative_one(op_const[1])) { - return new(mem_ctx) ir_expression(ir_unop_neg, - ir-operands[0]-type, - ir-operands[0], - NULL); - } + if (is_vec_negative_one(op_const[0])) + return neg(ir-operands[1]); + if (is_vec_negative_one(op_const[1])) + return neg(ir-operands[0]); /* Reassociate multiplication of constants so that we can do @@ -386,11 +371,9 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) } else if (is_vec_zero(op_const[1])) { return ir-operands[0]; } else if (is_vec_one(op_const[0])) { -return new(mem_ctx) ir_expression(ir_unop_logic_not, ir-type, - ir-operands[1], NULL); +return logic_not(ir-operands[1]); } else if (is_vec_one(op_const[1])) { -return new(mem_ctx) ir_expression(ir_unop_logic_not, ir-type, - ir-operands[0], NULL); +return logic_not(ir-operands[0]); } break; @@ -428,10 +411,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) /* As far as we know, all backends are OK with rsq. */ if (op_expr[0] op_expr[0]-operation == ir_unop_sqrt) { -return new(mem_ctx) ir_expression(ir_unop_rsq, - op_expr[0]-operands[0]-type, - op_expr[0]-operands[0], - NULL); +return rsq(op_expr[0]-operands[0]); } break; -- 1.8.4.rc3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] glsl: Move common code out of opt_algebraic's handle_expression().
Matt and I had each screwed up these common required patterns recently, in ways that wouldn't have been noticed for a long time if not for code review. Just enforce it in the caller so that we don't rely on code review catching these bugs. --- src/glsl/opt_algebraic.cpp | 117 +++-- 1 file changed, 39 insertions(+), 78 deletions(-) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 1351904..8d02cad 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -197,7 +197,6 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) { ir_constant *op_const[4] = {NULL, NULL, NULL, NULL}; ir_expression *op_expr[4] = {NULL, NULL, NULL, NULL}; - ir_expression *temp; unsigned int i; assert(ir-get_num_operands() = 4); @@ -220,12 +219,10 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) switch (op_expr[0]-operation) { case ir_unop_abs: case ir_unop_neg: - this-progress = true; - temp = new(mem_ctx) ir_expression(ir_unop_abs, + return new(mem_ctx) ir_expression(ir_unop_abs, ir-type, op_expr[0]-operands[0], NULL); - return swizzle_if_required(ir, temp); default: break; } @@ -236,8 +233,7 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) break; if (op_expr[0]-operation == ir_unop_neg) { - this-progress = true; - return swizzle_if_required(ir, op_expr[0]-operands[0]); + return op_expr[0]-operands[0]; } break; @@ -264,7 +260,6 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) } if (new_op != ir_unop_logic_not) { -this-progress = true; return new(mem_ctx) ir_expression(new_op, ir-type, op_expr[0]-operands[0], @@ -275,14 +270,10 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) } case ir_binop_add: - if (is_vec_zero(op_const[0])) { -this-progress = true; -return swizzle_if_required(ir, ir-operands[1]); - } - if (is_vec_zero(op_const[1])) { -this-progress = true; -return swizzle_if_required(ir, ir-operands[0]); - } + if (is_vec_zero(op_const[0])) +return ir-operands[1]; + if (is_vec_zero(op_const[1])) +return ir-operands[0]; /* Reassociate addition of constants so that we can do constant * folding. @@ -295,48 +286,35 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) case ir_binop_sub: if (is_vec_zero(op_const[0])) { -this-progress = true; -temp = new(mem_ctx) ir_expression(ir_unop_neg, +return new(mem_ctx) ir_expression(ir_unop_neg, ir-operands[1]-type, ir-operands[1], NULL); -return swizzle_if_required(ir, temp); - } - if (is_vec_zero(op_const[1])) { -this-progress = true; -return swizzle_if_required(ir, ir-operands[0]); } + if (is_vec_zero(op_const[1])) +return ir-operands[0]; break; case ir_binop_mul: - if (is_vec_one(op_const[0])) { -this-progress = true; -return swizzle_if_required(ir, ir-operands[1]); - } - if (is_vec_one(op_const[1])) { -this-progress = true; -return swizzle_if_required(ir, ir-operands[0]); - } + if (is_vec_one(op_const[0])) +return ir-operands[1]; + if (is_vec_one(op_const[1])) +return ir-operands[0]; - if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1])) { -this-progress = true; + if (is_vec_zero(op_const[0]) || is_vec_zero(op_const[1])) return ir_constant::zero(ir, ir-type); - } + if (is_vec_negative_one(op_const[0])) { - this-progress = true; - temp = new(mem_ctx) ir_expression(ir_unop_neg, + return new(mem_ctx) ir_expression(ir_unop_neg, ir-operands[1]-type, ir-operands[1], NULL); - return swizzle_if_required(ir, temp); } if (is_vec_negative_one(op_const[1])) { - this-progress = true; - temp = new(mem_ctx) ir_expression(ir_unop_neg, + return new(mem_ctx) ir_expression(ir_unop_neg, ir-operands[0]-type, ir-operands[0], NULL); - return swizzle_if_required(ir, temp); } @@ -352,26 +330,20 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir)
Re: [Mesa-dev] [PATCH] nv50: implement multisample textures
On 10/25/2013 04:11 PM, Christoph Bumiller wrote: On 25.10.2013 20:35, Emil Velikov wrote: On 21/10/13 23:23, Bryan Cain wrote: This is a port of 4da54c91d24da (nvc0: implement multisample textures) to nv50. When coupled with the patch to only report 16 texture samplers (to fix crashes), all of the Piglit tests in spec/arb_texture_multisample pass. Hello Bryan, Big thanks for your work. As promised here is a quick piglit summary on my nv96 pass/fail/crash 69/32/27 * dmesg does not spit anything nouveau related during the tests * any geometry shader related tests were skipped (piglit: info: Failed to create GL 3.2 core context) * all the crashes are due to the following assert codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed. I'm not sure how you'd get 4 arguments there (x y layer sample ?). There's no mip maps for multisample textures. But either way you're probably going to have to do things by hand: E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle and then add the correct offsets for the sample id as seen in get_sample_position (store the info in a constant buffer, that has to be updated when texture changes). You might want to use a lookup table like in nve4 compute (look for MS sample coordinate offsets) to map sample id to coordinate offset, that one works for any sample count as long as you don't use the ALT modes (nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs where the whole VM address calculation is done by hand). You're probably right. I don't know why MSAA appears to work for me, but there's probably something wrong with the output that I haven't noticed. I'll work on implementing it properly this weekend. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Move error message inside validation check reducing duplicate message handling
On 17 October 2013 04:42, Timothy Arceri t_arc...@yahoo.com.au wrote: --- src/glsl/ast_to_hir.cpp | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index dfa32d9..f96ed53 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -637,8 +637,8 @@ shift_result_type(const struct glsl_type *type_a, */ ir_rvalue * validate_assignment(struct _mesa_glsl_parse_state *state, - const glsl_type *lhs_type, ir_rvalue *rhs, - bool is_initializer) +YYLTYPE loc, const glsl_type *lhs_type, +ir_rvalue *rhs, bool is_initializer) { /* If there is already some error in the RHS, just return it. Anything * else will lead to an avalanche of error message back to the user. @@ -670,6 +670,12 @@ validate_assignment(struct _mesa_glsl_parse_state *state, return rhs; } + _mesa_glsl_error(loc, state, +is_initializer ? initializer : value + of type %s cannot be assigned to +variable of type %s, +rhs-type-name, lhs_type-name); + This doesn't produce the output you want. String concatenation happens at compile time and takes precedence over everything else, so this is being interpreted as: _mesa_glsl_error(loc, state, is_initializer ? initializer : value of type %s cannot be assigned to variable of type %s, rhs-type-name, lhs_type-name); Adding parenthesis doesn't help because string concatenation only works on string literals. I believe what you actually want is: _mesa_glsl_error(loc, state, %s of type %s cannot be assigned to variable of type %s, is_initializer ? initializer : value, rhs-type-name, lhs_type-name); With that change, this patch is: Reviewed-by: Paul Berry stereotype...@gmail.com Do you have push access? I can push the patch for you (with this change) if you'd like. return NULL; } @@ -700,10 +706,10 @@ do_assignment(exec_list *instructions, struct _mesa_glsl_parse_state *state, if (unlikely(expr-operation == ir_binop_vector_extract)) { ir_rvalue *new_rhs = -validate_assignment(state, lhs-type, rhs, is_initializer); +validate_assignment(state, lhs_loc, lhs-type, +rhs, is_initializer); if (new_rhs == NULL) { -_mesa_glsl_error( lhs_loc, state, type mismatch); return lhs; } else { rhs = new(ctx) ir_expression(ir_triop_vector_insert, @@ -752,10 +758,8 @@ do_assignment(exec_list *instructions, struct _mesa_glsl_parse_state *state, } ir_rvalue *new_rhs = - validate_assignment(state, lhs-type, rhs, is_initializer); - if (new_rhs == NULL) { - _mesa_glsl_error( lhs_loc, state, type mismatch); - } else { + validate_assignment(state, lhs_loc, lhs-type, rhs, is_initializer); + if (new_rhs != NULL) { rhs = new_rhs; /* If the LHS array was not declared with a size, it takes it size from @@ -2495,7 +2499,8 @@ process_initializer(ir_variable *var, ast_declaration *decl, */ if (type-qualifier.flags.q.constant || type-qualifier.flags.q.uniform) { - ir_rvalue *new_rhs = validate_assignment(state, var-type, rhs, true); + ir_rvalue *new_rhs = validate_assignment(state, initializer_loc, + var-type, rhs, true); if (new_rhs != NULL) { rhs = new_rhs; @@ -2524,10 +2529,6 @@ process_initializer(ir_variable *var, ast_declaration *decl, var-constant_value = constant_value; } } else { -_mesa_glsl_error(initializer_loc, state, - initializer of type %s cannot be assigned to - variable of type %s, - rhs-type-name, var-type-name); if (var-type-is_numeric()) { /* Reduce cascading errors. */ var-constant_value = ir_constant::zero(state, var-type); -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: implement multisample textures
On 25.10.2013 23:51, Bryan Cain wrote: On 10/25/2013 04:11 PM, Christoph Bumiller wrote: On 25.10.2013 20:35, Emil Velikov wrote: On 21/10/13 23:23, Bryan Cain wrote: This is a port of 4da54c91d24da (nvc0: implement multisample textures) to nv50. When coupled with the patch to only report 16 texture samplers (to fix crashes), all of the Piglit tests in spec/arb_texture_multisample pass. Hello Bryan, Big thanks for your work. As promised here is a quick piglit summary on my nv96 pass/fail/crash 69/32/27 * dmesg does not spit anything nouveau related during the tests * any geometry shader related tests were skipped (piglit: info: Failed to create GL 3.2 core context) * all the crashes are due to the following assert codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed. I'm not sure how you'd get 4 arguments there (x y layer sample ?). There's no mip maps for multisample textures. But either way you're probably going to have to do things by hand: E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle and then add the correct offsets for the sample id as seen in get_sample_position (store the info in a constant buffer, that has to be updated when texture changes). You might want to use a lookup table like in nve4 compute (look for MS sample coordinate offsets) to map sample id to coordinate offset, that one works for any sample count as long as you don't use the ALT modes (nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs where the whole VM address calculation is done by hand). You're probably right. I don't know why MSAA appears to work for me, but there's probably something wrong with the output that I haven't noticed. I'll work on implementing it properly this weekend. MSAA itself (rendering and resolving) has been working before, the only thing that ARB_texture_multisample adds is texelFetch from MS resources. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nv50: implement multisample textures
On 10/25/2013 05:05 PM, Christoph Bumiller wrote: On 25.10.2013 23:51, Bryan Cain wrote: On 10/25/2013 04:11 PM, Christoph Bumiller wrote: On 25.10.2013 20:35, Emil Velikov wrote: On 21/10/13 23:23, Bryan Cain wrote: This is a port of 4da54c91d24da (nvc0: implement multisample textures) to nv50. When coupled with the patch to only report 16 texture samplers (to fix crashes), all of the Piglit tests in spec/arb_texture_multisample pass. Hello Bryan, Big thanks for your work. As promised here is a quick piglit summary on my nv96 pass/fail/crash 69/32/27 * dmesg does not spit anything nouveau related during the tests * any geometry shader related tests were skipped (piglit: info: Failed to create GL 3.2 core context) * all the crashes are due to the following assert codegen/nv50_ir_emit_nv50.cpp:1393:emitTEX: Assertion `argc = 4' failed. I'm not sure how you'd get 4 arguments there (x y layer sample ?). There's no mip maps for multisample textures. But either way you're probably going to have to do things by hand: E.g. MS8 textures contain contiguous 4x2 rectangles of samples for each pixel, so you multiply x by 4 and y by 2 to arrive at the sub-rectangle and then add the correct offsets for the sample id as seen in get_sample_position (store the info in a constant buffer, that has to be updated when texture changes). You might want to use a lookup table like in nve4 compute (look for MS sample coordinate offsets) to map sample id to coordinate offset, that one works for any sample count as long as you don't use the ALT modes (nve4 doesn't need to for textures, but for images/surfaces/UAVs/RATs where the whole VM address calculation is done by hand). You're probably right. I don't know why MSAA appears to work for me, but there's probably something wrong with the output that I haven't noticed. I'll work on implementing it properly this weekend. MSAA itself (rendering and resolving) has been working before, the only thing that ARB_texture_multisample adds is texelFetch from MS resources. I really should read an extension's spec carefully before trying to implement it so that I don't waste other people's time. Sorry. Bryan ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.
On Fri, Oct 25, 2013 at 10:28 PM, Kenneth Graunke kenn...@whitecape.org wrote: On 10/22/2013 04:30 AM, Marek Olšák wrote: On Fri, Oct 18, 2013 at 8:09 AM, Kenneth Graunke kenn...@whitecape.org wrote: DrawTransformFeedback() needs to obtain the number of vertices written to a particular stream during the last Begin/EndTransformFeedback block. The new driver hook returns exactly that information. Gallium drivers already implement this functionality by passing the transform feedback object to the drawing function. I prefer to avoid this for two reasons: 1. Complexity: Normally, the drawing function takes an array of _mesa_prim objects, each of which specifies a vertex count. If tfb_vertcount != NULL, however, there will only be one _mesa_prim object with an invalid vertex count (of 1), so it needs to be ignored. Since the _mesa_prim pointers are const, you can't even override it to the proper value; you need to pass around extra ignore that, here's the real count parameters. The drawing function is already terribly complicated, so I don't want to make it even more complicated. I don't understand this. Are you saying that the software emulation of the feature is always better because of complexity the real hardware-accelerated solution would have? On Ivybridge hardware, I think that a GPU-only implementation of DrawTransformFeedback would be very complicated, and probably less efficient than this (extremely simple) software solution. It might be possible to do a reasonable GPU-only implementation on Haswell, but I haven't looked into the details yet. (See my reply to Eric.) At least for Ivybridge, I think I want this software path 100% of the time. We may want to remove the stall on Haswell as a later optimization. I'd like to have a dedicated flag for this fallback like we have Const.PrimitiveRestartInSoftware, in case we need to implement the query for something else. It sounds like for Gallium, you already have a decent GPU-only solution. I tried to follow that code to understand how it works, and got lost after jumping through around 5 files...which is probably just my poor understanding of the Gallium architecture. Gallium doesn't do anything, the interface is pretty much the same as the vbo one. On the hardware side, there are 4 counters containing the number of bytes written to each TFB buffer. If TFB is started, the counters are set to 0. Everytime TFB is ended or paused, the counters are stored for each buffer in memory. When resuming TFB, the counters are simply loaded from memory. When we have to do DrawTransformFeedback, we copy the value of the counter from memory to a special draw register. Since the value is in bytes, we also have to set the TFB buffer stride to another special draw register. That's all. The hardware then calculates count = bytes/stride before drawing. [snip] diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c index 1670409..11bb76a 100644 --- a/src/mesa/vbo/vbo_exec_array.c +++ b/src/mesa/vbo/vbo_exec_array.c @@ -1464,6 +1464,12 @@ vbo_draw_transform_feedback(struct gl_context *ctx, GLenum mode, return; } + if (ctx-Driver.GetTransformFeedbackVertexCount) { + GLsizei n = ctx-Driver.GetTransformFeedbackVertexCount(ctx, obj, stream); + vbo_draw_arrays(ctx, mode, 0, n, numInstances, 0); + return; + } As you mentioned, the only issue is with primitive restart, so why is this done even if primitive restart is disabled? Drivers which will have to implement this just to make e.g. non-VBO vertex uploads work will suffer from the CPU-GPU synchronization this code forces. Marek I hadn't thought about non-VBO vertex uploads. What does Gallium do in that case? Has it just been broken this whole time? Yes, it has, I completely forgot about it. :( I guess I figured drivers would either implement this hook, or do the tfb_vertcount approach, but not both. Maybe that's a bad assumption. For vertex uploads and vertex fetch fallbacks (where we translate and align vertex buffers to what a gallium driver supports - util/u_vbuf.c), we can use a query like the one you want to add. However, gallium drivers should use the tfb_vertcount approach (AKA pipe_draw_info::count_from_stream_output) whenever they see it's not NULL. Since most Gallium hardware drivers will never see non-VBO vertex data or an unsupported vertex format, it's the only approach they have to implement. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] gallium: add PIPE_CAP_MIXED_FRAMEBUFFER_SIZES
I'll do it in a moment. Marek On Fri, Oct 25, 2013 at 11:25 PM, Ilia Mirkin imir...@alum.mit.edu wrote: Thanks, Marek. Could someone with commit access pick this up? Let me know if you'd like me to reformat/resend/create a git tree/whatever. -ilia On Sun, Oct 13, 2013 at 9:16 AM, Marek Olšák mar...@gmail.com wrote: For the series: Reviewed-by: Marek Olšák marek.ol...@amd.com Marek On Sun, Oct 13, 2013 at 3:43 AM, Ilia Mirkin imir...@alum.mit.edu wrote: ping On Fri, Oct 4, 2013 at 4:32 AM, Ilia Mirkin imir...@alum.mit.edu wrote: This CAP will determine whether ARB_framebuffer_object can be enabled. The nv30 driver does not allow mixing swizzled and linear zsbuf/cbuf textures. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/docs/source/screen.rst | 3 +++ src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/radeonsi_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 3 ++- 14 files changed, 17 insertions(+), 1 deletion(-) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index d19cd1a..a01f548 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -173,6 +173,9 @@ The integer capabilities: viewport/scissor combination. * ''PIPE_CAP_ENDIANNESS``:: The endianness of the device. Either PIPE_ENDIAN_BIG or PIPE_ENDIAN_LITTLE. +* ``PIPE_CAP_MIXED_FRAMEBUFFER_SIZES``: Whether it is allowed to have + different sizes for fb color/zs attachments. This controls whether + ARB_framebuffer_object is provided. .. _pipe_capf: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index a038a77..7d0fb3b 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -140,6 +140,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) switch (param) { /* Supported features (boolean caps). */ case PIPE_CAP_NPOT_TEXTURES: + case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: case PIPE_CAP_TWO_SIDED_STENCIL: case PIPE_CAP_ANISOTROPIC_FILTER: case PIPE_CAP_POINT_SPRITE: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 556dda8..77607d0 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -172,6 +172,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) /* Supported features (boolean caps). */ case PIPE_CAP_ANISOTROPIC_FILTER: case PIPE_CAP_NPOT_TEXTURES: + case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: case PIPE_CAP_POINT_SPRITE: case PIPE_CAP_PRIMITIVE_RESTART: /* draw module */ case PIPE_CAP_TEXTURE_SHADOW_MAP: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index 3f8d431..ddf11ff 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -286,6 +286,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) switch (param) { case PIPE_CAP_NPOT_TEXTURES: + case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: case PIPE_CAP_TWO_SIDED_STENCIL: return true; case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS: diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index b3cd77f..2bbc2c9 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -109,6 +109,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_MAX_COMBINED_SAMPLERS: return 2 * PIPE_MAX_SAMPLERS; /* VS + FS samplers */ case PIPE_CAP_NPOT_TEXTURES: + case PIPE_CAP_MIXED_FRAMEBUFFER_SIZES: return 1; case PIPE_CAP_TWO_SIDED_STENCIL: return 1; diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c index 50ddfec..807100e 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c @@ -125,6 +125,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_QUERY_PIPELINE_STATISTICS: case PIPE_CAP_TEXTURE_BORDER_COLOR_QUIRK: case
Re: [Mesa-dev] [PATCH] R600: Expand vector FSQRT ops
Reviewed-by: Aaron Watry awa...@gmail.com I have tested this on a Radeon 5400 (Cedar), and I just sent a few generated tests to the piglit list. --Aaron On Wed, Oct 23, 2013 at 6:28 PM, Tom Stellard t...@stellard.net wrote: From: Tom Stellard thomas.stell...@amd.com --- lib/Target/R600/AMDGPUISelLowering.cpp | 1 + test/CodeGen/R600/llvm.sqrt.ll | 54 ++ 2 files changed, 55 insertions(+) create mode 100644 test/CodeGen/R600/llvm.sqrt.ll diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp index 91d85d3..52dd010 100644 --- a/lib/Target/R600/AMDGPUISelLowering.cpp +++ b/lib/Target/R600/AMDGPUISelLowering.cpp @@ -181,6 +181,7 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine TM) : setOperationAction(ISD::FFLOOR, VT, Expand); setOperationAction(ISD::FMUL, VT, Expand); setOperationAction(ISD::FRINT, VT, Expand); +setOperationAction(ISD::FSQRT, VT, Expand); setOperationAction(ISD::FSUB, VT, Expand); } } diff --git a/test/CodeGen/R600/llvm.sqrt.ll b/test/CodeGen/R600/llvm.sqrt.ll new file mode 100644 index 000..0d0d186 --- /dev/null +++ b/test/CodeGen/R600/llvm.sqrt.ll @@ -0,0 +1,54 @@ +; RUN: llc %s -march=r600 --mcpu=redwood | FileCheck %s --check-prefix=R600-CHECK +; RUN: llc %s -march=r600 --mcpu=SI | FileCheck %s --check-prefix=SI-CHECK + +; R600-CHECK-LABEL: @sqrt_f32 +; R600-CHECK: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[2].Z +; R600-CHECK: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[2].Z, PS +; SI-CHECK-LABEL: @sqrt_f32 +; SI-CHECK: V_SQRT_F32_e32 +define void @sqrt_f32(float addrspace(1)* %out, float %in) { +entry: + %0 = call float @llvm.sqrt.f32(float %in) + store float %0, float addrspace(1)* %out + ret void +} + +; R600-CHECK-LABEL: @sqrt_v2f32 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[2].W +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[2].W, PS +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].X +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].X, PS +; SI-CHECK-LABEL: @sqrt_v2f32 +; SI-CHECK: V_SQRT_F32_e32 +; SI-CHECK: V_SQRT_F32_e32 +define void @sqrt_v2f32(2 x float addrspace(1)* %out, 2 x float %in) { +entry: + %0 = call 2 x float @llvm.sqrt.v2f32(2 x float %in) + store 2 x float %0, 2 x float addrspace(1)* %out + ret void +} + +; R600-CHECK-LABEL: @sqrt_v4f32 +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].Y +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].Y, PS +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].Z +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].Z, PS +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[3].W +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[3].W, PS +; R600-CHECK-DAG: RECIPSQRT_CLAMPED * T{{[0-9]\.[XYZW]}}, KC0[4].X +; R600-CHECK-DAG: MUL NON-IEEE T{{[0-9]\.[XYZW]}}, KC0[4].X, PS +; SI-CHECK-LABEL: @sqrt_v4f32 +; SI-CHECK: V_SQRT_F32_e32 +; SI-CHECK: V_SQRT_F32_e32 +; SI-CHECK: V_SQRT_F32_e32 +; SI-CHECK: V_SQRT_F32_e32 +define void @sqrt_v4f32(4 x float addrspace(1)* %out, 4 x float %in) { +entry: + %0 = call 4 x float @llvm.sqrt.v4f32(4 x float %in) + store 4 x float %0, 4 x float addrspace(1)* %out + ret void +} + +declare float @llvm.sqrt.f32(float %in) +declare 2 x float @llvm.sqrt.v2f32(2 x float %in) +declare 4 x float @llvm.sqrt.v4f32(4 x float %in) -- 1.7.11.4 ___ llvm-commits mailing list llvm-comm...@cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Ivybridge support for ARB_transform_feedback2
On 10/17/2013 11:09 PM, Kenneth Graunke wrote: Here's my implementation of ARB_transform_feedback2. I believe it's complete; it passes all of our Piglit tests and a lot of Intel's oglconform tests. This should work out of the box on Ivybridge and Baytrail. It won't work on Haswell at the moment, due to restrictions on register writes (to be solved in a future kernel version). Patch 9 will need to be replaced with something that detects whether or not we can write registers from userspace batchbuffers. In the meantime, I figured I'd send out the rest for review. Porting this back to Sandybridge is probably doable, but annoying. Sandybridge doesn't have the MI_LOAD_REGISTER_MEM command, so we'd have to map the buffers and use MI_LOAD_REGISTER_IMM. Seems pretty gross. Plus, transform feedback is done very differently pre-Ivybridge. I'm not sure it's worth it, seeing as it's a GL 4.0 feature. Patches 5, 7, 8, and 9 (with Eric's suggested change) are all Reviewed-by: Ian Romanick ian.d.roman...@intel.com I share Eric's concern about patch 4. It sounds like you, Eric, and Marek are trending towards a solution for patch 6, so I'll stay out of it. :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] implement NV_vdpau_interop v3
On Sun, Oct 20, 2013 at 11:57 AM, Christian König deathsim...@vodafone.de wrote: Hi Marek, I've just send out a v6 of the patch, please take a second look. Most things are fixed now, but there are still a couple of open issues: 3) There should also probably be some checking for GL_ARB_texture_non_power_of_two, but the spec doesn't say what we should do (probably return GL_INVALID_OPERATION). Actually I thing VDPAU hold the answer to this. The specification there states that the different surfaces creation function should round up the width/height to supported values (which can then be queried later by the application). So we always will end up with correct values independent of GL_ARB_texture_non_power_of_two. 6) Registered and mapped VDPAU textures are not allowed to be re-specified by TexImage, TexSubImage, TexImage*Multisample, CopyTexImage, CopyTexSubImage, TexStorage, TexStorage*Multisample, and similar functions. This should be properly handled in those functions and GL errors should be returned. I would rather like to avoid touching those functions, cause they are not directly related to the spec and I don't want to risk breaking anything there. Would it valid so set/clear the immutable flag instead (honestly I don't have the slightest idea how the frontend handling works in this code)? Yes, it seems to be sufficient. 7) The extension spec says that all VDPAU textures should be y-inverted. Is that actually the case here? Uhm, no idea? It does seems to work, but where is that information stored? It means that a VDPAU surface is upside-down when it's used as an OpenGL texture. I don't remember whether we need to a blit or whether OpenGL textures are y-inverted by default (then we don't have to do anything). If we do the same thing as NVIDIA, it's probably okay. Please review and squash the attached patch with your version 6, and feel free to push it. Marek From 1ca52d1ae40fd81276f56e8a61fbed3ad819eb41 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= marek.ol...@amd.com Date: Sat, 26 Oct 2013 00:39:52 +0200 Subject: [PATCH] squash this with the vdpau patch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Marek Olšák marek.ol...@amd.com --- src/mesa/main/vdpau.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/vdpau.c b/src/mesa/main/vdpau.c index 157df47..7792b4b 100644 --- a/src/mesa/main/vdpau.c +++ b/src/mesa/main/vdpau.c @@ -131,7 +131,7 @@ register_surface(struct gl_context *ctx, GLboolean isOutput, return (GLintptr)NULL; } - surf = MALLOC_STRUCT( vdp_surface ); + surf = CALLOC_STRUCT( vdp_surface ); surf-vdpSurface = vdpSurface; surf-target = target; surf-access = GL_READ_WRITE; @@ -144,6 +144,7 @@ register_surface(struct gl_context *ctx, GLboolean isOutput, _mesa_lock_texture(ctx, tex); if (tex-Immutable) { +_mesa_unlock_texture(ctx, tex); FREE(surf); _mesa_error(ctx, GL_INVALID_OPERATION, VDPAURegisterSurfaceNV(texture is immutable)); @@ -153,15 +154,18 @@ register_surface(struct gl_context *ctx, GLboolean isOutput, if (tex-Target == 0) tex-Target = target; else if (tex-Target != target) { +_mesa_unlock_texture(ctx, tex); FREE(surf); _mesa_error(ctx, GL_INVALID_OPERATION, VDPAURegisterSurfaceNV(target mismatch)); return (GLintptr)NULL; } + /* This will disallow respecifying the storage. */ + tex-Immutable = GL_TRUE; _mesa_unlock_texture(ctx, tex); - surf-textures[i] = tex; + _mesa_reference_texobj(surf-textures[i], tex); } _mesa_set_add(ctx-vdpSurfaces, _mesa_hash_pointer(surf), surf); @@ -223,6 +227,7 @@ _mesa_VDPAUUnregisterSurfaceNV(GLintptr surface) { struct vdp_surface *surf = (struct vdp_surface *)surface; struct set_entry *entry; + int i; GET_CURRENT_CONTEXT(ctx); if (!ctx-vdpDevice || !ctx-vdpGetProcAddress || !ctx-vdpSurfaces) { @@ -240,6 +245,13 @@ _mesa_VDPAUUnregisterSurfaceNV(GLintptr surface) return; } + for (i = 0; i MAX_TEXTURES; i++) { + if (surf-textures[i]) { + surf-textures[i]-Immutable = GL_FALSE; + _mesa_reference_texobj(surf-textures[i], NULL); + } + } + _mesa_set_remove(ctx-vdpSurfaces, entry); FREE(surf); } -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V2 00/12] Implement GL_ARB_sample_shading on Intel hardware
Patches listed below implement the GL_ARB_sample_shading extension on Intel hardware = gen6. I verified the implementation with a number of piglit tests, currently under review on piglit mailing list. Observed no piglit, gles3 CTS regressions with these patches on SNB, IVB HSW. These patches can also be found at my github branch: https://github.com/aphogat/mesa.git branch: sample-shading-8 This is the V2 of the series I posted earlier. [PATCH 5/8] i965: Implement FS backend for ARB_sample_shading in my original series is split in to 3 patches here. Changes in individual patches are listed in commit message. Following patches in this series need a 'reviewed-by'. 4/12, 6/12, 7/12, 8/12, 9/12, 10/12, 11,12 Anuj Phogat (12): mesa: Add infrastructure for GL_ARB_sample_shading mesa: Add new functions and enums required by GL_ARB_sample_shading mesa: Pass number of samples as a program state variable mesa: Add a helper function _mesa_get_min_invocations_per_fragment() glsl: Add new builtins required by GL_ARB_sample_shading i965: Don't do vector splitting for ir_var_system_value i965: Add FS backend for builtin gl_SamplePosition i965: Add FS backend for builtin gl_SampleID i965: Add FS backend for builtin gl_SampleMask[] i965/gen6: Enable the features required for GL_ARB_sample_shading i965/gen7: Enable the features required for GL_ARB_sample_shading i965: Enable ARB_sample_shading on intel hardware = gen6 src/glsl/builtin_variables.cpp | 18 src/glsl/glcpp/glcpp-parse.y | 3 + src/glsl/glsl_parser_extras.cpp| 1 + src/glsl/glsl_parser_extras.h | 2 + src/glsl/standalone_scaffolding.cpp| 1 + src/mapi/glapi/gen/ARB_sample_shading.xml | 19 src/mapi/glapi/gen/GL4x.xml| 21 src/mapi/glapi/gen/Makefile.am | 4 +- src/mapi/glapi/gen/gl_API.xml | 3 +- src/mesa/drivers/dri/i965/brw_context.h| 2 + src/mesa/drivers/dri/i965/brw_defines.h| 2 + src/mesa/drivers/dri/i965/brw_fs.cpp | 114 + src/mesa/drivers/dri/i965/brw_fs.h | 14 +++ src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 50 + .../drivers/dri/i965/brw_fs_vector_splitting.cpp | 1 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 22 src/mesa/drivers/dri/i965/brw_wm.c | 12 +++ src/mesa/drivers/dri/i965/brw_wm.h | 3 + src/mesa/drivers/dri/i965/gen6_wm_state.c | 52 +- src/mesa/drivers/dri/i965/gen7_wm_state.c | 53 +- src/mesa/drivers/dri/i965/intel_extensions.c | 1 + src/mesa/main/enable.c | 16 +++ src/mesa/main/extensions.c | 1 + src/mesa/main/get.c| 8 ++ src/mesa/main/get_hash_params.py | 3 + src/mesa/main/mtypes.h | 13 ++- src/mesa/main/multisample.c| 18 src/mesa/main/multisample.h| 2 + src/mesa/main/tests/dispatch_sanity.cpp| 2 +- src/mesa/program/prog_print.c | 1 + src/mesa/program/prog_statevars.c | 11 ++ src/mesa/program/prog_statevars.h | 2 + src/mesa/program/program.c | 31 ++ src/mesa/program/program.h | 3 + 34 files changed, 499 insertions(+), 10 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_sample_shading.xml create mode 100644 src/mapi/glapi/gen/GL4x.xml -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V2 01/12] mesa: Add infrastructure for GL_ARB_sample_shading
This patch implements the common support code required for the GL_ARB_sample_shading extension. V2: Move GL_ARB_sample_shading to ARB extension list. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Reviewed-by: Ian Romanick i...@freedesktop.org Reviewed-by: Ken Graunke kenn...@whitecape.org --- src/glsl/glcpp/glcpp-parse.y| 3 +++ src/glsl/glsl_parser_extras.cpp | 1 + src/glsl/glsl_parser_extras.h | 2 ++ src/glsl/standalone_scaffolding.cpp | 1 + src/mesa/main/extensions.c | 1 + src/mesa/main/mtypes.h | 1 + 6 files changed, 9 insertions(+) diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y index 02100ab..5141bdd 100644 --- a/src/glsl/glcpp/glcpp-parse.y +++ b/src/glsl/glcpp/glcpp-parse.y @@ -1249,6 +1249,9 @@ glcpp_parser_create (const struct gl_extensions *extensions, int api) if (extensions-ARB_shading_language_420pack) add_builtin_define(parser, GL_ARB_shading_language_420pack, 1); + if (extensions-ARB_sample_shading) +add_builtin_define(parser, GL_ARB_sample_shading, 1); + if (extensions-EXT_shader_integer_mix) add_builtin_define(parser, GL_EXT_shader_integer_mix, 1); diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index be17109..669f531 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -533,6 +533,7 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = { EXT(AMD_vertex_shader_layer,true, false, AMD_vertex_shader_layer), EXT(EXT_shader_integer_mix, true, true, EXT_shader_integer_mix), EXT(ARB_texture_gather, true, false, ARB_texture_gather), + EXT(ARB_sample_shading, true, false, ARB_sample_shading), }; #undef EXT diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index a674384..872dd80 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -323,6 +323,8 @@ struct _mesa_glsl_parse_state { bool AMD_vertex_shader_layer_warn; bool ARB_shading_language_420pack_enable; bool ARB_shading_language_420pack_warn; + bool ARB_sample_shading_enable; + bool ARB_sample_shading_warn; bool EXT_shader_integer_mix_enable; bool EXT_shader_integer_mix_warn; /*@}*/ diff --git a/src/glsl/standalone_scaffolding.cpp b/src/glsl/standalone_scaffolding.cpp index 7a1cf68..cbff6d1 100644 --- a/src/glsl/standalone_scaffolding.cpp +++ b/src/glsl/standalone_scaffolding.cpp @@ -97,6 +97,7 @@ void initialize_context_to_defaults(struct gl_context *ctx, gl_api api) ctx-Extensions.ARB_explicit_attrib_location = true; ctx-Extensions.ARB_fragment_coord_conventions = true; ctx-Extensions.ARB_gpu_shader5 = true; + ctx-Extensions.ARB_sample_shading = true; ctx-Extensions.ARB_shader_bit_encoding = true; ctx-Extensions.ARB_shader_stencil_export = true; ctx-Extensions.ARB_shader_texture_lod = true; diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index e8e0a20..f3300e3 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -118,6 +118,7 @@ static const struct extension extension_table[] = { { GL_ARB_point_sprite,o(ARB_point_sprite), GL, 2003 }, { GL_ARB_provoking_vertex,o(EXT_provoking_vertex), GL, 2009 }, { GL_ARB_robustness, o(dummy_true), GL, 2010 }, + { GL_ARB_sample_shading, o(ARB_sample_shading), GL, 2009 }, { GL_ARB_sampler_objects, o(dummy_true), GL, 2009 }, { GL_ARB_seamless_cube_map, o(ARB_seamless_cube_map), GL, 2009 }, { GL_ARB_shader_bit_encoding, o(ARB_shader_bit_encoding), GL, 2010 }, diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 6374e8c..67f1bf6 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3198,6 +3198,7 @@ struct gl_extensions GLboolean ARB_occlusion_query; GLboolean ARB_occlusion_query2; GLboolean ARB_point_sprite; + GLboolean ARB_sample_shading; GLboolean ARB_seamless_cube_map; GLboolean ARB_shader_bit_encoding; GLboolean ARB_shader_stencil_export; -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V2 02/12] mesa: Add new functions and enums required by GL_ARB_sample_shading
New functions added by GL_ARB_sample_shading: glMinSampleShadingARB() New enums: GL_SAMPLE_SHADING_ARB GL_MIN_SAMPLE_SHADING_VALUE_ARB V2: Update comments. Create new GL4x.xml. Remove redundant code in get.c. Update the API_XML list in Makefile.am. Add extra_gl40_ARB_sample_shading predicate to get.c. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Reviewed-by: Ken Graunke kenn...@whitecape.org --- src/mapi/glapi/gen/ARB_sample_shading.xml | 19 +++ src/mapi/glapi/gen/GL4x.xml | 21 + src/mapi/glapi/gen/Makefile.am| 4 +++- src/mapi/glapi/gen/gl_API.xml | 3 ++- src/mesa/main/enable.c| 16 src/mesa/main/get.c | 8 src/mesa/main/get_hash_params.py | 3 +++ src/mesa/main/mtypes.h| 2 ++ src/mesa/main/multisample.c | 18 ++ src/mesa/main/multisample.h | 2 ++ src/mesa/main/tests/dispatch_sanity.cpp | 2 +- 11 files changed, 95 insertions(+), 3 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_sample_shading.xml create mode 100644 src/mapi/glapi/gen/GL4x.xml diff --git a/src/mapi/glapi/gen/ARB_sample_shading.xml b/src/mapi/glapi/gen/ARB_sample_shading.xml new file mode 100644 index 000..a87a517 --- /dev/null +++ b/src/mapi/glapi/gen/ARB_sample_shading.xml @@ -0,0 +1,19 @@ +?xml version=1.0? +!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd + +!-- Note: no GLX protocol info yet. -- + +OpenGLAPI + +category name=GL_ARB_sample_shading number=70 + + enum name=SAMPLE_SHADING_ARB value=0x8C36/ + enum name=MIN_SAMPLE_SHADING_VALUE_ARBvalue=0x8C37/ + + function name=MinSampleShadingARB alias=MinSampleShading + param name=value type=GLclampf/ + /function + +/category + +/OpenGLAPI diff --git a/src/mapi/glapi/gen/GL4x.xml b/src/mapi/glapi/gen/GL4x.xml new file mode 100644 index 000..367741f --- /dev/null +++ b/src/mapi/glapi/gen/GL4x.xml @@ -0,0 +1,21 @@ +?xml version=1.0? +!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd + +!-- Note: no GLX protocol info yet. -- + +OpenGLAPI + +category name=4.0 + enum name=SAMPLE_SHADING value=0x8C36/ + enum name=MIN_SAMPLE_SHADING_VALUEvalue=0x8C37/ + + function name=MinSampleShading offset=assign +param name=value type=GLclampf/ + /function +/category + +category name=4.3 + +/category + +/OpenGLAPI diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index d71d5d2..b8d280c 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -108,6 +108,7 @@ API_XML = \ ARB_invalidate_subdata.xml \ ARB_map_buffer_range.xml \ ARB_robustness.xml \ + ARB_sample_shading.xml \ ARB_sampler_objects.xml \ ARB_seamless_cube_map.xml \ ARB_sync.xml \ @@ -142,7 +143,8 @@ API_XML = \ NV_primitive_restart.xml \ NV_texture_barrier.xml \ OES_EGL_image.xml \ - GL3x.xml + GL3x.xml \ + GL4x.xml COMMON = $(API_XML) \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index 48fce36..f0eea9b 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8187,7 +8187,7 @@ xi:include href=ARB_draw_buffers_blend.xml xmlns:xi=http://www.w3.org/2001/XInclude/ xi:include href=AMD_draw_buffers_blend.xml xmlns:xi=http://www.w3.org/2001/XInclude/ -!-- 70. GL_ARB_sample_shading -- +xi:include href=ARB_sample_shading.xml xmlns:xi=http://www.w3.org/2001/XInclude/ xi:include href=ARB_texture_cube_map_array.xml xmlns:xi=http://www.w3.org/2001/XInclude/ xi:include href=ARB_texture_gather.xml xmlns:xi=http://www.w3.org/2001/XInclude/ !-- 73. GL_ARB_texture_query_lod -- @@ -13150,4 +13150,5 @@ xi:include href=EXT_transform_feedback.xml xmlns:xi=http://www.w3.org/2001/XInclude/ +xi:include href=GL4x.xml xmlns:xi=http://www.w3.org/2001/XInclude/ /OpenGLAPI diff --git a/src/mesa/main/enable.c b/src/mesa/main/enable.c index 5e2fd80..c9ccfd2 100644 --- a/src/mesa/main/enable.c +++ b/src/mesa/main/enable.c @@ -802,6 +802,15 @@ _mesa_set_enable(struct gl_context *ctx, GLenum cap, GLboolean state) ctx-Multisample.SampleCoverageInvert = state; break; + /* GL_ARB_sample_shading */ + case GL_SAMPLE_SHADING: + CHECK_EXTENSION(ARB_sample_shading, cap); + if (ctx-Multisample.SampleShading == state) +return; + FLUSH_VERTICES(ctx, _NEW_MULTISAMPLE); + ctx-Multisample.SampleShading = state; + break; + /* GL_IBM_rasterpos_clip */ case GL_RASTER_POSITION_UNCLIPPED_IBM: if (ctx-API != API_OPENGL_COMPAT) @@ -1594,6 +1603,13 @@ _mesa_IsEnabled( GLenum cap ) CHECK_EXTENSION(ARB_texture_multisample); return ctx-Multisample.SampleMask; + /* ARB_sample_shading */ + case
[Mesa-dev] [PATCH V2 03/12] mesa: Pass number of samples as a program state variable
Number of samples will be required in fragment shader program by new GLSL builtin uniform gl_NumSamples. V2: Use state.numsamples in place of state.num.samples Use _NEW_BUFFERS flag in place of _NEW_MULTISAMPLE Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Reviewed-by: Ian Romanick i...@freedesktop.org Reviewed-by: Ken Graunke kenn...@whitecape.org Reviewed-by: Paul Berry stereotype...@gmail.com --- src/mesa/program/prog_statevars.c | 11 +++ src/mesa/program/prog_statevars.h | 2 ++ 2 files changed, 13 insertions(+) diff --git a/src/mesa/program/prog_statevars.c b/src/mesa/program/prog_statevars.c index 145c07c..f6fd535 100644 --- a/src/mesa/program/prog_statevars.c +++ b/src/mesa/program/prog_statevars.c @@ -349,6 +349,9 @@ _mesa_fetch_state(struct gl_context *ctx, const gl_state_index state[], } } return; + case STATE_NUM_SAMPLES: + ((int *)value)[0] = ctx-DrawBuffer-Visual.samples; + return; case STATE_DEPTH_RANGE: value[0] = ctx-Viewport.Near; /* near */ value[1] = ctx-Viewport.Far; /* far*/ @@ -665,6 +668,9 @@ _mesa_program_state_flags(const gl_state_index state[STATE_LENGTH]) case STATE_PROGRAM_MATRIX: return _NEW_TRACK_MATRIX; + case STATE_NUM_SAMPLES: + return _NEW_BUFFERS; + case STATE_DEPTH_RANGE: return _NEW_VIEWPORT; @@ -852,6 +858,9 @@ append_token(char *dst, gl_state_index k) case STATE_TEXENV_COLOR: append(dst, texenv); break; + case STATE_NUM_SAMPLES: + append(dst, numsamples); + break; case STATE_DEPTH_RANGE: append(dst, depth.range); break; @@ -1027,6 +1036,8 @@ _mesa_program_state_string(const gl_state_index state[STATE_LENGTH]) break; case STATE_FOG_COLOR: break; + case STATE_NUM_SAMPLES: + break; case STATE_DEPTH_RANGE: break; case STATE_FRAGMENT_PROGRAM: diff --git a/src/mesa/program/prog_statevars.h b/src/mesa/program/prog_statevars.h index ec22b73..c3081c4 100644 --- a/src/mesa/program/prog_statevars.h +++ b/src/mesa/program/prog_statevars.h @@ -103,6 +103,8 @@ typedef enum gl_state_index_ { STATE_TEXENV_COLOR, + STATE_NUM_SAMPLES, + STATE_DEPTH_RANGE, STATE_VERTEX_PROGRAM, -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/12] mesa: Add a helper function _mesa_get_min_invocations_per_fragment()
Thsi function is used to test if we need to do per sample shading or per fragment shading. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/program/program.c | 31 +++ src/mesa/program/program.h | 3 +++ 2 files changed, 34 insertions(+) diff --git a/src/mesa/program/program.c b/src/mesa/program/program.c index 093d372..e12e6ec 100644 --- a/src/mesa/program/program.c +++ b/src/mesa/program/program.c @@ -1024,3 +1024,34 @@ _mesa_postprocess_program(struct gl_context *ctx, struct gl_program *prog) } } + +/* Gets the minimum number of shader invocations per fragment. + * This function is useful to determine if we need to do per + * sample shading or per fragment shading. + */ +GLint +_mesa_get_min_invocations_per_fragment(struct gl_context *ctx, + const struct gl_fragment_program *prog) +{ + /* From ARB_sample_shading specification: +* Using gl_SampleID in a fragment shader causes the entire shader +* to be evaluated per-sample. +* +* Using gl_SamplePosition in a fragment shader causes the entire +* shader to be evaluated per-sample. +* +* If MULTISAMPLE or SAMPLE_SHADING_ARB is disabled, sample shading +* has no effect. +*/ + if (ctx-Multisample.Enabled) { + if (prog-Base.SystemValuesRead SYSTEM_BIT_SAMPLE_ID || + prog-Base.SystemValuesRead SYSTEM_BIT_SAMPLE_POS) + return ctx-DrawBuffer-Visual.samples; + else if (ctx-Multisample.SampleShading) + return ceil(ctx-Multisample.MinSampleShadingValue * + ctx-DrawBuffer-Visual.samples); + else + return 1; + } + return 1; +} diff --git a/src/mesa/program/program.h b/src/mesa/program/program.h index 34965ab..353ccab 100644 --- a/src/mesa/program/program.h +++ b/src/mesa/program/program.h @@ -187,6 +187,9 @@ _mesa_valid_register_index(const struct gl_context *ctx, extern void _mesa_postprocess_program(struct gl_context *ctx, struct gl_program *prog); +extern GLint +_mesa_get_min_invocations_per_fragment(struct gl_context *ctx, + const struct gl_fragment_program *prog); static inline GLuint _mesa_program_target_to_index(GLenum v) -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V2 05/12] glsl: Add new builtins required by GL_ARB_sample_shading
New builtins added by GL_ARB_sample_shading: in vec2 gl_SamplePosition in int gl_SampleID in int gl_NumSamples out int gl_SampleMask[] V2: - Use SWIZZLE_ for STATE_NUM_SAMPLES. - Use result.samplemask in arb_output_attrib_string. - Add comment to explain the size of gl_SampleMask[] array. - Make gl_SampleID and gl_SamplePosition system values. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Reviewed-by: Paul Berry stereotype...@gmail.com --- src/glsl/builtin_variables.cpp | 18 ++ src/mesa/main/mtypes.h | 10 +- src/mesa/program/prog_print.c | 1 + 3 files changed, 28 insertions(+), 1 deletion(-) diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp index 64f3406..bb3d057 100644 --- a/src/glsl/builtin_variables.cpp +++ b/src/glsl/builtin_variables.cpp @@ -30,6 +30,9 @@ #include program/prog_statevars.h #include program/prog_instruction.h +static struct gl_builtin_uniform_element gl_NumSamples_elements[] = { + {NULL, {STATE_NUM_SAMPLES, 0, 0}, SWIZZLE_} +}; static struct gl_builtin_uniform_element gl_DepthRange_elements[] = { {near, {STATE_DEPTH_RANGE, 0, 0}, SWIZZLE_}, @@ -236,6 +239,7 @@ static struct gl_builtin_uniform_element gl_NormalMatrix_elements[] = { #define STATEVAR(name) {#name, name ## _elements, Elements(name ## _elements)} static const struct gl_builtin_uniform_desc _mesa_builtin_uniform_desc[] = { + STATEVAR(gl_NumSamples), STATEVAR(gl_DepthRange), STATEVAR(gl_ClipPlane), STATEVAR(gl_Point), @@ -645,6 +649,7 @@ builtin_variable_generator::generate_constants() void builtin_variable_generator::generate_uniforms() { + add_uniform(int_t, gl_NumSamples); add_uniform(type(gl_DepthRangeParameters), gl_DepthRange); add_uniform(array(vec4_t, VERT_ATTRIB_MAX), gl_CurrentAttribVertMESA); add_uniform(array(vec4_t, VARYING_SLOT_MAX), gl_CurrentAttribFragMESA); @@ -821,6 +826,19 @@ builtin_variable_generator::generate_fs_special_vars() if (state-AMD_shader_stencil_export_warn) var-warn_extension = GL_AMD_shader_stencil_export; } + + if (state-ARB_sample_shading_enable) { + add_system_value(SYSTEM_VALUE_SAMPLE_ID, int_t, gl_SampleID); + add_system_value(SYSTEM_VALUE_SAMPLE_POS, vec2_t, gl_SamplePosition); + /* From the ARB_sample_shading specification: + * The number of elements in the array is ceil(s/32), where s + * is the maximum number of color samples supported by the + * implementation. + * Since no drivers expose more than 32x MSAA, we can simply set + * the array size to 1 rather than computing it. + */ + add_output(FRAG_RESULT_SAMPLE_MASK, array(int_t, 1), gl_SampleMask); + } } diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 8306969..869470e 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -274,6 +274,11 @@ typedef enum #define VARYING_BIT_VAR(V) BITFIELD64_BIT(VARYING_SLOT_VAR0 + (V)) /*@}*/ +/** + * Bitflags for system values. + */ +#define SYSTEM_BIT_SAMPLE_ID BITFIELD64_BIT(SYSTEM_VALUE_SAMPLE_ID) +#define SYSTEM_BIT_SAMPLE_POS BITFIELD64_BIT(SYSTEM_VALUE_SAMPLE_POS) /** * Determine if the given gl_varying_slot appears in the fragment shader. @@ -306,12 +311,13 @@ typedef enum * register is written. No FRAG_RESULT_DATAn will be written. */ FRAG_RESULT_COLOR = 2, + FRAG_RESULT_SAMPLE_MASK = 3, /* FRAG_RESULT_DATAn are the per-render-target (GLSL gl_FragData[n] * or ARB_fragment_program fragment.color[n]) color results. If * any are written, FRAG_RESULT_COLOR will not be written. */ - FRAG_RESULT_DATA0 = 3, + FRAG_RESULT_DATA0 = 4, FRAG_RESULT_MAX = (FRAG_RESULT_DATA0 + MAX_DRAW_BUFFERS) } gl_frag_result; @@ -1904,6 +1910,8 @@ typedef enum SYSTEM_VALUE_FRONT_FACE, /** Fragment shader only (not done yet) */ SYSTEM_VALUE_VERTEX_ID, /** Vertex shader only */ SYSTEM_VALUE_INSTANCE_ID, /** Vertex shader only */ + SYSTEM_VALUE_SAMPLE_ID, /** Fragment shader only */ + SYSTEM_VALUE_SAMPLE_POS, /** Fragment shader only */ SYSTEM_VALUE_MAX /** Number of values */ } gl_system_value; diff --git a/src/mesa/program/prog_print.c b/src/mesa/program/prog_print.c index cf85213..fa9063f 100644 --- a/src/mesa/program/prog_print.c +++ b/src/mesa/program/prog_print.c @@ -311,6 +311,7 @@ arb_output_attrib_string(GLint index, GLenum progType) result.depth, /* FRAG_RESULT_DEPTH */ result.(one), /* FRAG_RESULT_STENCIL */ result.color, /* FRAG_RESULT_COLOR */ + result.samplemask, /* FRAG_RESULT_SAMPLE_MASK */ result.color[0], /* FRAG_RESULT_DATA0 (named for GLSL's gl_FragData) */ result.color[1], result.color[2], -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/12] i965: Don't do vector splitting for ir_var_system_value
This is required while adding builtin system value vec{2, 3, 4} variables. For example: (declare (sys) vec2 gl_SamplePosition) Without this patch above glsl ir splits in to: (declare (temporary) float gl_SamplePosition_x) (declare (temporary) float gl_SamplePosition_y) Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp index eb7851b..6284b59 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp @@ -111,6 +111,7 @@ ir_vector_reference_visitor::get_variable_entry(ir_variable *var) case ir_var_uniform: case ir_var_shader_in: case ir_var_shader_out: + case ir_var_system_value: case ir_var_function_in: case ir_var_function_out: case ir_var_function_inout: -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V2 07/12] i965: Add FS backend for builtin gl_SamplePosition
V2: - Update comments. - Make changes to support simd16 mode. - Add compute_pos_offset variable in brw_wm_prog_key. - Add variable uses_omask in brw_wm_prog_data. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_fs.cpp | 65 src/mesa/drivers/dri/i965/brw_fs.h | 2 + src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 5 +++ src/mesa/drivers/dri/i965/brw_wm.c | 6 +++ src/mesa/drivers/dri/i965/brw_wm.h | 2 + 6 files changed, 81 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 3b95922..d16f257 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -380,6 +380,7 @@ struct brw_wm_prog_data { GLuint nr_params; /** number of float params/constants */ GLuint nr_pull_params; bool dual_src_blend; + bool uses_pos_offset; uint32_t prog_offset_16; /** diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 65a4b66..0f8213e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1118,6 +1118,64 @@ fs_visitor::emit_frontfacing_interpolation(ir_variable *ir) return reg; } +void +fs_visitor::compute_sample_position(fs_reg dst, fs_reg int_sample_pos) +{ + assert(dst.type == BRW_REGISTER_TYPE_F); + + if (c-key.compute_pos_offset) { + /* Convert int_sample_pos to floating point */ + emit(MOV(dst, int_sample_pos)); + /* Scale to the range [0, 1] */ + emit(MUL(dst, dst, fs_reg(1 / 16.0f))); + } + else { + /* From ARB_sample_shading specification: + * When rendering to a non-multisample buffer, or if multisample + * rasterization is disabled, gl_SamplePosition will always be + * (0.5, 0.5). + */ + emit(MOV(dst, fs_reg(0.5f))); + } +} + +fs_reg * +fs_visitor::emit_samplepos_setup(ir_variable *ir) +{ + assert(brw-gen = 6); + assert(ir-type == glsl_type::vec2_type); + + this-current_annotation = compute sample position; + fs_reg *reg = new(this-mem_ctx) fs_reg(this, ir-type); + fs_reg pos = *reg; + fs_reg int_sample_x = fs_reg(this, glsl_type::int_type); + fs_reg int_sample_y = fs_reg(this, glsl_type::int_type); + + /* WM will be run in MSDISPMODE_PERSAMPLE. So, only one of SIMD8 or SIMD16 +* mode will be enabled. +* +* From the Ivy Bridge PRM, volume 2 part 1, page 344: +* R31.1:0 Position Offset X/Y for Slot[3:0] +* R31.3:2 Position Offset X/Y for Slot[7:4] +* . +* +* The X, Y sample positions come in as bytes in thread payload. So, read +* the positions using vstride=16, width=8, hstride=2. +*/ + struct brw_reg sample_pos_reg = + stride(retype(brw_vec1_grf(c-sample_pos_reg, 0), +BRW_REGISTER_TYPE_B), 16, 8, 2); + + emit(MOV(int_sample_x, fs_reg(sample_pos_reg))); + /* Compute gl_SamplePosition.x */ + compute_sample_position(pos, int_sample_x); + pos.reg_offset += dispatch_width / 8; + emit(MOV(int_sample_y, fs_reg(suboffset(sample_pos_reg, 1; + /* Compute gl_SamplePosition.y */ + compute_sample_position(pos, int_sample_y); + return reg; +} + fs_reg fs_visitor::fix_math_operand(fs_reg src) { @@ -2985,7 +3043,14 @@ fs_visitor::setup_payload_gen6() c-nr_payload_regs++; } } + + c-prog_data.uses_pos_offset = c-key.compute_pos_offset; /* R31: MSAA position offsets. */ + if (c-prog_data.uses_pos_offset) { + c-sample_pos_reg = c-nr_payload_regs; + c-nr_payload_regs++; + } + /* R32-: bary for 32-pixel. */ /* R58-59: interp W for 32-pixel. */ diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index b5aed23..db5df39 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -333,9 +333,11 @@ public: glsl_interp_qualifier interpolation_mode, bool is_centroid); fs_reg *emit_frontfacing_interpolation(ir_variable *ir); + fs_reg *emit_samplepos_setup(ir_variable *ir); fs_reg *emit_general_interpolation(ir_variable *ir); void emit_interpolation_setup_gen4(); void emit_interpolation_setup_gen6(); + void compute_sample_position(fs_reg dst, fs_reg int_sample_pos); fs_reg rescale_texcoord(ir_texture *ir, fs_reg coordinate, bool is_rect, int sampler, int texunit); fs_inst *emit_texture_gen4(ir_texture *ir, fs_reg dst, fs_reg coordinate, diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 9f37013..51972fe 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -125,6 +125,11 @@ fs_visitor::visit(ir_variable *ir) reg =
[Mesa-dev] [PATCH V2 08/12] i965: Add FS backend for builtin gl_SampleID
V2: - Update comments - Make changes to support simd16 mode. - Add compute_sample_id variables in brw_wm_prog_key - Add a special backend instruction to compute sample_id. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/brw_defines.h| 1 + src/mesa/drivers/dri/i965/brw_fs.cpp | 49 ++ src/mesa/drivers/dri/i965/brw_fs.h | 7 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 27 ++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_wm.c | 6 src/mesa/drivers/dri/i965/brw_wm.h | 1 + 7 files changed, 93 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 5ba9d45..f3c994b 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -788,6 +788,7 @@ enum opcode { FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7, FS_OPCODE_MOV_DISPATCH_TO_FLAGS, FS_OPCODE_DISCARD_JUMP, + FS_OPCODE_SET_SAMPLE_ID, FS_OPCODE_SET_SIMD4X2_OFFSET, FS_OPCODE_PACK_HALF_2x16_SPLIT, FS_OPCODE_UNPACK_HALF_2x16_SPLIT_X, diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 0f8213e..5773c6f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1176,6 +1176,55 @@ fs_visitor::emit_samplepos_setup(ir_variable *ir) return reg; } +fs_reg * +fs_visitor::emit_sampleid_setup(ir_variable *ir) +{ + assert(brw-gen = 6); + + this-current_annotation = compute sample id; + fs_reg *reg = new(this-mem_ctx) fs_reg(this, ir-type); + + if (c-key.compute_sample_id) { + fs_reg t1 = fs_reg(this, glsl_type::int_type); + fs_reg t2 = fs_reg(this, glsl_type::int_type); + t2.type = BRW_REGISTER_TYPE_UW; + + /* The PS will be run in MSDISPMODE_PERSAMPLE. For example with + * 8x multisampling, subspan 0 will represent sample N (where N + * is 0, 2, 4 or 6), subspan 1 will represent sample 1, 3, 5 or + * 7. We can find the value of N by looking at R0.0 bits 7:6 + * (Starting Sample Pair Index (SSPI)) and multiplying by two + * (since samples are always delivered in pairs). That is, we + * compute 2*((R0.0 0xc0) 6) == (R0.0 0xc0) 5. Then + * we need to add N to the sequence (0, 0, 0, 0, 1, 1, 1, 1) in + * case of SIMD8 and sequence (0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, + * 2, 3, 3, 3, 3) in case of SIMD16. We compute this sequence by + * populating a temporary variable with the sequence (0, 1, 2, 3), + * and then reading from it using vstride=1, width=4, hstride=0. + * These computations hold good for 4x multisampling as well. + */ + emit(BRW_OPCODE_AND, t1, + fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)), + fs_reg(brw_imm_d(0xc0))); + emit(BRW_OPCODE_SHR, t1, t1, fs_reg(5)); + /* This works for both SIMD8 and SIMD16 */ + emit(MOV(t2, brw_imm_v(0x3210))); + /* This special instruction takes care of setting vstride=1, + * width=4, hstride=0 of t2 during an ADD instruction. + */ + emit(FS_OPCODE_SET_SAMPLE_ID, *reg, t1, t2); + } + else { + /* As per GL_ARB_sample_shading specification: + * When rendering to a non-multisample buffer, or if multisample + * rasterization is disabled, gl_SampleID will always be zero. + */ + emit(BRW_OPCODE_MOV, *reg, fs_reg(0)); + } + + return reg; +} + fs_reg fs_visitor::fix_math_operand(fs_reg src) { diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index db5df39..8a1a414 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -334,6 +334,7 @@ public: bool is_centroid); fs_reg *emit_frontfacing_interpolation(ir_variable *ir); fs_reg *emit_samplepos_setup(ir_variable *ir); + fs_reg *emit_sampleid_setup(ir_variable *ir); fs_reg *emit_general_interpolation(ir_variable *ir); void emit_interpolation_setup_gen4(); void emit_interpolation_setup_gen6(); @@ -538,6 +539,12 @@ private: struct brw_reg index, struct brw_reg offset); void generate_mov_dispatch_to_flags(fs_inst *inst); + + void generate_set_sample_id(fs_inst *inst, + struct brw_reg dst, + struct brw_reg src0, + struct brw_reg src1); + void generate_set_simd4x2_offset(fs_inst *inst, struct brw_reg dst, struct brw_reg offset); diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index fa15f7b..4f15ed7 100644 ---
[Mesa-dev] [PATCH V2 09/12] i965: Add FS backend for builtin gl_SampleMask[]
V2: - Update comments - Use fs_reg(0x) in AND instruction to get the 16 bit sample_mask. - Add a special backend instructions to compute sample_mask. - Add a new variable uses_omask in brw_wm_prog_data. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/brw_context.h| 1 + src/mesa/drivers/dri/i965/brw_defines.h| 1 + src/mesa/drivers/dri/i965/brw_fs.h | 5 + src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 23 +++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 15 +++ 5 files changed, 45 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index d16f257..d623368 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -381,6 +381,7 @@ struct brw_wm_prog_data { GLuint nr_pull_params; bool dual_src_blend; bool uses_pos_offset; + bool uses_omask; uint32_t prog_offset_16; /** diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index f3c994b..f9556a5 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -788,6 +788,7 @@ enum opcode { FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7, FS_OPCODE_MOV_DISPATCH_TO_FLAGS, FS_OPCODE_DISCARD_JUMP, + FS_OPCODE_SET_OMASK, FS_OPCODE_SET_SAMPLE_ID, FS_OPCODE_SET_SIMD4X2_OFFSET, FS_OPCODE_PACK_HALF_2x16_SPLIT, diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 8a1a414..c9bcc4e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -436,6 +436,7 @@ public: struct hash_table *variable_ht; fs_reg frag_depth; + fs_reg sample_mask; fs_reg outputs[BRW_MAX_DRAW_BUFFERS]; unsigned output_components[BRW_MAX_DRAW_BUFFERS]; fs_reg dual_src_output; @@ -540,6 +541,10 @@ private: struct brw_reg offset); void generate_mov_dispatch_to_flags(fs_inst *inst); + void generate_set_omask(fs_inst *inst, + struct brw_reg dst, + struct brw_reg sample_mask); + void generate_set_sample_id(fs_inst *inst, struct brw_reg dst, struct brw_reg src0, diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 4f15ed7..fc8e0bd 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1024,6 +1024,25 @@ fs_generator::generate_set_simd4x2_offset(fs_inst *inst, brw_pop_insn_state(p); } +/* Sets vstride=16, width=8, hstride=2 of register mask before + * moving to register dst. + */ +void +fs_generator::generate_set_omask(fs_inst *inst, + struct brw_reg dst, + struct brw_reg mask) +{ + assert(dst.type == BRW_REGISTER_TYPE_UW); + if (dispatch_width == 16) + dst = vec16(dst); + brw_push_insn_state(p); + brw_set_compression_control(p, BRW_COMPRESSION_NONE); + brw_set_mask_control(p, BRW_MASK_DISABLE); + brw_MOV(p, dst, stride(retype(brw_vec1_reg(mask.file, mask.nr, 0), + dst.type), 16, 8, 2)); + brw_pop_insn_state(p); +} + /* Sets vstride=1, width=4, hstride=0 of register src1 during * the ADD instruction. */ @@ -1576,6 +1595,10 @@ fs_generator::generate_code(exec_list *instructions) generate_set_simd4x2_offset(inst, dst, src[0]); break; + case FS_OPCODE_SET_OMASK: + generate_set_omask(inst, dst, src[0]); + break; + case FS_OPCODE_SET_SAMPLE_ID: generate_set_sample_id(inst, dst, src[0], src[1]); break; diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 7a6a0b5..b9eb5b8 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -82,6 +82,8 @@ fs_visitor::visit(ir_variable *ir) } } else if (ir-location == FRAG_RESULT_DEPTH) { this-frag_depth = *reg; + } else if (ir-location == FRAG_RESULT_SAMPLE_MASK) { + this-sample_mask = *reg; } else { /* gl_FragData or a user-defined FS output */ assert(ir-location = FRAG_RESULT_DATA0 @@ -2510,6 +2512,19 @@ fs_visitor::emit_fb_writes() pop_force_uncompressed(); } + c-prog_data.uses_omask = + fp-Base.OutputsWritten BITFIELD64_BIT(FRAG_RESULT_SAMPLE_MASK); + if(c-prog_data.uses_omask) { + this-current_annotation = FB write oMask; + assert(this-sample_mask.file != BAD_FILE); + fs_reg reg = fs_reg(this, glsl_type::int_type); + + /* Hand over gl_SampleMask. Only lower 16 bits are relevant. */ + emit(AND(reg, this-sample_mask,
[Mesa-dev] [PATCH V2 10/12] i965/gen6: Enable the features required for GL_ARB_sample_shading
- Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE, GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification. - Only enable one of GEN6_WM_8_DISPATCH_ENABLE or GEN6_WM_16_DISPATCH_ENABLE when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled. Refer SNB PRM Vol. 2, Part 1, Page 279 for details. V2: - Add a comment explaining why only SIMD8 mode is enabled with MSDISPMODE_PERSAMPLE. - Use shared function _mesa_get_min_invocations_per_fragment(). - Use brw_wm_prog_data variables: uses_pos_offset, uses_omask. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/gen6_wm_state.c | 52 +-- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c b/src/mesa/drivers/dri/i965/gen6_wm_state.c index e3395ce..25ecc11 100644 --- a/src/mesa/drivers/dri/i965/gen6_wm_state.c +++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c @@ -30,6 +30,7 @@ #include brw_defines.h #include brw_util.h #include brw_wm.h +#include program/program.h #include program/prog_parameter.h #include program/prog_statevars.h #include intel_batchbuffer.h @@ -153,8 +154,20 @@ upload_wm_state(struct brw_context *brw) dw5 |= (brw-max_wm_threads - 1) GEN6_WM_MAX_THREADS_SHIFT; /* CACHE_NEW_WM_PROG */ + + /* In case of non 1x (i.e 4x, 8x) multisampling with MSDISPMODE_PERSAMPLE, +* only one of SIMD8 and SIMD16 should be enabled. So, we have two options +* in above mentioned case: +* 'SIMD8 only' dispatch: allowed on gen6. +* 'SIMD16 only' dispatch: not allowed on gen6. +* +* So, we enable 'SIMD8 only' dispatch in above case. +*/ dw5 |= GEN6_WM_8_DISPATCH_ENABLE; - if (brw-wm.prog_data-prog_offset_16) + + if (brw-wm.prog_data-prog_offset_16 + !(_mesa_get_min_invocations_per_fragment(ctx, + brw-fragment_program) 1)) dw5 |= GEN6_WM_16_DISPATCH_ENABLE; /* CACHE_NEW_WM_PROG | _NEW_COLOR */ @@ -183,7 +196,8 @@ upload_wm_state(struct brw_context *brw) /* _NEW_COLOR, _NEW_MULTISAMPLE */ if (fp-program.UsesKill || ctx-Color.AlphaEnabled || - ctx-Multisample.SampleAlphaToCoverage) + ctx-Multisample.SampleAlphaToCoverage || + brw-wm.prog_data-uses_omask) dw5 |= GEN6_WM_KILL_ENABLE; if (brw_color_buffer_write_enabled(brw) || @@ -191,6 +205,16 @@ upload_wm_state(struct brw_context *brw) dw5 |= GEN6_WM_DISPATCH_ENABLE; } + /* From the SNB PRM, volume 2 part 1, page 278: +* This bit is inserted in the PS payload header and made available to +* the DataPort (either via the message header or via header bypass) to +* indicate that oMask data (one or two phases) is included in Render +* Target Write messages. If present, the oMask data is used to mask off +* samples. +*/ +if(brw-wm.prog_data-uses_omask) + dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET; + /* CACHE_NEW_WM_PROG */ dw6 |= brw-wm.prog_data-num_varying_inputs GEN6_WM_NUM_SF_OUTPUTS_SHIFT; @@ -200,12 +224,34 @@ upload_wm_state(struct brw_context *brw) dw6 |= GEN6_WM_MSRAST_ON_PATTERN; else dw6 |= GEN6_WM_MSRAST_OFF_PIXEL; - dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL; + + if (_mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program) 1) + dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE; + else + dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL; } else { dw6 |= GEN6_WM_MSRAST_OFF_PIXEL; dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE; } + /* _NEW_BUFFERS, _NEW_MULTISAMPLE */ + /* From the SNB PRM, volume 2 part 1, page 281: +* If the PS kernel does not need the Position XY Offsets +* to compute a Position XY value, then this field should be +* programmed to POSOFFSET_NONE. +* +* SW Recommendation: If the PS kernel needs the Position Offsets +* to compute a Position XY value, this field should match Position +* ZW Interpolation Mode to ensure a consistent position.xyzw +* computation. +* We only require XY sample offsets. So, this recommendation doesn't +* look useful at the moment. We might need this in future. +*/ + if (brw-wm.prog_data-uses_pos_offset) + dw6 |= GEN6_WM_POSOFFSET_SAMPLE; + else + dw6 |= GEN6_WM_POSOFFSET_NONE; + BEGIN_BATCH(9); OUT_BATCH(_3DSTATE_WM 16 | (9 - 2)); OUT_BATCH(brw-wm.base.prog_offset); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V2 11/12] i965/gen7: Enable the features required for GL_ARB_sample_shading
- Enable GEN7_WM_MSDISPMODE_PERSAMPLE, GEN7_WM_POSOFFSET_SAMPLE, GEN7_WM_OMASK_TO_RENDER_TARGET as per extension's specification. - Only enable one of GEN7_WM_8_DISPATCH_ENABLE or GEN7_WM_16_DISPATCH_ENABLE when GEN7_WM_MSDISPMODE_PERSAMPLE is enabled. Refer IVB PRM Vol. 2, Part 1, Page 288 for details. V2: - Add a comment explaining why only SIMD8 mode is enabled with MSDISPMODE_PERSAMPLE. - Use shared function _mesa_get_min_invocations_per_fragment(). - Use brw_wm_prog_data variables: uses_pos_offset, uses_omask. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/gen7_wm_state.c | 53 +-- 1 file changed, 50 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c b/src/mesa/drivers/dri/i965/gen7_wm_state.c index a2046c3..493b107 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c @@ -27,6 +27,7 @@ #include brw_defines.h #include brw_util.h #include brw_wm.h +#include program/program.h #include program/prog_parameter.h #include program/prog_statevars.h #include intel_batchbuffer.h @@ -82,9 +83,13 @@ upload_wm_state(struct brw_context *brw) GEN7_WM_BARYCENTRIC_INTERPOLATION_MODE_SHIFT; /* _NEW_COLOR, _NEW_MULTISAMPLE */ + /* Enable if the pixel shader kernel generates and outputs oMask. +*/ if (fp-program.UsesKill || ctx-Color.AlphaEnabled || - ctx-Multisample.SampleAlphaToCoverage) + ctx-Multisample.SampleAlphaToCoverage || + brw-wm.prog_data-uses_omask) { dw1 |= GEN7_WM_KILL_ENABLE; + } /* _NEW_BUFFERS */ if (brw_color_buffer_write_enabled(brw) || writes_depth || @@ -97,7 +102,11 @@ upload_wm_state(struct brw_context *brw) dw1 |= GEN7_WM_MSRAST_ON_PATTERN; else dw1 |= GEN7_WM_MSRAST_OFF_PIXEL; - dw2 |= GEN7_WM_MSDISPMODE_PERPIXEL; + + if (_mesa_get_min_invocations_per_fragment(ctx, brw-fragment_program) 1) + dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE; + else + dw2 |= GEN7_WM_MSDISPMODE_PERPIXEL; } else { dw1 |= GEN7_WM_MSRAST_OFF_PIXEL; dw2 |= GEN7_WM_MSDISPMODE_PERSAMPLE; @@ -169,6 +178,32 @@ upload_ps_state(struct brw_context *brw) if (brw-wm.prog_data-nr_params 0) dw4 |= GEN7_PS_PUSH_CONSTANT_ENABLE; + /* From the IVB PRM, volume 2 part 1, page 287: +* This bit is inserted in the PS payload header and made available to +* the DataPort (either via the message header or via header bypass) to +* indicate that oMask data (one or two phases) is included in Render +* Target Write messages. If present, the oMask data is used to mask off +* samples. +*/ + if (brw-wm.prog_data-uses_omask) + dw4 |= GEN7_PS_OMASK_TO_RENDER_TARGET; + + /* From the IVB PRM, volume 2 part 1, page 287: +* If the PS kernel does not need the Position XY Offsets to +* compute a Position Value, then this field should be programmed +* to POSOFFSET_NONE. +* SW Recommendation: If the PS kernel needs the Position Offsets +* to compute a Position XY value, this field should match Position +* ZW Interpolation Mode to ensure a consistent position.xyzw +* computation. +* We only require XY sample offsets. So, this recommendation doesn't +* look useful at the moment. We might need this in future. +*/ + if (brw-wm.prog_data-uses_pos_offset) + dw4 |= GEN7_PS_POSOFFSET_SAMPLE; + else + dw4 |= GEN7_PS_POSOFFSET_NONE; + /* CACHE_NEW_WM_PROG | _NEW_COLOR * * The hardware wedges if you have this bit set but don't turn on any dual @@ -184,8 +219,20 @@ upload_ps_state(struct brw_context *brw) if (brw-wm.prog_data-num_varying_inputs != 0) dw4 |= GEN7_PS_ATTRIBUTE_ENABLE; + /* In case of non 1x (i.e 4x, 8x) multisampling with MSDISPMODE_PERSAMPLE, +* only one of SIMD8 and SIMD16 should be enabled. So, we have two options +* in that case: +* 'SIMD8 only' dispatch: allowed on gen7. +* 'SIMD16 only' dispatch: allowed on gen7 except when in PERSAMPLE mode +* with number of multisamples = 8. +* TODO: Currently we enable 'SIMD8 only' dispatch in above mentioned case. +* Make changes to allow 'SIMD16 only' dispatch for multisamples 8. +*/ dw4 |= GEN7_PS_8_DISPATCH_ENABLE; - if (brw-wm.prog_data-prog_offset_16) + + if (brw-wm.prog_data-prog_offset_16 + !(_mesa_get_min_invocations_per_fragment(ctx, +brw-fragment_program ) 1)) dw4 |= GEN7_PS_16_DISPATCH_ENABLE; dw5 |= (brw-wm.prog_data-first_curbe_grf -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/12] i965: Enable ARB_sample_shading on intel hardware = gen6
Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Reviewed-by: Paul Berry stereotype...@gmail.com Reviewed-by: Ken Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/intel_extensions.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 803d090..88201bd 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -150,6 +150,7 @@ intelInitExtensions(struct gl_context *ctx) ctx-Extensions.OES_depth_texture_cube_map = true; ctx-Extensions.ARB_shading_language_packing = true; ctx-Extensions.ARB_texture_multisample = true; + ctx-Extensions.ARB_sample_shading = true; /* Test if the kernel has the ioctl. */ if (drm_intel_reg_read(brw-bufmgr, TIMESTAMP, dummy) == 0) -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/9] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.
On 10/25/2013 03:26 PM, Marek Olšák wrote: [snip] At least for Ivybridge, I think I want this software path 100% of the time. We may want to remove the stall on Haswell as a later optimization. I'd like to have a dedicated flag for this fallback like we have Const.PrimitiveRestartInSoftware, in case we need to implement the query for something else. Sure, that seems reasonable. I'll send out a proposed patch and CC you. It sounds like for Gallium, you already have a decent GPU-only solution. I tried to follow that code to understand how it works, and got lost after jumping through around 5 files...which is probably just my poor understanding of the Gallium architecture. Gallium doesn't do anything, the interface is pretty much the same as the vbo one. On the hardware side, there are 4 counters containing the number of bytes written to each TFB buffer. If TFB is started, the counters are set to 0. Everytime TFB is ended or paused, the counters are stored for each buffer in memory. When resuming TFB, the counters are simply loaded from memory. When we have to do DrawTransformFeedback, we copy the value of the counter from memory to a special draw register. Since the value is in bytes, we also have to set the TFB buffer stride to another special draw register. That's all. The hardware then calculates count = bytes/stride before drawing. Oh, interesting! I would have expected it to count in vertices, but bytes - that's pretty clever. If the units were the same on i965, I would've definitely done it that way...it makes a lot of sense. [snip] I hadn't thought about non-VBO vertex uploads. What does Gallium do in that case? Has it just been broken this whole time? Yes, it has, I completely forgot about it. :( I guess I figured drivers would either implement this hook, or do the tfb_vertcount approach, but not both. Maybe that's a bad assumption. For vertex uploads and vertex fetch fallbacks (where we translate and align vertex buffers to what a gallium driver supports - util/u_vbuf.c), we can use a query like the one you want to add. However, gallium drivers should use the tfb_vertcount approach (AKA pipe_draw_info::count_from_stream_output) whenever they see it's not NULL. Since most Gallium hardware drivers will never see non-VBO vertex data or an unsupported vertex format, it's the only approach they have to implement. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] GL_OES_get_program_binary extension
On Thu, Oct 24, 2013 at 1:28 AM, Tapani Pälli tapani.pa...@intel.com wrote: Hello; These patches introduce GL_OES_get_program_binary extension support for Mesa. There are already stub functions for this extension, patches add the missing functionality part. This is based on the 'more automatic' shader cache work I've been implementing. I wanted to implement this first as this is a standard for applications to use and the automatic cache can be built separately based on these same enablers. As well as code review I would also appreciate any testing efforts with this. I've tested this with my own test apps but as you can imagine the coverage ain't that big. I'm also thinking of building piglit test cases to exercise cache shader but that is still on planning stage. Are the implementations for serializing and unserializing cache shaders mostly shared between the automatic shader cache and this extension's implementation? I worry that the only way to get sufficient testing of that is via the automatic shader cache, and that only once it's stable can this extension proceed. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965/fs: Drop no-op shifts involving 0.
On Fri, Oct 25, 2013 at 2:49 PM, Eric Anholt e...@anholt.net wrote: I noticed this in a shader in Unigine Heaven that was spilling. While it doesn't really reduce register pressure, it shaves a few instructions anyway (7955 - 7882). v2: Fix turning 0 x into x instead of 0 (caught by Erik Faye-Lund). --- src/glsl/opt_algebraic.cpp | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/glsl/opt_algebraic.cpp b/src/glsl/opt_algebraic.cpp index 2e33dfe..a07e153 100644 --- a/src/glsl/opt_algebraic.cpp +++ b/src/glsl/opt_algebraic.cpp @@ -346,6 +346,16 @@ ir_algebraic_visitor::handle_expression(ir_expression *ir) } break; + case ir_binop_rshift: + case ir_binop_lshift: + /* 0 x == 0 */ + if (is_vec_zero(op_const[0])) + return ir-operands[0]; Any value to writing ir_constant::zero(ir, ir-type) here instead? Either way, this series is Reviewed-by: Matt Turner matts...@gmail.com ... for whatever that's worth these days. :) I do think these clean ups make the code a lot clearer. Nice. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] glsl: Move common code out of opt_algebraic's handle_expression().
On 10/25/2013 02:49 PM, Eric Anholt wrote: Matt and I had each screwed up these common required patterns recently, in ways that wouldn't have been noticed for a long time if not for code review. Just enforce it in the caller so that we don't rely on code review catching these bugs. --- src/glsl/opt_algebraic.cpp | 117 +++-- 1 file changed, 39 insertions(+), 78 deletions(-) Yes, thank you! These first two patches are great. Much less likely to botch things after the first patch, and I've been meaning to convert a bunch of our code to IR builder, but never got to it. Thanks for beating me to it :) All three are: Reviewed-by: Kenneth Graunke kenn...@whitecape.org ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/8] mesa: Separate transform feedback object initialization from allocation.
Both Gallium and i965 subclass gl_transform_feedback_object, which requires implementing a custom NewTransformFeedback hook. Creating a helper function to initialize the fields avoids code duplication and divergence. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: Eric Anholt e...@anholt.net Cc: Marek Olšák mar...@gmail.com --- src/mesa/main/transformfeedback.c | 20 +++- src/mesa/main/transformfeedback.h | 3 +++ 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index bc9b52a..76d213b 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -205,17 +205,27 @@ _mesa_free_transform_feedback(struct gl_context *ctx) } +/** Initialize the fields of a gl_transform_feedback_object. */ +void +_mesa_init_transform_feedback_object(struct gl_transform_feedback_object *obj, + GLuint name) +{ + if (!obj) + return; + + obj-Name = name; + obj-RefCount = 1; + obj-EverBound = GL_FALSE; +} + + /** Default fallback for ctx-Driver.NewTransformFeedback() */ static struct gl_transform_feedback_object * new_transform_feedback(struct gl_context *ctx, GLuint name) { struct gl_transform_feedback_object *obj; obj = CALLOC_STRUCT(gl_transform_feedback_object); - if (obj) { - obj-Name = name; - obj-RefCount = 1; - obj-EverBound = GL_FALSE; - } + _mesa_init_transform_feedback_object(obj, name); return obj; } diff --git a/src/mesa/main/transformfeedback.h b/src/mesa/main/transformfeedback.h index 0ffaab5..7aecd66 100644 --- a/src/mesa/main/transformfeedback.h +++ b/src/mesa/main/transformfeedback.h @@ -91,6 +91,9 @@ _mesa_GetTransformFeedbackVarying(GLuint program, GLuint index, /*** GL_ARB_transform_feedback2 ***/ +extern void +_mesa_init_transform_feedback_object(struct gl_transform_feedback_object *obj, + GLuint name); struct gl_transform_feedback_object * _mesa_lookup_transform_feedback_object(struct gl_context *ctx, GLuint name); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/8] st/mesa: Use the new _mesa_init_transform_feedback_object() helper.
This picks up a missing obj-EverBound = GL_FALSE line, and will catch any new fields that get added in the future. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Cc: Marek Olšák mar...@gmail.com --- src/mesa/state_tracker/st_cb_xformfb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Compile tested with --with-gallium-drivers=swrast. Not tested beyond that. diff --git a/src/mesa/state_tracker/st_cb_xformfb.c b/src/mesa/state_tracker/st_cb_xformfb.c index e1a7a88..a1c643b 100644 --- a/src/mesa/state_tracker/st_cb_xformfb.c +++ b/src/mesa/state_tracker/st_cb_xformfb.c @@ -74,8 +74,8 @@ st_new_transform_feedback(struct gl_context *ctx, GLuint name) if (!obj) return NULL; - obj-base.Name = name; - obj-base.RefCount = 1; + _mesa_init_transform_feedback_object(obj, name); + return obj-base; } -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/8] i965: Create a new brw_transform_feedback_object subclass.
This adds the basic driver hooks to allocate/free the brw variant. It doesn't contain any additional information yet, but it will soon. v2: Use the new _mesa_init_transform_feedback_object helper function (requested by Eric and Ian). Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_context.c | 2 ++ src/mesa/drivers/dri/i965/brw_context.h | 9 + src/mesa/drivers/dri/i965/gen6_sol.c| 29 + 3 files changed, 40 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 8420c65..2df12ed 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -250,6 +250,8 @@ brw_init_driver_functions(struct brw_context *brw, functions-QuerySamplesForFormat = brw_query_samples_for_format; + functions-NewTransformFeedback = brw_new_transform_feedback; + functions-DeleteTransformFeedback = brw_delete_transform_feedback; if (brw-gen = 7) { functions-BeginTransformFeedback = gen7_begin_transform_feedback; functions-EndTransformFeedback = gen7_end_transform_feedback; diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 4bff63e..54ad929 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -880,6 +880,10 @@ struct intel_batchbuffer { } saved; }; +struct brw_transform_feedback_object { + struct gl_transform_feedback_object base; +}; + /** * Data shared between each programmable stage in the pipeline (vs, gs, and * wm). @@ -1556,6 +1560,11 @@ extern int intel_translate_logic_op(GLenum opcode); void intel_init_syncobj_functions(struct dd_function_table *functions); /* gen6_sol.c */ +struct gl_transform_feedback_object * +brw_new_transform_feedback(struct gl_context *ctx, GLuint name); +void +brw_delete_transform_feedback(struct gl_context *ctx, + struct gl_transform_feedback_object *obj); void brw_begin_transform_feedback(struct gl_context *ctx, GLenum mode, struct gl_transform_feedback_object *obj); diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index 21da444..0d84cee 100644 --- a/src/mesa/drivers/dri/i965/gen6_sol.c +++ b/src/mesa/drivers/dri/i965/gen6_sol.c @@ -26,6 +26,7 @@ * Code to initialize the binding table entries used by transform feedback. */ +#include main/bufferobj.h #include main/macros.h #include brw_context.h #include intel_batchbuffer.h @@ -132,6 +133,34 @@ const struct brw_tracked_state gen6_gs_binding_table = { .emit = brw_gs_upload_binding_table, }; +struct gl_transform_feedback_object * +brw_new_transform_feedback(struct gl_context *ctx, GLuint name) +{ + struct brw_context *brw = brw_context(ctx); + struct brw_transform_feedback_object *brw_obj = + CALLOC_STRUCT(brw_transform_feedback_object); + if (!brw_obj) + return NULL; + + _mesa_init_transform_feedback_object(brw_obj-base, name); + + return brw_obj-base; +} + +void +brw_delete_transform_feedback(struct gl_context *ctx, + struct gl_transform_feedback_object *obj) +{ + struct brw_transform_feedback_object *brw_obj = + (struct brw_transform_feedback_object *) obj; + + for (unsigned i = 0; i Elements(obj-Buffers); i++) { + _mesa_reference_buffer_object(ctx, obj-Buffers[i], NULL); + } + + free(brw_obj); +} + void brw_begin_transform_feedback(struct gl_context *ctx, GLenum mode, struct gl_transform_feedback_object *obj) -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 4/8] i965: Implement Pause/ResumeTransformfeedback driver hooks on Gen7+.
The ARB_transform_feedback2 extension introduces the ability to pause and resume transform feedback sessions. Although only one can be active at a time, it's possible to switch between multiple transform feedback objects while paused. In order to facilitate this, we need to save/restore the SO_WRITE_OFFSET registers so that after resuming, the GPU continues writing where it left off. This functionality also exists in ES 3.0, but somehow we completely forgot to implement it. v2: Reduce alignment from 4096 to 64 (it seemed excessive). Signed-off-by: Kenneth Graunke kenn...@whitecape.org Reviewed-by: Ian Romanick ian.d.roman...@intel.com Reviewed-by: Eric Anholt e...@anholt.net --- src/mesa/drivers/dri/i965/brw_context.c| 2 ++ src/mesa/drivers/dri/i965/brw_context.h| 9 +++ src/mesa/drivers/dri/i965/gen6_sol.c | 5 src/mesa/drivers/dri/i965/gen7_sol_state.c | 40 ++ 4 files changed, 56 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 2df12ed..90d9be4 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -255,6 +255,8 @@ brw_init_driver_functions(struct brw_context *brw, if (brw-gen = 7) { functions-BeginTransformFeedback = gen7_begin_transform_feedback; functions-EndTransformFeedback = gen7_end_transform_feedback; + functions-PauseTransformFeedback = gen7_pause_transform_feedback; + functions-ResumeTransformFeedback = gen7_resume_transform_feedback; } else { functions-BeginTransformFeedback = brw_begin_transform_feedback; functions-EndTransformFeedback = brw_end_transform_feedback; diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 54ad929..48aa4c1 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -882,6 +882,9 @@ struct intel_batchbuffer { struct brw_transform_feedback_object { struct gl_transform_feedback_object base; + + /** A buffer to hold SO_WRITE_OFFSET(n) values while paused. */ + drm_intel_bo *offset_bo; }; /** @@ -1579,6 +1582,12 @@ gen7_begin_transform_feedback(struct gl_context *ctx, GLenum mode, void gen7_end_transform_feedback(struct gl_context *ctx, struct gl_transform_feedback_object *obj); +void +gen7_pause_transform_feedback(struct gl_context *ctx, + struct gl_transform_feedback_object *obj); +void +gen7_resume_transform_feedback(struct gl_context *ctx, + struct gl_transform_feedback_object *obj); /* brw_blorp_blit.cpp */ GLbitfield diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index 0d84cee..2e6c86a 100644 --- a/src/mesa/drivers/dri/i965/gen6_sol.c +++ b/src/mesa/drivers/dri/i965/gen6_sol.c @@ -144,6 +144,9 @@ brw_new_transform_feedback(struct gl_context *ctx, GLuint name) _mesa_init_transform_feedback_object(brw_obj-base, name); + brw_obj-offset_bo = + drm_intel_bo_alloc(brw-bufmgr, transform feedback offsets, 16, 64); + return brw_obj-base; } @@ -158,6 +161,8 @@ brw_delete_transform_feedback(struct gl_context *ctx, _mesa_reference_buffer_object(ctx, obj-Buffers[i], NULL); } + drm_intel_bo_unreference(brw_obj-offset_bo); + free(brw_obj); } diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index abfe0a0..27421da 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -273,3 +273,43 @@ gen7_end_transform_feedback(struct gl_context *ctx, intel_batchbuffer_emit_mi_flush(brw); } + +void +gen7_pause_transform_feedback(struct gl_context *ctx, + struct gl_transform_feedback_object *obj) +{ + struct brw_context *brw = brw_context(ctx); + struct brw_transform_feedback_object *brw_obj = + (struct brw_transform_feedback_object *) obj; + + /* Save the SOL buffer offset register values. */ + for (int i = 0; i 4; i++) { + BEGIN_BATCH(3); + OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2)); + OUT_BATCH(GEN7_SO_WRITE_OFFSET(i)); + OUT_RELOC(brw_obj-offset_bo, +I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, +i * sizeof(uint32_t)); + ADVANCE_BATCH(); + } +} + +void +gen7_resume_transform_feedback(struct gl_context *ctx, + struct gl_transform_feedback_object *obj) +{ + struct brw_context *brw = brw_context(ctx); + struct brw_transform_feedback_object *brw_obj = + (struct brw_transform_feedback_object *) obj; + + /* Reload the SOL buffer offset registers. */ + for (int i = 0; i 4; i++) { + BEGIN_BATCH(3); + OUT_BATCH(GEN7_MI_LOAD_REGISTER_MEM | (3 - 2)); + OUT_BATCH(GEN7_SO_WRITE_OFFSET(i)); + OUT_RELOC(brw_obj-offset_bo, +
[Mesa-dev] [PATCH v2 5/8] mesa: Add a new GetTransformFeedbackVertexCount() driver hook.
DrawTransformFeedback() needs to obtain the number of vertices written to a particular stream during the last Begin/EndTransformFeedback block. The new driver hook returns exactly that information. Gallium drivers already implement this by passing the transform feedback object to the drawing function, counting the number of vertices written on the GPU, and using draw indirect. This is efficient, but doesn't always work: If vertex data comes from user arrays, then the VBO module needs to know how many vertices to upload, so we need to synchronously count. Gallium drivers are currently broken in this case. It also doesn't work if primitive restart is done in software. For normal drawing, vbo_draw_arrays() performs software primitive restart, splitting the draw call in two. vbo_draw_transform_feedback() currently doesn't because it has no idea how many vertices need to be drawn. The new driver hook gives it that information, allowing us to reuse the existing vbo_draw_arrays() code to do everything right. On Intel hardware (at least Ivybridge), using the draw indirect approach is difficult since the hardware counts primitives, rather than vertices, which requires doing some simple math. So we always use this hook. Gallium drivers will likely want to use this hook in some cases, but want to use the existing draw indirect approach where possible. Hence, I've added a flag to allow drivers to opt-in to this call. v2: Make it possible to implement this hook but only use this path when necessary (suggested by Marek). Cc: Marek Olšák mar...@gmail.com Cc: Eric Anholt e...@anholt.net Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_context.c | 2 ++ src/mesa/main/dd.h | 8 src/mesa/main/mtypes.h | 6 ++ src/mesa/vbo/vbo_exec_array.c | 10 ++ 4 files changed, 26 insertions(+) Marek, Does this look like what you wanted? I feel a bit silly adding all of this seeing as the later conditions are totally untested - i965 sets the always use this hook flag, so it short-circuits them, and Gallium drivers don't yet implement the hook, so they don't hit it either. :) But I think this is probably roughly what you're going to want... Eric, does this look reasonable? Thanks for everything! diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 90d9be4..623273c 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -329,6 +329,8 @@ brw_initialize_context_constants(struct brw_context *brw) ctx-Const.MaxTransformFeedbackSeparateComponents = BRW_MAX_SOL_BINDINGS / BRW_MAX_SOL_BUFFERS; + ctx-Const.AlwaysUseGetTransformFeedbackVertexCount = true; + if (brw-gen == 6) { ctx-Const.MaxSamples = 4; ctx-Const.MaxColorTextureSamples = 4; diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h index 29469ce..11d5a9e 100644 --- a/src/mesa/main/dd.h +++ b/src/mesa/main/dd.h @@ -843,6 +843,14 @@ struct dd_function_table { struct gl_transform_feedback_object *obj); /** +* Return the number of vertices written to a stream during the last +* Begin/EndTransformFeedback block. +*/ + GLsizei (*GetTransformFeedbackVertexCount)(struct gl_context *ctx, + struct gl_transform_feedback_object *obj, + GLuint stream); + + /** * \name GL_NV_texture_barrier interface */ void (*TextureBarrier)(struct gl_context *ctx); diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 97ed1bd..f5e1f01 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3131,6 +3131,12 @@ struct gl_constants */ GLboolean PrimitiveRestartInSoftware; + /** +* Always use the GetTransformFeedbackVertexCount() driver hook, rather +* than passing the transform feedback object to the drawing function. +*/ + GLboolean AlwaysUseGetTransformFeedbackVertexCount; + /** GL_ARB_map_buffer_alignment */ GLuint MinMapBufferAlignment; diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c index 1670409..f25a9de 100644 --- a/src/mesa/vbo/vbo_exec_array.c +++ b/src/mesa/vbo/vbo_exec_array.c @@ -1464,6 +1464,16 @@ vbo_draw_transform_feedback(struct gl_context *ctx, GLenum mode, return; } + if (ctx-Driver.GetTransformFeedbackVertexCount + (ctx-Const.AlwaysUseGetTransformFeedbackVertexCount || +(ctx-Const.PrimitiveRestartInSoftware + ctx-Array._PrimitiveRestart) || +!vbo_all_varyings_in_vbos(exec-array.inputs))) { + GLsizei n = ctx-Driver.GetTransformFeedbackVertexCount(ctx, obj, stream); + vbo_draw_arrays(ctx, mode, 0, n, numInstances, 0); + return; + } + vbo_bind_arrays(ctx); /* init most fields to zero */ -- 1.8.3.2
[Mesa-dev] [PATCH v2 6/8] i965: Mark brw_draw_prims tfb_vertcount parameter as unused.
Renaming it makes it obvious that it isn't used, and the assertion verifies that the VBO module never passes us such an object. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Reviewed-by: Ian Romanick ian.d.roman...@intel.com Reviewed-by: Eric Anholt e...@anholt.net --- src/mesa/drivers/dri/i965/brw_draw.c | 4 +++- src/mesa/drivers/dri/i965/brw_draw.h | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 0acd089..7b33b76 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -463,11 +463,13 @@ void brw_draw_prims( struct gl_context *ctx, GLboolean index_bounds_valid, GLuint min_index, GLuint max_index, -struct gl_transform_feedback_object *tfb_vertcount ) +struct gl_transform_feedback_object *unused_tfb_object) { struct brw_context *brw = brw_context(ctx); const struct gl_client_array **arrays = ctx-Array._DrawArrays; + assert(unused_tfb_object == NULL); + if (!_mesa_check_conditional_render(ctx)) return; diff --git a/src/mesa/drivers/dri/i965/brw_draw.h b/src/mesa/drivers/dri/i965/brw_draw.h index aac375f..fb96813 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.h +++ b/src/mesa/drivers/dri/i965/brw_draw.h @@ -41,7 +41,7 @@ void brw_draw_prims( struct gl_context *ctx, GLboolean index_bounds_valid, GLuint min_index, GLuint max_index, -struct gl_transform_feedback_object *tfb_vertcount ); +struct gl_transform_feedback_object *unused_tfb_object); void brw_draw_init( struct brw_context *brw ); void brw_draw_destroy( struct brw_context *brw ); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 7/8] i965: Implement glDrawTransformFeedback().
Implementing the GetTransformFeedbackVertexCount() driver hook allows the VBO module to call us with the right number of vertices. The hardware doesn't directly count the number of vertices written by SOL, so we instead use the SO_NUM_PRIMS_WRITTEN(n) counters and multiply by the number of vertices per primitive. Unfortunately, counting the number of primitives generated is tricky: a program might pause a transform feedback operation, start a second one with a different object, then switch back and resume. Both transform feedback operations share the SO_NUM_PRIMS_WRITTEN counters. To work around this, we save the counter values at Begin, Pause, Resume, and End. This bookends each section where transform feedback is active for the current object. Adding up differences of pairs gives us the number of primitives generated. (This is similar to what we do for occlusion queries on platforms without hardware contexts.) v2: Fix missing parenthesis in assertion (caught by Eric Anholt). Signed-off-by: Kenneth Graunke kenn...@whitecape.org Reviewed-by: Ian Romanick ian.d.roman...@intel.com Reviewed-by: Eric Anholt e...@anholt.net --- src/mesa/drivers/dri/i965/brw_context.c| 2 + src/mesa/drivers/dri/i965/brw_context.h| 26 src/mesa/drivers/dri/i965/gen6_sol.c | 1 + src/mesa/drivers/dri/i965/gen7_sol_state.c | 190 - 4 files changed, 218 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 623273c..f4e04b6 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -252,6 +252,8 @@ brw_init_driver_functions(struct brw_context *brw, functions-NewTransformFeedback = brw_new_transform_feedback; functions-DeleteTransformFeedback = brw_delete_transform_feedback; + functions-GetTransformFeedbackVertexCount = + brw_get_transform_feedback_vertex_count; if (brw-gen = 7) { functions-BeginTransformFeedback = gen7_begin_transform_feedback; functions-EndTransformFeedback = gen7_end_transform_feedback; diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 48aa4c1..c72bad1 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -880,11 +880,33 @@ struct intel_batchbuffer { } saved; }; +#define BRW_MAX_XFB_STREAMS 4 + struct brw_transform_feedback_object { struct gl_transform_feedback_object base; /** A buffer to hold SO_WRITE_OFFSET(n) values while paused. */ drm_intel_bo *offset_bo; + + /** The most recent primitive mode (GL_TRIANGLES/GL_POINTS/GL_LINES). */ + GLenum primitive_mode; + + /** +* Count of primitives generated during this transform feedback operation. +* @{ +*/ + uint64_t prims_generated[BRW_MAX_XFB_STREAMS]; + drm_intel_bo *prim_count_bo; + unsigned prim_count_buffer_index; /** in number of uint64_t units */ + /** @} */ + + /** +* Number of vertices written between last Begin/EndTransformFeedback(). +* +* Used to implement DrawTransformFeedback(). +*/ + uint64_t vertices_written[BRW_MAX_XFB_STREAMS]; + bool vertices_written_valid; }; /** @@ -1574,6 +1596,10 @@ brw_begin_transform_feedback(struct gl_context *ctx, GLenum mode, void brw_end_transform_feedback(struct gl_context *ctx, struct gl_transform_feedback_object *obj); +GLsizei +brw_get_transform_feedback_vertex_count(struct gl_context *ctx, +struct gl_transform_feedback_object *obj, +GLuint stream); /* gen7_sol_state.c */ void diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index 2e6c86a..af5bed9 100644 --- a/src/mesa/drivers/dri/i965/gen6_sol.c +++ b/src/mesa/drivers/dri/i965/gen6_sol.c @@ -162,6 +162,7 @@ brw_delete_transform_feedback(struct gl_context *ctx, } drm_intel_bo_unreference(brw_obj-offset_bo); + drm_intel_bo_unreference(brw_obj-prim_count_bo); free(brw_obj); } diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index 27421da..7cac8fe 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -249,14 +249,179 @@ const struct brw_tracked_state gen7_sol_state = { .emit = upload_sol_state, }; +/** + * Tally the number of primitives generated so far. + * + * The buffer contains a series of pairs: + * (start0, start1, start2, start3, end0, end1, end2, end3) ; + * (start0, start1, start2, start3, end0, end1, end2, end3) ; + * + * For each stream, we subtract the pair of values (end - start) to get the + * number of primitives generated during one section. We accumulate these + * values, adding them up to get the total number of primitives generated. + */ +static void