Re: [Mesa-dev] [PATCH] i965: Make TCS precompile use the TES primitive mode when available.
Reviewed-by: Jordan JustenOn 2016-01-01 23:23:51, Kenneth Graunke wrote: > If there's a linked TES program, we should just use the actual > primitive mode. If not, just guess triangles (as we did before). > > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/brw_tcs.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_tcs.c > b/src/mesa/drivers/dri/i965/brw_tcs.c > index 2c925e7..7e41426 100644 > --- a/src/mesa/drivers/dri/i965/brw_tcs.c > +++ b/src/mesa/drivers/dri/i965/brw_tcs.c > @@ -307,7 +307,9 @@ brw_tcs_precompile(struct gl_context *ctx, > /* Guess that the input and output patches have the same dimensionality. > */ > key.input_vertices = shader_prog->TessCtrl.VerticesOut; > > - key.tes_primitive_mode = GL_TRIANGLES; > + key.tes_primitive_mode = > + shader_prog->_LinkedShaders[MESA_SHADER_TESS_EVAL] ? > + shader_prog->TessEval.PrimitiveMode : GL_TRIANGLES; > > key.outputs_written = prog->OutputsWritten; > key.patch_outputs_written = prog->PatchOutputsWritten; > -- > 2.6.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 93561] ninja: error: '$(PRIVATE_SCRIPT)', needed by 'out/target/product/rpi2/gen/STATIC_LIBRARIES/libmesa_dri_common_intermediates/xmlpool/options.h', missing and no known rule to make
https://bugs.freedesktop.org/show_bug.cgi?id=93561 Bug ID: 93561 Summary: ninja: error: '$(PRIVATE_SCRIPT)', needed by 'out/target/product/rpi2/gen/STATIC_LIBRARIES/libmesa_ dri_common_intermediates/xmlpool/options.h', missing and no known rule to make it Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: changuan...@hotmail.my QA Contact: mesa-dev@lists.freedesktop.org external/mesa3d/src/mesa/drivers/dri/common/Android.mk:85: kati doesn't support .SECONDEXPANSION Starting build with ninja ninja: Entering directory `.' ninja: error: '$(PRIVATE_SCRIPT)', needed by 'out/target/product/rpi2/gen/STATIC_LIBRARIES/libmesa_dri_common_intermediates/xmlpool/options.h', missing and no known rule to make it make: *** [ninja_wrapper] Error 1 -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: Fix undefined shifts.
Reviewed-by: Jordan JustenOn 2015-12-30 12:26:25, Matt Turner wrote: > Shifting into the sign bit if undefined, as is shifting by 32. > --- > src/glsl/ir_constant_expression.cpp | 10 +- > src/glsl/nir/nir_opcodes.py | 6 +++--- > 2 files changed, 8 insertions(+), 8 deletions(-) > > diff --git a/src/glsl/ir_constant_expression.cpp > b/src/glsl/ir_constant_expression.cpp > index 5bf5ce5..cf62f96 100644 > --- a/src/glsl/ir_constant_expression.cpp > +++ b/src/glsl/ir_constant_expression.cpp > @@ -1539,10 +1539,10 @@ ir_expression::constant_expression_value(struct > hash_table *variable_context) > data.i[c] = -1; > else { > int count = 0; > -int top_bit = op[0]->type->base_type == GLSL_TYPE_UINT > - ? 0 : v & (1 << 31); > +unsigned top_bit = op[0]->type->base_type == GLSL_TYPE_UINT > + ? 0 : v & (1u << 31); > > -while (((v & (1 << 31)) == top_bit) && count != 32) { > +while (((v & (1u << 31)) == top_bit) && count != 32) { > count++; > v <<= 1; > } > @@ -1628,7 +1628,7 @@ ir_expression::constant_expression_value(struct > hash_table *variable_context) > else if (offset + bits > 32) > data.u[c] = 0; /* Undefined for bitfieldInsert, per spec. */ > else > -data.u[c] = ((1 << bits) - 1) << offset; > +data.u[c] = ((1ul << bits) - 1) << offset; >} >break; > } > @@ -1738,7 +1738,7 @@ ir_expression::constant_expression_value(struct > hash_table *variable_context) > else if (offset + bits > 32) > data.u[c] = 0; /* Undefined, per spec. */ > else { > -unsigned insert_mask = ((1 << bits) - 1) << offset; > +unsigned insert_mask = ((1ul << bits) - 1) << offset; > > unsigned insert = op[1]->value.u[c]; > insert <<= offset; > diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py > index 1cd01a4..e8b5123 100644 > --- a/src/glsl/nir/nir_opcodes.py > +++ b/src/glsl/nir/nir_opcodes.py > @@ -516,7 +516,7 @@ int offset = src0, bits = src1; > if (offset < 0 || bits < 0 || offset + bits > 32) > dst = 0; /* undefined per the spec */ > else > - dst = ((1 << bits)- 1) << offset; > + dst = ((1ul << bits) - 1) << offset; > """) > > opcode("ldexp", 0, tfloat, [0, 0], [tfloat, tint], "", """ > @@ -578,7 +578,7 @@ if (bits == 0) { > } else if (bits < 0 || offset < 0 || offset + bits > 32) { > dst = 0; /* undefined per the spec */ > } else { > - dst = (base >> offset) & ((1 << bits) - 1); > + dst = (base >> offset) & ((1ul << bits) - 1); > } > """) > opcode("ibitfield_extract", 0, tint, > @@ -618,7 +618,7 @@ if (bits == 0) { > } else if (offset < 0 || bits < 0 || bits + offset > 32) { > dst = 0; > } else { > - unsigned mask = ((1 << bits) - 1) << offset; > + unsigned mask = ((1ul << bits) - 1) << offset; > dst = (base & ~mask) | ((insert << bits) & mask); > } > """) > -- > 2.4.9 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome
https://bugs.freedesktop.org/show_bug.cgi?id=90264 --- Comment #59 from pavi...@yahoo.fr --- I hope the bug you filed about this will have some attention. https://code.google.com/p/chromium/issues/detail?id=505969 But you didn't said if something still need to be fixed in nouveau ;) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3] gallium/tests: fix build with clang compiler
Nested functions are supported as an extension in GNU C, but Clang don't support them. This fixes compilation errors when (manually) building compute.c, or by setting --enable-gallium-tests to the configure script. Changes from v3: - refactor by introducing test_default_init() Changes from v2: - fix typo Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165 Signed-off-by: Samuel Pitoiset--- src/gallium/tests/trivial/compute.c | 603 1 file changed, 330 insertions(+), 273 deletions(-) diff --git a/src/gallium/tests/trivial/compute.c b/src/gallium/tests/trivial/compute.c index bcdfb11..5ce12ab 100644 --- a/src/gallium/tests/trivial/compute.c +++ b/src/gallium/tests/trivial/compute.c @@ -428,6 +428,35 @@ static void launch_grid(struct context *ctx, const uint *block_layout, pipe->launch_grid(pipe, block_layout, grid_layout, pc, input); } +static void test_default_init(void *p, int s, int x, int y) +{ +*(uint32_t *)p = 0xdeadbeef; +} + +/* test_system_values */ +static void test_system_values_expect(void *p, int s, int x, int y) +{ +int id = x / 16, sv = (x % 16) / 4, c = x % 4; +int tid[] = { id % 20, (id % 240) / 20, id / 240, 0 }; +int bsz[] = { 4, 3, 5, 1}; +int gsz[] = { 5, 4, 1, 1}; + +switch (sv) { +case 0: +*(uint32_t *)p = tid[c] / bsz[c]; +break; +case 1: +*(uint32_t *)p = bsz[c]; +break; +case 2: +*(uint32_t *)p = gsz[c]; +break; +case 3: +*(uint32_t *)p = tid[c] % bsz[c]; +break; +} +} + static void test_system_values(struct context *ctx) { const char *src = "COMP\n" @@ -461,44 +490,31 @@ static void test_system_values(struct context *ctx) " STORE RES[0].xyzw, TEMP[0], SV[3]\n" " RET\n" "ENDSUB\n"; -void init(void *p, int s, int x, int y) { -*(uint32_t *)p = 0xdeadbeef; -} -void expect(void *p, int s, int x, int y) { -int id = x / 16, sv = (x % 16) / 4, c = x % 4; -int tid[] = { id % 20, (id % 240) / 20, id / 240, 0 }; -int bsz[] = { 4, 3, 5, 1}; -int gsz[] = { 5, 4, 1, 1}; - -switch (sv) { -case 0: -*(uint32_t *)p = tid[c] / bsz[c]; -break; -case 1: -*(uint32_t *)p = bsz[c]; -break; -case 2: -*(uint32_t *)p = gsz[c]; -break; -case 3: -*(uint32_t *)p = tid[c] % bsz[c]; -break; -} -} printf("- %s\n", __func__); init_prog(ctx, 0, 0, 0, src, NULL); init_tex(ctx, 0, PIPE_BUFFER, true, PIPE_FORMAT_R32_FLOAT, - 76800, 0, init); + 76800, 0, test_default_init); init_compute_resources(ctx, (int []) { 0, -1 }); launch_grid(ctx, (uint []){4, 3, 5}, (uint []){5, 4, 1}, 0, NULL); -check_tex(ctx, 0, expect, NULL); +check_tex(ctx, 0, test_system_values_expect, NULL); destroy_compute_resources(ctx); destroy_tex(ctx); destroy_prog(ctx); } +/* test_resource_access */ +static void test_resource_access_init0(void *p, int s, int x, int y) +{ +*(float *)p = 8.0 - (float)x; +} + +static void test_resource_access_expect(void *p, int s, int x, int y) +{ +*(float *)p = 8.0 - (float)((x + 4 * y) & 0x3f); +} + static void test_resource_access(struct context *ctx) { const char *src = "COMP\n" @@ -519,31 +535,33 @@ static void test_resource_access(struct context *ctx) " STORE RES[1].xyzw, TEMP[1], TEMP[0]\n" " RET\n" "ENDSUB\n"; -void init0(void *p, int s, int x, int y) { -*(float *)p = 8.0 - (float)x; -} -void init1(void *p, int s, int x, int y) { -*(uint32_t *)p = 0xdeadbeef; -} -void expect(void *p, int s, int x, int y) { -*(float *)p = 8.0 - (float)((x + 4*y) & 0x3f); -} printf("- %s\n", __func__); init_prog(ctx, 0, 0, 0, src, NULL); init_tex(ctx, 0, PIPE_BUFFER, true, PIPE_FORMAT_R32_FLOAT, - 256, 0, init0); + 256, 0, test_resource_access_init0); init_tex(ctx, 1, PIPE_TEXTURE_2D, true, PIPE_FORMAT_R32_FLOAT, - 60, 12, init1); + 60, 12, test_default_init); init_compute_resources(ctx, (int []) { 0, 1, -1 }); launch_grid(ctx, (uint []){1, 1, 1}, (uint []){15, 12, 1}, 0, NULL); -check_tex(ctx, 1, expect, NULL); +
Re: [Mesa-dev] [PATCH v4] nv50, nvc0: optimize coherent buffer checking at draw time
Reviewed-by: Ilia MirkinOn Sat, Jan 2, 2016 at 12:09 PM, Samuel Pitoiset wrote: > Instead of iterating over all the buffer resources looking for coherent > buffers, we keep track of a context-wide count. This will save some > iterations (and CPU cycles) in 99.99% case because usually coherent > buffers are not so used. > > Changes from v4: > - fix flag for textures > > Changes from v3: > - check if views[i] and views[i]->texture are not NULL > - fix use of nv50->textures_coherent > - check if vb[i].buffer is not NULL > - clear out the flag for UBO > > Changes from v2: > - forgot to apply some changes for nv50 (texture/vertex bufs) > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/nv50/nv50_context.h | 3 ++ > src/gallium/drivers/nouveau/nv50/nv50_state.c | 25 +++ > src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 42 > + > src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 3 ++ > src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 36 + > src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 41 +--- > 6 files changed, 82 insertions(+), 68 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h > b/src/gallium/drivers/nouveau/nv50/nv50_context.h > index 2cebcd9..712d00e 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_context.h > +++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h > @@ -134,9 +134,11 @@ struct nv50_context { > struct nv50_constbuf constbuf[3][NV50_MAX_PIPE_CONSTBUFS]; > uint16_t constbuf_dirty[3]; > uint16_t constbuf_valid[3]; > + uint16_t constbuf_coherent[3]; > > struct pipe_vertex_buffer vtxbuf[PIPE_MAX_ATTRIBS]; > unsigned num_vtxbufs; > + uint32_t vtxbufs_coherent; > struct pipe_index_buffer idxbuf; > uint32_t vbo_fifo; /* bitmask of vertex elements to be pushed to FIFO */ > uint32_t vbo_user; /* bitmask of vertex buffers pointing to user memory */ > @@ -148,6 +150,7 @@ struct nv50_context { > > struct pipe_sampler_view *textures[3][PIPE_MAX_SAMPLERS]; > unsigned num_textures[3]; > + uint32_t textures_coherent[3]; > struct nv50_tsc_entry *samplers[3][PIPE_MAX_SAMPLERS]; > unsigned num_samplers[3]; > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c > b/src/gallium/drivers/nouveau/nv50/nv50_state.c > index de65597..cb04043 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c > +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c > @@ -664,6 +664,17 @@ nv50_stage_set_sampler_views(struct nv50_context *nv50, > int s, >if (old) > nv50_screen_tic_unlock(nv50->screen, old); > > + if (views[i] && views[i]->texture) { > + struct pipe_resource *res = views[i]->texture; > + if (res->target == PIPE_BUFFER && > + (res->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT)) > +nv50->textures_coherent[s] |= 1 << i; > + else > +nv50->textures_coherent[s] &= ~(1 << i); > + } else { > + nv50->textures_coherent[s] &= ~(1 << i); > + } > + >pipe_sampler_view_reference(>textures[s][i], views[i]); > } > > @@ -847,13 +858,19 @@ nv50_set_constant_buffer(struct pipe_context *pipe, > uint shader, uint index, >nv50->constbuf[s][i].u.data = cb->user_buffer; >nv50->constbuf[s][i].size = MIN2(cb->buffer_size, 0x1); >nv50->constbuf_valid[s] |= 1 << i; > + nv50->constbuf_coherent[s] &= ~(1 << i); > } else > if (res) { >nv50->constbuf[s][i].offset = cb->buffer_offset; >nv50->constbuf[s][i].size = MIN2(align(cb->buffer_size, 0x100), > 0x1); >nv50->constbuf_valid[s] |= 1 << i; > + if (res->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT) > + nv50->constbuf_coherent[s] |= 1 << i; > + else > + nv50->constbuf_coherent[s] &= ~(1 << i); > } else { >nv50->constbuf_valid[s] &= ~(1 << i); > + nv50->constbuf_coherent[s] &= ~(1 << i); > } > nv50->constbuf_dirty[s] |= 1 << i; > > @@ -1003,6 +1020,7 @@ nv50_set_vertex_buffers(struct pipe_context *pipe, > if (!vb) { >nv50->vbo_user &= ~(((1ull << count) - 1) << start_slot); >nv50->vbo_constant &= ~(((1ull << count) - 1) << start_slot); > + nv50->vtxbufs_coherent &= ~(((1ull << count) - 1) << start_slot); >return; > } > > @@ -1015,9 +1033,16 @@ nv50_set_vertex_buffers(struct pipe_context *pipe, > nv50->vbo_constant |= 1 << dst_index; > else > nv50->vbo_constant &= ~(1 << dst_index); > + nv50->vtxbufs_coherent &= ~(1 << dst_index); >} else { > nv50->vbo_user &= ~(1 << dst_index); > nv50->vbo_constant &= ~(1 << dst_index); > + > + if (vb[i].buffer && > + vb[i].buffer->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT) > +nv50->vtxbufs_coherent |= (1 <<
[Mesa-dev] [PATCH] arb_indirect_parameters: add basic rendering tests
Creates an array with 3 draws, the last of which is "bad", and makes sure that the "bad" one is never drawn. Parameter count is supplied from an earlier XFB draw to ensure that proper fencing occurs. Signed-off-by: Ilia Mirkin--- tests/spec/CMakeLists.txt | 1 + .../spec/arb_indirect_parameters/CMakeLists.gl.txt | 13 ++ tests/spec/arb_indirect_parameters/CMakeLists.txt | 1 + .../spec/arb_indirect_parameters/tf-count-arrays.c | 220 .../arb_indirect_parameters/tf-count-elements.c| 229 + 5 files changed, 464 insertions(+) create mode 100644 tests/spec/arb_indirect_parameters/CMakeLists.gl.txt create mode 100644 tests/spec/arb_indirect_parameters/CMakeLists.txt create mode 100644 tests/spec/arb_indirect_parameters/tf-count-arrays.c create mode 100644 tests/spec/arb_indirect_parameters/tf-count-elements.c diff --git a/tests/spec/CMakeLists.txt b/tests/spec/CMakeLists.txt index 3c4bcfb..a984734 100644 --- a/tests/spec/CMakeLists.txt +++ b/tests/spec/CMakeLists.txt @@ -142,3 +142,4 @@ add_subdirectory (mesa_pack_invert) add_subdirectory (ext_texture_format_bgra) add_subdirectory (oes_draw_elements_base_vertex) add_subdirectory (arb_shader_draw_parameters) +add_subdirectory (arb_indirect_parameters) diff --git a/tests/spec/arb_indirect_parameters/CMakeLists.gl.txt b/tests/spec/arb_indirect_parameters/CMakeLists.gl.txt new file mode 100644 index 000..88f533d --- /dev/null +++ b/tests/spec/arb_indirect_parameters/CMakeLists.gl.txt @@ -0,0 +1,13 @@ +include_directories( + ${GLEXT_INCLUDE_DIR} + ${OPENGL_INCLUDE_PATH} + ${piglit_SOURCE_DIR}/tests/mesa/util +) + +link_libraries ( + piglitutil_${piglit_target_api} + ${OPENGL_gl_LIBRARY} +) + +piglit_add_executable (arb_indirect_parameters-tf-count-elements tf-count-elements.c) +piglit_add_executable (arb_indirect_parameters-tf-count-arrays tf-count-arrays.c) diff --git a/tests/spec/arb_indirect_parameters/CMakeLists.txt b/tests/spec/arb_indirect_parameters/CMakeLists.txt new file mode 100644 index 000..144a306 --- /dev/null +++ b/tests/spec/arb_indirect_parameters/CMakeLists.txt @@ -0,0 +1 @@ +piglit_include_target_api() diff --git a/tests/spec/arb_indirect_parameters/tf-count-arrays.c b/tests/spec/arb_indirect_parameters/tf-count-arrays.c new file mode 100644 index 000..e88a7ba --- /dev/null +++ b/tests/spec/arb_indirect_parameters/tf-count-arrays.c @@ -0,0 +1,220 @@ +/* + * Copyright (C) 2016 Ilia Mirkin + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#include "piglit-util-gl.h" + +PIGLIT_GL_TEST_CONFIG_BEGIN + + config.supports_gl_core_version = 31; + config.window_visual = PIGLIT_GL_VISUAL_RGBA | PIGLIT_GL_VISUAL_DOUBLE; + +PIGLIT_GL_TEST_CONFIG_END + +static const char *vs_tf = + "#version 140\n" + "out int tf;\n" + "uniform int tf_val;\n" + "void main() { gl_Position = vec4(0); tf = tf_val; }\n"; + +static const char *vs_draw = + "#version 140\n" + "out vec4 color;\n" + "in vec4 vtx, in_color;\n" + "void main() { gl_Position = vtx; color = in_color; }\n"; + +static const char *fs_draw = + "#version 140\n" + "out vec4 c;\n" + "in vec4 color;\n" + "void main() { c = color; }\n"; + +static GLint tf_prog, draw_prog; +static GLint tf_val; +static GLuint tf_vao, draw_vao; + +void +piglit_init(int argc, char **argv) +{ + static const char *varying = "tf"; + static const unsigned cmds[] = { + 4, 1, 0, 0, + 4, 1, 4, 0, + 4, 1, 8, 0, + }; + static const struct { + float vertex_array[12 * 2]; + float colors[12 * 4]; + } geometry = { + { + -1, -1, + 0, -1, + 0, 1, +
Re: [Mesa-dev] [PATCH 2/2] glsl: Handle bits=32 case in bitfieldInsert/bitfieldExtract.
On 2015-12-30 13:26:48, Ilia Mirkin wrote: > On Wed, Dec 30, 2015 at 3:26 PM, Matt Turnerwrote: > > The OpenGL specifications for these functions say: > > > >The result will be undefined if or is negative, or if > >the sum of and is greater than the number of bits > >used to store the operand. > > > > Therefore passing bits=32, offset=0 is legal and defined in GLSL. > > > > But the earlier DX11/SM5 bfi/ibfe/ubfe opcodes are specified to accept a > > bitfield width ranging from 0-31. As such, Intel and AMD instructions > > read only the low 5 bits of the width operand, making them not compliant > > with the GLSL spec, so we have to special case the bits=32 case. > > > > Checking that offset=0 is not necessary, since for any other value, > > + will be greater than 32, which is specified as > > generating an undefined result. > > > > Fixes: > >ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2 > >ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3 > >ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0 > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 > > --- > > Yuck. Suggestions welcome. > > Can you make a piglit test? Want to see if nvidia has the same > problem. According to > http://docs.nvidia.com/cuda/parallel-thread-execution/#integer-arithmetic-instructions-bfe, > offset/bits can actually be up to 255 (although I can't fully imagine > why one might want that). However perhaps the HW differs. > Matt, Should we move this into the driver then? -Jordan > > > > > src/glsl/builtin_functions.cpp | 6 +- > > src/glsl/lower_instructions.cpp | 7 +++ > > 2 files changed, 8 insertions(+), 5 deletions(-) > > > > diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp > > index 602852a..3d5de83 100644 > > --- a/src/glsl/builtin_functions.cpp > > +++ b/src/glsl/builtin_functions.cpp > > @@ -4894,7 +4894,11 @@ builtin_builder::_bitfieldExtract(const glsl_type > > *type) > > ir_variable *bits = in_var(glsl_type::int_type, "bits"); > > MAKE_SIG(type, gpu_shader5_or_es31, 3, value, offset, bits); > > > > - body.emit(ret(expr(ir_triop_bitfield_extract, value, offset, bits))); > > + ir_if *if_32 = new(mem_ctx) ir_if(greater(bits, imm(31))); > > + if_32->then_instructions.push_tail(ret(rshift(value, offset))); > > + if_32->else_instructions.push_tail( > > + ret(expr(ir_triop_bitfield_extract, value, offset, bits))); > > + body.emit(if_32); > > > > return sig; > > } > > diff --git a/src/glsl/lower_instructions.cpp > > b/src/glsl/lower_instructions.cpp > > index 845cfff..8a425a8 100644 > > --- a/src/glsl/lower_instructions.cpp > > +++ b/src/glsl/lower_instructions.cpp > > @@ -359,10 +359,9 @@ > > lower_instructions_visitor::bitfield_insert_to_bfm_bfi(ir_expression *ir) > > ir_rvalue *base_expr = ir->operands[0]; > > > > ir->operation = ir_triop_bfi; > > - ir->operands[0] = new(ir) ir_expression(ir_binop_bfm, > > - ir->type->get_base_type(), > > - ir->operands[3], > > - ir->operands[2]); > > + ir->operands[0] = lshift(rshift(new(ir) ir_constant(~0u), > > + sub(new(ir) ir_constant(32), > > ir->operands[3])), > > +ir->operands[2]); > > /* ir->operands[1] is still the value to insert. */ > > ir->operands[2] = base_expr; > > ir->operands[3] = NULL; > > -- > > 2.4.9 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] gallium/tests: fix build with clang compiler
omg I don't know why folks insist on using gnuc nested functions they are insane. Thanks for working though this one! Reviewed-by: Edward O'CallaghanOn 2016-01-03 04:20, Samuel Pitoiset wrote: Nested functions are supported as an extension in GNU C, but Clang don't support them. This fixes compilation errors when (manually) building compute.c, or by setting --enable-gallium-tests to the configure script. Changes from v3: - refactor by introducing test_default_init() Changes from v2: - fix typo Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165 Signed-off-by: Samuel Pitoiset --- src/gallium/tests/trivial/compute.c | 603 1 file changed, 330 insertions(+), 273 deletions(-) diff --git a/src/gallium/tests/trivial/compute.c b/src/gallium/tests/trivial/compute.c index bcdfb11..5ce12ab 100644 --- a/src/gallium/tests/trivial/compute.c +++ b/src/gallium/tests/trivial/compute.c @@ -428,6 +428,35 @@ static void launch_grid(struct context *ctx, const uint *block_layout, pipe->launch_grid(pipe, block_layout, grid_layout, pc, input); } +static void test_default_init(void *p, int s, int x, int y) +{ +*(uint32_t *)p = 0xdeadbeef; +} + +/* test_system_values */ +static void test_system_values_expect(void *p, int s, int x, int y) +{ +int id = x / 16, sv = (x % 16) / 4, c = x % 4; +int tid[] = { id % 20, (id % 240) / 20, id / 240, 0 }; +int bsz[] = { 4, 3, 5, 1}; +int gsz[] = { 5, 4, 1, 1}; + +switch (sv) { +case 0: +*(uint32_t *)p = tid[c] / bsz[c]; +break; +case 1: +*(uint32_t *)p = bsz[c]; +break; +case 2: +*(uint32_t *)p = gsz[c]; +break; +case 3: +*(uint32_t *)p = tid[c] % bsz[c]; +break; +} +} + static void test_system_values(struct context *ctx) { const char *src = "COMP\n" @@ -461,44 +490,31 @@ static void test_system_values(struct context *ctx) " STORE RES[0].xyzw, TEMP[0], SV[3]\n" " RET\n" "ENDSUB\n"; -void init(void *p, int s, int x, int y) { -*(uint32_t *)p = 0xdeadbeef; -} -void expect(void *p, int s, int x, int y) { -int id = x / 16, sv = (x % 16) / 4, c = x % 4; -int tid[] = { id % 20, (id % 240) / 20, id / 240, 0 }; -int bsz[] = { 4, 3, 5, 1}; -int gsz[] = { 5, 4, 1, 1}; - -switch (sv) { -case 0: -*(uint32_t *)p = tid[c] / bsz[c]; -break; -case 1: -*(uint32_t *)p = bsz[c]; -break; -case 2: -*(uint32_t *)p = gsz[c]; -break; -case 3: -*(uint32_t *)p = tid[c] % bsz[c]; -break; -} -} printf("- %s\n", __func__); init_prog(ctx, 0, 0, 0, src, NULL); init_tex(ctx, 0, PIPE_BUFFER, true, PIPE_FORMAT_R32_FLOAT, - 76800, 0, init); + 76800, 0, test_default_init); init_compute_resources(ctx, (int []) { 0, -1 }); launch_grid(ctx, (uint []){4, 3, 5}, (uint []){5, 4, 1}, 0, NULL); -check_tex(ctx, 0, expect, NULL); +check_tex(ctx, 0, test_system_values_expect, NULL); destroy_compute_resources(ctx); destroy_tex(ctx); destroy_prog(ctx); } +/* test_resource_access */ +static void test_resource_access_init0(void *p, int s, int x, int y) +{ +*(float *)p = 8.0 - (float)x; +} + +static void test_resource_access_expect(void *p, int s, int x, int y) +{ +*(float *)p = 8.0 - (float)((x + 4 * y) & 0x3f); +} + static void test_resource_access(struct context *ctx) { const char *src = "COMP\n" @@ -519,31 +535,33 @@ static void test_resource_access(struct context *ctx) " STORE RES[1].xyzw, TEMP[1], TEMP[0]\n" " RET\n" "ENDSUB\n"; -void init0(void *p, int s, int x, int y) { -*(float *)p = 8.0 - (float)x; -} -void init1(void *p, int s, int x, int y) { -*(uint32_t *)p = 0xdeadbeef; -} -void expect(void *p, int s, int x, int y) { -*(float *)p = 8.0 - (float)((x + 4*y) & 0x3f); -} printf("- %s\n", __func__); init_prog(ctx, 0, 0, 0, src, NULL); init_tex(ctx, 0, PIPE_BUFFER, true, PIPE_FORMAT_R32_FLOAT, - 256, 0, init0); + 256, 0, test_resource_access_init0); init_tex(ctx, 1, PIPE_TEXTURE_2D, true, PIPE_FORMAT_R32_FLOAT, - 60, 12, init1); +
[Mesa-dev] [PATCH v4] nv50, nvc0: optimize coherent buffer checking at draw time
Instead of iterating over all the buffer resources looking for coherent buffers, we keep track of a context-wide count. This will save some iterations (and CPU cycles) in 99.99% case because usually coherent buffers are not so used. Changes from v4: - fix flag for textures Changes from v3: - check if views[i] and views[i]->texture are not NULL - fix use of nv50->textures_coherent - check if vb[i].buffer is not NULL - clear out the flag for UBO Changes from v2: - forgot to apply some changes for nv50 (texture/vertex bufs) Signed-off-by: Samuel Pitoiset--- src/gallium/drivers/nouveau/nv50/nv50_context.h | 3 ++ src/gallium/drivers/nouveau/nv50/nv50_state.c | 25 +++ src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 42 + src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 3 ++ src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 36 + src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 41 +--- 6 files changed, 82 insertions(+), 68 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h b/src/gallium/drivers/nouveau/nv50/nv50_context.h index 2cebcd9..712d00e 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_context.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h @@ -134,9 +134,11 @@ struct nv50_context { struct nv50_constbuf constbuf[3][NV50_MAX_PIPE_CONSTBUFS]; uint16_t constbuf_dirty[3]; uint16_t constbuf_valid[3]; + uint16_t constbuf_coherent[3]; struct pipe_vertex_buffer vtxbuf[PIPE_MAX_ATTRIBS]; unsigned num_vtxbufs; + uint32_t vtxbufs_coherent; struct pipe_index_buffer idxbuf; uint32_t vbo_fifo; /* bitmask of vertex elements to be pushed to FIFO */ uint32_t vbo_user; /* bitmask of vertex buffers pointing to user memory */ @@ -148,6 +150,7 @@ struct nv50_context { struct pipe_sampler_view *textures[3][PIPE_MAX_SAMPLERS]; unsigned num_textures[3]; + uint32_t textures_coherent[3]; struct nv50_tsc_entry *samplers[3][PIPE_MAX_SAMPLERS]; unsigned num_samplers[3]; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c b/src/gallium/drivers/nouveau/nv50/nv50_state.c index de65597..cb04043 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c @@ -664,6 +664,17 @@ nv50_stage_set_sampler_views(struct nv50_context *nv50, int s, if (old) nv50_screen_tic_unlock(nv50->screen, old); + if (views[i] && views[i]->texture) { + struct pipe_resource *res = views[i]->texture; + if (res->target == PIPE_BUFFER && + (res->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT)) +nv50->textures_coherent[s] |= 1 << i; + else +nv50->textures_coherent[s] &= ~(1 << i); + } else { + nv50->textures_coherent[s] &= ~(1 << i); + } + pipe_sampler_view_reference(>textures[s][i], views[i]); } @@ -847,13 +858,19 @@ nv50_set_constant_buffer(struct pipe_context *pipe, uint shader, uint index, nv50->constbuf[s][i].u.data = cb->user_buffer; nv50->constbuf[s][i].size = MIN2(cb->buffer_size, 0x1); nv50->constbuf_valid[s] |= 1 << i; + nv50->constbuf_coherent[s] &= ~(1 << i); } else if (res) { nv50->constbuf[s][i].offset = cb->buffer_offset; nv50->constbuf[s][i].size = MIN2(align(cb->buffer_size, 0x100), 0x1); nv50->constbuf_valid[s] |= 1 << i; + if (res->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT) + nv50->constbuf_coherent[s] |= 1 << i; + else + nv50->constbuf_coherent[s] &= ~(1 << i); } else { nv50->constbuf_valid[s] &= ~(1 << i); + nv50->constbuf_coherent[s] &= ~(1 << i); } nv50->constbuf_dirty[s] |= 1 << i; @@ -1003,6 +1020,7 @@ nv50_set_vertex_buffers(struct pipe_context *pipe, if (!vb) { nv50->vbo_user &= ~(((1ull << count) - 1) << start_slot); nv50->vbo_constant &= ~(((1ull << count) - 1) << start_slot); + nv50->vtxbufs_coherent &= ~(((1ull << count) - 1) << start_slot); return; } @@ -1015,9 +1033,16 @@ nv50_set_vertex_buffers(struct pipe_context *pipe, nv50->vbo_constant |= 1 << dst_index; else nv50->vbo_constant &= ~(1 << dst_index); + nv50->vtxbufs_coherent &= ~(1 << dst_index); } else { nv50->vbo_user &= ~(1 << dst_index); nv50->vbo_constant &= ~(1 << dst_index); + + if (vb[i].buffer && + vb[i].buffer->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT) +nv50->vtxbufs_coherent |= (1 << dst_index); + else +nv50->vtxbufs_coherent &= ~(1 << dst_index); } } } diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c index 2d1aa6a..60fa2bc 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c @@ -765,7 +765,7 @@
Re: [Mesa-dev] [PATCH] arb_indirect_parameters: add basic rendering tests
Errr... wrong list. And forgot to add to all.py. Please disregard, will send a fixed version to the right list shortly. On Sat, Jan 2, 2016 at 3:02 PM, Ilia Mirkinwrote: > Creates an array with 3 draws, the last of which is "bad", and makes > sure that the "bad" one is never drawn. Parameter count is supplied from > an earlier XFB draw to ensure that proper fencing occurs. > > Signed-off-by: Ilia Mirkin > --- > tests/spec/CMakeLists.txt | 1 + > .../spec/arb_indirect_parameters/CMakeLists.gl.txt | 13 ++ > tests/spec/arb_indirect_parameters/CMakeLists.txt | 1 + > .../spec/arb_indirect_parameters/tf-count-arrays.c | 220 > .../arb_indirect_parameters/tf-count-elements.c| 229 > + > 5 files changed, 464 insertions(+) > create mode 100644 tests/spec/arb_indirect_parameters/CMakeLists.gl.txt > create mode 100644 tests/spec/arb_indirect_parameters/CMakeLists.txt > create mode 100644 tests/spec/arb_indirect_parameters/tf-count-arrays.c > create mode 100644 tests/spec/arb_indirect_parameters/tf-count-elements.c > > diff --git a/tests/spec/CMakeLists.txt b/tests/spec/CMakeLists.txt > index 3c4bcfb..a984734 100644 > --- a/tests/spec/CMakeLists.txt > +++ b/tests/spec/CMakeLists.txt > @@ -142,3 +142,4 @@ add_subdirectory (mesa_pack_invert) > add_subdirectory (ext_texture_format_bgra) > add_subdirectory (oes_draw_elements_base_vertex) > add_subdirectory (arb_shader_draw_parameters) > +add_subdirectory (arb_indirect_parameters) > diff --git a/tests/spec/arb_indirect_parameters/CMakeLists.gl.txt > b/tests/spec/arb_indirect_parameters/CMakeLists.gl.txt > new file mode 100644 > index 000..88f533d > --- /dev/null > +++ b/tests/spec/arb_indirect_parameters/CMakeLists.gl.txt > @@ -0,0 +1,13 @@ > +include_directories( > + ${GLEXT_INCLUDE_DIR} > + ${OPENGL_INCLUDE_PATH} > + ${piglit_SOURCE_DIR}/tests/mesa/util > +) > + > +link_libraries ( > + piglitutil_${piglit_target_api} > + ${OPENGL_gl_LIBRARY} > +) > + > +piglit_add_executable (arb_indirect_parameters-tf-count-elements > tf-count-elements.c) > +piglit_add_executable (arb_indirect_parameters-tf-count-arrays > tf-count-arrays.c) > diff --git a/tests/spec/arb_indirect_parameters/CMakeLists.txt > b/tests/spec/arb_indirect_parameters/CMakeLists.txt > new file mode 100644 > index 000..144a306 > --- /dev/null > +++ b/tests/spec/arb_indirect_parameters/CMakeLists.txt > @@ -0,0 +1 @@ > +piglit_include_target_api() > diff --git a/tests/spec/arb_indirect_parameters/tf-count-arrays.c > b/tests/spec/arb_indirect_parameters/tf-count-arrays.c > new file mode 100644 > index 000..e88a7ba > --- /dev/null > +++ b/tests/spec/arb_indirect_parameters/tf-count-arrays.c > @@ -0,0 +1,220 @@ > +/* > + * Copyright (C) 2016 Ilia Mirkin > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > + * DEALINGS IN THE SOFTWARE. > + */ > + > +#include "piglit-util-gl.h" > + > +PIGLIT_GL_TEST_CONFIG_BEGIN > + > + config.supports_gl_core_version = 31; > + config.window_visual = PIGLIT_GL_VISUAL_RGBA | > PIGLIT_GL_VISUAL_DOUBLE; > + > +PIGLIT_GL_TEST_CONFIG_END > + > +static const char *vs_tf = > + "#version 140\n" > + "out int tf;\n" > + "uniform int tf_val;\n" > + "void main() { gl_Position = vec4(0); tf = tf_val; }\n"; > + > +static const char *vs_draw = > + "#version 140\n" > + "out vec4 color;\n" > + "in vec4 vtx, in_color;\n" > + "void main() { gl_Position = vtx; color = in_color; }\n"; > + > +static const char *fs_draw = > + "#version 140\n" > + "out vec4 c;\n" > + "in vec4 color;\n" > + "void main() { c = color; }\n"; > + > +static GLint tf_prog, draw_prog; > +static GLint tf_val; > +static GLuint tf_vao, draw_vao; > + > +void > +piglit_init(int argc, char **argv) > +{ > +
Re: [Mesa-dev] [PATCH 5/9] gallium/radeon: always add +DumpCode to the LLVM target machine for LLVM <= 3.5
What's the reason for always having +DumpCode? Generating the assembly is some overhead that's usually unnecessary. Even if it's a small part of the profiles I've seen, it still seems like a natural thing to just skip. From what I can tell it should be dependent on any of the shader dumping flags + DBG_CHECK_VM being set. In any case, I suppose that would be for a separate commit. Cheers, Nicolai On 01.01.2016 09:13, Marek Olšák wrote: From: Marek OlšákIt's the same behavior that we use for later LLVM. --- src/gallium/drivers/r600/r600_llvm.c | 2 +- src/gallium/drivers/radeon/radeon_llvm_emit.c | 5 ++--- src/gallium/drivers/radeon/radeon_llvm_emit.h | 2 +- src/gallium/drivers/radeonsi/si_shader.c | 2 +- 4 files changed, 5 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/r600/r600_llvm.c b/src/gallium/drivers/r600/r600_llvm.c index 1cc3031..7d93658 100644 --- a/src/gallium/drivers/r600/r600_llvm.c +++ b/src/gallium/drivers/r600/r600_llvm.c @@ -922,7 +922,7 @@ unsigned r600_llvm_compile( const char * gpu_family = r600_get_llvm_processor_name(family); memset(, 0, sizeof(struct radeon_shader_binary)); - r = radeon_llvm_compile(mod, , gpu_family, dump, dump, NULL); + r = radeon_llvm_compile(mod, , gpu_family, dump, NULL); r = r600_create_shader(bc, , use_kill); diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.c b/src/gallium/drivers/radeon/radeon_llvm_emit.c index 61ed940..f8c7f54 100644 --- a/src/gallium/drivers/radeon/radeon_llvm_emit.c +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.c @@ -141,7 +141,7 @@ static void radeonDiagnosticHandler(LLVMDiagnosticInfoRef di, void *context) * @returns 0 for success, 1 for failure */ unsigned radeon_llvm_compile(LLVMModuleRef M, struct radeon_shader_binary *binary, -const char *gpu_family, bool dump_ir, bool dump_asm, +const char *gpu_family, bool dump_ir, LLVMTargetMachineRef tm) { @@ -165,8 +165,7 @@ unsigned radeon_llvm_compile(LLVMModuleRef M, struct radeon_shader_binary *binar } strncpy(cpu, gpu_family, CPU_STRING_LEN); memset(fs, 0, sizeof(fs)); - if (dump_asm) - strncpy(fs, "+DumpCode", FS_STRING_LEN); + strncpy(fs, "+DumpCode", FS_STRING_LEN); tm = LLVMCreateTargetMachine(target, triple, cpu, fs, LLVMCodeGenLevelDefault, LLVMRelocDefault, LLVMCodeModelDefault); diff --git a/src/gallium/drivers/radeon/radeon_llvm_emit.h b/src/gallium/drivers/radeon/radeon_llvm_emit.h index e20aed9..5f956dd 100644 --- a/src/gallium/drivers/radeon/radeon_llvm_emit.h +++ b/src/gallium/drivers/radeon/radeon_llvm_emit.h @@ -38,7 +38,7 @@ void radeon_llvm_shader_type(LLVMValueRef F, unsigned type); LLVMTargetRef radeon_llvm_get_r600_target(const char *triple); unsigned radeon_llvm_compile(LLVMModuleRef M, struct radeon_shader_binary *binary, -const char *gpu_family, bool dump_ir, bool dump_asm, +const char *gpu_family, bool dump_ir, LLVMTargetMachineRef tm); #endif /* RADEON_LLVM_EMIT_H */ diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index a9297a5..4044961 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -3884,7 +3884,7 @@ int si_compile_llvm(struct si_screen *sscreen, struct si_shader *shader, bool dump_ir = dump_asm && !(sscreen->b.debug_flags & DBG_NO_IR); r = radeon_llvm_compile(mod, >binary, - r600_get_llvm_processor_name(sscreen->b.family), dump_ir, dump_asm, tm); + r600_get_llvm_processor_name(sscreen->b.family), dump_ir, tm); if (r) return r; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/9] RadeonSI: Some shaders cleanups
This looks much better now :) For the series: Reviewed-by: Nicolai HähnleOn 01.01.2016 09:13, Marek Olšák wrote: Hi, These are shader cleanups mostly around si_compile_llvm. You may wonder why the "move si_shader_binary_upload out of xxx" patches. They are part of my one-variant-per-shader rework, which needs a lot of restructuring. Besides this, I have 2 more series of cleanup patches, which I will send when this lands. Please review. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] nvc0/ir: add support for PK2H/UP2H
Signed-off-by: Ilia Mirkin--- .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 1 + .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 5 - .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 23 ++ src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +- 4 files changed, 29 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index e9ddd36..ec74e7a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -740,6 +740,7 @@ CodeEmitterGM107::emitF2F() emitCC (0x2f); emitField(0x2d, 1, (insn->op == OP_NEG) || insn->src(0).mod.neg()); emitFMZ (0x2c, 1); + emitField(0x29, 1, insn->subOp); emitRND (0x27, rnd, 0x2a); emitField(0x0a, 2, util_logbase2(typeSizeof(insn->sType))); emitField(0x08, 2, util_logbase2(typeSizeof(insn->dType))); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index 1d4f0d9..0b28047 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -1030,7 +1030,10 @@ CodeEmitterNVC0::emitCVT(Instruction *i) // for 8/16 source types, the byte/word is in subOp. word 1 is // represented as 2. - code[1] |= i->subOp << 0x17; + if (!isFloatType(i->sType)) + code[1] |= i->subOp << 0x17; + else + code[1] |= i->subOp << 0x18; if (sat) code[0] |= 0x20; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index beb67fe..e0b9435 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -319,6 +319,10 @@ unsigned int Instruction::srcMask(unsigned int s) const x |= 2; return x; } + case TGSI_OPCODE_PK2H: + return 0x3; + case TGSI_OPCODE_UP2H: + return 0x1; default: break; } @@ -452,6 +456,7 @@ nv50_ir::DataType Instruction::inferSrcType() const case TGSI_OPCODE_ATOMUMAX: case TGSI_OPCODE_UBFE: case TGSI_OPCODE_UMSB: + case TGSI_OPCODE_UP2H: return nv50_ir::TYPE_U32; case TGSI_OPCODE_I2F: case TGSI_OPCODE_I2D: @@ -516,10 +521,12 @@ nv50_ir::DataType Instruction::inferDstType() const case TGSI_OPCODE_DSGE: case TGSI_OPCODE_DSLT: case TGSI_OPCODE_DSNE: + case TGSI_OPCODE_PK2H: return nv50_ir::TYPE_U32; case TGSI_OPCODE_I2F: case TGSI_OPCODE_U2F: case TGSI_OPCODE_D2F: + case TGSI_OPCODE_UP2H: return nv50_ir::TYPE_F32; case TGSI_OPCODE_I2D: case TGSI_OPCODE_U2D: @@ -2807,6 +2814,22 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) mkCvt(OP_CVT, dstTy, dst0[c], srcTy, fetchSrc(0, c)); break; + case TGSI_OPCODE_PK2H: + val0 = getScratch(); + val1 = getScratch(); + mkCvt(OP_CVT, TYPE_F16, val0, TYPE_F32, fetchSrc(0, 0)); + mkCvt(OP_CVT, TYPE_F16, val1, TYPE_F32, fetchSrc(0, 1)); + mkOp3(OP_INSBF, TYPE_U32, dst0[0], val1, mkImm(0x1010), val0); + break; + case TGSI_OPCODE_UP2H: + src0 = fetchSrc(0, 0); + if (dst0[0]) + mkCvt(OP_CVT, TYPE_F32, dst0[0], TYPE_F16, src0); + if (dst0[1]) { + geni = mkCvt(OP_CVT, TYPE_F32, dst0[1], TYPE_F16, src0); + geni->subOp = 1; + } + break; case TGSI_OPCODE_EMIT: /* export the saved viewport index */ if (viewport != NULL) { diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index 58b712e..43f6164 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -197,6 +197,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_DRAW_PARAMETERS: case PIPE_CAP_MULTI_DRAW_INDIRECT: case PIPE_CAP_MULTI_DRAW_INDIRECT_PARAMS: + case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return 1; case PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE: return (class_3d >= NVE4_3D_CLASS) ? 1 : 0; @@ -219,7 +220,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_VERTEXID_NOBASE: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: - case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return 0; case PIPE_CAP_VENDOR_ID: -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H support
Signed-off-by: Ilia Mirkin--- src/gallium/docs/source/screen.rst | 2 ++ src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/drivers/virgl/virgl_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + 16 files changed, 17 insertions(+) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index db70cc8..39ecc63 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -290,6 +290,8 @@ The integer capabilities: * ``PIPE_CAP_DRAW_PARAMETERS``: Whether ``TGSI_SEMANTIC_BASEVERTEX``, ``TGSI_SEMANTIC_BASEINSTANCE``, and ``TGSI_SEMANTIC_DRAWID`` are supported in vertex shaders. +* ``PIPE_CAP_TGSI_PACK_HALF_FLOAT``: Whether the ``UP2H`` and ``PK2H`` + TGSI opcodes are supported. .. _pipe_capf: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index c684019..a8030f2 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -241,6 +241,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return 0; case PIPE_CAP_MAX_VIEWPORTS: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index d8cfcf0..f42fc37 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -255,6 +255,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return 0; case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index 4ca62a6..3a18e74 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -479,6 +479,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index fcef3b6..ef91c1a 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -304,6 +304,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_DRAW_PARAMETERS: case PIPE_CAP_MULTI_DRAW_INDIRECT: case PIPE_CAP_MULTI_DRAW_INDIRECT_PARAMS: + case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return 0; } /* should only get here on unhandled cases */ diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c index dbe5b3c..6c4a0f3 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c @@ -177,6 +177,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index bcb8577..d6131c2 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -220,6 +220,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index 22f7885..58b712e 100644 ---
[Mesa-dev] [PATCH 6/6] r600: add support for PK2H/UP2H
Signed-off-by: Ilia Mirkin--- src/gallium/drivers/r600/r600_pipe.c | 2 +- src/gallium/drivers/r600/r600_shader.c | 102 +++-- 2 files changed, 99 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 70c1ec1..359fe41 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -328,6 +328,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_TEXTURE_QUERY_LOD: case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: case PIPE_CAP_SAMPLER_VIEW_TARGET: + case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return family >= CHIP_CEDAR ? 1 : 0; case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family >= CHIP_CEDAR ? 4 : 0; @@ -351,7 +352,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_DRAW_PARAMETERS: case PIPE_CAP_MULTI_DRAW_INDIRECT: case PIPE_CAP_MULTI_DRAW_INDIRECT_PARAMS: - case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return 0; case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index d411b0b..23ea34e 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -8959,6 +8959,100 @@ static int tgsi_umad(struct r600_shader_ctx *ctx) return 0; } +static int tgsi_pk2h(struct r600_shader_ctx *ctx) +{ + struct tgsi_full_instruction *inst = >parse.FullToken.FullInstruction; + struct r600_bytecode_alu alu; + int r; + + /* temp.xy = f32_to_f16(src) */ + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_FLT32_TO_FLT16; + alu.dst.chan = 0; + alu.dst.sel = ctx->temp_reg; + alu.dst.write = 1; + r600_bytecode_src([0], >src[0], 0); + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + alu.dst.chan = 1; + r600_bytecode_src([0], >src[0], 1); + alu.last = 1; + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + + /* dst.x = temp.y * 0x1 + temp.x */ + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP3_MULADD_UINT24; + alu.is_op3 = 1; + tgsi_dst(ctx, >Dst[0], 0, ); + alu.last = 1; + alu.src[0].sel = ctx->temp_reg; + alu.src[0].chan = 1; + alu.src[1].sel = V_SQ_ALU_SRC_LITERAL; + alu.src[1].value = 0x1; + alu.src[2].sel = ctx->temp_reg; + alu.src[2].chan = 0; + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + + return 0; +} + +static int tgsi_up2h(struct r600_shader_ctx *ctx) +{ + struct tgsi_full_instruction *inst = >parse.FullToken.FullInstruction; + struct r600_bytecode_alu alu; + int r; + int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask); + + /* temp.x = src.x */ + /* note: no need to mask out the high bits */ + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_MOV; + alu.dst.chan = 0; + alu.dst.sel = ctx->temp_reg; + alu.dst.write = 1; + r600_bytecode_src([0], >src[0], 0); + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + + /* temp.y = src.x >> 16 */ + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP2_LSHR_INT; + alu.dst.chan = 1; + alu.dst.sel = ctx->temp_reg; + alu.dst.write = 1; + r600_bytecode_src([0], >src[0], 0); + alu.src[1].sel = V_SQ_ALU_SRC_LITERAL; + alu.src[1].value = 16; + alu.last = 1; + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + + /* dst.xy = f16_to_f32(temp.xy) */ + for (int i = 0; i < lasti + 1; i++) { + if (!(inst->Dst[0].Register.WriteMask & (1 << i))) + continue; + memset(, 0, sizeof(struct r600_bytecode_alu)); + tgsi_dst(ctx, >Dst[0], i, ); + alu.op = ALU_OP1_FLT16_TO_FLT32; + alu.src[0].sel = ctx->temp_reg; + alu.src[0].chan = i; + if (i == lasti) + alu.last = 1; + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + } + + return 0; +} + static const struct r600_shader_tgsi_instruction r600_shader_tgsi_instruction[] = { [TGSI_OPCODE_ARL] = { ALU_OP0_NOP, tgsi_r600_arl}, [TGSI_OPCODE_MOV] = { ALU_OP1_MOV, tgsi_op2}, @@ -9205,7 +9299,7 @@ static const struct r600_shader_tgsi_instruction eg_shader_tgsi_instruction[] = [TGSI_OPCODE_DDX] = { FETCH_OP_GET_GRADIENTS_H, tgsi_tex}, [TGSI_OPCODE_DDY] = {
[Mesa-dev] [PATCH 4/6] st/mesa: use PK2H/UP2H when supported
Signed-off-by: Ilia Mirkin--- src/mesa/state_tracker/st_context.c| 2 ++ src/mesa/state_tracker/st_context.h| 1 + src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 16 +++- 3 files changed, 14 insertions(+), 5 deletions(-) diff --git a/src/mesa/state_tracker/st_context.c b/src/mesa/state_tracker/st_context.c index e532c6b..d53de1e 100644 --- a/src/mesa/state_tracker/st_context.c +++ b/src/mesa/state_tracker/st_context.c @@ -250,6 +250,8 @@ st_create_context_priv( struct gl_context *ctx, struct pipe_context *pipe, screen->get_param(screen, PIPE_CAP_QUERY_TIME_ELAPSED); st->has_multi_draw_indirect = screen->get_param(screen, PIPE_CAP_MULTI_DRAW_INDIRECT); + st->has_half_float_packing = + screen->get_param(screen, PIPE_CAP_TGSI_PACK_HALF_FLOAT); /* GL limits and extensions */ st_init_limits(st->pipe->screen, >Const, >Extensions); diff --git a/src/mesa/state_tracker/st_context.h b/src/mesa/state_tracker/st_context.h index ccebdd9..ae0114c 100644 --- a/src/mesa/state_tracker/st_context.h +++ b/src/mesa/state_tracker/st_context.h @@ -102,6 +102,7 @@ struct st_context boolean force_persample_in_shader; boolean has_shareable_shaders; boolean has_multi_draw_indirect; + boolean has_half_float_packing; /** * If a shader can be created when we get its source. diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index cdbe2f4..2adb57d 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -2163,15 +2163,20 @@ glsl_to_tgsi_visitor::visit(ir_expression *ir) } break; + case ir_unop_pack_half_2x16: + emit_asm(ir, TGSI_OPCODE_PK2H, result_dst, op[0]); + break; + case ir_unop_unpack_half_2x16: + emit_asm(ir, TGSI_OPCODE_UP2H, result_dst, op[0]); + break; + case ir_unop_pack_snorm_2x16: case ir_unop_pack_unorm_2x16: - case ir_unop_pack_half_2x16: case ir_unop_pack_snorm_4x8: case ir_unop_pack_unorm_4x8: case ir_unop_unpack_snorm_2x16: case ir_unop_unpack_unorm_2x16: - case ir_unop_unpack_half_2x16: case ir_unop_unpack_half_2x16_split_x: case ir_unop_unpack_half_2x16_split_y: case ir_unop_unpack_snorm_4x8: @@ -5853,13 +5858,14 @@ st_link_shader(struct gl_context *ctx, struct gl_shader_program *prog) LOWER_PACK_SNORM_4x8 | LOWER_UNPACK_SNORM_4x8 | LOWER_UNPACK_UNORM_4x8 | - LOWER_PACK_UNORM_4x8 | - LOWER_PACK_HALF_2x16 | - LOWER_UNPACK_HALF_2x16; + LOWER_PACK_UNORM_4x8; if (ctx->Extensions.ARB_gpu_shader5) lower_inst |= LOWER_PACK_USE_BFI | LOWER_PACK_USE_BFE; + if (!ctx->st->has_half_float_packing) +lower_inst |= LOWER_PACK_HALF_2x16 | + LOWER_UNPACK_HALF_2x16; lower_packing_builtins(ir, lower_inst); } -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] tgsi: update PK2H/UP2H channel behavior info
--- src/gallium/auxiliary/tgsi/tgsi_info.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index 3b40c3d..c078b6f 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -77,10 +77,10 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 1, 0, 0, 0, 0, COMP, "DDX", TGSI_OPCODE_DDX }, { 1, 1, 0, 0, 0, 0, COMP, "DDY", TGSI_OPCODE_DDY }, { 0, 0, 0, 0, 0, 0, NONE, "KILL", TGSI_OPCODE_KILL }, - { 1, 1, 0, 0, 0, 0, COMP, "PK2H", TGSI_OPCODE_PK2H }, - { 1, 1, 0, 0, 0, 0, COMP, "PK2US", TGSI_OPCODE_PK2US }, - { 1, 1, 0, 0, 0, 0, COMP, "PK4B", TGSI_OPCODE_PK4B }, - { 1, 1, 0, 0, 0, 0, COMP, "PK4UB", TGSI_OPCODE_PK4UB }, + { 1, 1, 0, 0, 0, 0, REPL, "PK2H", TGSI_OPCODE_PK2H }, + { 1, 1, 0, 0, 0, 0, REPL, "PK2US", TGSI_OPCODE_PK2US }, + { 1, 1, 0, 0, 0, 0, REPL, "PK4B", TGSI_OPCODE_PK4B }, + { 1, 1, 0, 0, 0, 0, REPL, "PK4UB", TGSI_OPCODE_PK4UB }, { 0, 1, 0, 0, 0, 1, NONE, "", 44 }, /* removed */ { 1, 2, 0, 0, 0, 0, COMP, "SEQ", TGSI_OPCODE_SEQ }, { 0, 1, 0, 0, 0, 1, NONE, "", 46 }, /* removed */ @@ -92,10 +92,10 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 2, 1, 0, 0, 0, OTHR, "TEX", TGSI_OPCODE_TEX }, { 1, 4, 1, 0, 0, 0, OTHR, "TXD", TGSI_OPCODE_TXD }, { 1, 2, 1, 0, 0, 0, OTHR, "TXP", TGSI_OPCODE_TXP }, - { 1, 1, 0, 0, 0, 0, COMP, "UP2H", TGSI_OPCODE_UP2H }, - { 1, 1, 0, 0, 0, 0, COMP, "UP2US", TGSI_OPCODE_UP2US }, - { 1, 1, 0, 0, 0, 0, COMP, "UP4B", TGSI_OPCODE_UP4B }, - { 1, 1, 0, 0, 0, 0, COMP, "UP4UB", TGSI_OPCODE_UP4UB }, + { 1, 1, 0, 0, 0, 0, CHAN, "UP2H", TGSI_OPCODE_UP2H }, + { 1, 1, 0, 0, 0, 0, CHAN, "UP2US", TGSI_OPCODE_UP2US }, + { 1, 1, 0, 0, 0, 0, CHAN, "UP4B", TGSI_OPCODE_UP4B }, + { 1, 1, 0, 0, 0, 0, CHAN, "UP4UB", TGSI_OPCODE_UP4UB }, { 0, 1, 0, 0, 0, 1, NONE, "", 59 }, /* removed */ { 0, 1, 0, 0, 0, 1, NONE, "", 60 }, /* removed */ { 1, 1, 0, 0, 0, 0, COMP, "ARR", TGSI_OPCODE_ARR }, -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] gallium: document PK2H/UP2H
Signed-off-by: Ilia Mirkin--- src/gallium/docs/source/tgsi.rst | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 955ece8..f69998f 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -458,7 +458,9 @@ while DDY is allowed to be the same for the entire 2x2 quad. .. opcode:: PK2H - Pack Two 16-bit Floats - TBD +.. math:: + + dst.x = f32\_to\_f16(src.x) | f32\_to\_f16(src.y) << 16 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars @@ -615,7 +617,11 @@ This instruction replicates its result. .. opcode:: UP2H - Unpack Two 16-Bit Floats - TBD +.. math:: + + dst.x = f16\_to\_f32(src0.x \& 0x) + + dst.y = f16\_to\_f32(src0.x >> 16) .. note:: -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] glapi: add ARB_indirect_parameters definitions
Signed-off-by: Ilia Mirkin--- src/mapi/glapi/gen/ARB_indirect_parameters.xml | 30 ++ src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 6 +- src/mesa/main/extensions_table.h | 1 + src/mesa/main/mtypes.h | 1 + src/mesa/main/tests/dispatch_sanity.cpp| 4 src/mesa/vbo/vbo_exec_array.c | 21 ++ 7 files changed, 63 insertions(+), 1 deletion(-) create mode 100644 src/mapi/glapi/gen/ARB_indirect_parameters.xml diff --git a/src/mapi/glapi/gen/ARB_indirect_parameters.xml b/src/mapi/glapi/gen/ARB_indirect_parameters.xml new file mode 100644 index 000..20de905 --- /dev/null +++ b/src/mapi/glapi/gen/ARB_indirect_parameters.xml @@ -0,0 +1,30 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 2da8f7d..900b61a 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -137,6 +137,7 @@ API_XML = \ ARB_get_texture_sub_image.xml \ ARB_gpu_shader_fp64.xml \ ARB_gpu_shader5.xml \ + ARB_indirect_parameters.xml \ ARB_instanced_arrays.xml \ ARB_internalformat_query.xml \ ARB_invalidate_subdata.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index 21f6293..593ace4 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8247,7 +8247,11 @@ http://www.w3.org/2001/XInclude"/> - + + +http://www.w3.org/2001/XInclude"/> + + http://www.w3.org/2001/XInclude"/> diff --git a/src/mesa/main/extensions_table.h b/src/mesa/main/extensions_table.h index 789b55a..aeccb01 100644 --- a/src/mesa/main/extensions_table.h +++ b/src/mesa/main/extensions_table.h @@ -70,6 +70,7 @@ EXT(ARB_gpu_shader5 , ARB_gpu_shader5 EXT(ARB_gpu_shader_fp64 , ARB_gpu_shader_fp64 , x , GLC, x , x , 2010) EXT(ARB_half_float_pixel, dummy_true , GLL, GLC, x , x , 2003) EXT(ARB_half_float_vertex , ARB_half_float_vertex , GLL, GLC, x , x , 2008) +EXT(ARB_indirect_parameters , ARB_indirect_parameters , x , GLC, x , x , 2013) EXT(ARB_instanced_arrays, ARB_instanced_arrays , GLL, GLC, x , x , 2008) EXT(ARB_internalformat_query, ARB_internalformat_query , GLL, GLC, x , x , 2011) EXT(ARB_invalidate_subdata , dummy_true , GLL, GLC, x , x , 2012) diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 5b9fce8..5cd2e8e 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3700,6 +3700,7 @@ struct gl_extensions GLboolean ARB_gpu_shader5; GLboolean ARB_gpu_shader_fp64; GLboolean ARB_half_float_vertex; + GLboolean ARB_indirect_parameters; GLboolean ARB_instanced_arrays; GLboolean ARB_internalformat_query; GLboolean ARB_map_buffer_range; diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index d288b1d..7610bcb 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -1844,6 +1844,10 @@ const struct function gl_core_functions_possible[] = { { "glGetQueryBufferObjecti64v", 45, -1 }, { "glGetQueryBufferObjectui64v", 45, -1 }, + /* GL_ARB_indirect_parameters */ + { "glMultiDrawArraysIndirectCountARB", 31, -1 }, + { "glMultiDrawElementsIndirectCountARB", 31, -1 }, + { NULL, 0, -1 } }; diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c index fd29837..0c26bad 100644 --- a/src/mesa/vbo/vbo_exec_array.c +++ b/src/mesa/vbo/vbo_exec_array.c @@ -1825,6 +1825,25 @@ vbo_exec_MultiDrawElementsIndirect(GLenum mode, GLenum type, primcount, stride); } +static void GLAPIENTRY +vbo_exec_MultiDrawArraysIndirectCount(GLenum mode, + GLintptr indirect, + GLintptr drawcount, + GLsizei maxdrawcount, GLsizei stride) +{ + +} + +static void GLAPIENTRY +vbo_exec_MultiDrawElementsIndirectCount(GLenum mode, GLenum type, +GLintptr indirect, +GLintptr drawcount, +GLsizei maxdrawcount, GLsizei stride) +{ + +} + + /** * Initialize the dispatch table with the VBO functions for drawing. */ @@ -1872,6 +1891,8 @@ vbo_initialize_exec_dispatch(const struct gl_context *ctx, if (ctx->API ==
[Mesa-dev] [PATCH 0/5] Add ARB_indirect_parameters support
The nvc0 patch applies on top of some unpublished patches, see https://github.com/imirkin/mesa/commits/tmp4 for the full thing. The whole series applies on top of the ARB_multi_draw_indirect patches I sent earlier (with potential minor modifications). There is some type confusion between the ARB_indirect_parameters spec and the Khronos gl.xml/glcorearb.h files, I went with the latter's definitions. This passes the relatively simple piglit test I sent. Ilia Mirkin (5): glapi: add ARB_indirect_parameters definitions mesa: add parameter buffer, used for ARB_indirect_parameters mesa: add support for ARB_indirect_parameters draw functions st/mesa: expose ARB_indirect_parameters when the backend driver allows nvc0: add ARB_indirect_parameters support docs/relnotes/11.2.0.html | 1 + src/gallium/drivers/nouveau/nvc0/mme/com9097.mme | 157 + src/gallium/drivers/nouveau/nvc0/mme/com9097.mme.h | 125 src/gallium/drivers/nouveau/nvc0/nvc0_macros.h | 4 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 4 +- src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c| 29 +++- src/mapi/glapi/gen/ARB_indirect_parameters.xml | 30 src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 6 +- src/mesa/main/api_validate.c | 115 +++ src/mesa/main/api_validate.h | 16 +++ src/mesa/main/bufferobj.c | 15 ++ src/mesa/main/extensions_table.h | 1 + src/mesa/main/get.c| 5 + src/mesa/main/get_hash_params.py | 4 + src/mesa/main/mtypes.h | 2 + src/mesa/main/tests/dispatch_sanity.cpp| 4 + src/mesa/state_tracker/st_cb_bufferobjects.c | 1 + src/mesa/state_tracker/st_extensions.c | 1 + src/mesa/vbo/vbo_exec_array.c | 124 20 files changed, 638 insertions(+), 7 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_indirect_parameters.xml -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] st/mesa: expose ARB_indirect_parameters when the backend driver allows
Signed-off-by: Ilia Mirkin--- src/mesa/state_tracker/st_cb_bufferobjects.c | 1 + src/mesa/state_tracker/st_extensions.c | 1 + 2 files changed, 2 insertions(+) diff --git a/src/mesa/state_tracker/st_cb_bufferobjects.c b/src/mesa/state_tracker/st_cb_bufferobjects.c index 5d20b26..e775453 100644 --- a/src/mesa/state_tracker/st_cb_bufferobjects.c +++ b/src/mesa/state_tracker/st_cb_bufferobjects.c @@ -230,6 +230,7 @@ st_bufferobj_data(struct gl_context *ctx, bind = PIPE_BIND_CONSTANT_BUFFER; break; case GL_DRAW_INDIRECT_BUFFER: + case GL_PARAMETER_BUFFER_ARB: bind = PIPE_BIND_COMMAND_ARGS_BUFFER; break; default: diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 90eb677..3c198ec 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -452,6 +452,7 @@ void st_init_extensions(struct pipe_screen *screen, { o(ARB_draw_instanced), PIPE_CAP_TGSI_INSTANCEID }, { o(ARB_fragment_program_shadow), PIPE_CAP_TEXTURE_SHADOW_MAP }, { o(ARB_framebuffer_object), PIPE_CAP_MIXED_FRAMEBUFFER_SIZES }, + { o(ARB_indirect_parameters), PIPE_CAP_MULTI_DRAW_INDIRECT_PARAMS }, { o(ARB_instanced_arrays), PIPE_CAP_VERTEX_ELEMENT_INSTANCE_DIVISOR }, { o(ARB_occlusion_query), PIPE_CAP_OCCLUSION_QUERY }, { o(ARB_occlusion_query2), PIPE_CAP_OCCLUSION_QUERY }, -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 72877] Wrong colors with Mesa 9.2 and Mesa 10.0 on PPC Linux systems
https://bugs.freedesktop.org/show_bug.cgi?id=72877 --- Comment #15 from Ilia Mirkin--- (In reply to Alex Perez from comment #14) > Ping. I am still experiencing problems with incorrect colors with the very > latest Mesa, compiled from a fresh git checkout today. Mesa 11.0.3+ work fine on a PPC G5 with a NV34 GPU. Haven't tested much else. I don't think this is a "core" issue anymore. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] mesa: add parameter buffer, used for ARB_indirect_parameters
Signed-off-by: Ilia Mirkin--- src/mesa/main/bufferobj.c| 15 +++ src/mesa/main/get.c | 5 + src/mesa/main/get_hash_params.py | 4 src/mesa/main/mtypes.h | 1 + 4 files changed, 25 insertions(+) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 181eb49..342f319 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -127,6 +127,11 @@ get_buffer_target(struct gl_context *ctx, GLenum target) return >DrawIndirectBuffer; } break; + case GL_PARAMETER_BUFFER_ARB: + if (_mesa_has_ARB_indirect_parameters(ctx)) { + return >ParameterBuffer; + } + break; case GL_DISPATCH_INDIRECT_BUFFER: if (_mesa_has_compute_shaders(ctx)) { return >DispatchIndirectBuffer; @@ -866,6 +871,9 @@ _mesa_init_buffer_objects( struct gl_context *ctx ) _mesa_reference_buffer_object(ctx, >DrawIndirectBuffer, ctx->Shared->NullBufferObj); + _mesa_reference_buffer_object(ctx, >ParameterBuffer, +ctx->Shared->NullBufferObj); + _mesa_reference_buffer_object(ctx, >DispatchIndirectBuffer, ctx->Shared->NullBufferObj); @@ -913,6 +921,8 @@ _mesa_free_buffer_objects( struct gl_context *ctx ) _mesa_reference_buffer_object(ctx, >DrawIndirectBuffer, NULL); + _mesa_reference_buffer_object(ctx, >ParameterBuffer, NULL); + _mesa_reference_buffer_object(ctx, >DispatchIndirectBuffer, NULL); for (i = 0; i < MAX_COMBINED_UNIFORM_BUFFERS; i++) { @@ -1261,6 +1271,11 @@ _mesa_DeleteBuffers(GLsizei n, const GLuint *ids) _mesa_BindBuffer( GL_DRAW_INDIRECT_BUFFER, 0 ); } + /* unbind ARB_indirect_parameters binding point */ + if (ctx->ParameterBuffer == bufObj) { +_mesa_BindBuffer(GL_PARAMETER_BUFFER_ARB, 0); + } + /* unbind ARB_compute_shader binding point */ if (ctx->DispatchIndirectBuffer == bufObj) { _mesa_BindBuffer(GL_DISPATCH_INDIRECT_BUFFER, 0); diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index c6a2e5b..95cb18c 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -423,6 +423,7 @@ EXTRA_EXT(ARB_framebuffer_no_attachments); EXTRA_EXT(ARB_tessellation_shader); EXTRA_EXT(ARB_shader_subroutine); EXTRA_EXT(ARB_shader_storage_buffer_object); +EXTRA_EXT(ARB_indirect_parameters); static const int extra_ARB_color_buffer_float_or_glcore[] = { @@ -1032,6 +1033,10 @@ find_custom_value(struct gl_context *ctx, const struct value_desc *d, union valu case GL_DRAW_INDIRECT_BUFFER_BINDING: v->value_int = ctx->DrawIndirectBuffer->Name; break; + /* GL_ARB_indirect_parameters */ + case GL_PARAMETER_BUFFER_BINDING_ARB: + v->value_int = ctx->ParameterBuffer->Name; + break; /* GL_ARB_separate_shader_objects */ case GL_PROGRAM_PIPELINE_BINDING: if (ctx->Pipeline.Current) { diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py index 7a48ed2..af7a8f4 100644 --- a/src/mesa/main/get_hash_params.py +++ b/src/mesa/main/get_hash_params.py @@ -887,6 +887,10 @@ descriptor=[ # GL_ARB_shader_subroutine [ "MAX_SUBROUTINES", "CONST(MAX_SUBROUTINES), extra_ARB_shader_subroutine" ], [ "MAX_SUBROUTINE_UNIFORM_LOCATIONS", "CONST(MAX_SUBROUTINE_UNIFORM_LOCATIONS), extra_ARB_shader_subroutine" ], + +# GL_ARB_indirect_parameters + [ "PARAMETER_BUFFER_BINDING_ARB", "LOC_CUSTOM, TYPE_INT, 0, extra_ARB_indirect_parameters" ], + ]} ] diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 5cd2e8e..dd52368 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -4349,6 +4349,7 @@ struct gl_context struct gl_perf_monitor_state PerfMonitor; struct gl_buffer_object *DrawIndirectBuffer; /** < GL_ARB_draw_indirect */ + struct gl_buffer_object *ParameterBuffer; /** < GL_ARB_indirect_parameters */ struct gl_buffer_object *DispatchIndirectBuffer; /** < GL_ARB_compute_shader */ struct gl_buffer_object *CopyReadBuffer; /**< GL_ARB_copy_buffer */ -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] llvmpipe: add sse code for fixed position calculation
From: Roland ScheideggerThis is quite a few less instructions, albeit still do the 2 64bit muls with scalar c code (they'd need way more shuffles, plus fixup for the signed mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed scalar muls natively just fine after all (even on 32bit). (This still doesn't have a measurable performance impact in reality, although profiler seems to say time spent in setup indeed has gone down by 10% or so overall.) --- src/gallium/drivers/llvmpipe/lp_setup_tri.c | 58 + 1 file changed, 50 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c b/src/gallium/drivers/llvmpipe/lp_setup_tri.c index cb1d715..fefd1c1 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup_tri.c +++ b/src/gallium/drivers/llvmpipe/lp_setup_tri.c @@ -65,11 +65,11 @@ fixed_to_float(int a) struct fixed_position { int32_t x[4]; int32_t y[4]; - int64_t area; int32_t dx01; int32_t dy01; int32_t dx20; int32_t dy20; + int64_t area; }; @@ -866,29 +866,71 @@ static void retry_triangle_ccw( struct lp_setup_context *setup, /** * Calculate fixed position data for a triangle + * It is unfortunate we need to do that here (as we need area + * calculated in fixed point), as there's quite some code duplication + * to what is done in the jit setup prog. */ static inline void -calc_fixed_position( struct lp_setup_context *setup, - struct fixed_position* position, - const float (*v0)[4], - const float (*v1)[4], - const float (*v2)[4]) +calc_fixed_position(struct lp_setup_context *setup, +struct fixed_position* position, +const float (*v0)[4], +const float (*v1)[4], +const float (*v2)[4]) { + /* +* The rounding may not be quite the same with PIPE_ARCH_SSE +* (util_iround right now only does nearest/even on x87, +* otherwise nearest/away-from-zero). +* Both should be acceptable, I think. +*/ +#if defined(PIPE_ARCH_SSE) + __m128d v0r, v1r, v2r; + __m128 vxy0xy2, vxy1xy0; + __m128i vxy0xy2i, vxy1xy0i; + __m128i dxdy0120, x0x2y0y2, x1x0y1y0, x0120, y0120; + __m128 pix_offset = _mm_set1_ps(setup->pixel_offset); + __m128 fixed_one = _mm_set1_ps((float)FIXED_ONE); + v0r = _mm_load_sd((const double *)v0[0]); + v1r = _mm_load_sd((const double *)v1[0]); + v2r = _mm_load_sd((const double *)v2[0]); + vxy0xy2 = (__m128)_mm_unpacklo_pd(v0r, v2r); + vxy1xy0 = (__m128)_mm_unpacklo_pd(v1r, v0r); + vxy0xy2 = _mm_sub_ps(vxy0xy2, pix_offset); + vxy1xy0 = _mm_sub_ps(vxy1xy0, pix_offset); + vxy0xy2 = _mm_mul_ps(vxy0xy2, fixed_one); + vxy1xy0 = _mm_mul_ps(vxy1xy0, fixed_one); + vxy0xy2i = _mm_cvtps_epi32(vxy0xy2); + vxy1xy0i = _mm_cvtps_epi32(vxy1xy0); + dxdy0120 = _mm_sub_epi32(vxy0xy2i, vxy1xy0i); + _mm_store_si128((__m128i *)>dx01, dxdy0120); + /* +* For the mul, would need some more shuffles, plus emulation +* for the signed mul (without sse41), so don't bother. +*/ + x0x2y0y2 = _mm_shuffle_epi32(vxy0xy2i, _MM_SHUFFLE(3,1,2,0)); + x1x0y1y0 = _mm_shuffle_epi32(vxy1xy0i, _MM_SHUFFLE(3,1,2,0)); + x0120 = _mm_unpacklo_epi32(x0x2y0y2, x1x0y1y0); + y0120 = _mm_unpackhi_epi32(x0x2y0y2, x1x0y1y0); + _mm_store_si128((__m128i *)>x[0], x0120); + _mm_store_si128((__m128i *)>y[0], y0120); + +#else position->x[0] = subpixel_snap(v0[0][0] - setup->pixel_offset); position->x[1] = subpixel_snap(v1[0][0] - setup->pixel_offset); position->x[2] = subpixel_snap(v2[0][0] - setup->pixel_offset); - position->x[3] = 0; + position->x[3] = 0; // should be unused position->y[0] = subpixel_snap(v0[0][1] - setup->pixel_offset); position->y[1] = subpixel_snap(v1[0][1] - setup->pixel_offset); position->y[2] = subpixel_snap(v2[0][1] - setup->pixel_offset); - position->y[3] = 0; + position->y[3] = 0; // should be unused position->dx01 = position->x[0] - position->x[1]; position->dy01 = position->y[0] - position->y[1]; position->dx20 = position->x[2] - position->x[0]; position->dy20 = position->y[2] - position->y[0]; +#endif position->area = IMUL64(position->dx01, position->dy20) - IMUL64(position->dx20, position->dy01); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 72877] Wrong colors with Mesa 9.2 and Mesa 10.0 on PPC Linux systems
https://bugs.freedesktop.org/show_bug.cgi?id=72877 --- Comment #14 from Alex Perez--- Ping. I am still experiencing problems with incorrect colors with the very latest Mesa, compiled from a fresh git checkout today. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] Add ARB_indirect_parameters support
In this series patches 1-4 are: Reviewed-by: Edward O'CallaghanNo idea what is happening in patch 5 to say anything either way. On 2016-01-03 07:38, Ilia Mirkin wrote: The nvc0 patch applies on top of some unpublished patches, see https://github.com/imirkin/mesa/commits/tmp4 for the full thing. The whole series applies on top of the ARB_multi_draw_indirect patches I sent earlier (with potential minor modifications). There is some type confusion between the ARB_indirect_parameters spec and the Khronos gl.xml/glcorearb.h files, I went with the latter's definitions. This passes the relatively simple piglit test I sent. Ilia Mirkin (5): glapi: add ARB_indirect_parameters definitions mesa: add parameter buffer, used for ARB_indirect_parameters mesa: add support for ARB_indirect_parameters draw functions st/mesa: expose ARB_indirect_parameters when the backend driver allows nvc0: add ARB_indirect_parameters support docs/relnotes/11.2.0.html | 1 + src/gallium/drivers/nouveau/nvc0/mme/com9097.mme | 157 + src/gallium/drivers/nouveau/nvc0/mme/com9097.mme.h | 125 src/gallium/drivers/nouveau/nvc0/nvc0_macros.h | 4 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 4 +- src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c| 29 +++- src/mapi/glapi/gen/ARB_indirect_parameters.xml | 30 src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 6 +- src/mesa/main/api_validate.c | 115 +++ src/mesa/main/api_validate.h | 16 +++ src/mesa/main/bufferobj.c | 15 ++ src/mesa/main/extensions_table.h | 1 + src/mesa/main/get.c| 5 + src/mesa/main/get_hash_params.py | 4 + src/mesa/main/mtypes.h | 2 + src/mesa/main/tests/dispatch_sanity.cpp| 4 + src/mesa/state_tracker/st_cb_bufferobjects.c | 1 + src/mesa/state_tracker/st_extensions.c | 1 + src/mesa/vbo/vbo_exec_array.c | 124 20 files changed, 638 insertions(+), 7 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_indirect_parameters.xml ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/8] tgsi: add ureg support for image decls
There is quite a bit of rename churn happening here at the same time as the bring up of ureg support for image declarations. Would it be possible to split the rename churn out from the actual behavioral changes please? On 2016-01-03 15:37, Ilia Mirkin wrote: Signed-off-by: Ilia Mirkin--- src/gallium/auxiliary/tgsi/tgsi_build.c| 62 + src/gallium/auxiliary/tgsi/tgsi_dump.c | 10 +-- src/gallium/auxiliary/tgsi/tgsi_parse.c| 4 +- src/gallium/auxiliary/tgsi/tgsi_parse.h| 2 +- src/gallium/auxiliary/tgsi/tgsi_strings.c | 4 +- src/gallium/auxiliary/tgsi/tgsi_text.c | 10 +-- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 77 ++ src/gallium/auxiliary/tgsi/tgsi_ureg.h | 7 ++ src/gallium/drivers/ilo/shader/toy_tgsi.c | 8 +-- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 12 +++- src/gallium/drivers/svga/svga_tgsi_vgpu10.c| 2 + src/gallium/include/pipe/p_shader_tokens.h | 7 +- 12 files changed, 153 insertions(+), 52 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c index fdb7feb..bb9d0cb 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_build.c +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c @@ -259,36 +259,39 @@ tgsi_build_declaration_semantic( return ds; } -static struct tgsi_declaration_resource -tgsi_default_declaration_resource(void) +static struct tgsi_declaration_image +tgsi_default_declaration_image(void) { - struct tgsi_declaration_resource dr; + struct tgsi_declaration_image di; - dr.Resource = TGSI_TEXTURE_BUFFER; - dr.Raw = 0; - dr.Writable = 0; - dr.Padding = 0; + di.Resource = TGSI_TEXTURE_BUFFER; + di.Raw = 0; + di.Writable = 0; + di.Format = 0; + di.Padding = 0; - return dr; + return di; } -static struct tgsi_declaration_resource -tgsi_build_declaration_resource(unsigned texture, -unsigned raw, -unsigned writable, -struct tgsi_declaration *declaration, -struct tgsi_header *header) +static struct tgsi_declaration_image +tgsi_build_declaration_image(unsigned texture, + unsigned format, + unsigned raw, + unsigned writable, + struct tgsi_declaration *declaration, + struct tgsi_header *header) { - struct tgsi_declaration_resource dr; + struct tgsi_declaration_image di; - dr = tgsi_default_declaration_resource(); - dr.Resource = texture; - dr.Raw = raw; - dr.Writable = writable; + di = tgsi_default_declaration_image(); + di.Resource = texture; + di.Format = format; + di.Raw = raw; + di.Writable = writable; declaration_grow(declaration, header); - return dr; + return di; } static struct tgsi_declaration_sampler_view @@ -364,7 +367,7 @@ tgsi_default_full_declaration( void ) full_declaration.Range = tgsi_default_declaration_range(); full_declaration.Semantic = tgsi_default_declaration_semantic(); full_declaration.Interp = tgsi_default_declaration_interp(); - full_declaration.Resource = tgsi_default_declaration_resource(); + full_declaration.Image = tgsi_default_declaration_image(); full_declaration.SamplerView = tgsi_default_declaration_sampler_view(); full_declaration.Array = tgsi_default_declaration_array(); @@ -454,20 +457,21 @@ tgsi_build_full_declaration( header ); } - if (full_decl->Declaration.File == TGSI_FILE_RESOURCE) { - struct tgsi_declaration_resource *dr; + if (full_decl->Declaration.File == TGSI_FILE_IMAGE) { + struct tgsi_declaration_image *di; if (maxsize <= size) { return 0; } - dr = (struct tgsi_declaration_resource *)[size]; + di = (struct tgsi_declaration_image *)[size]; size++; - *dr = tgsi_build_declaration_resource(full_decl->Resource.Resource, -full_decl->Resource.Raw, - full_decl->Resource.Writable, -declaration, -header); + *di = tgsi_build_declaration_image(full_decl->Image.Resource, + full_decl->Image.Format, + full_decl->Image.Raw, + full_decl->Image.Writable, + declaration, + header); } if (full_decl->Declaration.File == TGSI_FILE_SAMPLER_VIEW) { diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c index e29ffb3..dad3839 100644 ---
Re: [Mesa-dev] [PATCH 0/8] gallium: add shader buffer support
In this series patches 2-8 are: Reviewed-by: Edward O'Callaghanwith some commentary on patch 1. Kind Regards, On 2016-01-03 15:37, Ilia Mirkin wrote: This provides enough support in TGSI to support shader buffers. I do away with the defunct TGSI_FILE_RESOURCE (renaming it into TGSI_FILE_IMAGE to work with pipe_image_view), and add a brand new TGSI_FILE_BUFFER. At the declaration level, this can have an ATOMIC qualifier (and later a SHARED qualifier for compute shaders). I also add memory qualifiers to LOAD/STORE opcodes, which can convey the coherent/volatile/restrict flags as specified in the GLSL. I also modified all of the formerly resource opcodes to work on both buffers and images. For images they will derive the format from the IMAGE declaration, while buffers are format-less by definition. This is still missing a way to implement memory barriers, that will come soon, and is not going to affect anything else I do in this series. For the full series I'm working on, you can look at https://github.com/imirkin/mesa/commits/atomic3 which exposes ARB_shader_atomic_counters and ARB_shader_storage_buffer_objects on nvc0+ (but it won't work on maxwell -- need to add emission of atomic ops and cache control). However this is a nice self-contained chunk to start with. Ilia Mirkin (8): tgsi: add ureg support for image decls ureg: add buffer support to ureg tgsi: provide a way to encode memory qualifiers for SSBO tgsi: add a is_store property tgsi: update atomic op docs gallium: add PIPE_SHADER_CAP_MAX_SHADER_BUFFERS gallium: add PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT gallium: add a RESQ opcode to query info about a resource src/gallium/auxiliary/gallivm/lp_bld_limits.h | 1 + src/gallium/auxiliary/tgsi/tgsi_build.c| 112 -- src/gallium/auxiliary/tgsi/tgsi_dump.c | 25 +- src/gallium/auxiliary/tgsi/tgsi_exec.h | 1 + src/gallium/auxiliary/tgsi/tgsi_info.c | 446 ++--- src/gallium/auxiliary/tgsi/tgsi_info.h | 1 + src/gallium/auxiliary/tgsi/tgsi_parse.c| 8 +- src/gallium/auxiliary/tgsi/tgsi_parse.h| 3 +- src/gallium/auxiliary/tgsi/tgsi_strings.c | 12 +- src/gallium/auxiliary/tgsi/tgsi_strings.h | 2 + src/gallium/auxiliary/tgsi/tgsi_text.c | 42 +- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 182 + src/gallium/auxiliary/tgsi/tgsi_ureg.h | 23 ++ src/gallium/docs/source/screen.rst | 8 + src/gallium/docs/source/tgsi.rst | 105 ++--- src/gallium/drivers/freedreno/freedreno_screen.c | 3 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/ilo/shader/toy_tgsi.c | 8 +- src/gallium/drivers/llvmpipe/lp_screen.c | 1 + .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 12 +- src/gallium/drivers/nouveau/nv30/nv30_screen.c | 3 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 + src/gallium/drivers/r300/r300_screen.c | 3 + src/gallium/drivers/r600/r600_pipe.c | 2 + src/gallium/drivers/radeonsi/si_pipe.c | 3 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 4 + src/gallium/drivers/svga/svga_tgsi_vgpu10.c| 2 + src/gallium/drivers/vc4/vc4_screen.c | 3 + src/gallium/drivers/virgl/virgl_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 2 + src/gallium/include/pipe/p_shader_tokens.h | 28 +- 34 files changed, 729 insertions(+), 324 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] gallium: document PK2H/UP2H
This series is: Reviewed-by: Edward O'CallaghanOn 2016-01-03 11:37, Ilia Mirkin wrote: Signed-off-by: Ilia Mirkin --- src/gallium/docs/source/tgsi.rst | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 955ece8..f69998f 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -458,7 +458,9 @@ while DDY is allowed to be the same for the entire 2x2 quad. .. opcode:: PK2H - Pack Two 16-bit Floats - TBD +.. math:: + + dst.x = f32\_to\_f16(src.x) | f32\_to\_f16(src.y) << 16 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars @@ -615,7 +617,11 @@ This instruction replicates its result. .. opcode:: UP2H - Unpack Two 16-Bit Floats - TBD +.. math:: + + dst.x = f16\_to\_f32(src0.x \& 0x) + + dst.y = f16\_to\_f32(src0.x >> 16) .. note:: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/8] tgsi: add ureg support for image decls
On Sun, Jan 3, 2016 at 2:33 AM,wrote: > There is quite a bit of rename churn happening here at the same time as the > bring up of ureg support for image declarations. > Would it be possible to split the rename churn out from the actual > behavioral changes please? This is almost exclusively a rename. The only other thing is adding the format to the tgsi_declaration_image (formerly tgsi_declaration_resource) and a couple of ureg helpers. I don't think it's really worth splitting apart, although if others feel similarly I can go back and do it. > > > On 2016-01-03 15:37, Ilia Mirkin wrote: >> >> Signed-off-by: Ilia Mirkin >> --- >> src/gallium/auxiliary/tgsi/tgsi_build.c| 62 + >> src/gallium/auxiliary/tgsi/tgsi_dump.c | 10 +-- >> src/gallium/auxiliary/tgsi/tgsi_parse.c| 4 +- >> src/gallium/auxiliary/tgsi/tgsi_parse.h| 2 +- >> src/gallium/auxiliary/tgsi/tgsi_strings.c | 4 +- >> src/gallium/auxiliary/tgsi/tgsi_text.c | 10 +-- >> src/gallium/auxiliary/tgsi/tgsi_ureg.c | 77 >> ++ >> src/gallium/auxiliary/tgsi/tgsi_ureg.h | 7 ++ >> src/gallium/drivers/ilo/shader/toy_tgsi.c | 8 +-- >> .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 12 +++- >> src/gallium/drivers/svga/svga_tgsi_vgpu10.c| 2 + >> src/gallium/include/pipe/p_shader_tokens.h | 7 +- >> 12 files changed, 153 insertions(+), 52 deletions(-) >> >> diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c >> b/src/gallium/auxiliary/tgsi/tgsi_build.c >> index fdb7feb..bb9d0cb 100644 >> --- a/src/gallium/auxiliary/tgsi/tgsi_build.c >> +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c >> @@ -259,36 +259,39 @@ tgsi_build_declaration_semantic( >> return ds; >> } >> >> -static struct tgsi_declaration_resource >> -tgsi_default_declaration_resource(void) >> +static struct tgsi_declaration_image >> +tgsi_default_declaration_image(void) >> { >> - struct tgsi_declaration_resource dr; >> + struct tgsi_declaration_image di; >> >> - dr.Resource = TGSI_TEXTURE_BUFFER; >> - dr.Raw = 0; >> - dr.Writable = 0; >> - dr.Padding = 0; >> + di.Resource = TGSI_TEXTURE_BUFFER; >> + di.Raw = 0; >> + di.Writable = 0; >> + di.Format = 0; >> + di.Padding = 0; >> >> - return dr; >> + return di; >> } >> >> -static struct tgsi_declaration_resource >> -tgsi_build_declaration_resource(unsigned texture, >> -unsigned raw, >> -unsigned writable, >> -struct tgsi_declaration *declaration, >> -struct tgsi_header *header) >> +static struct tgsi_declaration_image >> +tgsi_build_declaration_image(unsigned texture, >> + unsigned format, >> + unsigned raw, >> + unsigned writable, >> + struct tgsi_declaration *declaration, >> + struct tgsi_header *header) >> { >> - struct tgsi_declaration_resource dr; >> + struct tgsi_declaration_image di; >> >> - dr = tgsi_default_declaration_resource(); >> - dr.Resource = texture; >> - dr.Raw = raw; >> - dr.Writable = writable; >> + di = tgsi_default_declaration_image(); >> + di.Resource = texture; >> + di.Format = format; >> + di.Raw = raw; >> + di.Writable = writable; >> >> declaration_grow(declaration, header); >> >> - return dr; >> + return di; >> } >> >> static struct tgsi_declaration_sampler_view >> @@ -364,7 +367,7 @@ tgsi_default_full_declaration( void ) >> full_declaration.Range = tgsi_default_declaration_range(); >> full_declaration.Semantic = tgsi_default_declaration_semantic(); >> full_declaration.Interp = tgsi_default_declaration_interp(); >> - full_declaration.Resource = tgsi_default_declaration_resource(); >> + full_declaration.Image = tgsi_default_declaration_image(); >> full_declaration.SamplerView = >> tgsi_default_declaration_sampler_view(); >> full_declaration.Array = tgsi_default_declaration_array(); >> >> @@ -454,20 +457,21 @@ tgsi_build_full_declaration( >> header ); >> } >> >> - if (full_decl->Declaration.File == TGSI_FILE_RESOURCE) { >> - struct tgsi_declaration_resource *dr; >> + if (full_decl->Declaration.File == TGSI_FILE_IMAGE) { >> + struct tgsi_declaration_image *di; >> >>if (maxsize <= size) { >> return 0; >>} >> - dr = (struct tgsi_declaration_resource *)[size]; >> + di = (struct tgsi_declaration_image *)[size]; >>size++; >> >> - *dr = tgsi_build_declaration_resource(full_decl->Resource.Resource, >> -full_decl->Resource.Raw, >> -full_decl->Resource.Writable, >> -
[Mesa-dev] [PATCH 6/8] gallium: add PIPE_SHADER_CAP_MAX_SHADER_BUFFERS
Signed-off-by: Ilia Mirkin--- src/gallium/auxiliary/gallivm/lp_bld_limits.h| 1 + src/gallium/auxiliary/tgsi/tgsi_exec.h | 1 + src/gallium/docs/source/screen.rst | 4 src/gallium/drivers/freedreno/freedreno_screen.c | 2 ++ src/gallium/drivers/nouveau/nv30/nv30_screen.c | 2 ++ src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 2 ++ src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 2 ++ src/gallium/drivers/svga/svga_screen.c | 3 +++ src/gallium/drivers/vc4/vc4_screen.c | 2 ++ src/gallium/include/pipe/p_defines.h | 1 + 13 files changed, 23 insertions(+) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_limits.h b/src/gallium/auxiliary/gallivm/lp_bld_limits.h index ad64ae0..4598db8 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_limits.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_limits.h @@ -136,6 +136,7 @@ gallivm_get_shader_param(enum pipe_shader_cap param) case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: + case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS: return 0; case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT: return 32; diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h b/src/gallium/auxiliary/tgsi/tgsi_exec.h index f86adce..26fec8e 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.h +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h @@ -473,6 +473,7 @@ tgsi_exec_get_shader_param(enum pipe_shader_cap param) return 1; case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: + case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS: return 0; case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT: return 32; diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 41bd0f8..4402809 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -377,6 +377,10 @@ to be 0. of iterations that loops are allowed to have to be unrolled. It is only a hint to state trackers. Whether any loops will be unrolled is not guaranteed. +* ``PIPE_SHADER_CAP_MAX_SHADER_BUFFERS``: Maximum number of memory buffers + (also used to implement atomic counters). Having this be non-0 also + implies support for the ``LOAD``, ``STORE``, and ``ATOM*`` TGSI + opcodes. .. _pipe_compute_cap: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 4b6d6af..bf356c4 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -415,6 +415,8 @@ fd_screen_get_shader_param(struct pipe_screen *pscreen, unsigned shader, return PIPE_SHADER_IR_TGSI; case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT: return 32; + case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS: + return 0; } debug_printf("unknown shader param %d\n", param); return 0; diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c index 02303bb..3d77f81 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c @@ -266,6 +266,7 @@ nv30_screen_get_shader_param(struct pipe_screen *pscreen, unsigned shader, case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: case PIPE_SHADER_CAP_TGSI_ANY_INOUT_DECL_RANGE: + case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS: return 0; case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT: return 32; @@ -309,6 +310,7 @@ nv30_screen_get_shader_param(struct pipe_screen *pscreen, unsigned shader, case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: case PIPE_SHADER_CAP_TGSI_ANY_INOUT_DECL_RANGE: + case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS: return 0; case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT: return 32; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index b3f2492..aafca71 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -301,6 +301,7 @@ nv50_screen_get_shader_param(struct pipe_screen *pscreen, unsigned shader, case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: case PIPE_SHADER_CAP_TGSI_ANY_INOUT_DECL_RANGE: + case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS: return 0; case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT: return 32; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
[Mesa-dev] [PATCH 0/8] gallium: add shader buffer support
This provides enough support in TGSI to support shader buffers. I do away with the defunct TGSI_FILE_RESOURCE (renaming it into TGSI_FILE_IMAGE to work with pipe_image_view), and add a brand new TGSI_FILE_BUFFER. At the declaration level, this can have an ATOMIC qualifier (and later a SHARED qualifier for compute shaders). I also add memory qualifiers to LOAD/STORE opcodes, which can convey the coherent/volatile/restrict flags as specified in the GLSL. I also modified all of the formerly resource opcodes to work on both buffers and images. For images they will derive the format from the IMAGE declaration, while buffers are format-less by definition. This is still missing a way to implement memory barriers, that will come soon, and is not going to affect anything else I do in this series. For the full series I'm working on, you can look at https://github.com/imirkin/mesa/commits/atomic3 which exposes ARB_shader_atomic_counters and ARB_shader_storage_buffer_objects on nvc0+ (but it won't work on maxwell -- need to add emission of atomic ops and cache control). However this is a nice self-contained chunk to start with. Ilia Mirkin (8): tgsi: add ureg support for image decls ureg: add buffer support to ureg tgsi: provide a way to encode memory qualifiers for SSBO tgsi: add a is_store property tgsi: update atomic op docs gallium: add PIPE_SHADER_CAP_MAX_SHADER_BUFFERS gallium: add PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT gallium: add a RESQ opcode to query info about a resource src/gallium/auxiliary/gallivm/lp_bld_limits.h | 1 + src/gallium/auxiliary/tgsi/tgsi_build.c| 112 -- src/gallium/auxiliary/tgsi/tgsi_dump.c | 25 +- src/gallium/auxiliary/tgsi/tgsi_exec.h | 1 + src/gallium/auxiliary/tgsi/tgsi_info.c | 446 ++--- src/gallium/auxiliary/tgsi/tgsi_info.h | 1 + src/gallium/auxiliary/tgsi/tgsi_parse.c| 8 +- src/gallium/auxiliary/tgsi/tgsi_parse.h| 3 +- src/gallium/auxiliary/tgsi/tgsi_strings.c | 12 +- src/gallium/auxiliary/tgsi/tgsi_strings.h | 2 + src/gallium/auxiliary/tgsi/tgsi_text.c | 42 +- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 182 + src/gallium/auxiliary/tgsi/tgsi_ureg.h | 23 ++ src/gallium/docs/source/screen.rst | 8 + src/gallium/docs/source/tgsi.rst | 105 ++--- src/gallium/drivers/freedreno/freedreno_screen.c | 3 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/ilo/shader/toy_tgsi.c | 8 +- src/gallium/drivers/llvmpipe/lp_screen.c | 1 + .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 12 +- src/gallium/drivers/nouveau/nv30/nv30_screen.c | 3 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 + src/gallium/drivers/r300/r300_screen.c | 3 + src/gallium/drivers/r600/r600_pipe.c | 2 + src/gallium/drivers/radeonsi/si_pipe.c | 3 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 4 + src/gallium/drivers/svga/svga_tgsi_vgpu10.c| 2 + src/gallium/drivers/vc4/vc4_screen.c | 3 + src/gallium/drivers/virgl/virgl_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 2 + src/gallium/include/pipe/p_shader_tokens.h | 28 +- 34 files changed, 729 insertions(+), 324 deletions(-) -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/8] tgsi: update atomic op docs
Specify that the operation only applies to the x component, not per-component as previously specified. This is unnecessary for GL and creates additional complications for images which need to support these operations as well. Signed-off-by: Ilia Mirkin--- src/gallium/docs/source/tgsi.rst | 93 1 file changed, 47 insertions(+), 46 deletions(-) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 955ece8..a3151e3 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -2252,11 +2252,11 @@ after lookup. Resource Access Opcodes ^^^ -.. opcode:: LOAD - Fetch data from a shader resource +.. opcode:: LOAD - Fetch data from a shader buffer or image Syntax: ``LOAD dst, resource, address`` - Example: ``LOAD TEMP[0], RES[0], TEMP[1]`` + Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]`` Using the provided integer address, LOAD fetches data from the specified buffer or texture without any @@ -2280,7 +2280,7 @@ Resource Access Opcodes Syntax: ``STORE resource, address, src`` - Example: ``STORE RES[0], TEMP[0], TEMP[1]`` + Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]`` Using the provided integer address, STORE writes data to the specified buffer or texture. @@ -2358,158 +2358,159 @@ These opcodes provide atomic variants of some common arithmetic and logical operations. In this context atomicity means that another concurrent memory access operation that affects the same memory location is guaranteed to be performed strictly before or after the -entire execution of the atomic operation. - -For the moment they're only valid in compute programs. +entire execution of the atomic operation. The resource may be a buffer +or an image. In the case of an image, the offset works the same as for +``LOAD`` and ``STORE``, specified above. These atomic operations may +only be used with 32-bit integer image formats. .. opcode:: ATOMUADD - Atomic integer addition Syntax: ``ATOMUADD dst, resource, offset, src`` - Example: ``ATOMUADD TEMP[0], RES[0], TEMP[1], TEMP[2]`` + Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` - The following operation is performed atomically on each component: + The following operation is performed atomically: .. math:: - dst_i = resource[offset]_i + dst_x = resource[offset] - resource[offset]_i = dst_i + src_i + resource[offset] = dst_x + src_x .. opcode:: ATOMXCHG - Atomic exchange Syntax: ``ATOMXCHG dst, resource, offset, src`` - Example: ``ATOMXCHG TEMP[0], RES[0], TEMP[1], TEMP[2]`` + Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` - The following operation is performed atomically on each component: + The following operation is performed atomically: .. math:: - dst_i = resource[offset]_i + dst_x = resource[offset] - resource[offset]_i = src_i + resource[offset] = src_x .. opcode:: ATOMCAS - Atomic compare-and-exchange Syntax: ``ATOMCAS dst, resource, offset, cmp, src`` - Example: ``ATOMCAS TEMP[0], RES[0], TEMP[1], TEMP[2], TEMP[3]`` + Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]`` - The following operation is performed atomically on each component: + The following operation is performed atomically: .. math:: - dst_i = resource[offset]_i + dst_x = resource[offset] - resource[offset]_i = (dst_i == cmp_i ? src_i : dst_i) + resource[offset] = (dst_x == cmp_x ? src_x : dst_x) .. opcode:: ATOMAND - Atomic bitwise And Syntax: ``ATOMAND dst, resource, offset, src`` - Example: ``ATOMAND TEMP[0], RES[0], TEMP[1], TEMP[2]`` + Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` - The following operation is performed atomically on each component: + The following operation is performed atomically: .. math:: - dst_i = resource[offset]_i + dst_x = resource[offset] - resource[offset]_i = dst_i \& src_i + resource[offset] = dst_x \& src_x .. opcode:: ATOMOR - Atomic bitwise Or Syntax: ``ATOMOR dst, resource, offset, src`` - Example: ``ATOMOR TEMP[0], RES[0], TEMP[1], TEMP[2]`` + Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` - The following operation is performed atomically on each component: + The following operation is performed atomically: .. math:: - dst_i = resource[offset]_i + dst_x = resource[offset] - resource[offset]_i = dst_i | src_i + resource[offset] = dst_x | src_x .. opcode:: ATOMXOR - Atomic bitwise Xor Syntax: ``ATOMXOR dst, resource, offset, src`` - Example: ``ATOMXOR TEMP[0], RES[0], TEMP[1], TEMP[2]`` + Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` - The following operation is performed atomically on each component: + The following operation is performed
[Mesa-dev] [PATCH 4/8] tgsi: add a is_store property
Signed-off-by: Ilia Mirkin--- src/gallium/auxiliary/tgsi/tgsi_info.c | 446 - src/gallium/auxiliary/tgsi/tgsi_info.h | 1 + 2 files changed, 224 insertions(+), 223 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index 3b40c3d..8a0e9c4 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -37,231 +37,231 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { - { 1, 1, 0, 0, 0, 0, COMP, "ARL", TGSI_OPCODE_ARL }, - { 1, 1, 0, 0, 0, 0, COMP, "MOV", TGSI_OPCODE_MOV }, - { 1, 1, 0, 0, 0, 0, CHAN, "LIT", TGSI_OPCODE_LIT }, - { 1, 1, 0, 0, 0, 0, REPL, "RCP", TGSI_OPCODE_RCP }, - { 1, 1, 0, 0, 0, 0, REPL, "RSQ", TGSI_OPCODE_RSQ }, - { 1, 1, 0, 0, 0, 0, CHAN, "EXP", TGSI_OPCODE_EXP }, - { 1, 1, 0, 0, 0, 0, CHAN, "LOG", TGSI_OPCODE_LOG }, - { 1, 2, 0, 0, 0, 0, COMP, "MUL", TGSI_OPCODE_MUL }, - { 1, 2, 0, 0, 0, 0, COMP, "ADD", TGSI_OPCODE_ADD }, - { 1, 2, 0, 0, 0, 0, REPL, "DP3", TGSI_OPCODE_DP3 }, - { 1, 2, 0, 0, 0, 0, REPL, "DP4", TGSI_OPCODE_DP4 }, - { 1, 2, 0, 0, 0, 0, CHAN, "DST", TGSI_OPCODE_DST }, - { 1, 2, 0, 0, 0, 0, COMP, "MIN", TGSI_OPCODE_MIN }, - { 1, 2, 0, 0, 0, 0, COMP, "MAX", TGSI_OPCODE_MAX }, - { 1, 2, 0, 0, 0, 0, COMP, "SLT", TGSI_OPCODE_SLT }, - { 1, 2, 0, 0, 0, 0, COMP, "SGE", TGSI_OPCODE_SGE }, - { 1, 3, 0, 0, 0, 0, COMP, "MAD", TGSI_OPCODE_MAD }, - { 1, 2, 0, 0, 0, 0, COMP, "SUB", TGSI_OPCODE_SUB }, - { 1, 3, 0, 0, 0, 0, COMP, "LRP", TGSI_OPCODE_LRP }, - { 1, 3, 0, 0, 0, 0, COMP, "FMA", TGSI_OPCODE_FMA }, - { 1, 1, 0, 0, 0, 0, REPL, "SQRT", TGSI_OPCODE_SQRT }, - { 1, 3, 0, 0, 0, 0, REPL, "DP2A", TGSI_OPCODE_DP2A }, - { 0, 0, 0, 0, 0, 0, NONE, "", 22 }, /* removed */ - { 0, 0, 0, 0, 0, 0, NONE, "", 23 }, /* removed */ - { 1, 1, 0, 0, 0, 0, COMP, "FRC", TGSI_OPCODE_FRC }, - { 1, 3, 0, 0, 0, 0, COMP, "CLAMP", TGSI_OPCODE_CLAMP }, - { 1, 1, 0, 0, 0, 0, COMP, "FLR", TGSI_OPCODE_FLR }, - { 1, 1, 0, 0, 0, 0, COMP, "ROUND", TGSI_OPCODE_ROUND }, - { 1, 1, 0, 0, 0, 0, REPL, "EX2", TGSI_OPCODE_EX2 }, - { 1, 1, 0, 0, 0, 0, REPL, "LG2", TGSI_OPCODE_LG2 }, - { 1, 2, 0, 0, 0, 0, REPL, "POW", TGSI_OPCODE_POW }, - { 1, 2, 0, 0, 0, 0, COMP, "XPD", TGSI_OPCODE_XPD }, - { 0, 0, 0, 0, 0, 0, NONE, "", 32 }, /* removed */ - { 1, 1, 0, 0, 0, 0, COMP, "ABS", TGSI_OPCODE_ABS }, - { 0, 0, 0, 0, 0, 0, NONE, "", 34 }, /* removed */ - { 1, 2, 0, 0, 0, 0, REPL, "DPH", TGSI_OPCODE_DPH }, - { 1, 1, 0, 0, 0, 0, REPL, "COS", TGSI_OPCODE_COS }, - { 1, 1, 0, 0, 0, 0, COMP, "DDX", TGSI_OPCODE_DDX }, - { 1, 1, 0, 0, 0, 0, COMP, "DDY", TGSI_OPCODE_DDY }, - { 0, 0, 0, 0, 0, 0, NONE, "KILL", TGSI_OPCODE_KILL }, - { 1, 1, 0, 0, 0, 0, COMP, "PK2H", TGSI_OPCODE_PK2H }, - { 1, 1, 0, 0, 0, 0, COMP, "PK2US", TGSI_OPCODE_PK2US }, - { 1, 1, 0, 0, 0, 0, COMP, "PK4B", TGSI_OPCODE_PK4B }, - { 1, 1, 0, 0, 0, 0, COMP, "PK4UB", TGSI_OPCODE_PK4UB }, - { 0, 1, 0, 0, 0, 1, NONE, "", 44 }, /* removed */ - { 1, 2, 0, 0, 0, 0, COMP, "SEQ", TGSI_OPCODE_SEQ }, - { 0, 1, 0, 0, 0, 1, NONE, "", 46 }, /* removed */ - { 1, 2, 0, 0, 0, 0, COMP, "SGT", TGSI_OPCODE_SGT }, - { 1, 1, 0, 0, 0, 0, REPL, "SIN", TGSI_OPCODE_SIN }, - { 1, 2, 0, 0, 0, 0, COMP, "SLE", TGSI_OPCODE_SLE }, - { 1, 2, 0, 0, 0, 0, COMP, "SNE", TGSI_OPCODE_SNE }, - { 0, 1, 0, 0, 0, 1, NONE, "", 51 }, /* removed */ - { 1, 2, 1, 0, 0, 0, OTHR, "TEX", TGSI_OPCODE_TEX }, - { 1, 4, 1, 0, 0, 0, OTHR, "TXD", TGSI_OPCODE_TXD }, - { 1, 2, 1, 0, 0, 0, OTHR, "TXP", TGSI_OPCODE_TXP }, - { 1, 1, 0, 0, 0, 0, COMP, "UP2H", TGSI_OPCODE_UP2H }, - { 1, 1, 0, 0, 0, 0, COMP, "UP2US", TGSI_OPCODE_UP2US }, - { 1, 1, 0, 0, 0, 0, COMP, "UP4B", TGSI_OPCODE_UP4B }, - { 1, 1, 0, 0, 0, 0, COMP, "UP4UB", TGSI_OPCODE_UP4UB }, - { 0, 1, 0, 0, 0, 1, NONE, "", 59 }, /* removed */ - { 0, 1, 0, 0, 0, 1, NONE, "", 60 }, /* removed */ - { 1, 1, 0, 0, 0, 0, COMP, "ARR", TGSI_OPCODE_ARR }, - { 0, 1, 0, 0, 0, 1, NONE, "", 62 }, /* removed */ - { 0, 0, 0, 1, 0, 0, NONE, "CAL", TGSI_OPCODE_CAL }, - { 0, 0, 0, 0, 0, 0, NONE, "RET", TGSI_OPCODE_RET }, - { 1, 1, 0, 0, 0, 0, COMP, "SSG", TGSI_OPCODE_SSG }, - { 1, 3, 0, 0, 0, 0, COMP, "CMP", TGSI_OPCODE_CMP }, - { 1, 1, 0, 0, 0, 0, CHAN, "SCS", TGSI_OPCODE_SCS }, - { 1, 2, 1, 0, 0, 0, OTHR, "TXB", TGSI_OPCODE_TXB }, - { 0, 1, 0, 0, 0, 1, NONE, "", 69 }, /* removed */ - { 1, 2, 0, 0, 0, 0, COMP, "DIV", TGSI_OPCODE_DIV }, - { 1, 2, 0, 0, 0, 0, REPL, "DP2", TGSI_OPCODE_DP2 }, - { 1, 2, 1, 0, 0, 0, OTHR, "TXL", TGSI_OPCODE_TXL }, - { 0, 0, 0, 0, 0, 0, NONE, "BRK", TGSI_OPCODE_BRK }, - { 0, 1, 0, 1, 0, 1, NONE, "IF", TGSI_OPCODE_IF }, - { 0, 1, 0, 1, 0, 1, NONE, "UIF", TGSI_OPCODE_UIF }, - { 0, 1, 0, 0, 0, 1, NONE, "", 76 }, /* removed */ - { 0, 0, 0, 1, 1, 1, NONE, "ELSE", TGSI_OPCODE_ELSE }, - { 0,
[Mesa-dev] [PATCH 1/8] tgsi: add ureg support for image decls
Signed-off-by: Ilia Mirkin--- src/gallium/auxiliary/tgsi/tgsi_build.c| 62 + src/gallium/auxiliary/tgsi/tgsi_dump.c | 10 +-- src/gallium/auxiliary/tgsi/tgsi_parse.c| 4 +- src/gallium/auxiliary/tgsi/tgsi_parse.h| 2 +- src/gallium/auxiliary/tgsi/tgsi_strings.c | 4 +- src/gallium/auxiliary/tgsi/tgsi_text.c | 10 +-- src/gallium/auxiliary/tgsi/tgsi_ureg.c | 77 ++ src/gallium/auxiliary/tgsi/tgsi_ureg.h | 7 ++ src/gallium/drivers/ilo/shader/toy_tgsi.c | 8 +-- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 12 +++- src/gallium/drivers/svga/svga_tgsi_vgpu10.c| 2 + src/gallium/include/pipe/p_shader_tokens.h | 7 +- 12 files changed, 153 insertions(+), 52 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c index fdb7feb..bb9d0cb 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_build.c +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c @@ -259,36 +259,39 @@ tgsi_build_declaration_semantic( return ds; } -static struct tgsi_declaration_resource -tgsi_default_declaration_resource(void) +static struct tgsi_declaration_image +tgsi_default_declaration_image(void) { - struct tgsi_declaration_resource dr; + struct tgsi_declaration_image di; - dr.Resource = TGSI_TEXTURE_BUFFER; - dr.Raw = 0; - dr.Writable = 0; - dr.Padding = 0; + di.Resource = TGSI_TEXTURE_BUFFER; + di.Raw = 0; + di.Writable = 0; + di.Format = 0; + di.Padding = 0; - return dr; + return di; } -static struct tgsi_declaration_resource -tgsi_build_declaration_resource(unsigned texture, -unsigned raw, -unsigned writable, -struct tgsi_declaration *declaration, -struct tgsi_header *header) +static struct tgsi_declaration_image +tgsi_build_declaration_image(unsigned texture, + unsigned format, + unsigned raw, + unsigned writable, + struct tgsi_declaration *declaration, + struct tgsi_header *header) { - struct tgsi_declaration_resource dr; + struct tgsi_declaration_image di; - dr = tgsi_default_declaration_resource(); - dr.Resource = texture; - dr.Raw = raw; - dr.Writable = writable; + di = tgsi_default_declaration_image(); + di.Resource = texture; + di.Format = format; + di.Raw = raw; + di.Writable = writable; declaration_grow(declaration, header); - return dr; + return di; } static struct tgsi_declaration_sampler_view @@ -364,7 +367,7 @@ tgsi_default_full_declaration( void ) full_declaration.Range = tgsi_default_declaration_range(); full_declaration.Semantic = tgsi_default_declaration_semantic(); full_declaration.Interp = tgsi_default_declaration_interp(); - full_declaration.Resource = tgsi_default_declaration_resource(); + full_declaration.Image = tgsi_default_declaration_image(); full_declaration.SamplerView = tgsi_default_declaration_sampler_view(); full_declaration.Array = tgsi_default_declaration_array(); @@ -454,20 +457,21 @@ tgsi_build_full_declaration( header ); } - if (full_decl->Declaration.File == TGSI_FILE_RESOURCE) { - struct tgsi_declaration_resource *dr; + if (full_decl->Declaration.File == TGSI_FILE_IMAGE) { + struct tgsi_declaration_image *di; if (maxsize <= size) { return 0; } - dr = (struct tgsi_declaration_resource *)[size]; + di = (struct tgsi_declaration_image *)[size]; size++; - *dr = tgsi_build_declaration_resource(full_decl->Resource.Resource, -full_decl->Resource.Raw, -full_decl->Resource.Writable, -declaration, -header); + *di = tgsi_build_declaration_image(full_decl->Image.Resource, + full_decl->Image.Format, + full_decl->Image.Raw, + full_decl->Image.Writable, + declaration, + header); } if (full_decl->Declaration.File == TGSI_FILE_SAMPLER_VIEW) { diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c index e29ffb3..dad3839 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c @@ -348,12 +348,14 @@ iter_declaration( } } - if (decl->Declaration.File == TGSI_FILE_RESOURCE) { + if (decl->Declaration.File == TGSI_FILE_IMAGE) { TXT(", "); -
[Mesa-dev] [PATCH 3/8] tgsi: provide a way to encode memory qualifiers for SSBO
Each load/store on most hardware can specify what caching to do. Since SSBO allows individual variables to also have separate caching modes, allow loads/stores to have the qualifiers instead of attempting to encode them in declarations. Signed-off-by: Ilia Mirkin--- src/gallium/auxiliary/tgsi/tgsi_build.c| 50 +++- src/gallium/auxiliary/tgsi/tgsi_dump.c | 10 ++ src/gallium/auxiliary/tgsi/tgsi_parse.c| 4 +++ src/gallium/auxiliary/tgsi/tgsi_parse.h| 1 + src/gallium/auxiliary/tgsi/tgsi_strings.c | 7 src/gallium/auxiliary/tgsi/tgsi_strings.h | 2 ++ src/gallium/auxiliary/tgsi/tgsi_text.c | 27 +++ src/gallium/auxiliary/tgsi/tgsi_ureg.c | 53 ++ src/gallium/auxiliary/tgsi/tgsi_ureg.h | 13 src/gallium/include/pipe/p_shader_tokens.h | 16 - 10 files changed, 181 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_build.c b/src/gallium/auxiliary/tgsi/tgsi_build.c index bb9d0cb..ea20746 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_build.c +++ b/src/gallium/auxiliary/tgsi/tgsi_build.c @@ -620,7 +620,8 @@ tgsi_default_instruction( void ) instruction.NumSrcRegs = 1; instruction.Label = 0; instruction.Texture = 0; - instruction.Padding = 0; + instruction.Memory = 0; + instruction.Padding = 0; return instruction; } @@ -766,6 +767,34 @@ tgsi_build_instruction_texture( return instruction_texture; } +static struct tgsi_instruction_memory +tgsi_default_instruction_memory( void ) +{ + struct tgsi_instruction_memory instruction_memory; + + instruction_memory.Qualifier = 0; + instruction_memory.Padding = 0; + + return instruction_memory; +} + +static struct tgsi_instruction_memory +tgsi_build_instruction_memory( + unsigned qualifier, + struct tgsi_token *prev_token, + struct tgsi_instruction *instruction, + struct tgsi_header *header ) +{ + struct tgsi_instruction_memory instruction_memory; + + instruction_memory.Qualifier = qualifier; + instruction_memory.Padding = 0; + instruction->Memory = 1; + + instruction_grow( instruction, header ); + + return instruction_memory; +} static struct tgsi_texture_offset tgsi_default_texture_offset( void ) @@ -1012,6 +1041,7 @@ tgsi_default_full_instruction( void ) full_instruction.Predicate = tgsi_default_instruction_predicate(); full_instruction.Label = tgsi_default_instruction_label(); full_instruction.Texture = tgsi_default_instruction_texture(); + full_instruction.Memory = tgsi_default_instruction_memory(); for( i = 0; i < TGSI_FULL_MAX_TEX_OFFSETS; i++ ) { full_instruction.TexOffsets[i] = tgsi_default_texture_offset(); } @@ -1123,6 +1153,24 @@ tgsi_build_full_instruction( prev_token = (struct tgsi_token *) texture_offset; } } + + if (full_inst->Instruction.Memory) { + struct tgsi_instruction_memory *instruction_memory; + + if( maxsize <= size ) + return 0; + instruction_memory = + (struct tgsi_instruction_memory *) [size]; + size++; + + *instruction_memory = tgsi_build_instruction_memory( + full_inst->Memory.Qualifier, + prev_token, + instruction, + header ); + prev_token = (struct tgsi_token *) instruction_memory; + } + for( i = 0; i < full_inst->Instruction.NumDstRegs; i++ ) { const struct tgsi_full_dst_register *reg = _inst->Dst[i]; struct tgsi_dst_register *dst_register; diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c index de3aae5..2ad29b9 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c @@ -624,6 +624,16 @@ iter_instruction( } } + if (inst->Instruction.Memory) { + uint32_t qualifier = inst->Memory.Qualifier; + while (qualifier) { + int bit = ffs(qualifier) - 1; + qualifier &= ~(1U << bit); + TXT(", "); + ENM(bit, tgsi_memory_names); + } + } + switch (inst->Instruction.Opcode) { case TGSI_OPCODE_IF: case TGSI_OPCODE_UIF: diff --git a/src/gallium/auxiliary/tgsi/tgsi_parse.c b/src/gallium/auxiliary/tgsi/tgsi_parse.c index 9a52bbb..ae95ebd 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_parse.c +++ b/src/gallium/auxiliary/tgsi/tgsi_parse.c @@ -195,6 +195,10 @@ tgsi_parse_token( } } + if (inst->Instruction.Memory) { + next_token(ctx, >Memory); + } + assert( inst->Instruction.NumDstRegs <= TGSI_FULL_MAX_DST_REGISTERS ); for (i = 0; i < inst->Instruction.NumDstRegs; i++) { diff --git a/src/gallium/auxiliary/tgsi/tgsi_parse.h b/src/gallium/auxiliary/tgsi/tgsi_parse.h index 5ed1a83..4689fb7 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_parse.h +++ b/src/gallium/auxiliary/tgsi/tgsi_parse.h @@ -91,6 +91,7 @@ struct tgsi_full_instruction struct tgsi_instruction_predicate
[Mesa-dev] [PATCH 7/8] gallium: add PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT
Signed-off-by: Ilia Mirkin--- src/gallium/docs/source/screen.rst | 4 src/gallium/drivers/freedreno/freedreno_screen.c | 1 + src/gallium/drivers/i915/i915_screen.c | 1 + src/gallium/drivers/ilo/ilo_screen.c | 1 + src/gallium/drivers/llvmpipe/lp_screen.c | 1 + src/gallium/drivers/nouveau/nv30/nv30_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 + src/gallium/drivers/r300/r300_screen.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 1 + src/gallium/drivers/radeonsi/si_pipe.c | 1 + src/gallium/drivers/softpipe/sp_screen.c | 1 + src/gallium/drivers/svga/svga_screen.c | 1 + src/gallium/drivers/vc4/vc4_screen.c | 1 + src/gallium/drivers/virgl/virgl_screen.c | 1 + src/gallium/include/pipe/p_defines.h | 1 + 16 files changed, 19 insertions(+) diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 4402809..cea6fc0 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -285,6 +285,10 @@ The integer capabilities: * ``PIPE_CAP_DRAW_PARAMETERS``: Whether ``TGSI_SEMANTIC_BASEVERTEX``, ``TGSI_SEMANTIC_BASEINSTANCE``, and ``TGSI_SEMANTIC_DRAWID`` are supported in vertex shaders. +* ``PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT``: Describes the required + alignment for pipe_shader_buffer::buffer_offset, in bytes. Maximum + value allowed is 256 (for GL conformance). 0 is only allowed if + shader buffers are not supported. .. _pipe_capf: diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index bf356c4..44db5e8 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -239,6 +239,7 @@ fd_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: return 0; case PIPE_CAP_MAX_VIEWPORTS: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 14bd8d7..22d926c 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -255,6 +255,7 @@ i915_get_param(struct pipe_screen *screen, enum pipe_cap cap) case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: return 0; case PIPE_CAP_MAX_DUAL_SOURCE_RENDER_TARGETS: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c index ac29b56..02b6851 100644 --- a/src/gallium/drivers/ilo/ilo_screen.c +++ b/src/gallium/drivers/ilo/ilo_screen.c @@ -477,6 +477,7 @@ ilo_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index 5352963..6f0041a 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -302,6 +302,7 @@ llvmpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: return 0; } /* should only get here on unhandled cases */ diff --git a/src/gallium/drivers/nouveau/nv30/nv30_screen.c b/src/gallium/drivers/nouveau/nv30/nv30_screen.c index 3d77f81..818ee17 100644 --- a/src/gallium/drivers/nouveau/nv30/nv30_screen.c +++ b/src/gallium/drivers/nouveau/nv30/nv30_screen.c @@ -175,6 +175,7 @@ nv30_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index aafca71..a1dcfda 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -218,6 +218,7 @@ nv50_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: case PIPE_CAP_DRAW_PARAMETERS: + case PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT: return 0;
[Mesa-dev] [PATCH 2/8] ureg: add buffer support to ureg
Signed-off-by: Ilia Mirkin--- src/gallium/auxiliary/tgsi/tgsi_dump.c | 5 +++ src/gallium/auxiliary/tgsi/tgsi_strings.c | 1 + src/gallium/auxiliary/tgsi/tgsi_text.c | 5 +++ src/gallium/auxiliary/tgsi/tgsi_ureg.c | 52 ++ src/gallium/auxiliary/tgsi/tgsi_ureg.h | 3 ++ src/gallium/include/pipe/p_shader_tokens.h | 4 ++- 6 files changed, 69 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c b/src/gallium/auxiliary/tgsi/tgsi_dump.c index dad3839..de3aae5 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c @@ -359,6 +359,11 @@ iter_declaration( TXT(", RAW"); } + if (decl->Declaration.File == TGSI_FILE_BUFFER) { + if (decl->Declaration.Atomic) + TXT(", ATOMIC"); + } + if (decl->Declaration.File == TGSI_FILE_SAMPLER_VIEW) { TXT(", "); ENM(decl->SamplerView.Resource, tgsi_texture_names); diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c b/src/gallium/auxiliary/tgsi/tgsi_strings.c index ae30399..c0dd044 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c @@ -56,6 +56,7 @@ static const char *tgsi_file_names[] = "SV", "IMAGE", "SVIEW", + "BUFFER", }; const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] = diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c index a45ab90..d72d843 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_text.c +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c @@ -1350,6 +1350,11 @@ static boolean parse_declaration( struct translate_ctx *ctx ) decl.SamplerView.ReturnTypeX; } ctx->cur = cur; + } else if (file == TGSI_FILE_BUFFER) { + if (str_match_nocase_whole(, "ATOMIC")) { +decl.Declaration.Atomic = 1; +ctx->cur = cur; + } } else { if (str_match_nocase_whole(, "LOCAL")) { decl.Declaration.Local = 1; diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index ee23df9..6d5092b 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -165,6 +165,12 @@ struct ureg_program } image[PIPE_MAX_SHADER_IMAGES]; unsigned nr_images; + struct { + unsigned index; + bool atomic; + } buffer[PIPE_MAX_SHADER_BUFFERS]; + unsigned nr_buffers; + struct util_bitmask *free_temps; struct util_bitmask *local_temps; struct util_bitmask *decl_temps; @@ -689,6 +695,29 @@ ureg_DECL_image(struct ureg_program *ureg, return reg; } +/* Allocate a new buffer. + */ +struct ureg_src ureg_DECL_buffer(struct ureg_program *ureg, unsigned nr, + bool atomic) +{ + struct ureg_src reg = ureg_src_register(TGSI_FILE_BUFFER, nr); + unsigned i; + + for (i = 0; i < ureg->nr_buffers; i++) + if (ureg->buffer[i].index == nr) + return reg; + + if (i < PIPE_MAX_SHADER_BUFFERS) { + ureg->buffer[i].index = nr; + ureg->buffer[i].atomic = atomic; + ureg->nr_buffers++; + return reg; + } + + assert(0); + return reg; +} + static int match_or_expand_immediate64( const unsigned *v, int type, @@ -1546,6 +1575,25 @@ emit_decl_image(struct ureg_program *ureg, } static void +emit_decl_buffer(struct ureg_program *ureg, + unsigned index, + bool atomic) +{ + union tgsi_any_token *out = get_tokens(ureg, DOMAIN_DECL, 2); + + out[0].value = 0; + out[0].decl.Type = TGSI_TOKEN_TYPE_DECLARATION; + out[0].decl.NrTokens = 2; + out[0].decl.File = TGSI_FILE_BUFFER; + out[0].decl.UsageMask = 0xf; + out[0].decl.Atomic = atomic; + + out[1].value = 0; + out[1].decl_range.First = index; + out[1].decl_range.Last = index; +} + +static void emit_immediate( struct ureg_program *ureg, const unsigned *v, unsigned type ) @@ -1713,6 +1761,10 @@ static void emit_decls( struct ureg_program *ureg ) ureg->image[i].raw); } + for (i = 0; i < ureg->nr_buffers; i++) { + emit_decl_buffer(ureg, ureg->buffer[i].index, ureg->buffer[i].atomic); + } + if (ureg->const_decls.nr_constant_ranges) { for (i = 0; i < ureg->const_decls.nr_constant_ranges; i++) { emit_decl_range(ureg, diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.h b/src/gallium/auxiliary/tgsi/tgsi_ureg.h index bba2afb..e25c961 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.h +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.h @@ -335,6 +335,9 @@ ureg_DECL_image(struct ureg_program *ureg, boolean wr, boolean raw); +struct ureg_src +ureg_DECL_buffer(struct ureg_program *ureg, unsigned nr, bool atomic); + static inline struct ureg_src ureg_imm4f( struct ureg_program *ureg, float a,
[Mesa-dev] [PATCH 8/8] gallium: add a RESQ opcode to query info about a resource
Signed-off-by: Ilia Mirkin--- src/gallium/auxiliary/tgsi/tgsi_info.c | 2 +- src/gallium/docs/source/tgsi.rst | 12 src/gallium/include/pipe/p_shader_tokens.h | 1 + 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index 8a0e9c4..bdd4688 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -142,7 +142,7 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 0, 0, 0, 0, 0, 1, 0, NONE, "ENDSUB", TGSI_OPCODE_ENDSUB }, { 1, 1, 1, 0, 0, 0, 0, OTHR, "TXQ_LZ", TGSI_OPCODE_TXQ_LZ }, { 1, 1, 1, 0, 0, 0, 0, OTHR, "TXQS", TGSI_OPCODE_TXQS }, - { 0, 0, 0, 0, 0, 0, 0, NONE, "", 105 }, /* removed */ + { 1, 1, 0, 0, 0, 0, 0, NONE, "RESQ", TGSI_OPCODE_RESQ }, { 0, 0, 0, 0, 0, 0, 0, NONE, "", 106 }, /* removed */ { 0, 0, 0, 0, 0, 0, 0, NONE, "NOP", TGSI_OPCODE_NOP }, { 1, 2, 0, 0, 0, 0, 0, COMP, "FSEQ", TGSI_OPCODE_FSEQ }, diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index a3151e3..f4b8c78 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -2299,6 +2299,18 @@ Resource Access Opcodes texture arrays and 2D textures. address.w is always ignored. +.. opcode:: RESQ - Query information about a resource + + Syntax: ``RESQ dst, resource`` + + Example: ``RESQ TEMP[0], BUFFER[0]`` + + Returns information about the buffer or image resource. For buffer + resources, the size (in bytes) is returned in the x component. For + image resources, .xyz will contain the width/height/layers of the + image, while .w will contain the number of samples for multi-sampled + images. + .. _threadsyncopcodes: diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index 43a5561..f300207 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -411,6 +411,7 @@ struct tgsi_property_data { #define TGSI_OPCODE_ENDSUB 102 #define TGSI_OPCODE_TXQ_LZ 103 /* TXQ for mipmap level 0 */ #define TGSI_OPCODE_TXQS104 +#define TGSI_OPCODE_RESQ105 /* gap */ #define TGSI_OPCODE_NOP 107 -- 2.4.10 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] i965: Add state bit to trigger re-emission of color calculator state.
This will be used on Gen8+ to make sure that the color calculator state pointers are re-emitted when switching back to the 3D pipeline after some GPGPU workload due to a hardware workaround. There are other state bits already defined that could be used to achieve the same effect but they all cause a ton of unrelated state to be re-emitted (e.g. BRW_NEW_STATE_BASE_ADDRESS), so just define a new one, state bits are cheap. --- src/mesa/drivers/dri/i965/brw_context.h | 2 ++ src/mesa/drivers/dri/i965/brw_state_upload.c | 1 + src/mesa/drivers/dri/i965/gen6_cc.c | 1 + 3 files changed, 4 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 7b0340f..b80db00 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -221,6 +221,7 @@ enum brw_state_id { BRW_STATE_COMPUTE_PROGRAM, BRW_STATE_CS_WORK_GROUPS, BRW_STATE_URB_SIZE, + BRW_STATE_CC_STATE, BRW_NUM_STATE_BITS }; @@ -309,6 +310,7 @@ enum brw_state_id { #define BRW_NEW_COMPUTE_PROGRAM (1ull << BRW_STATE_COMPUTE_PROGRAM) #define BRW_NEW_CS_WORK_GROUPS (1ull << BRW_STATE_CS_WORK_GROUPS) #define BRW_NEW_URB_SIZE(1ull << BRW_STATE_URB_SIZE) +#define BRW_NEW_CC_STATE(1ull << BRW_STATE_CC_STATE) struct brw_state_flags { /** State update flags signalled by mesa internals */ diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index 2a671a58d..876e130 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -664,6 +664,7 @@ static struct dirty_bit_map brw_bits[] = { DEFINE_BIT(BRW_NEW_COMPUTE_PROGRAM), DEFINE_BIT(BRW_NEW_CS_WORK_GROUPS), DEFINE_BIT(BRW_NEW_URB_SIZE), + DEFINE_BIT(BRW_NEW_CC_STATE), {0, 0, 0} }; diff --git a/src/mesa/drivers/dri/i965/gen6_cc.c b/src/mesa/drivers/dri/i965/gen6_cc.c index 3bab8f4..cee139b 100644 --- a/src/mesa/drivers/dri/i965/gen6_cc.c +++ b/src/mesa/drivers/dri/i965/gen6_cc.c @@ -298,6 +298,7 @@ const struct brw_tracked_state gen6_color_calc_state = { .mesa = _NEW_COLOR | _NEW_STENCIL, .brw = BRW_NEW_BATCH | + BRW_NEW_CC_STATE | BRW_NEW_STATE_BASE_ADDRESS, }, .emit = gen6_upload_color_calc_state, -- 2.6.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] i965/gen4-5: Emit MI_FLUSH as required prior to switching pipelines.
AFAIK brw_emit_select_pipeline() is only called once during context init on Gen4-5, at which point the pipeline is likely to be already idle so it may just happen to work by luck regardless of the MI_FLUSH. --- src/mesa/drivers/dri/i965/brw_misc_state.c | 13 + 1 file changed, 13 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c index 75540c1..e5af1da 100644 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c @@ -914,6 +914,19 @@ brw_emit_select_pipeline(struct brw_context *brw, enum brw_pipeline pipeline) PIPE_CONTROL_STATE_CACHE_INVALIDATE | PIPE_CONTROL_INSTRUCTION_INVALIDATE | PIPE_CONTROL_NO_WRITE); + + } else { + /* From "BXML » GT » MI » vol1a GPU Overview » [Instruction] + * PIPELINE_SELECT [DevBWR+]": + * + * Project: PRE-DEVSNB + * + * Software must ensure the current pipeline is flushed via an + * MI_FLUSH or PIPE_CONTROL prior to the execution of PIPELINE_SELECT. + */ + BEGIN_BATCH(1); + OUT_BATCH(MI_FLUSH); + ADVANCE_BATCH(); } /* Select the pipeline */ -- 2.6.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] i965/gen6-7: Implement stall and flushes required prior to switching pipelines.
Switching the current pipeline while it's not completely idle or the read and write caches aren't flushed can lead to corruption. Fixes misrendering of at least the following Khronos CTS test: ES31-CTS.shader_image_load_store.basic-allTargets-store-fs The stall and flushes are no longer required on Gen8+. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93323 --- src/mesa/drivers/dri/i965/brw_misc_state.c | 28 1 file changed, 28 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c index 7d53d18..75540c1 100644 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c @@ -886,6 +886,34 @@ brw_emit_select_pipeline(struct brw_context *brw, enum brw_pipeline pipeline) brw->ctx.NewDriverState |= BRW_NEW_CC_STATE; } + + } else if (brw->gen >= 6) { + /* From "BXML » GT » MI » vol1a GPU Overview » [Instruction] + * PIPELINE_SELECT [DevBWR+]": + * + * Project: DEVSNB+ + * + * Software must ensure all the write caches are flushed through a + * stalling PIPE_CONTROL command followed by another PIPE_CONTROL + * command to invalidate read only caches prior to programming + * MI_PIPELINE_SELECT command to change the Pipeline Select Mode. + */ + const unsigned dc_flush = + brw->gen >= 7 ? PIPE_CONTROL_DATA_CACHE_INVALIDATE : 0; + + brw_emit_pipe_control_flush(brw, + PIPE_CONTROL_RENDER_TARGET_FLUSH | + PIPE_CONTROL_DEPTH_CACHE_FLUSH | + dc_flush | + PIPE_CONTROL_NO_WRITE | + PIPE_CONTROL_CS_STALL); + + brw_emit_pipe_control_flush(brw, + PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | + PIPE_CONTROL_CONST_CACHE_INVALIDATE | + PIPE_CONTROL_STATE_CACHE_INVALIDATE | + PIPE_CONTROL_INSTRUCTION_INVALIDATE | + PIPE_CONTROL_NO_WRITE); } /* Select the pipeline */ -- 2.6.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] i965/gen7: Emit stall and dummy primitive draw after switching to the 3D pipeline.
This hardware bug can supposedly lead to a hang on IVB and VLV. --- src/mesa/drivers/dri/i965/brw_misc_state.c | 24 1 file changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c index e5af1da..2263604 100644 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c @@ -935,6 +935,30 @@ brw_emit_select_pipeline(struct brw_context *brw, enum brw_pipeline pipeline) (brw->gen >= 9 ? (3 << 8) : 0) | (pipeline == BRW_COMPUTE_PIPELINE ? 2 : 0)); ADVANCE_BATCH(); + + if (brw->gen == 7 && !brw->is_haswell && + pipeline == BRW_RENDER_PIPELINE) { + /* From "BXML » GT » MI » vol1a GPU Overview » [Instruction] + * PIPELINE_SELECT [DevBWR+]": + * + * Project: DEVIVB, DEVHSW:GT3:A0 + * + * Software must send a pipe_control with a CS stall and a post sync + * operation and then a dummy DRAW after every MI_SET_CONTEXT and + * after any PIPELINE_SELECT that is enabling 3D mode. + */ + gen7_emit_cs_stall_flush(brw); + + BEGIN_BATCH(7); + OUT_BATCH(CMD_3D_PRIM << 16 | (7 - 2)); + OUT_BATCH(_3DPRIM_POINTLIST); + OUT_BATCH(0); + OUT_BATCH(0); + OUT_BATCH(0); + OUT_BATCH(0); + OUT_BATCH(0); + ADVANCE_BATCH(); + } } /** -- 2.6.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] i965/gen8+: Invalidate color calc state when switching to the GPGPU pipeline.
This hardware bug can cause a hang on context restore while the current pipeline is set to GPGPU (BDWGFX HSD 1909593). In addition to clearing the valid bit, mark the CC state as dirty to make sure that the CC indirect state pointer is re-emitted when we switch back to the 3D pipeline. --- src/mesa/drivers/dri/i965/brw_misc_state.c | 20 1 file changed, 20 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c index cf6ba5b..7d53d18 100644 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c @@ -868,6 +868,26 @@ brw_emit_select_pipeline(struct brw_context *brw, enum brw_pipeline pipeline) const uint32_t _3DSTATE_PIPELINE_SELECT = is_965 ? CMD_PIPELINE_SELECT_965 : CMD_PIPELINE_SELECT_GM45; + if (brw->gen >= 8) { + /* From "BXML » GT » MI » vol1a GPU Overview » [Instruction] + * PIPELINE_SELECT [DevBWR+]": + * + * Project: BDW, SKL + * + * Software must clear the COLOR_CALC_STATE Valid field in + * 3DSTATE_CC_STATE_POINTERS command prior to send a PIPELINE_SELECT + * with Pipeline Select set to GPGPU. + */ + if (pipeline == BRW_COMPUTE_PIPELINE) { + BEGIN_BATCH(2); + OUT_BATCH(_3DSTATE_CC_STATE_POINTERS << 16 | (2 - 2)); + OUT_BATCH(0); + ADVANCE_BATCH(); + + brw->ctx.NewDriverState |= BRW_NEW_CC_STATE; + } + } + /* Select the pipeline */ BEGIN_BATCH(1); OUT_BATCH(_3DSTATE_PIPELINE_SELECT << 16 | -- 2.6.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] i965/gen7.5+: Disable resource streamer during GPGPU workloads.
The RS and hardware binding tables are only supported on the 3D pipeline and can lead to corruption if left enabled during a GPGPU workload. Disable it when switching to the GPGPU (or media) pipeline and re-enable it when switching back to the 3D pipeline. --- src/mesa/drivers/dri/i965/brw_binding_tables.c | 2 +- src/mesa/drivers/dri/i965/brw_misc_state.c | 38 ++ src/mesa/drivers/dri/i965/brw_state.h | 1 + 3 files changed, 40 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c b/src/mesa/drivers/dri/i965/brw_binding_tables.c index 80935cf..5c5aa0e 100644 --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c @@ -364,7 +364,7 @@ gen7_disable_hw_binding_tables(struct brw_context *brw) /** * Enable hardware binding tables and set up the binding table pool. */ -static void +void gen7_enable_hw_binding_tables(struct brw_context *brw) { if (!brw->use_resource_streamer) diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c index 2263604..7e68838 100644 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c @@ -868,6 +868,25 @@ brw_emit_select_pipeline(struct brw_context *brw, enum brw_pipeline pipeline) const uint32_t _3DSTATE_PIPELINE_SELECT = is_965 ? CMD_PIPELINE_SELECT_965 : CMD_PIPELINE_SELECT_GM45; + if (brw->use_resource_streamer && pipeline != BRW_RENDER_PIPELINE) { + /* From "BXML » GT » MI » vol1a GPU Overview » [Instruction] + * PIPELINE_SELECT [DevBWR+]": + * + * Project: HSW, BDW, CHV, SKL, BXT + * + * Hardware Binding Tables are only supported for 3D workloads. Resource + * streamer must be enabled only for 3D workloads. Resource streamer + * must be disabled for Media and GPGPU workloads. + */ + BEGIN_BATCH(1); + OUT_BATCH(MI_RS_CONTROL | 0); + ADVANCE_BATCH(); + + gen7_disable_hw_binding_tables(brw); + + /* XXX - Disable gather constant pool too when we start using it. */ + } + if (brw->gen >= 8) { /* From "BXML » GT » MI » vol1a GPU Overview » [Instruction] * PIPELINE_SELECT [DevBWR+]": @@ -959,6 +978,25 @@ brw_emit_select_pipeline(struct brw_context *brw, enum brw_pipeline pipeline) OUT_BATCH(0); ADVANCE_BATCH(); } + + if (brw->use_resource_streamer && pipeline == BRW_RENDER_PIPELINE) { + /* From "BXML » GT » MI » vol1a GPU Overview » [Instruction] + * PIPELINE_SELECT [DevBWR+]": + * + * Project: HSW, BDW, CHV, SKL, BXT + * + * Hardware Binding Tables are only supported for 3D workloads. Resource + * streamer must be enabled only for 3D workloads. Resource streamer + * must be disabled for Media and GPGPU workloads. + */ + BEGIN_BATCH(1); + OUT_BATCH(MI_RS_CONTROL | 1); + ADVANCE_BATCH(); + + gen7_enable_hw_binding_tables(brw); + + /* XXX - Re-enable gather constant pool here. */ + } } /** diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index d29b997..7d61b7c 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -396,6 +396,7 @@ void gen7_update_binding_table_from_array(struct brw_context *brw, gl_shader_stage stage, const uint32_t* binding_table, int num_surfaces); +void gen7_enable_hw_binding_tables(struct brw_context *brw); void gen7_disable_hw_binding_tables(struct brw_context *brw); void gen7_reset_hw_bt_pool_offsets(struct brw_context *brw); -- 2.6.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/6] i965: GPGPU/3D pipeline switching fixes.
The PIPELINE_SELECT command has a number of awkward restrictions we don't currently take into account while switching between the GPGPU and 3D pipeline, what in some cases can lead to corruption or hangs. This series should implement all workarounds mentioned in the hardware spec ("BXML » GT » MI » vol1a GPU Overview » [Instruction] PIPELINE_SELECT [DevBWR+]") that seem to be relevant to us. [PATCH 1/6] i965: Add state bit to trigger re-emission of color calculator state. [PATCH 2/6] i965/gen8+: Invalidate color calc state when switching to the GPGPU pipeline. [PATCH 3/6] i965/gen6-7: Implement stall and flushes required prior to switching pipelines. [PATCH 4/6] i965/gen4-5: Emit MI_FLUSH as required prior to switching pipelines. [PATCH 5/6] i965/gen7: Emit stall and dummy primitive draw after switching to the 3D pipeline. [PATCH 6/6] i965/gen7.5+: Disable resource streamer during GPGPU workloads. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev