I would think that the LLVM backend should be able recognize that it's spilling an array of literals.
Marek On Thu, Oct 30, 2014 at 1:43 PM, Marek Olšák <mar...@gmail.com> wrote: > Hi Tom, > > Could you please elaborate on how we should change the TGSI->LLVM translation? > > Marek > > On Fri, Oct 17, 2014 at 4:49 PM, Tom Stellard <t...@stellard.net> wrote: >> On Wed, Oct 15, 2014 at 05:32:11PM -0700, Kenneth Graunke wrote: >>> Consider GLSL code such as: >>> >>> const ivec2 offsets[] = >>> ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1), >>> ivec2(0, -1), ivec2(0, 0), ivec2(0, 1), >>> ivec2(1, -1), ivec2(1, 0), ivec2(1, 1)); >>> >>> ivec2 offset = offsets[<non-constant expression>]; >>> >> >> I don't think we necessarily want this pass for radeonsi. We produce >> efficient code for these kinds of arrays in OpenCL, I think we just need to >> change how we translate TGSI to LLVM. >> >> -Tom >> >>> Both i965 and nv50 currently handle this very poorly. On i965, this >>> becomes a pile of MOVs to load the immediate constants into registers, >>> a pile of scratch writes to move the whole array to memory, and one >>> scratch read to actually access the value - effectively the same as if >>> it were a non-constant array. >>> >>> We'd much rather upload large blocks of constant data as uniform data, >>> so drivers can simply upload the data via constbufs, and not have to >>> populate it via shader instructions. >>> >>> This is currently non-optional because both i965 and nouveau benefit >>> from it - we can revisit that if another driver actually benefits. >>> >>> Improves performance in a terrain rendering microbenchmark by about 2x, >>> and cuts the number of instructions in about half. Helps a lot of >>> "Natural Selection 2" shaders, as well as one "HOARD" shader. >>> >>> total instructions in shared programs: 5473459 -> 5471765 (-0.03%) >>> instructions in affected programs: 5880 -> 4186 (-28.81%) >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957 >>> Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> >>> >>> Reviewers: suggestions for better max_array_access handling are welcome. >>> Reviewers: This changes the const-ness of the expression. Is that a >>> problem? >>> --- >>> src/glsl/Makefile.sources | 1 + >>> src/glsl/ir_optimization.h | 1 + >>> src/glsl/linker.cpp | 2 + >>> src/glsl/lower_const_arrays_to_uniforms.cpp | 101 >>> ++++++++++++++++++++++++++++ >>> 4 files changed, 105 insertions(+) >>> create mode 100644 src/glsl/lower_const_arrays_to_uniforms.cpp >>> >>> I've had this patch sitting around since April, and been pondering whether >>> we should improve it somehow. But...it helps certain shaders a ton, and I >>> haven't seen anything hurt by it. So I'm wondering if we should just land >>> it; we can always improve things later. >>> >>> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources >>> index 0c55327..6aed52d 100644 >>> --- a/src/glsl/Makefile.sources >>> +++ b/src/glsl/Makefile.sources >>> @@ -58,6 +58,7 @@ LIBGLSL_FILES = \ >>> $(GLSL_SRCDIR)/loop_analysis.cpp \ >>> $(GLSL_SRCDIR)/loop_controls.cpp \ >>> $(GLSL_SRCDIR)/loop_unroll.cpp \ >>> + $(GLSL_SRCDIR)/lower_const_arrays_to_uniforms.cpp \ >>> $(GLSL_SRCDIR)/lower_clip_distance.cpp \ >>> $(GLSL_SRCDIR)/lower_discard.cpp \ >>> $(GLSL_SRCDIR)/lower_discard_flow.cpp \ >>> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h >>> index e25857a..34e0b4b 100644 >>> --- a/src/glsl/ir_optimization.h >>> +++ b/src/glsl/ir_optimization.h >>> @@ -114,6 +114,7 @@ bool lower_noise(exec_list *instructions); >>> bool lower_variable_index_to_cond_assign(exec_list *instructions, >>> bool lower_input, bool lower_output, bool lower_temp, bool >>> lower_uniform); >>> bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz); >>> +bool lower_const_arrays_to_uniforms(exec_list *instructions); >>> bool lower_clip_distance(gl_shader *shader); >>> void lower_output_reads(exec_list *instructions); >>> bool lower_packing_builtins(exec_list *instructions, int op_mask); >>> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp >>> index 47a722d..2a69b78 100644 >>> --- a/src/glsl/linker.cpp >>> +++ b/src/glsl/linker.cpp >>> @@ -2692,6 +2692,8 @@ link_shaders(struct gl_context *ctx, struct >>> gl_shader_program *prog) >>> &ctx->Const.ShaderCompilerOptions[i], >>> ctx->Const.NativeIntegers)) >>> ; >>> + >>> + lower_const_arrays_to_uniforms(prog->_LinkedShaders[i]->ir); >>> } >>> >>> /* Check and validate stream emissions in geometry shaders */ >>> diff --git a/src/glsl/lower_const_arrays_to_uniforms.cpp >>> b/src/glsl/lower_const_arrays_to_uniforms.cpp >>> new file mode 100644 >>> index 0000000..2085086 >>> --- /dev/null >>> +++ b/src/glsl/lower_const_arrays_to_uniforms.cpp >>> @@ -0,0 +1,101 @@ >>> +/* >>> + * Copyright © 2014 Intel Corporation >>> + * >>> + * Permission is hereby granted, free of charge, to any person obtaining a >>> + * copy of this software and associated documentation files (the >>> "Software"), >>> + * to deal in the Software without restriction, including without >>> limitation >>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense, >>> + * and/or sell copies of the Software, and to permit persons to whom the >>> + * Software is furnished to do so, subject to the following conditions: >>> + * >>> + * The above copyright notice and this permission notice (including the >>> next >>> + * paragraph) shall be included in all copies or substantial portions of >>> the >>> + * Software. >>> + * >>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS >>> OR >>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, >>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL >>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >>> OTHER >>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING >>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>> + * DEALINGS IN THE SOFTWARE. >>> + */ >>> + >>> +/** >>> + * \file lower_const_arrays_to_uniforms.cpp >>> + * >>> + * Lower constant arrays to uniform arrays. >>> + * >>> + * Some driver backends (such as i965 and nouveau) don't handle constant >>> arrays >>> + * gracefully, instead treating them as ordinary writable temporary arrays. >>> + * Since arrays can be large, this often means spilling them to scratch >>> memory, >>> + * which usually involves a large number of instructions. >>> + * >>> + * This must be called prior to link_set_uniform_initializers(); we need >>> the >>> + * linker to process our new uniform's constant initializer. >>> + * >>> + * This should be called after optimizations, since those can result in >>> + * splitting and removing arrays that are indexed by constant expressions. >>> + */ >>> +#include "ir.h" >>> +#include "ir_visitor.h" >>> +#include "ir_rvalue_visitor.h" >>> +#include "glsl_types.h" >>> + >>> +namespace { >>> +class lower_const_array_visitor : public ir_rvalue_visitor { >>> +public: >>> + lower_const_array_visitor(exec_list *insts) >>> + { >>> + instructions = insts; >>> + progress = false; >>> + } >>> + >>> + bool run() >>> + { >>> + visit_list_elements(this, instructions); >>> + return progress; >>> + } >>> + >>> + void handle_rvalue(ir_rvalue **rvalue); >>> + >>> +private: >>> + exec_list *instructions; >>> + bool progress; >>> +}; >>> + >>> +void >>> +lower_const_array_visitor::handle_rvalue(ir_rvalue **rvalue) >>> +{ >>> + if (!*rvalue) >>> + return; >>> + >>> + ir_constant *con = (*rvalue)->as_constant(); >>> + if (!con || !con->type->is_array()) >>> + return; >>> + >>> + void *mem_ctx = ralloc_parent(con); >>> + >>> + ir_variable *uni = >>> + new(mem_ctx) ir_variable(con->type, "constarray", ir_var_uniform); >>> + uni->constant_initializer = con; >>> + uni->constant_value = con; >>> + uni->data.has_initializer = true; >>> + uni->data.read_only = true; >>> + /* Assume the whole thing is accessed. */ >>> + uni->data.max_array_access = uni->type->length - 1; >>> + instructions->push_head(uni); >>> + >>> + *rvalue = new(mem_ctx) ir_dereference_variable(uni); >>> + >>> + progress = true; >>> +} >>> + >>> +} /* anonymous namespace */ >>> + >>> +bool >>> +lower_const_arrays_to_uniforms(exec_list *instructions) >>> +{ >>> + lower_const_array_visitor v(instructions); >>> + return v.run(); >>> +} >>> -- >>> 2.1.2 >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev