Iago Toral Quiroga <ito...@igalia.com> writes: > This allows us to re-use the results of previous ssbo loads in situations > that are safe (i.e. when there are no stores, atomic operations or > memory barriers in between). > > This is particularly useful for things like matrix multiplications, where > for a mat4 buffer variable we cut the number of loads from 16 (4 reads of > each column) down to 4 (1 read of each column). > > The pass can only cache ssbo loads that involve constant blocks and > offsets, but could be extended to compare sub-expressions for these > as well, similar to a CSE pass. > > The way the cache works is simple: ssbo loads with constant block/offset > are included in a cache as they are seen. Stores invalidate cache entries. > Stores with non-constant offset invalidate all cached loads for the block > and stores with non-constant block invalidate all cache entries. There is > room to improve this by using the actual variable name we are accessing to > limit the entries that should be invalidated. We also need to invalidate > cache entries when we exit the block in which they have been defined > (i.e. inside if/else blocks or loops). > > The cache optimization is built as a separate pass, instead of merging it > inside the lower_ubo_reference pass for a number of reasons: > > 1) The way we process assignments in visitors is that the LHS is > processed before the RHS. This creates a problem for an optimization > such as this when we do things like a = a + 1, since we would see the > store before the read when the actual execution order is reversed. > This could be fixed by re-implementing the logic in the visit_enter > method for ir_assignment in lower_ubo_reference and then returning > visit_continue_with_parent. > > 2) Some writes/reads need to be split into multiple smaller > writes/reads, and we need to handle caching for each one. This happens > deep inside the code that handles the lowering and some > of the information we need to do this is not available. This could also > be fixed by passing more data into the corresponding functions or by > making this data available as class members, but the current implementation > is already complex enough and this would only contribute to the complexity. > > 3) We can have ssbo loads in the LHS too (i.e. a[a[0]] = ..). In these cases > the current code in lower_uo_reference would see the store before the read. > Probably fixable, but again would add more complexity to the lowering. > > On the other hand, a separate pass that runs after the lowering sees > all the individal loads and stores in the correct order (so we don't need > to do any tricks) and it allows us to sepearate the lowering logic (which > is already complex) from the caching logic. It also gives us a chance to > run it after other optimization passes have run and turned constant > expressions for block/offset into constants, enabling more opportunities > for caching.
Seems like a restricted form of CSE that only handles SSBO loads, and only the ones with constant arguments. Why don't we CSE these? (and other memory access operations like image loads) > --- > src/glsl/Makefile.sources | 1 + > src/glsl/ir_optimization.h | 1 + > src/glsl/opt_ssbo_load.cpp | 338 > +++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 340 insertions(+) > create mode 100644 src/glsl/opt_ssbo_load.cpp > > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources > index ca87036..73c7514 100644 > --- a/src/glsl/Makefile.sources > +++ b/src/glsl/Makefile.sources > @@ -201,6 +201,7 @@ LIBGLSL_FILES = \ > opt_noop_swizzle.cpp \ > opt_rebalance_tree.cpp \ > opt_redundant_jumps.cpp \ > + opt_ssbo_load.cpp \ > opt_structure_splitting.cpp \ > opt_swizzle_swizzle.cpp \ > opt_tree_grafting.cpp \ > diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h > index ce5c492..26677d7 100644 > --- a/src/glsl/ir_optimization.h > +++ b/src/glsl/ir_optimization.h > @@ -125,6 +125,7 @@ bool lower_clip_distance(gl_shader *shader); > void lower_output_reads(unsigned stage, exec_list *instructions); > bool lower_packing_builtins(exec_list *instructions, int op_mask); > void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions); > +bool opt_ssbo_loads(struct gl_shader *shader, exec_list *instructions); > void lower_packed_varyings(void *mem_ctx, > unsigned locations_used, ir_variable_mode mode, > unsigned gs_input_vertices, gl_shader *shader); > diff --git a/src/glsl/opt_ssbo_load.cpp b/src/glsl/opt_ssbo_load.cpp > new file mode 100644 > index 0000000..5404907 > --- /dev/null > +++ b/src/glsl/opt_ssbo_load.cpp > @@ -0,0 +1,338 @@ > +/* > + * Copyright © 2015 Intel Corporation > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > + * DEALINGS IN THE SOFTWARE. > + */ > + > +/** > + * \file opt_ssbo_load.cpp > + * > + * IR optimization pass to reduce the number of SSBO loads by re-using > previous > + * loads when it is safe to do so (i.e. no stores have invalidated the load, > + * no memory barriers in between, etc) > + */ > + > +#include "ir.h" > +#include "ir_builder.h" > +#include "program/hash_table.h" > + > +using namespace ir_builder; > + > +namespace { > +struct ssbo_load_cache_remove_closure { > + struct hash_table *ht; > + const char *key_prefix; > + unsigned depth; > +}; > + > +struct ssbo_load_cache_entry { > + ir_dereference *load; > + unsigned depth; > +}; > + > +class ssbo_load_cache_visitor : public ir_hierarchical_visitor { > +public: > + ssbo_load_cache_visitor(gl_shader *shader) > + : current_depth(0), progress(false) > + { > + mem_ctx = ralloc_parent(shader->ir); > + ssbo_load_cache = > + hash_table_ctor(0, hash_table_string_hash, > hash_table_string_compare); > + } > + > + ~ssbo_load_cache_visitor() > + { > + hash_table_dtor(ssbo_load_cache); > + } > + > + virtual ir_visitor_status visit_enter(ir_call *); > + > + char *build_ssbo_load_cache_key(unsigned block_index, > + unsigned offset); > + ir_dereference *ssbo_load_cache_find(ir_rvalue *block_index, > + ir_rvalue *offset); > + void ssbo_load_cache_add(ir_dereference *deref, > + ir_rvalue *block_index, > + ir_rvalue *offset, > + unsigned depth); > + void ssbo_load_cache_remove(ir_rvalue *block_index, > + ir_rvalue *offset); > + void ssbo_load_cache_remove_by_key_prefix(const char *prefix); > + void ssbo_load_cache_remove_all(); > + void ssbo_load_cache_remove_depth(unsigned depth); > + > + ir_visitor_status visit_enter(ir_if *ir); > + ir_visitor_status visit_enter(ir_loop *ir); > + ir_visitor_status visit_leave(ir_loop *ir); > + > + bool get_progress() { return this->progress; } > + > +private: > + void *mem_ctx; > + unsigned current_depth; > + struct hash_table *ssbo_load_cache; > + bool progress; > +}; > + > +ir_visitor_status > +ssbo_load_cache_visitor::visit_enter(ir_call *ir) > +{ > + if (!ir->callee->is_intrinsic) > + return visit_continue_with_parent; > + > + if (!strcmp(ir->callee_name(), "__intrinsic_load_ssbo")) { > + exec_node *param = ir->actual_parameters.get_head(); > + ir_rvalue *block = ((ir_instruction *)param)->as_rvalue(); > + > + param = param->get_next(); > + ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue(); > + > + ir_dereference *cached_load = ssbo_load_cache_find(block, offset); > + if (cached_load) { > + ir_variable *var = ir->return_deref->variable_referenced(); > + base_ir->insert_before(assign(var, cached_load->clone(mem_ctx, > NULL))); > + ir->remove(); > + this->progress = true; > + } else { > + ssbo_load_cache_add(ir->return_deref, block, offset, > + this->current_depth); > + } > + } else if (!strcmp(ir->callee_name(), "__intrinsic_store_ssbo")) { > + exec_node *param = ir->actual_parameters.get_head(); > + ir_rvalue *block = ((ir_instruction *)param)->as_rvalue(); > + > + param = param->get_next(); > + ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue(); > + > + ssbo_load_cache_remove(block, offset); > + } else if (strstr(ir->callee_name(), "__intrinsic_ssbo_atomic") == > + ir->callee_name()) { > + exec_node *param = ir->actual_parameters.get_head(); > + ir_rvalue *block = ((ir_instruction *)param)->as_rvalue(); > + > + param = param->get_next(); > + ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue(); > + > + ssbo_load_cache_remove(block, offset); > + } else if (!strcmp(ir->callee_name(), "__intrinsic_memory_barrier")) { > + ssbo_load_cache_remove_all(); > + } > + > + return visit_continue_with_parent; > +} > + > +ir_visitor_status > +ssbo_load_cache_visitor::visit_enter(ir_if *ir) > +{ > + ir->condition->accept(this); > + > + this->current_depth++; > + > + if (!ir->then_instructions.is_empty()) { > + visit_list_elements(this, &ir->then_instructions); > + ssbo_load_cache_remove_depth(this->current_depth); > + } > + > + if (!ir->else_instructions.is_empty()) { > + visit_list_elements(this, &ir->else_instructions); > + ssbo_load_cache_remove_depth(this->current_depth); > + } > + > + this->current_depth--; > + > + return visit_continue_with_parent; > +} > + > +ir_visitor_status > +ssbo_load_cache_visitor::visit_enter(ir_loop *ir) > +{ > + this->current_depth++; > + return visit_continue; > +} > + > +ir_visitor_status > +ssbo_load_cache_visitor::visit_leave(ir_loop *ir) > +{ > + ssbo_load_cache_remove_depth(this->current_depth); > + this->current_depth--; > + return visit_continue; > +} > + > +char * > +ssbo_load_cache_visitor::build_ssbo_load_cache_key(unsigned block_index, > + unsigned offset) > +{ > + return ralloc_asprintf(mem_ctx, "%u-%u", block_index, offset); > +} > + > +ir_dereference * > +ssbo_load_cache_visitor::ssbo_load_cache_find(ir_rvalue *block_index, > + ir_rvalue *offset) > +{ > + ir_constant *const_block_index = block_index->as_constant(); > + if (!const_block_index) > + return NULL; > + > + ir_constant *const_offset = offset->as_constant(); > + if (!const_offset) > + return NULL; > + > + char *cache_key = > + build_ssbo_load_cache_key(const_block_index->value.u[0], > + const_offset->value.u[0]); > + > + struct ssbo_load_cache_entry *entry = (struct ssbo_load_cache_entry *) > + hash_table_find(ssbo_load_cache, cache_key); > + return entry ? entry->load : NULL; > +} > + > +void > +ssbo_load_cache_visitor::ssbo_load_cache_add(ir_dereference *deref, > + ir_rvalue *block_index, > + ir_rvalue *offset, > + unsigned depth) > +{ > + /* We only support caching SSBO loads with constant block and offset */ > + ir_constant *const_block_index = block_index->as_constant(); > + if (!const_block_index) > + return; > + > + ir_constant *const_offset = offset->as_constant(); > + if (!const_offset) > + return; > + > + char *cache_key = > + build_ssbo_load_cache_key(const_block_index->value.u[0], > + const_offset->value.u[0]); > + > + struct ssbo_load_cache_entry *entry = (struct ssbo_load_cache_entry *) > + ralloc_size(this->mem_ctx, sizeof(struct ssbo_load_cache_entry)); > + entry->load = deref; > + entry->depth = depth; > + > + hash_table_replace(ssbo_load_cache, entry, cache_key); > +} > + > +static void > +ssbo_load_cache_remove_callback(const void *key, void *data, void *closure) > +{ > + struct ssbo_load_cache_remove_closure *c = > + (struct ssbo_load_cache_remove_closure *) closure; > + > + /* If we have a key_prefix, then we only want to delete entries > + * with that key prefix. Otherwise, if we have depth > 0 we want > + * to delete keys with that depth (or larger). Otheriwse (no prefix > + * and depth == 0) we want to delete all keys. > + */ > + assert((c->key_prefix && c->depth == 0) || > + (!c->key_prefix && c->depth > 0) || > + (!c->key_prefix && c->depth == 0)); > + > + if (c->key_prefix && strstr((char *)key, (char *)c->key_prefix) != key) > + return; > + > + if (c->depth > 0) { > + struct ssbo_load_cache_entry *entry = > + (struct ssbo_load_cache_entry *) data; > + if (entry->depth < c->depth) > + return; > + } > + > + hash_table_remove(c->ht, key); > +} > + > +void > +ssbo_load_cache_visitor::ssbo_load_cache_remove_by_key_prefix(const char > *prefix) > +{ > + struct ssbo_load_cache_remove_closure c; > + c.ht = ssbo_load_cache; > + c.key_prefix = prefix; > + c.depth = 0; > + > + hash_table_call_foreach(ssbo_load_cache, > + ssbo_load_cache_remove_callback, &c); > +} > + > +void > +ssbo_load_cache_visitor::ssbo_load_cache_remove_all() > +{ > + struct ssbo_load_cache_remove_closure c; > + c.ht = ssbo_load_cache; > + c.key_prefix = NULL; > + c.depth = 0; > + > + hash_table_call_foreach(ssbo_load_cache, > + ssbo_load_cache_remove_callback, &c); > +} > + > +void > +ssbo_load_cache_visitor::ssbo_load_cache_remove_depth(unsigned depth) > +{ > + struct ssbo_load_cache_remove_closure c; > + c.ht = ssbo_load_cache; > + c.key_prefix = NULL; > + c.depth = depth; > + > + hash_table_call_foreach(ssbo_load_cache, > + ssbo_load_cache_remove_callback, &c); > +} > + > +void > +ssbo_load_cache_visitor::ssbo_load_cache_remove(ir_rvalue *block_index, > + ir_rvalue *offset) > +{ > + ir_constant *const_block_index = block_index->as_constant(); > + if (!const_block_index) { > + /* If we don't know the block index, then invalidate the entire cache. > + * We could try to do better, for example, considering the actual > + * field name we are accessing in the SSBO in the keys so we only > + * invalidate those. This requires some work though. > + */ > + ssbo_load_cache_remove_all(); > + return; > + } > + > + ir_constant *const_offset = offset->as_constant(); > + if (!const_offset) { > + /* We know the block but not the offset, so invalidate all entries > + * for the given block > + */ > + ssbo_load_cache_remove_by_key_prefix( > + ralloc_asprintf(mem_ctx, "%u-", const_block_index->value.u[0])); > + return; > + } > + > + /* We know block and offset, so invalidate that particular load only */ > + char *cache_key = > + build_ssbo_load_cache_key(const_block_index->value.u[0], > + const_offset->value.u[0]); > + > + hash_table_remove(ssbo_load_cache, cache_key); > +} > + > +} /* Unnamed namespace */ > + > +bool > +opt_ssbo_loads(struct gl_shader *shader, exec_list *instructions) > +{ > + ssbo_load_cache_visitor v(shader); > + visit_list_elements(&v, instructions); > + return v.get_progress(); > +} > -- > 1.9.1 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev