On Sat, Apr 2, 2016 at 4:49 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > > > On Sat, Apr 2, 2016 at 11:50 AM, Rob Clark <robdcl...@gmail.com> wrote: >> >> On Fri, Apr 1, 2016 at 1:46 PM, Jason Ekstrand <ja...@jlekstrand.net> >> wrote: >> > >> > >> > On Sat, Mar 26, 2016 at 2:02 PM, Rob Clark <robdcl...@gmail.com> wrote: >> >> >> >> From: Rob Clark <robcl...@freedesktop.org> >> >> >> >> A pass to lower complex (struct/array/mat) inputs/outputs to primitive >> >> types. This allows, for example, linking that removes unused >> >> components >> >> of a larger type which is not indirectly accessed. >> >> >> >> In the near term, it is needed for gallium (mesa/st) support for NIR, >> >> since only used components of a type are assigned VBO slots, and we >> >> otherwise have no way to represent that to the driver backend. But it >> >> should be useful for doing shader linking in NIR. >> >> >> >> Signed-off-by: Rob Clark <robcl...@freedesktop.org> >> >> --- >> >> src/compiler/Makefile.sources | 1 + >> >> src/compiler/nir/nir.h | 1 + >> >> src/compiler/nir/nir_lower_io_types.c | 178 >> >> ++++++++++++++++++++++++++++++++++ >> >> 3 files changed, 180 insertions(+) >> >> create mode 100644 src/compiler/nir/nir_lower_io_types.c >> >> >> >> diff --git a/src/compiler/Makefile.sources >> >> b/src/compiler/Makefile.sources >> >> index ae7efbf..8f1ffdc 100644 >> >> --- a/src/compiler/Makefile.sources >> >> +++ b/src/compiler/Makefile.sources >> >> @@ -196,6 +196,7 @@ NIR_FILES = \ >> >> nir/nir_lower_idiv.c \ >> >> nir/nir_lower_indirect_derefs.c \ >> >> nir/nir_lower_io.c \ >> >> + nir/nir_lower_io_types.c \ >> >> nir/nir_lower_outputs_to_temporaries.c \ >> >> nir/nir_lower_passthrough_edgeflags.c \ >> >> nir/nir_lower_phis_to_scalar.c \ >> >> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h >> >> index 65f75bd..7f7c459 100644 >> >> --- a/src/compiler/nir/nir.h >> >> +++ b/src/compiler/nir/nir.h >> >> @@ -2172,6 +2172,7 @@ void nir_lower_io(nir_shader *shader, >> >> nir_src *nir_get_io_offset_src(nir_intrinsic_instr *instr); >> >> nir_src *nir_get_io_vertex_index_src(nir_intrinsic_instr *instr); >> >> >> >> +void nir_lower_io_types(nir_shader *shader, int (*type_size)(const >> >> struct >> >> glsl_type *)); >> >> void nir_lower_vars_to_ssa(nir_shader *shader); >> >> >> >> bool nir_remove_dead_variables(nir_shader *shader); >> >> diff --git a/src/compiler/nir/nir_lower_io_types.c >> >> b/src/compiler/nir/nir_lower_io_types.c >> >> new file mode 100644 >> >> index 0000000..b7d9ebe >> >> --- /dev/null >> >> +++ b/src/compiler/nir/nir_lower_io_types.c >> >> @@ -0,0 +1,178 @@ >> >> +/* >> >> + * Copyright © 2016 Red Hat >> >> + * >> >> + * Permission is hereby granted, free of charge, to any person >> >> obtaining >> >> a >> >> + * copy of this software and associated documentation files (the >> >> "Software"), >> >> + * to deal in the Software without restriction, including without >> >> limitation >> >> + * the rights to use, copy, modify, merge, publish, distribute, >> >> sublicense, >> >> + * and/or sell copies of the Software, and to permit persons to whom >> >> the >> >> + * Software is furnished to do so, subject to the following >> >> conditions: >> >> + * >> >> + * The above copyright notice and this permission notice (including >> >> the >> >> next >> >> + * paragraph) shall be included in all copies or substantial portions >> >> of >> >> the >> >> + * Software. >> >> + * >> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >> >> EXPRESS OR >> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >> >> MERCHANTABILITY, >> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >> >> SHALL >> >> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES >> >> OR >> >> OTHER >> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >> >> ARISING FROM, >> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >> >> DEALINGS >> >> IN THE >> >> + * SOFTWARE. >> >> + */ >> >> + >> >> +#include "nir.h" >> >> +#include "nir_builder.h" >> >> + >> >> +/* Lower complex (struct/array/mat) input and output vars to primitive >> >> types >> >> + * (vec4) for linking. All indirect input/output access should >> >> already >> >> be >> >> + * lowered (ie. nir_lower_io_to_temporaries). >> >> + */ >> >> + >> >> +struct lower_io_types_state { >> >> + nir_shader *shader; >> >> + struct exec_list new_ins; >> >> + struct exec_list new_outs; >> >> + int (*type_size)(const struct glsl_type *); >> >> +}; >> >> + >> >> +static nir_variable * >> >> +get_new_var(struct lower_io_types_state *state, nir_variable *var, >> >> unsigned off) >> >> +{ >> >> + struct exec_list *list; >> >> + >> >> + if (var->data.mode == nir_var_shader_in) { >> >> + list = &state->new_ins; >> >> + } else { >> >> + assert(var->data.mode == nir_var_shader_out); >> >> + list = &state->new_outs; >> >> + } >> >> + >> >> + nir_foreach_variable(nvar, list) { >> >> + if (nvar->data.location == (var->data.location + off)) >> >> + return nvar; >> >> + } >> > >> > >> > Doing a linear search here could get expensive. Why not just have an >> > array >> > that maps locations to variables? I think you have a maximum of 64 >> > possible >> > locations (if you disregard tess) so it's not that memory-intensive. >> >> hmm, list of intputs and outputs is pretty small/bounded.. do you >> *really* expect it to be a problem? > > > No, it should be fine. It just bothers me a bit :-P > >> >> >> >> >> + >> >> + /* doesn't already exist, so we need to create a new one: */ >> >> + /* TODO figure out if scalar vs vec, and if >> >> float/int/uint/(double?) >> >> + * do we need to fixup interpolation mode for int vs float >> >> components >> >> + * of a struct, etc.. >> >> + */ >> >> + const struct glsl_type *ntype = glsl_vec4_type(); >> >> + nir_variable *nvar = nir_variable_create(state->shader, >> >> var->data.mode, >> >> + ntype, NULL); >> > >> > >> > Do you want to create a new one or clone? Cloning seems better because >> > that >> > would ensure that interp qualifiers etc. get copied over. >> >> I started out that way.. but decided to avoid it for some reason.. I >> think just to avoid the duplicate ralloc_asprintf() for the name, but >> there might have been another reason.. >> >> > Also, I seem to recall it being possible for outputs to have constant >> > initializers coming out of GLSL IR. This pass doesn't handle that. >> >> hmm, I haven't come across any piglit that triggered this.. if you >> have some ideas for a .shader_test I could try.. > > > I don't know of anything off-hand that triggers it. I just seem to recall > coming across it at some point. I guess one could put in an assert > somewhere and do a piglit run.
I guess if there was something that was failing before in tgsi case, I might have missed it. But set of regressions compared to tgsi case is pretty small and well understood (and mostly related to variable-indexing issues in ir3 which would be solved if I lowered vars to regs) It could ofc be a gap in piglit coverage (or just some test that is skip'd on freedreno if it needs a higher gl level or some extension we don't have).. >> >> >> >> >> + >> >> + nvar->name = ralloc_asprintf(nvar, "%s@%u", var->name, off); >> >> + nvar->data = var->data; >> >> + nvar->data.location += off; >> >> + >> >> + /* nir_variable_create is too clever for it's own good: */ >> >> + exec_node_remove(&nvar->node); >> >> + exec_node_self_link(&nvar->node); /* no delinit() :-( */ >> > >> > >> > This isn't needed. Re-insertion just stomps whatever was there before. >> >> maybe.. tbh I'd kinda prefer if _remove() spent the extra couple >> instructions to _self_link() rather than leaving booby-traps. List >> stuff can fail in spectacular fashion if you get that sort of thing >> wrong and extra couple instructions is lost in the noise. (Plus, >> everything is already in the cache at this point, so you'll have a >> hard time proving to me that you could measure a perf diff..) > > > Sure... I've made that argument on other linked list implementations in the > past. Feel free to send the patch and see what the reaction is :-p heh, sent.. let's see what happens ;-) >> >> > Incidentally, if you had an array, you could just let >> > nir_variable_create be >> > clever. >> > >> >> >> >> + >> >> + exec_list_push_tail(list, &nvar->node); >> >> + >> >> + /* remove existing var from input/output list: */ >> >> + exec_node_remove(&var->node); >> >> + exec_node_self_link(&var->node); >> > >> > >> > This isn't needed either. >> > >> >> >> >> + >> >> + return nvar; >> >> +} >> >> + >> >> +static unsigned >> >> +get_deref_offset(struct lower_io_types_state *state, nir_deref *tail) >> >> +{ >> >> + unsigned offset = 0; >> >> + >> >> + while (tail->child != NULL) { >> >> + const struct glsl_type *parent_type = tail->type; >> >> + tail = tail->child; >> >> + >> >> + if (tail->deref_type == nir_deref_type_array) { >> >> + nir_deref_array *deref_array = nir_deref_as_array(tail); >> >> + >> >> + /* indirect inputs/outputs should already be lowered! */ >> >> + assert(deref_array->deref_array_type == >> >> nir_deref_array_type_direct); >> >> + >> >> + unsigned size = state->type_size(tail->type); >> >> + >> >> + offset += size * deref_array->base_offset; >> >> + } else if (tail->deref_type == nir_deref_type_struct) { >> >> + nir_deref_struct *deref_struct = nir_deref_as_struct(tail); >> >> + >> >> + for (unsigned i = 0; i < deref_struct->index; i++) { >> >> + offset += >> >> state->type_size(glsl_get_struct_field(parent_type, >> >> i)); >> >> + } >> >> + } >> >> + } >> >> + >> >> + return offset; >> >> +} >> >> + >> >> +static bool >> >> +lower_io_types_block(nir_block *block, void *void_state) >> >> +{ >> >> + struct lower_io_types_state *state = void_state; >> >> + >> >> + nir_foreach_instr(block, instr) { >> >> + if (instr->type != nir_instr_type_intrinsic) >> >> + continue; >> >> + >> >> + nir_intrinsic_instr *intr = nir_instr_as_intrinsic(instr); >> >> + >> >> + if ((intr->intrinsic != nir_intrinsic_load_var) && >> >> + (intr->intrinsic != nir_intrinsic_store_var)) >> >> + continue; >> > >> > >> > You should say somewhere in the top comment that copies need to also be >> > lowered prior to this pass. >> >> ok, I'll add a comment >> >> >> >> >> + >> >> + nir_variable *var = intr->variables[0]->var; >> >> + >> >> + if ((var->data.mode != nir_var_shader_in) && >> >> + (var->data.mode != nir_var_shader_out)) >> >> + continue; >> >> + >> >> + /* no need to split up if already primitive */ >> >> + if (state->type_size(var->type) == 1) >> >> + continue; >> >> + >> >> + unsigned off = get_deref_offset(state, >> >> &intr->variables[0]->deref); >> >> + nir_variable *nvar = get_new_var(state, var, off); >> >> + >> >> + /* and then re-write the load/store_var deref: */ >> >> + intr->variables[0] = nir_deref_var_create(intr, nvar); >> >> + } >> >> + >> >> + return true; >> >> +} >> >> + >> >> +static void >> >> +lower_io_types_impl(nir_function_impl *impl, struct >> >> lower_io_types_state >> >> *state) >> >> +{ >> >> + nir_foreach_block(impl, lower_io_types_block, state); >> >> + >> >> + nir_metadata_preserve(impl, nir_metadata_block_index | >> >> + nir_metadata_dominance); >> >> +} >> >> + >> >> + >> >> +void >> >> +nir_lower_io_types(nir_shader *shader, int (*type_size)(const struct >> >> glsl_type *)) >> > >> > >> > Just a side-note (not a request) but we really should have a >> > "glsl_type_size_func" typedef somewhere. >> >> not a bad idea.. although maybe I'll punt until gallium and i965 >> type_size fxns are unified (in which case we could drop this arg).. >> >> > Also, what happened to just using nir_type_size_vec4? Incidentally, I >> > think >> > you could also use the newly exposed component_slots function from my >> > series. >> >> After the lower_io patch, I didn't need the type_size fxn from >> freedreno/ir3. That and I punted on sorting out the small deltas >> between the gallium and i965 versions.. you seemed to thing some of >> those diffs might not matter, but I've had this series dragging on for >> long enough that I figured I'd avoid introducing dependencies on >> sorting that out. Not like there are a lot of call-sites for this >> pass, so I figured no real problem w/ dropping the type_size fxn ptr >> in a later patchset. > > > Sure. I don't want to block on that. I do think component_slots is > more-or-less the "correct" thing to do since it uses the exact same units as > var->data.location. yeah, I'll have a look at component_slots stuff.. requiring type_size_vec4 seems a bit awkward in the current patch.. BR, -R > I'm not too worried about the other comments. > > Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> > >> >> BR, >> -R >> >> >> >> >> +{ >> >> + struct lower_io_types_state state; >> >> + >> >> + /* NOTE: only operates on units of vec4 slots: */ >> >> + assert(type_size(glsl_vec4_type()) == 1); >> >> + >> >> + state.shader = shader; >> >> + exec_list_make_empty(&state.new_ins); >> >> + exec_list_make_empty(&state.new_outs); >> >> + state.type_size = type_size; >> >> + >> >> + nir_foreach_function(shader, function) { >> >> + if (function->impl) >> >> + lower_io_types_impl(function->impl, &state); >> >> + } >> >> + >> >> + /* move new in/out vars to shader's lists: */ >> >> + exec_list_append(&shader->inputs, &state.new_ins); >> >> + exec_list_append(&shader->outputs, &state.new_outs); >> >> +} >> >> -- >> >> 2.5.5 >> >> >> >> _______________________________________________ >> >> mesa-dev mailing list >> >> mesa-dev@lists.freedesktop.org >> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > >> > > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev