Kenneth Graunke <kenn...@whitecape.org> writes: > On Monday, August 22, 2016 5:50:49 PM PDT Francisco Jerez wrote: >> Kenneth Graunke <kenn...@whitecape.org> writes: >> >> > Many GPUs cannot handle GL_KHR_blend_equation_advanced natively, and >> > need to emulate it in the pixel shader. This lowering pass implements >> > all the necessary math for advanced blending. It fetches the existing >> > framebuffer value using the MESA_shader_framebuffer_fetch built-in >> > variables, and the previous commit's state var uniform to select >> > which equation to use. >> > >> > This is done at the GLSL IR level to make it easy for all drivers to >> > implement the GL_KHR_blend_equation_advanced extension and share code. >> > >> > Drivers need to hook up MESA_shader_framebuffer_fetch functionality: >> > 1. Hook up the fb_fetch_output variable >> > 2. Implement BlendBarrier() >> > >> > Then to get KHR_blend_equation_advanced, they simply need to: >> > 3. Disable hardware blending based on ctx->Color._AdvancedBlendEnabled >> > 4. Call this lowering pass. >> > >> > Very little driver specific code should be required. >> > >> > v2: Handle multiple output variables per render target (which may exist >> > due to ARB_enhanced_layouts), and array variables (even with one >> > render target, we might have out vec4 color[1]), and non-vec4 >> > variables (it's easier than finding spec text to justify not >> > handling it). Thanks to Francisco Jerez for the feedback. >> > >> > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> >> > --- >> > src/compiler/Makefile.sources | 1 + >> > src/compiler/glsl/ir_optimization.h | 1 + >> > .../glsl/lower_blend_equation_advanced.cpp | 557 >> > +++++++++++++++++++++ >> > 3 files changed, 559 insertions(+) >> > create mode 100644 src/compiler/glsl/lower_blend_equation_advanced.cpp >> > >> > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources >> > index 0ff9b23..fe26132 100644 >> > --- a/src/compiler/Makefile.sources >> > +++ b/src/compiler/Makefile.sources >> > @@ -77,6 +77,7 @@ LIBGLSL_FILES = \ >> > glsl/loop_analysis.h \ >> > glsl/loop_controls.cpp \ >> > glsl/loop_unroll.cpp \ >> > + glsl/lower_blend_equation_advanced.cpp \ >> > glsl/lower_buffer_access.cpp \ >> > glsl/lower_buffer_access.h \ >> > glsl/lower_const_arrays_to_uniforms.cpp \ >> > diff --git a/src/compiler/glsl/ir_optimization.h >> > b/src/compiler/glsl/ir_optimization.h >> > index c29260a..3bd6928 100644 >> > --- a/src/compiler/glsl/ir_optimization.h >> > +++ b/src/compiler/glsl/ir_optimization.h >> > @@ -151,6 +151,7 @@ void optimize_dead_builtin_variables(exec_list >> > *instructions, >> > bool lower_tess_level(gl_linked_shader *shader); >> > >> > bool lower_vertex_id(gl_linked_shader *shader); >> > +bool lower_blend_equation_advanced(gl_linked_shader *shader); >> > >> > bool lower_subroutine(exec_list *instructions, struct >> > _mesa_glsl_parse_state *state); >> > void propagate_invariance(exec_list *instructions); >> > diff --git a/src/compiler/glsl/lower_blend_equation_advanced.cpp >> > b/src/compiler/glsl/lower_blend_equation_advanced.cpp >> > new file mode 100644 >> > index 0000000..5632865 >> > --- /dev/null >> > +++ b/src/compiler/glsl/lower_blend_equation_advanced.cpp >> > @@ -0,0 +1,557 @@ >> > +/* >> > + * Copyright © 2016 Intel Corporation >> > + * >> > + * Permission is hereby granted, free of charge, to any person obtaining a >> > + * copy of this software and associated documentation files (the >> > "Software"), >> > + * to deal in the Software without restriction, including without >> > limitation >> > + * the rights to use, copy, modify, merge, publish, distribute, >> > sublicense, >> > + * and/or sell copies of the Software, and to permit persons to whom the >> > + * Software is furnished to do so, subject to the following conditions: >> > + * >> > + * The above copyright notice and this permission notice (including the >> > next >> > + * paragraph) shall be included in all copies or substantial portions of >> > the >> > + * Software. >> > + * >> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >> > EXPRESS OR >> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >> > MERCHANTABILITY, >> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT >> > SHALL >> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >> > OTHER >> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING >> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >> > + * DEALINGS IN THE SOFTWARE. >> > + */ >> > + >> > +#include "ir.h" >> > +#include "ir_builder.h" >> > +#include "ir_optimization.h" >> > +#include "ir_hierarchical_visitor.h" >> > +#include "program/prog_instruction.h" >> > +#include "program/prog_statevars.h" >> > +#include "util/bitscan.h" >> > + >> > +using namespace ir_builder; >> > + >> > +#define imm1(x) new(mem_ctx) ir_constant((float) x, 1) >> > +#define imm3(x) new(mem_ctx) ir_constant((float) x, 3) >> > + >> >> The x argument should probably be between parentheses to avoid surprises >> in the future -- Or even better, make them static inline functions >> instead of macros. > > Good call, I'll add parenthesis. I made them macros rather than static > inlines so you can do imm3(0.5) rather than imm3(mem_ctx, 0.5)...it's > shorter (which is the main point). > > [snip] > >> > + blend_comps[0], blend_comps[1], >> > + blend_comps[2], blend_comps[3]); >> > + } >> > + >> > + ir_function_signature *main = get_main(sh); >> > + ir_factory f(&main->body, mem_ctx); >> > + >> > + ir_variable *result_dest = >> > + calc_blend_result(f, mode, fb, blend_source, sh->info.BlendSupport); >> > + >> > + /* Copy the result back to the original values. It would be simpler >> > + * to demote the program's output variables, and create a new vec4 >> > + * output for our result, but this pass runs before we create the >> > + * ARB_program_interface_query resource list. So we have to leave >> > + * the original outputs in place and use them. >> > + */ >> > + for (int i = 0; i < 4; i++) { >> > + if (!outputs[i]) >> > + continue; >> > + >> > + f.emit(assign(deref_output(outputs[i]), swizzle(result_dest, i, 1), >> > + 1 << i)); >> > + } >> > + >> >> It looks like this relies on the main function having a single exit >> point, which AFAICT may not be the case if there are multiple return >> statements and the lowering pass is called before do_lower_jumps() (as >> is the case in your i965 code). Several options occur to me: >> >> - Rename the shader's main function and generate a new main function >> that calls into the original and implements the advanced blending >> epilogue. This has the advantage that it wouldn't force drivers to >> lower returns if they can support them natively, and is likely to >> result in better code generation than the next option. >> >> - Make the lowering pass re-emit the blending epilogue for each exit >> point of the main function. >> >> - Call do_lower_jumps(lower_main_return=true) manually at the top of >> lower_blend_equation_advanced() if support for any advanced blending >> modes was requested in order to make sure the function has the right >> form. This seems a bit of a hack but I guess it would be the easiest >> solution. > > Ugh. Good point. > > Option #2 seems unfortunate - I'd hate to emit this multiple times. > > In a world where all functions are inlined, options #1 and #3 seem > basically equivalent - function inlining lowers early returns. So > the renamed main would have early returns lowered anyway. > > I think I'll take option #3 and change the function to: > > bool > lower_blend_equation_advanced(struct gl_linked_shader *sh) > { > if (sh->info.BlendSupport == 0) > return false; > > /* Lower early returns in main() so there's a single exit point > * where we can insert our lowering code. > */ > do_lower_jumps(sh->ir, false, false, true, false, false);
I guess option #1 has some slight advantage over option #3 because it doesn't impose return lowering on the driver and it doesn't cause the jump lowering pass to be run multiple times, but #3 is so much simpler that it's hard to justify the additional effort for option #1, so this sounds like a good plan to me. With the comments above taken into account patch is: Reviewed-by: Francisco Jerez <curroje...@riseup.net>
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev