On Tue, Jun 27, 2017 at 6:50 PM, Nicolai Hähnle <nhaeh...@gmail.com> wrote: > On 27.06.2017 17:07, Marek Olšák wrote: >> >> On Tue, Jun 27, 2017 at 9:22 AM, Nicolai Hähnle <nhaeh...@gmail.com> >> wrote: >>> >>> On 27.06.2017 02:14, Marek Olšák wrote: >>>> >>>> >>>> From: Marek Olšák <marek.ol...@amd.com> >>>> >>>> Shader key size: 107 -> 47 >>> >>> >>> >>> Nice improvement. >>> >>> >>>> Divisors of 0 and 1 are encoded in the shader key. Greater instance >>>> divisors >>>> are loaded from a constant buffer. >>>> >>>> The shader code doing the division is huge. Is it something we need to >>>> worry about? Does any app use instance divisors >= 2? >>> >>> >>> >>> This reminds me of a certain LLVM improvement that I still need to clear. >>> >>> I doubt instance divisors >= 2 are used. As a data point, Vulkan doesn't >>> support it as a feature at all, IIRC. >>> >>> Can we get an optimized monotholic shader variant built for shaders that >>> have to fetch? This should help if anybody ever triggers this, because >> >> >> We can't get optimized variants if we want to keep the shader key >> small. If I put all instance divisors into key.opt, it would defeat >> the effect of this patch. > > > What I meant is an optimized variant that still loads the instance divisors > from the constant buffer, but compiles the prolog and main parts together. > That way, LLVM can potentially schedule some of the division code before > waiting for the loads of other attributes that are per-vertex. That should > only require a single bit in the key.opt part.
Like this? diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 63cc746..af3f2a9 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -1192,6 +1192,11 @@ static void si_shader_selector_key_vs(struct si_context *sctx, prolog_key->instance_divisor_is_fetched = sctx->vertex_elements->instance_divisor_is_fetched; + /* Prefer a monolithic shader to allow scheduling divisions around + * VBO loads. */ + if (prolog_key->instance_divisor_is_fetched) + key->opt.prefer_mono = 1; + unsigned count = MIN2(vs->info.num_inputs, sctx->vertex_elements->count); memcpy(key->mono.vs_fix_fetch, sctx->vertex_elements->fix_fetch, count); Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev