On Mon, Aug 28, 2017 at 12:50 PM, Nicolai Hähnle <nhaeh...@gmail.com> wrote: > On 25.08.2017 02:57, Marek Olšák wrote: >> >> From: Marek Olšák <marek.ol...@amd.com> >> >> This is still very simple, but it's better than before. >> >> Loosely ported from Vulkan. >> --- >> src/gallium/drivers/radeonsi/si_state.c | 38 >> ++++++++++++++++++++++----------- >> 1 file changed, 26 insertions(+), 12 deletions(-) >> >> diff --git a/src/gallium/drivers/radeonsi/si_state.c >> b/src/gallium/drivers/radeonsi/si_state.c >> index 24e509c..393f960 100644 >> --- a/src/gallium/drivers/radeonsi/si_state.c >> +++ b/src/gallium/drivers/radeonsi/si_state.c >> @@ -4649,40 +4649,54 @@ static void si_init_config(struct si_context >> *sctx) >> /* If this is 0, Bonaire can hang even if GS isn't >> being used. >> * Other chips are unaffected. These are >> suboptimal values, >> * but we don't use on-chip GS. >> */ >> si_pm4_set_reg(pm4, R_028A44_VGT_GS_ONCHIP_CNTL, >> S_028A44_ES_VERTS_PER_SUBGRP(64) | >> S_028A44_GS_PRIMS_PER_SUBGRP(4)); >> } >> si_pm4_set_reg(pm4, R_00B21C_SPI_SHADER_PGM_RSRC3_GS, >> S_00B21C_CU_EN(0xffff)); >> - if (sscreen->b.info.num_good_compute_units / >> - (sscreen->b.info.max_se * >> sscreen->b.info.max_sh_per_se) <= 4) { >> + /* Compute LATE_ALLOC_VS.LIMIT. */ >> + unsigned num_cu_per_sh = >> sscreen->b.info.num_good_compute_units / >> + (sscreen->b.info.max_se * >> + sscreen->b.info.max_sh_per_se); >> + unsigned late_alloc_limit; /* The limit is per SH. */ >> + >> + if (sctx->b.family == CHIP_KABINI) { >> + late_alloc_limit = 0; /* Potential hang on Kabini. >> */ >> + } else if (num_cu_per_sh <= 4) { >> /* Too few available compute units per SH. >> Disallowing >> - * VS to run on CU0 could hurt us more than late >> VS >> + * VS to run on one CU could hurt us more than >> late VS >> * allocation would help. >> * >> - * LATE_ALLOC_VS = 2 is the highest safe number. >> + * 2 is the highest safe number that allows us to >> keep >> + * all CUs enabled. > > > Just to clarify: is the logic here that LATE_ALLOC_VS = 2 means up to 3 late > alloc waves, which means at least one SIMD is always free, even when there's > only a single CU?
Yes according to the Vulkan driver. > > >> */ >> - si_pm4_set_reg(pm4, >> R_00B118_SPI_SHADER_PGM_RSRC3_VS, S_00B118_CU_EN(0xffff)); >> - si_pm4_set_reg(pm4, >> R_00B11C_SPI_SHADER_LATE_ALLOC_VS, S_00B11C_LIMIT(2)); >> + late_alloc_limit = 2; >> } else { >> - /* Set LATE_ALLOC_VS == 31. It should be less than >> - * the number of scratch waves. Limitations: >> - * - VS can't execute on CU0. >> - * - If HS writes outputs to LDS, LS can't execute >> on CU0. >> + /* This is a good initial value. >> + * We shouldn't run into a VS-PS deadlock, because >> it >> + * only allows 1 late_alloc wave per SIMD on >> num_cu - 2. > > > Disabling VS on CU0 should already guarantee that there's no VS-PS deadlock. > So maybe with this change, we can allow VS to run on all CUs? At least the > comment should reflect that. > > On the other hand, Vulkan actually looks at the "always-on CUs", which I > guess is related to power-gating turning off CUs. So as long as we don't > look at that -- and I kind of think we shouldn't, because we care about the > high performance case where all CUs are turned on anyway -- we do need to > keep CU0 disabled for VS to avoid deadlocks when CUs happen to be > power-gated. Well, I can change the comment, but I don't see anything else here that should be changed. > > BTW, I recently realized that "late alloc" simply means that the VS waves > get their export allocation as soon as possible, i.e. as soon as there's > free space in the PC / POS export. So even late alloc waves might end up > never being stalled on exports, depending on the overall latency of the VS > relative to PS. Not important for the driver, but I found it interesting to > know. OK makes sense. Marek _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev