Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

Christian König Wed, 28 Aug 2013 07:03:04 -0700

Well, for this discussion let's just assume that we fixed the delay inthe upper layers of the stack and the driver sees the shader code assoon as the application (if I understood it correctly Vadim has justvolunteered for the job).

Also let's assume that shaders are small and having allot of shadervariants around after they are compiled isn't bad.

In this case the probably best solution is to compile early and try tomake the shaders as state invariant as possible, e.g. don't dooptimizations like getting ride of extra exports for case where we don'tneed the alpha test or if it's just a dependency on a boolean then haveboth variants covered by the bytecode and use a bit constant to choosebetween the two etc...

As a second step the driver should create a optimized version of theshader in a background thread when we know all the state that is/wasactive when the shader is used.

Of course you need a bit of heuristic for this, cause sometimes it isbetter to switch between shader variants and other times it is better tohave one variant covering all the different states and just use bitconstants to choose between them.


Just some thoughts on this topic,
Christian.

PS: My mail server is once more driving me nuts, please ignore the extracopy if you get this mail twice.


Am 28.08.2013 02:07, schrieb Vadim Girlin:

On 08/28/2013 02:59 AM, Marek Olšák wrote:
First, you won't really see any significant continual difference in
frame rate no matter how many shader variants you have unless you are
very CPU-bound. The problem is shader compilation on the first use,
that's where you get a big hiccup. Try Skyrim for example: You have to
first look around and see every object that's around you and get
unpleasant stuttering before you can actually go on and play the game.
Yes, this also Wine's fault that it compiles shaders on the first use
too, but we don't have to be as bad as Wine, do we? Valve also
reported shader recompilations on the first use being a serious issue
with open source drivers.
I perfectly understand that deferred compilation is exactly theproblem that makes the games freeze due to shader compilation on firstuse when something new appears on the screen, but I don't think we cansolve this problem in the *driver* by trying to compile early, becauseAFAICS currently the shaders are passed to the driver too late anyway,and this happens not only with wine. E.g. when I run Heaven in awindow with "MESA_GLSL=dump R600_DEBUG=ps,vs", so that I can seeHeaven's window and console output at the same time, what I see isthat most of GL dumps happen while Heaven shows splash screen withloading progress, but most of the driver's dumps appear on the firstframe and few more times during benchmark. It looks like compilationis deferred somewhere in the stack before the driver, or am I missingsomething?
Vadim
Marek
On Tue, Aug 27, 2013 at 11:52 PM, Vadim Girlin<vadimgir...@gmail.com> wrote:
On 08/28/2013 12:43 AM, Marek Olšák wrote:
Shader variants are BAD, BAD, BAD. Have you ever played an AAA game
with a Mesa driver that likes to compile shader variants on first use?
It's HORRIBLE.
I don't think that shader variants are bad, but it's definitely badwhen weare compiling variants that are never used. Currently glxgearscompiles 18ps/vs shaders. In my branch with initial GS support [1] I switchedhandlingof the shaders to deferred compilation, that is, shaders arecompiled onlybefore the actual draw. I found later that it's not really requiredfor GS,but IIRC this change results in only 5 shaders being compiled forglxgearsinstead of 18. It seems most of the useless variants are results ofstate
changes between creation of the shader state (initial compilation) and
actual draw call.
I had some concerns about increased overhead with those changes, andit'sactually noticeable with drawoverhead demo, but I didn't see anyregressions
with a few real apps that I tested, e.g. glxgears even showed slightly
better performance with these changes. Probably I also implementedit in a
not very optimal way (I was mostly concentrated on GS support) and the
overhead can be reduced.
One more thing is duplicate shaders, I've analyzed shader dumps fromUnigineHeaven 3.0 some time ago and found that from about 320 compiledshaders,
only about 180 (50%) were unique, others were duplicates (detected by
comparing the bytecode dumps for them in an automated way), maybethey haddifferent shader keys (which still resulted in the same bytecode),but Isuspect duplicate pipe shaders were also involved. Unfortunately Ididn't
have a time to investigate it more thoroughly since then.
So my point is that we don't really need to eliminate shadervariants, firstwe need to eliminate compilation of unused variants and duplicateshaders.Also we might want to consider offloading of the compilation toseparate
thread(s) and caching of shader binaries between runs.

Vadim

  [1] http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-geom-shaders
What the patch does is probably the right solution. At least
alpha-test state changes don't cause shader recompilation and
re-binding, which also negatively affects performance. Ideally we
shouldn't depend on the framebuffer state at all, but we need to
emulate the TGSI property FS_COLOR0_WRITES_ALL_CBUFS. I think we
should always be fine with key.nr_cbufs forced to 8 for any shader
without that property. I expect app developers to do the right thing
and not write outputs they don't need.

Marek
On Tue, Aug 27, 2013 at 9:00 PM, Roland Scheidegger<srol...@vmware.com>
wrote:
Not that I'm qualified to review r600 code, but couldn't you create
different shader variants depending on whether you need alphatest? At
least I would assume shader exports aren't free.

Roland

Am 27.08.2013 19:56, schrieb Vadim Girlin:
We need to export at least one color if the shader writes it,
even when nr_cbufs==0.

Signed-off-by: Vadim Girlin <vadimgir...@gmail.com>
---

Tested on evergreen with multiple combinations of backends - no
regressions,
fixes some tests:
default - fixes fb-alphatest-nocolor andfb_alphatest-nocolor-ffdefault+sb - fixes fb-alphatest-nocolor andfb_alphatest-nocolor-ff
    llvm       - fixes about 25 tests related to depth/stencil
llvm+sb - fixes about 300 tests (llvm's depth/stencilissues and
                 regressions cased by reordering of exports in sb)
With this patch, there are no regressions with default+sb vsdefault.There is one regression with llvm+sb vs llvm -fs-texturegrad-miplevels,
AFAICS it's a problem with llvm backend uncovered by sb -
SET_GRADIENTS_V/H
instructions are not placed in the same TEX clause withcorresponding
SAMPLE_G.

   src/gallium/drivers/r600/r600_shader.c | 7 ++++---
   1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c
b/src/gallium/drivers/r600/r600_shader.c
index 300b5c4..f7eab76 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -918,6 +918,7 @@ static int r600_shader_from_tgsi(structr600_screen
*rscreen,
        unsigned opcode;
        int i, j, k, r = 0;
        int next_pos_base = 60, next_param_base = 0;
+     int max_color_exports = MAX2(key.nr_cbufs, 1);
        /* Declarations used by llvm code */
        bool use_llvm = false;
        bool indirect_gprs;
@@ -1130,7 +1131,7 @@ static int r600_shader_from_tgsi(struct
r600_screen *rscreen,
                radeon_llvm_ctx.face_gpr = ctx.face_gpr;
                radeon_llvm_ctx.r600_inputs = ctx.shader->input;
                radeon_llvm_ctx.r600_outputs = ctx.shader->output;
- radeon_llvm_ctx.color_buffer_count =MAX2(key.nr_cbufs ,
1);
+ radeon_llvm_ctx.color_buffer_count =max_color_exports;
                radeon_llvm_ctx.chip_class = ctx.bc->chip_class;
radeon_llvm_ctx.fs_color_all =shader->fs_write_all &&
(rscreen->chip_class >= EVERGREEN);
                radeon_llvm_ctx.stream_outputs = &so;
@@ -1440,7 +1441,7 @@ static int r600_shader_from_tgsi(struct
r600_screen *rscreen,
                case TGSI_PROCESSOR_FRAGMENT:
                        if (shader->output[i].name ==
TGSI_SEMANTIC_COLOR) {
                                /* never export more colors than the
number of CBs */
- if (shader->output[i].sid >=key.nr_cbufs)
{
+                             if (shader->output[i].sid >=
max_color_exports) {
                                        /* skip export */
                                        j--;
                                        continue;
@@ -1450,7 +1451,7 @@ static int r600_shader_from_tgsi(struct
r600_screen *rscreen,
                                output[j].type =
V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL;
shader->nr_ps_color_exports++;
                                if (shader->fs_write_all &&
(rscreen->chip_class >= EVERGREEN)) {
- for (k = 1; k <key.nr_cbufs; k++)
{
+ for (k = 1; k <max_color_exports;
k++) {
                                                j++;
memset(&output[j], 0,
sizeof(struct r600_bytecode_output));
output[j].gpr =
shader->output[i].gpr;
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev



_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] r600g: fix color exports when we have no CBs

Reply via email to