Re: [Mesa-dev] [PATCH] nv50/ir: remove dnz flag when converting MAD to ADD due to optimizations
yeah, sounds fine. I wasn't 100% sure what the dnz flag does, with the addition below: Reviewed-by: Karol Herbst diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 307d8762506..202faf0746a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1094,6 +1094,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) if (imm0.isNegative()) i->src(t).mod = i->src(t).mod ^ Modifier(NV50_IR_MOD_NEG); i->op = OP_ADD; + i->dnz = 0; i->setSrc(s, i->getSrc(t)); i->src(s).mod = i->src(t).mod; } else shader: FRAG PROPERTY FS_COORD_ORIGIN UPPER_LEFT PROPERTY MUL_ZERO_WINS 1 DCL IN[0], COLOR, COLOR DCL IN[1], TEXCOORD[0], PERSPECTIVE DCL OUT[0], COLOR DCL OUT[1], COLOR[1] DCL CONST[0][0..129] DCL TEMP[0..2] IMM[0] FLT32 { -0.,-1., 2.,-0.5000} 0: ADD TEMP[0].x, -CONST[0][112]., IN[1]. 1: CMP TEMP[0], TEMP[0]., IMM[0]., IMM[0]. 2: KILL_IF TEMP[0] 3: MUL TEMP[0].xyz, CONST[0][0], IN[0] 4: MOV TEMP[0].w, IN[0]. 5: MUL TEMP[1].xyz, TEMP[0], IMM[0]. 6: MUL OUT[0].w, TEMP[0]., CONST[0][0]. 7: MAD_SAT TEMP[0].w, IN[1]., CONST[0][128]., CONST[0][128]. 8: MUL TEMP[0].w, TEMP[0]., CONST[0][129]. 9: MOV TEMP[2].z, IMM[0]. 10: MAD TEMP[0].xyz, TEMP[2]., -TEMP[0], CONST[0][129] 11: MAD OUT[0].xyz, TEMP[0]., TEMP[0], TEMP[1] 12: MOV OUT[1], -IMM[0].wwwy 13: END On Sun, Nov 25, 2018 at 3:58 AM Ilia Mirkin wrote: > > dnz flag only applies for multiplications (e.g. to make 0 * Infinity > becomes 0 instead of NaN). Once we optimize a MAD into an ADD, the dnz > flag no longer makes sense, and upsets the GM107 emitter (since it looks > at the ftz and dnz flags together). > > Signed-off-by: Ilia Mirkin > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index 04d26dcbf53..307d8762506 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -740,6 +740,7 @@ ConstantFolding::expr(Instruction *i, >// restrictions, so move it into a separate LValue. >bld.setPosition(i, false); >i->op = OP_ADD; > + i->dnz = 0; >i->setSrc(1, bld.mkMov(bld.getSSA(type), i->getSrc(0), > type)->getDef(0)); >i->setSrc(0, i->getSrc(2)); >i->src(0).mod = i->src(2).mod; > @@ -1131,6 +1132,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue > , int s) > i->setSrc(1, i->getSrc(2)); > i->src(1).mod = i->src(2).mod; > i->setSrc(2, NULL); > + i->dnz = 0; > i->op = OP_ADD; >} else >if (!isFloatType(i->dType) && !i->subOp && !i->src(t).mod && > !i->src(2).mod) { > -- > 2.18.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nv50/ir: remove dnz flag when converting MAD to ADD due to optimizations
dnz flag only applies for multiplications (e.g. to make 0 * Infinity becomes 0 instead of NaN). Once we optimize a MAD into an ADD, the dnz flag no longer makes sense, and upsets the GM107 emitter (since it looks at the ftz and dnz flags together). Signed-off-by: Ilia Mirkin --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 04d26dcbf53..307d8762506 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -740,6 +740,7 @@ ConstantFolding::expr(Instruction *i, // restrictions, so move it into a separate LValue. bld.setPosition(i, false); i->op = OP_ADD; + i->dnz = 0; i->setSrc(1, bld.mkMov(bld.getSSA(type), i->getSrc(0), type)->getDef(0)); i->setSrc(0, i->getSrc(2)); i->src(0).mod = i->src(2).mod; @@ -1131,6 +1132,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) i->setSrc(1, i->getSrc(2)); i->src(1).mod = i->src(2).mod; i->setSrc(2, NULL); + i->dnz = 0; i->op = OP_ADD; } else if (!isFloatType(i->dType) && !i->subOp && !i->src(t).mod && !i->src(2).mod) { -- 2.18.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] nv50, nvc0: Fix gallium nine regression regarding sampler bindings
On Sun, Nov 25, 2018 at 2:11 AM Ilia Mirkin wrote: > > Using this approach, num_samplers will never go down. Also, this > applies to more than just samplers -- textures, everything else. but is this a problem? I was checking on where num_samplers was used and I don't see that it make much of a difference what the actual value is. Sure we end up with unused but bound object, but the original change inside gallium also stopped caring about this completely. Maybe I overlooked something, though. > On Sat, Nov 24, 2018 at 6:04 PM Karol Herbst wrote: > > > > The new approach is that samplers don't get unbound even if they won't be > > used > > in a draw and we should just leave them be as well. > > > > Fixes a regression in multiple windows games using gallium nine and nouveau. > > > > Fixes: 4d6fab245eec3880e2a59424a579851f44857ce8 > >"cso: don't track the number of sampler states bound" > > Signed-off-by: Karol Herbst > > --- > > src/gallium/drivers/nouveau/nv50/nv50_state.c | 13 ++--- > > src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 17 + > > 2 files changed, 7 insertions(+), 23 deletions(-) > > > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c > > b/src/gallium/drivers/nouveau/nv50/nv50_state.c > > index fb4a259ce16..59437b22c9c 100644 > > --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c > > +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c > > @@ -600,25 +600,16 @@ static inline void > > nv50_stage_sampler_states_bind(struct nv50_context *nv50, int s, > > unsigned nr, void **hwcso) > > { > > - unsigned i; > > - > > assert(nr <= PIPE_MAX_SAMPLERS); > > - for (i = 0; i < nr; ++i) { > > + for (unsigned i = 0; i < nr; ++i) { > >struct nv50_tsc_entry *old = nv50->samplers[s][i]; > > > >nv50->samplers[s][i] = nv50_tsc_entry(hwcso[i]); > >if (old) > > nv50_screen_tsc_unlock(nv50->screen, old); > > } > > - assert(nv50->num_samplers[s] <= PIPE_MAX_SAMPLERS); > > - for (; i < nv50->num_samplers[s]; ++i) { > > - if (nv50->samplers[s][i]) { > > - nv50_screen_tsc_unlock(nv50->screen, nv50->samplers[s][i]); > > - nv50->samplers[s][i] = NULL; > > - } > > - } > > > > - nv50->num_samplers[s] = nr; > > + nv50->num_samplers[s] = MAX2(nv50->num_samplers[s], nr); > > > > nv50->dirty_3d |= NV50_NEW_3D_SAMPLERS; > > } > > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > > b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > > index f2393cb27b5..12765af8585 100644 > > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > > @@ -449,10 +449,12 @@ nvc0_sampler_state_delete(struct pipe_context *pipe, > > void *hwcso) > > { > > unsigned s, i; > > > > - for (s = 0; s < 6; ++s) > > + for (s = 0; s < 6; ++s) { > >for (i = 0; i < nvc0_context(pipe)->num_samplers[s]; ++i) > > if (nvc0_context(pipe)->samplers[s][i] == hwcso) > > nvc0_context(pipe)->samplers[s][i] = NULL; > > + nvc0_context(pipe)->num_samplers[s] = 0; > > + } > > > > nvc0_screen_tsc_free(nvc0_context(pipe)->screen, nv50_tsc_entry(hwcso)); > > > > @@ -464,9 +466,7 @@ nvc0_stage_sampler_states_bind(struct nvc0_context > > *nvc0, > > unsigned s, > > unsigned nr, void **hwcso) > > { > > - unsigned i; > > - > > - for (i = 0; i < nr; ++i) { > > + for (unsigned i = 0; i < nr; ++i) { > >struct nv50_tsc_entry *old = nvc0->samplers[s][i]; > > > >if (hwcso[i] == old) > > @@ -477,14 +477,7 @@ nvc0_stage_sampler_states_bind(struct nvc0_context > > *nvc0, > >if (old) > > nvc0_screen_tsc_unlock(nvc0->screen, old); > > } > > - for (; i < nvc0->num_samplers[s]; ++i) { > > - if (nvc0->samplers[s][i]) { > > - nvc0_screen_tsc_unlock(nvc0->screen, nvc0->samplers[s][i]); > > - nvc0->samplers[s][i] = NULL; > > - } > > - } > > - > > - nvc0->num_samplers[s] = nr; > > + nvc0->num_samplers[s] = MAX2(nvc0->num_samplers[s], nr); > > } > > > > static void > > -- > > 2.19.1 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nv50/ir: don't optimize dnz muls to add
yeah, I was hitting some asserts with a d3d trace. The issue is that we optimize some MADs/MULs with dnz set to ADD, but the emiter isn't able to emit an ADD with the dnz flag. At least for gm107. example TGSI triggering it (there are more cases though): VERT PROPERTY NEXT_SHADER FRAG PROPERTY MUL_ZERO_WINS 1 DCL IN[0] DCL IN[1] DCL IN[2] DCL OUT[0], POSITION DCL OUT[1], COLOR DCL OUT[2].xy, TEXCOORD[0] DCL CONST[0][0..240] DCL TEMP[0] IMM[0] FLT32 {1., 0., 0., 0.} 0: MUL TEMP[0], CONST[0][1], IN[0]. 1: MAD TEMP[0], IN[0]., CONST[0][0], TEMP[0] 2: MAD TEMP[0], IN[0]., CONST[0][2], TEMP[0] 3: MAD TEMP[0], IN[0]., CONST[0][3], TEMP[0] 4: ADD OUT[0], TEMP[0], CONST[0][240] 5: MUL OUT[1], CONST[0][4], IN[1] 6: MAD TEMP[0].xyz, IN[2].xyxw, IMM[0].xxyy, IMM[0].yyxy 7: DP3 OUT[2].x, TEMP[0], CONST[0][16].xyww 8: DP3 OUT[2].y, TEMP[0], CONST[0][17].xyww 9: END On Sun, Nov 25, 2018 at 2:12 AM Ilia Mirkin wrote: > > Can you elaborate as to what the issue is? The dnz flag is set when we > want to make NaN -> Infinity. Do you have a concrete TGSI program that > triggers issues? > On Sat, Nov 24, 2018 at 6:04 PM Karol Herbst wrote: > > > > fixes asserts with gallium nine > > > > Signed-off-by: Karol Herbst > > --- > > src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 -- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > > b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > > index 04d26dcbf53..0a284572ede 100644 > > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > > @@ -557,6 +557,8 @@ ConstantFolding::expr(Instruction *i, > > case OP_MAD: > > case OP_FMA: > > case OP_MUL: > > + if (i->dnz && i->op != OP_MUL) > > + return; > >if (i->dnz && i->dType == TYPE_F32) { > > if (!isfinite(a->data.f32)) > > a->data.f32 = 0.0f; > > @@ -1089,7 +1091,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue > > , int s) > > i->src(0).mod = 0; > > i->setSrc(1, NULL); > >} else > > - if (!i->postFactor && (imm0.isInteger(2) || imm0.isInteger(-2))) { > > + if (!i->postFactor && !i->dnz && (imm0.isInteger(2) || > > imm0.isInteger(-2))) { > > if (imm0.isNegative()) > > i->src(t).mod = i->src(t).mod ^ Modifier(NV50_IR_MOD_NEG); > > i->op = OP_ADD; > > @@ -1120,7 +1122,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue > > , int s) > > if (i->op != OP_CVT) > > i->src(0).mod = 0; > >} else > > - if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && > > + if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && !i->dnz && > >(imm0.isInteger(1) || imm0.isInteger(-1))) { > > if (imm0.isNegative()) > > i->src(t).mod = i->src(t).mod ^ Modifier(NV50_IR_MOD_NEG); > > -- > > 2.19.1 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nv50/ir: don't optimize dnz muls to add
Can you elaborate as to what the issue is? The dnz flag is set when we want to make NaN -> Infinity. Do you have a concrete TGSI program that triggers issues? On Sat, Nov 24, 2018 at 6:04 PM Karol Herbst wrote: > > fixes asserts with gallium nine > > Signed-off-by: Karol Herbst > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index 04d26dcbf53..0a284572ede 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -557,6 +557,8 @@ ConstantFolding::expr(Instruction *i, > case OP_MAD: > case OP_FMA: > case OP_MUL: > + if (i->dnz && i->op != OP_MUL) > + return; >if (i->dnz && i->dType == TYPE_F32) { > if (!isfinite(a->data.f32)) > a->data.f32 = 0.0f; > @@ -1089,7 +1091,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue > , int s) > i->src(0).mod = 0; > i->setSrc(1, NULL); >} else > - if (!i->postFactor && (imm0.isInteger(2) || imm0.isInteger(-2))) { > + if (!i->postFactor && !i->dnz && (imm0.isInteger(2) || > imm0.isInteger(-2))) { > if (imm0.isNegative()) > i->src(t).mod = i->src(t).mod ^ Modifier(NV50_IR_MOD_NEG); > i->op = OP_ADD; > @@ -1120,7 +1122,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue > , int s) > if (i->op != OP_CVT) > i->src(0).mod = 0; >} else > - if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && > + if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && !i->dnz && >(imm0.isInteger(1) || imm0.isInteger(-1))) { > if (imm0.isNegative()) > i->src(t).mod = i->src(t).mod ^ Modifier(NV50_IR_MOD_NEG); > -- > 2.19.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] nv50, nvc0: Fix gallium nine regression regarding sampler bindings
Using this approach, num_samplers will never go down. Also, this applies to more than just samplers -- textures, everything else. On Sat, Nov 24, 2018 at 6:04 PM Karol Herbst wrote: > > The new approach is that samplers don't get unbound even if they won't be used > in a draw and we should just leave them be as well. > > Fixes a regression in multiple windows games using gallium nine and nouveau. > > Fixes: 4d6fab245eec3880e2a59424a579851f44857ce8 >"cso: don't track the number of sampler states bound" > Signed-off-by: Karol Herbst > --- > src/gallium/drivers/nouveau/nv50/nv50_state.c | 13 ++--- > src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 17 + > 2 files changed, 7 insertions(+), 23 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c > b/src/gallium/drivers/nouveau/nv50/nv50_state.c > index fb4a259ce16..59437b22c9c 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c > +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c > @@ -600,25 +600,16 @@ static inline void > nv50_stage_sampler_states_bind(struct nv50_context *nv50, int s, > unsigned nr, void **hwcso) > { > - unsigned i; > - > assert(nr <= PIPE_MAX_SAMPLERS); > - for (i = 0; i < nr; ++i) { > + for (unsigned i = 0; i < nr; ++i) { >struct nv50_tsc_entry *old = nv50->samplers[s][i]; > >nv50->samplers[s][i] = nv50_tsc_entry(hwcso[i]); >if (old) > nv50_screen_tsc_unlock(nv50->screen, old); > } > - assert(nv50->num_samplers[s] <= PIPE_MAX_SAMPLERS); > - for (; i < nv50->num_samplers[s]; ++i) { > - if (nv50->samplers[s][i]) { > - nv50_screen_tsc_unlock(nv50->screen, nv50->samplers[s][i]); > - nv50->samplers[s][i] = NULL; > - } > - } > > - nv50->num_samplers[s] = nr; > + nv50->num_samplers[s] = MAX2(nv50->num_samplers[s], nr); > > nv50->dirty_3d |= NV50_NEW_3D_SAMPLERS; > } > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > index f2393cb27b5..12765af8585 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > @@ -449,10 +449,12 @@ nvc0_sampler_state_delete(struct pipe_context *pipe, > void *hwcso) > { > unsigned s, i; > > - for (s = 0; s < 6; ++s) > + for (s = 0; s < 6; ++s) { >for (i = 0; i < nvc0_context(pipe)->num_samplers[s]; ++i) > if (nvc0_context(pipe)->samplers[s][i] == hwcso) > nvc0_context(pipe)->samplers[s][i] = NULL; > + nvc0_context(pipe)->num_samplers[s] = 0; > + } > > nvc0_screen_tsc_free(nvc0_context(pipe)->screen, nv50_tsc_entry(hwcso)); > > @@ -464,9 +466,7 @@ nvc0_stage_sampler_states_bind(struct nvc0_context *nvc0, > unsigned s, > unsigned nr, void **hwcso) > { > - unsigned i; > - > - for (i = 0; i < nr; ++i) { > + for (unsigned i = 0; i < nr; ++i) { >struct nv50_tsc_entry *old = nvc0->samplers[s][i]; > >if (hwcso[i] == old) > @@ -477,14 +477,7 @@ nvc0_stage_sampler_states_bind(struct nvc0_context *nvc0, >if (old) > nvc0_screen_tsc_unlock(nvc0->screen, old); > } > - for (; i < nvc0->num_samplers[s]; ++i) { > - if (nvc0->samplers[s][i]) { > - nvc0_screen_tsc_unlock(nvc0->screen, nvc0->samplers[s][i]); > - nvc0->samplers[s][i] = NULL; > - } > - } > - > - nvc0->num_samplers[s] = nr; > + nvc0->num_samplers[s] = MAX2(nvc0->num_samplers[s], nr); > } > > static void > -- > 2.19.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] nv50/ir: don't optimize dnz muls to add
fixes asserts with gallium nine Signed-off-by: Karol Herbst --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 04d26dcbf53..0a284572ede 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -557,6 +557,8 @@ ConstantFolding::expr(Instruction *i, case OP_MAD: case OP_FMA: case OP_MUL: + if (i->dnz && i->op != OP_MUL) + return; if (i->dnz && i->dType == TYPE_F32) { if (!isfinite(a->data.f32)) a->data.f32 = 0.0f; @@ -1089,7 +1091,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) i->src(0).mod = 0; i->setSrc(1, NULL); } else - if (!i->postFactor && (imm0.isInteger(2) || imm0.isInteger(-2))) { + if (!i->postFactor && !i->dnz && (imm0.isInteger(2) || imm0.isInteger(-2))) { if (imm0.isNegative()) i->src(t).mod = i->src(t).mod ^ Modifier(NV50_IR_MOD_NEG); i->op = OP_ADD; @@ -1120,7 +1122,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) if (i->op != OP_CVT) i->src(0).mod = 0; } else - if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && + if (i->subOp != NV50_IR_SUBOP_MUL_HIGH && !i->dnz && (imm0.isInteger(1) || imm0.isInteger(-1))) { if (imm0.isNegative()) i->src(t).mod = i->src(t).mod ^ Modifier(NV50_IR_MOD_NEG); -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/2] gallium nine fixes for nouveau
Patch 1 fixes some compiler asserts I was running into: Maybe we can just do those optimizations anyway, but simply drop the dnz flag on the ADD as long as the instructions aren't marked as being prices Patch 2 tries to fix our outstanding issue with bound samplers with nine.: I don't really know if this is the correct fix for it, but it makes sense to me reading the commit message of the gallium change. No piglit regressions and games are working again using gallium nine and nouveau. Maybe we should drop that entire num_samples handling alltogether? Karol Herbst (2): nv50/ir: don't optimize dnz muls to add nv50,nvc0: Fix gallium nine regression regarding sampler bindings .../nouveau/codegen/nv50_ir_peephole.cpp| 6 -- src/gallium/drivers/nouveau/nv50/nv50_state.c | 13 ++--- src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 17 + 3 files changed, 11 insertions(+), 25 deletions(-) -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] nv50, nvc0: Fix gallium nine regression regarding sampler bindings
The new approach is that samplers don't get unbound even if they won't be used in a draw and we should just leave them be as well. Fixes a regression in multiple windows games using gallium nine and nouveau. Fixes: 4d6fab245eec3880e2a59424a579851f44857ce8 "cso: don't track the number of sampler states bound" Signed-off-by: Karol Herbst --- src/gallium/drivers/nouveau/nv50/nv50_state.c | 13 ++--- src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 17 + 2 files changed, 7 insertions(+), 23 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c b/src/gallium/drivers/nouveau/nv50/nv50_state.c index fb4a259ce16..59437b22c9c 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c @@ -600,25 +600,16 @@ static inline void nv50_stage_sampler_states_bind(struct nv50_context *nv50, int s, unsigned nr, void **hwcso) { - unsigned i; - assert(nr <= PIPE_MAX_SAMPLERS); - for (i = 0; i < nr; ++i) { + for (unsigned i = 0; i < nr; ++i) { struct nv50_tsc_entry *old = nv50->samplers[s][i]; nv50->samplers[s][i] = nv50_tsc_entry(hwcso[i]); if (old) nv50_screen_tsc_unlock(nv50->screen, old); } - assert(nv50->num_samplers[s] <= PIPE_MAX_SAMPLERS); - for (; i < nv50->num_samplers[s]; ++i) { - if (nv50->samplers[s][i]) { - nv50_screen_tsc_unlock(nv50->screen, nv50->samplers[s][i]); - nv50->samplers[s][i] = NULL; - } - } - nv50->num_samplers[s] = nr; + nv50->num_samplers[s] = MAX2(nv50->num_samplers[s], nr); nv50->dirty_3d |= NV50_NEW_3D_SAMPLERS; } diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c index f2393cb27b5..12765af8585 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c @@ -449,10 +449,12 @@ nvc0_sampler_state_delete(struct pipe_context *pipe, void *hwcso) { unsigned s, i; - for (s = 0; s < 6; ++s) + for (s = 0; s < 6; ++s) { for (i = 0; i < nvc0_context(pipe)->num_samplers[s]; ++i) if (nvc0_context(pipe)->samplers[s][i] == hwcso) nvc0_context(pipe)->samplers[s][i] = NULL; + nvc0_context(pipe)->num_samplers[s] = 0; + } nvc0_screen_tsc_free(nvc0_context(pipe)->screen, nv50_tsc_entry(hwcso)); @@ -464,9 +466,7 @@ nvc0_stage_sampler_states_bind(struct nvc0_context *nvc0, unsigned s, unsigned nr, void **hwcso) { - unsigned i; - - for (i = 0; i < nr; ++i) { + for (unsigned i = 0; i < nr; ++i) { struct nv50_tsc_entry *old = nvc0->samplers[s][i]; if (hwcso[i] == old) @@ -477,14 +477,7 @@ nvc0_stage_sampler_states_bind(struct nvc0_context *nvc0, if (old) nvc0_screen_tsc_unlock(nvc0->screen, old); } - for (; i < nvc0->num_samplers[s]; ++i) { - if (nvc0->samplers[s][i]) { - nvc0_screen_tsc_unlock(nvc0->screen, nvc0->samplers[s][i]); - nvc0->samplers[s][i] = NULL; - } - } - - nvc0->num_samplers[s] = nr; + nvc0->num_samplers[s] = MAX2(nvc0->num_samplers[s], nr); } static void -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108245] RADV/Vega: Low mip levels of large BCn textures get corrupted by vkCmdCopyBufferToImage
https://bugs.freedesktop.org/show_bug.cgi?id=108245 --- Comment #6 from Bas Nieuwenhuizen --- https://patchwork.freedesktop.org/patch/263716/ fixes the testcase (and hopefully does not regress anything else) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: Clamp gfx9 image view extents to the allocated image extents.
Mirrors AMDVLK. Looks like if we go over the alignment of height we actually start to change the addressing. Seems like the extra miplevels actually work with this. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108245 Fixes: f6cc15dccd5 "radv/gfx9: fix block compression texture views. (v2)" --- src/amd/vulkan/radv_image.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index 7492bf48b51..ba8e28f0e23 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -1175,8 +1175,6 @@ radv_image_view_init(struct radv_image_view *iview, if (device->physical_device->rad_info.chip_class >= GFX9 && vk_format_is_compressed(image->vk_format) && !vk_format_is_compressed(iview->vk_format)) { -unsigned rounded_img_w = util_next_power_of_two(iview->extent.width); -unsigned rounded_img_h = util_next_power_of_two(iview->extent.height); unsigned lvl_width = radv_minify(image->info.width , range->baseMipLevel); unsigned lvl_height = radv_minify(image->info.height, range->baseMipLevel); @@ -1186,8 +1184,8 @@ radv_image_view_init(struct radv_image_view *iview, lvl_width <<= range->baseMipLevel; lvl_height <<= range->baseMipLevel; -iview->extent.width = CLAMP(lvl_width, iview->extent.width, rounded_img_w); -iview->extent.height = CLAMP(lvl_height, iview->extent.height, rounded_img_h); +iview->extent.width = CLAMP(lvl_width, iview->extent.width, iview->image->surface.u.gfx9.surf_pitch); +iview->extent.height = CLAMP(lvl_height, iview->extent.height, iview->image->surface.u.gfx9.surf_height); } } -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: Fix opaque metadata descriptor last layer.
We used the layer count which results in an off by one error. Not sure this really affects anything. Fixes: f4e499ec791 "radv: add initial non-conformant radv vulkan driver" --- src/amd/vulkan/radv_image.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_image.c b/src/amd/vulkan/radv_image.c index 7492bf48b51..f0b1d31c5bd 100644 --- a/src/amd/vulkan/radv_image.c +++ b/src/amd/vulkan/radv_image.c @@ -691,7 +691,7 @@ radv_query_opaque_metadata(struct radv_device *device, si_make_texture_descriptor(device, image, false, (VkImageViewType)image->type, image->vk_format, , 0, image->info.levels - 1, 0, - image->info.array_size, + image->info.array_size - 1, image->info.width, image->info.height, image->info.depth, desc, NULL); -- 2.19.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108848] 3d image broken in Dragon age: origins
https://bugs.freedesktop.org/show_bug.cgi?id=108848 --- Comment #9 from Axel Davy --- You can also try to reduce the memory footprint of pulseaudio, see the trick described here: https://www.winehq.org/pipermail/wine-devel/2018-November/134954.html -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108853] OSMesaGetDepthBuffer flipped vertically
https://bugs.freedesktop.org/show_bug.cgi?id=108853 Bug ID: 108853 Summary: OSMesaGetDepthBuffer flipped vertically Product: Mesa Version: 18.2 Hardware: All OS: All Status: NEW Severity: normal Priority: medium Component: Drivers/OSMesa Assignee: mesa-dev@lists.freedesktop.org Reporter: popi...@basilisk.fr QA Contact: mesa-dev@lists.freedesktop.org When installing OSMesa using ./configure --prefix=$install_dir \ --enable-opengl --disable-gles1 --disable-gles2 \ --disable-va --disable-xvmc --disable-vdpau \ --enable-shared-glapi \ --disable-texture-float \ --with-gallium-drivers=swrast \ --disable-dri --with-dri-drivers= \ --disable-egl --with-platforms= --disable-gbm \ --enable-glx--with-platforms=x11 \ --disable-osmesa --enable-gallium-osmesa i.e. using the gallium osmesa implementation, the depth buffer returned by OSMesaGetDepthBuffer is flipped vertically relative to the OSMesa framebuffer. Using the non-gallium osmesa implementation returns a depth buffer correctly aligned with the framebuffer. The gallium-returned depth buffer can be fixed using something like: fbdepth_t * framebuffer_depth (framebuffer * p) { unsigned int * depth; GLint width, height, bytesPerValue; OSMesaGetDepthBuffer (p->ctx, , , , (void **)); assert (p->width == width && p->height == height && bytesPerValue == 4); assert (sizeof(fbdepth_t) == bytesPerValue); #if GALLIUM // fix for bug in gallium/libosmesa // the depth buffer is flipped vertically for (GLint j = 0; j < height/2; j++) for (GLint i = 0; i < width; i++) { unsigned int tmp = depth[j*width + i]; depth[j*width + i] = depth[(height - 1 - j)*width + i]; depth[(height - 1 - j)*width + i] = tmp; } #endif // GALLIUM return depth; } -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108848] 3d image broken in Dragon age: origins
https://bugs.freedesktop.org/show_bug.cgi?id=108848 --- Comment #8 from Andre Heider --- Game works for me with nine, but only after setting LARGE_ADDRESS_AWARE manually on DAOrigins.exe. (It seems to work without on low texture details, but I guess it'll crash at some point eventually). -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108848] 3d image broken in Dragon age: origins
https://bugs.freedesktop.org/show_bug.cgi?id=108848 --- Comment #7 from Axel Davy --- After looking further, the crash observed seems to be NineVolumeTexture9 missing checks to properly exit on memory allocation failures. I'll send a fix for this, but it won't help the game work. I suspect the problem is as suggested not enough memory available. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108848] 3d image broken in Dragon age: origins
https://bugs.freedesktop.org/show_bug.cgi?id=108848 --- Comment #6 from Axel Davy --- Is the game 32 bits ? One thing you may want to try is to use a tool to make the exe large space aware. Possibly you run out of virtual space (wine restricts available space without large space aware, and some linux libs take more space than on windows) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108848] 3d image broken in Dragon age: origins
https://bugs.freedesktop.org/show_bug.cgi?id=108848 --- Comment #5 from Axel Davy --- Mesa needs to be built with --enable-debug to have NINE_DEBUG=all do anything. I can see though that two mmaps are failing on the log with nine. Getting a new log with NINE_DEBUG=all would help find what causes the failed mmap, and what arguments were used for the crashing call. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/7] winsys/amdgpu: slab allocators and tweaks for address translation
Patch #5 and #6 are Reviewed-by: Christian König For patch #7 I think we really need some testing if that gives us an improvement. As you noted as well that we have buffer which are slightly smaller than a power of two is rather unlikely. Christian. Am 24.11.18 um 00:40 schrieb Marek Olšák: Hi, This series changes the slab allocation to 3 slab allocators layered on top of each other, and increases the max slab entry size to 256 KB and the max slab size to 2 MB. There are also tweaks for faster address translation, though we don't know whether it helps anything. Please review. Thanks, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev