Re: [Mesa-dev] [PATCH 1/4] r600: use min_dx10/max_dx10 instead of min/max

Jan Vesely Thu, 09 Nov 2017 09:43:36 -0800

On Thu, 2017-11-09 at 18:39 +0100, Nicolai Hähnle wrote:
> On 09.11.2017 18:26, Roland Scheidegger wrote:
> > Am 09.11.2017 um 18:19 schrieb Jan Vesely:
> > > On Thu, 2017-11-09 at 03:58 +0100, srol...@vmware.com wrote:
> > > > From: Roland Scheidegger <srol...@vmware.com>
> > > > 
> > > > I believe this is the safe thing to do, especially ever since the driver
> > > > actually generates NaNs for muls too.
> > > > Albeit since the radeon ISA docs are inaccurate/wrong there, I'm not
> > > > entirely sure what the non-dx10 versions do,
> > > 
> > > non-dx10 version return nan if one of the operands is nan (tested on
> > > Turks).
> > 
> > Yes, I've modified a piglit test and came to the same conclusion. (I
> > will put that up for review shortly.)
> > I don't know why you'd ever want that, though (ieee fmin/fmax also
> > should return non-nan).
> 
> My guess is that DX9-level hardware had that behavior by virtue of just 
> not caring about NaN at all, and the hardware folks were just being 
> conservative in adding a new opcode rather than changing the behavior of 
> the old one. I don't think GCN has the old-style min/max.


Looks like it.
There's v_max_legacy_f32 at least on SI/CI. comment says "D.f =
max(S0.f, S1.f) (DX9 rules for NaN)"

Jan
> 
> Cheers,
> Nicolai
> 
> > 
> > Roland
> > 
> > 
> > > 
> > > Jan
> > > 
> > > >   but (as required by dx10)
> > > > the dx10 versions should pick a non-nan source over a nan source.
> > > > Other drivers presumably do the same (radeonsi, llvmpipe).
> > > > This was shown to make some difference for bug 103544, albeit it is not
> > > > required to fix it.
> > > > ---
> > > >   src/gallium/drivers/r600/r600_shader.c  | 12 ++++++------
> > > >   src/gallium/drivers/r600/sb/sb_expr.cpp |  2 ++
> > > >   2 files changed, 8 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/src/gallium/drivers/r600/r600_shader.c 
> > > > b/src/gallium/drivers/r600/r600_shader.c
> > > > index 188fbc9d47..6a755bb3fd 100644
> > > > --- a/src/gallium/drivers/r600/r600_shader.c
> > > > +++ b/src/gallium/drivers/r600/r600_shader.c
> > > > @@ -8844,8 +8844,8 @@ static const struct r600_shader_tgsi_instruction 
> > > > r600_shader_tgsi_instruction[]
> > > >         [TGSI_OPCODE_DP3]       = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > >         [TGSI_OPCODE_DP4]       = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > >         [TGSI_OPCODE_DST]       = { ALU_OP0_NOP, tgsi_opdst},
> > > > -       [TGSI_OPCODE_MIN]       = { ALU_OP2_MIN, tgsi_op2},
> > > > -       [TGSI_OPCODE_MAX]       = { ALU_OP2_MAX, tgsi_op2},
> > > > +       [TGSI_OPCODE_MIN]       = { ALU_OP2_MIN_DX10, tgsi_op2},
> > > > +       [TGSI_OPCODE_MAX]       = { ALU_OP2_MAX_DX10, tgsi_op2},
> > > >         [TGSI_OPCODE_SLT]       = { ALU_OP2_SETGT, tgsi_op2_swap},
> > > >         [TGSI_OPCODE_SGE]       = { ALU_OP2_SETGE, tgsi_op2},
> > > >         [TGSI_OPCODE_MAD]       = { ALU_OP3_MULADD_IEEE, tgsi_op3},
> > > > @@ -9042,8 +9042,8 @@ static const struct r600_shader_tgsi_instruction 
> > > > eg_shader_tgsi_instruction[] =
> > > >         [TGSI_OPCODE_DP3]       = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > >         [TGSI_OPCODE_DP4]       = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > >         [TGSI_OPCODE_DST]       = { ALU_OP0_NOP, tgsi_opdst},
> > > > -       [TGSI_OPCODE_MIN]       = { ALU_OP2_MIN, tgsi_op2},
> > > > -       [TGSI_OPCODE_MAX]       = { ALU_OP2_MAX, tgsi_op2},
> > > > +       [TGSI_OPCODE_MIN]       = { ALU_OP2_MIN_DX10, tgsi_op2},
> > > > +       [TGSI_OPCODE_MAX]       = { ALU_OP2_MAX_DX10, tgsi_op2},
> > > >         [TGSI_OPCODE_SLT]       = { ALU_OP2_SETGT, tgsi_op2_swap},
> > > >         [TGSI_OPCODE_SGE]       = { ALU_OP2_SETGE, tgsi_op2},
> > > >         [TGSI_OPCODE_MAD]       = { ALU_OP3_MULADD_IEEE, tgsi_op3},
> > > > @@ -9265,8 +9265,8 @@ static const struct r600_shader_tgsi_instruction 
> > > > cm_shader_tgsi_instruction[] =
> > > >         [TGSI_OPCODE_DP3]       = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > >         [TGSI_OPCODE_DP4]       = { ALU_OP2_DOT4_IEEE, tgsi_dp},
> > > >         [TGSI_OPCODE_DST]       = { ALU_OP0_NOP, tgsi_opdst},
> > > > -       [TGSI_OPCODE_MIN]       = { ALU_OP2_MIN, tgsi_op2},
> > > > -       [TGSI_OPCODE_MAX]       = { ALU_OP2_MAX, tgsi_op2},
> > > > +       [TGSI_OPCODE_MIN]       = { ALU_OP2_MIN_DX10, tgsi_op2},
> > > > +       [TGSI_OPCODE_MAX]       = { ALU_OP2_MAX_DX10, tgsi_op2},
> > > >         [TGSI_OPCODE_SLT]       = { ALU_OP2_SETGT, tgsi_op2_swap},
> > > >         [TGSI_OPCODE_SGE]       = { ALU_OP2_SETGE, tgsi_op2},
> > > >         [TGSI_OPCODE_MAD]       = { ALU_OP3_MULADD_IEEE, tgsi_op3},
> > > > diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp 
> > > > b/src/gallium/drivers/r600/sb/sb_expr.cpp
> > > > index 3dd3a4815b..7a5d62c8e8 100644
> > > > --- a/src/gallium/drivers/r600/sb/sb_expr.cpp
> > > > +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
> > > > @@ -753,7 +753,9 @@ bool expr_handler::fold_alu_op2(alu_node& n) {
> > > >                                 n.bc.src[0].abs == n.bc.src[1].abs) {
> > > >                         switch (n.bc.op) {
> > > >                         case ALU_OP2_MIN: // (MIN x, x) => (MOV x)
> > > > +                       case ALU_OP2_MIN_DX10:
> > > >                         case ALU_OP2_MAX:
> > > > +                       case ALU_OP2_MAX_DX10:
> > > >                                 convert_to_mov(n, v0, n.bc.src[0].neg, 
> > > > n.bc.src[0].abs);
> > > >                                 return fold_alu_op1(n);
> > > >                         case ALU_OP2_ADD:  // (ADD x, x) => (MUL x, 2)
> > 
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> 
>

signature.asc
Description: This is a digitally signed message part

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] r600: use min_dx10/max_dx10 instead of min/max

Reply via email to