All on Juniper. But anyway, I've got another solution, with the only drawback that I don't really know what it does due to docs being lackluster/misleading there :-). But that would let us keep the ieee opcodes. And while I don't know what it does, I suspect it's a better idea regardless ;-). hw supported by nouveau drivers might just do the same without anyone even knowing with some luck...
Roland Am 08.11.2017 um 07:20 schrieb Ilia Mirkin: > Actually cayman gets half of it - it gets the abs, but not clamped. I > wonder what happens if you go the other way -- use the IEEE version of > the op for RSQ() (presumably you're not testing this on cayman). > > On Wed, Nov 8, 2017 at 1:18 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote: >> tgsi_rsq appears to ignore the passed-in op and always puts in >> ALU_OP1_RECIPSQRT_CLAMPED anyways. It also sticks an absolute value on >> the RSQ() argument. This only happens for eg, not cayman. (Probably >> why only the rcp_clamped change appeared to be necessary.) >> >> This is odd though, because there's no clamping like that in other >> drivers. The trace you made looks fine on both nvc0 and nv50. >> >> On Tue, Nov 7, 2017 at 11:01 PM, <srol...@vmware.com> wrote: >>> From: Roland Scheidegger <srol...@vmware.com> >>> >>> r600 already used the clamped versions, but for some reason this was >>> different to eg/cayman. >>> (Note that it has been different since essentially forever, 7 years, since >>> df62338c491f2cace1a48f99de78e83b5edd82fd in particular, which changed >>> this for r600 but not eg (cayman wasn't supported back then, but probably >>> copied this from the eg part later). The commit does not mention any reason >>> why this difference should exist.) >>> This seems a bit unfortunate, since it would be nice to use ieee arithmetic, >>> I have no idea what this could potentially break and no idea if it really >>> makes sense going back to legacy-style rcp/rsq... >>> This however prevents misrenderings in This War of Mine since using ieee >>> muls (ce7a045feeef8cad155f1c9aa07f166e146e3d00), albeit strictly speaking >>> only rcp_clamped is necessary for this. It seems likely the root cause is >>> some x * rcp(y) calculation where both x and y evaluate to 0. Albeit it >>> apparently works with other drivers, not sure what's up with that... >>> >>> Bugzilla: >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D103544&d=DwIBaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=FYMO5RZgIVWRluOu5DMEzW2xYcV3A5ZOOioWlZcdonw&s=b2fwORC2yAK7ezpjud4xeebdQRW-632-SZjTJGI-JHk&e= >>> --- >>> src/gallium/drivers/r600/r600_shader.c | 16 ++++++++++++---- >>> 1 file changed, 12 insertions(+), 4 deletions(-) >>> >>> diff --git a/src/gallium/drivers/r600/r600_shader.c >>> b/src/gallium/drivers/r600/r600_shader.c >>> index 6a755bb3fd..62fc4da901 100644 >>> --- a/src/gallium/drivers/r600/r600_shader.c >>> +++ b/src/gallium/drivers/r600/r600_shader.c >>> @@ -9033,8 +9033,12 @@ static const struct r600_shader_tgsi_instruction >>> eg_shader_tgsi_instruction[] = >>> [TGSI_OPCODE_ARL] = { ALU_OP0_NOP, tgsi_eg_arl}, >>> [TGSI_OPCODE_MOV] = { ALU_OP1_MOV, tgsi_op2}, >>> [TGSI_OPCODE_LIT] = { ALU_OP0_NOP, tgsi_lit}, >>> - [TGSI_OPCODE_RCP] = { ALU_OP1_RECIP_IEEE, >>> tgsi_trans_srcx_replicate}, >>> - [TGSI_OPCODE_RSQ] = { ALU_OP1_RECIPSQRT_IEEE, tgsi_rsq}, >>> + /* XXX: >>> + * For state trackers other than OpenGL, we'll want to use >>> + * _RECIP_IEEE/_RECIPSQRT_IEEE instead. >>> + */ >>> + [TGSI_OPCODE_RCP] = { ALU_OP1_RECIP_CLAMPED, >>> tgsi_trans_srcx_replicate}, >>> + [TGSI_OPCODE_RSQ] = { ALU_OP1_RECIPSQRT_CLAMPED, tgsi_rsq}, >>> [TGSI_OPCODE_EXP] = { ALU_OP0_NOP, tgsi_exp}, >>> [TGSI_OPCODE_LOG] = { ALU_OP0_NOP, tgsi_log}, >>> [TGSI_OPCODE_MUL] = { ALU_OP2_MUL_IEEE, tgsi_op2}, >>> @@ -9256,8 +9260,12 @@ static const struct r600_shader_tgsi_instruction >>> cm_shader_tgsi_instruction[] = >>> [TGSI_OPCODE_ARL] = { ALU_OP0_NOP, tgsi_eg_arl}, >>> [TGSI_OPCODE_MOV] = { ALU_OP1_MOV, tgsi_op2}, >>> [TGSI_OPCODE_LIT] = { ALU_OP0_NOP, tgsi_lit}, >>> - [TGSI_OPCODE_RCP] = { ALU_OP1_RECIP_IEEE, >>> cayman_emit_float_instr}, >>> - [TGSI_OPCODE_RSQ] = { ALU_OP1_RECIPSQRT_IEEE, >>> cayman_emit_float_instr}, >>> + /* XXX: >>> + * For state trackers other than OpenGL, we'll want to use >>> + * _RECIP_IEEE/_RECIPSQRT_IEEE instead. >>> + */ >>> + [TGSI_OPCODE_RCP] = { ALU_OP1_RECIP_CLAMPED, >>> cayman_emit_float_instr}, >>> + [TGSI_OPCODE_RSQ] = { ALU_OP1_RECIPSQRT_CLAMPED, >>> cayman_emit_float_instr}, >>> [TGSI_OPCODE_EXP] = { ALU_OP0_NOP, tgsi_exp}, >>> [TGSI_OPCODE_LOG] = { ALU_OP0_NOP, tgsi_log}, >>> [TGSI_OPCODE_MUL] = { ALU_OP2_MUL_IEEE, tgsi_op2}, >>> -- >>> 2.12.3 >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIBaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=FYMO5RZgIVWRluOu5DMEzW2xYcV3A5ZOOioWlZcdonw&s=BMC4YllECcl4gCLztCjAYp4GShGEolmDf7oSttV0cZA&e= _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev