On Sun, Mar 31, 2013 at 1:08 AM, Vadim Girlin <vadimgir...@gmail.com> wrote: > On 03/30/2013 05:35 AM, Martin Andersson wrote: >> >> I found an issue with the shader compiler for Cayman when I looked >> into why the ext_transform_feedback/order test case caused a GPU stall. >> It turned out the stall was an infinite loop that was the result of broken >> calculation in the shader function. The issue is that Cayman uses the >> tgsi_umad function for UMAD, but that does not work since it does not >> populate the y, z and w slots for UMUL that cayman requires. >> >> This patch implements a cayman_umad. There are some things I'm unsure of >> though. >> >> The UMUL for Cayman is compiled to, as far as I can tell, >> ALU_OP2_MULLO_INT and >> not ALU_OP2_MULLO_UINT. So I do not know if I should use the int or the >> uint >> version in cayman_umad. In the patch I used the uint one, because that >> seemed >> the most logical. > > > Probably the use of MULLO_INT for UMUL on cayman is just a typo, AFAIK > MULLO_UINT should be used.
Ok, I will send a patch for that as well then. > >> >> The add part of UMAD I copied from tgsi_umad and that had a loop around >> the >> variable lasti, but the variable lasti is usally not used in cayman >> specific code. >> > > The only difference with umad on cayman is in the mul part - each MULLO_UINT > should be expanded to 4 slots on cayman. Add part doesn't need any changes. > > >> This is used in tgsi functions. >> int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask); > > > This is used to determine last written vector component from the write mask, > so that if tgsi instruction doesn't write e.g. W component, we don't have to > emit R600 instruction(s) for that component. > > >> >> But in cayman specific code this is used instead. >> int last_slot = (inst->Dst[0].Register.WriteMask & 0x8) ? 4 : 3; > > > This is used for instructions like RECIP_xxx (see the comment at > r600_shader.c:40) that should be expanded to 3 slots with optional 4th slot > if the write to the W component is required, but MULLO_UINT is different - > it should be expanded to 4 instruction slots always. By the way, it seems > cayman_mul_int_instr is incorrect in this regard. > > >> >> It does not work to switch lasti with last_slot, since that makes the loop >> run too >> many times (in my test case lasti is 0 and last_slot is 3). So I just >> removed the >> loop, was that correct or should i resolve that in some other way? > > > No, it's not correct, there should be a loop over the vector components for > addition as well - it should be performed in the same way as on the > pre-cayman chips. In your patch you are only performing the addition for one > component. > > Basically, the only required change for UMAD on cayman is that you need to > expand each one-slot MULLO_xx on pre-cayman into 4 instruction slots on > cayman. Should I keep the cayman_umad function or should I modify tgsi_umad and add the cayman specific part there? > Vadim > > >> >> Martin Andersson (1): >> r600g: Add a Cayman specific version of UMAD >> >> src/gallium/drivers/r600/r600_shader.c | 47 >> +++++++++++++++++++++++++++++++++- >> 1 file changed, 46 insertions(+), 1 deletion(-) >> > //Martin _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev