On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott <cwabbo...@gmail.com> wrote: > On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner <matts...@gmail.com> wrote: >> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott <cwabbo...@gmail.com> wrote: >>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner <matts...@gmail.com> wrote: >>>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga <ito...@igalia.com> >>>> wrote: >>>>> From: Connor Abbott <connor.w.abb...@intel.com> >>>>> >>>>> It appears that not only math instructions, but also MOV_BYTES or >>>>> any instruction that uses Align1 mode cannot be in the middle >>>>> of a dependency control sequence or the GPU will hang (at least on my >>>>> BDW). This fixes GPU hangs in some fp64 tests. >>>> >>>> I'm pretty surprised by this assessment. Doubtful even. >>>> >>>>> Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> >>>>> --- >>>>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 ++++++++++++----- >>>>> 1 file changed, 12 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>>>> index 3bcd5cb..bc0a33b 100644 >>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const >>>>> vec4_instruction *inst) >>>>> } >>>>> >>>>> /* >>>>> + * Instructions that use Align1 mode cause the GPU to hang when >>>>> inserted >>>>> + * between a NoDDClr and NoDDChk in Align16 mode. Discovered >>>>> empirically. >>>>> + */ >>>>> + >>>>> + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || >>>>> + inst->opcode == VEC4_OPCODE_MOV_BYTES || >>>> >>>> PACK_BYTES sets depctrl itself in the generator, and at the time I >>>> added it I made a test that did >>>> >>>> vec4 foo = vec4(packUnorm4x8(...), >>>> packUnorm4x8(...), >>>> packUnorm4x8(...), >>>> packUnorm4x8(...)) >>>> >>>> and confirmed that it set depctrl properly on the whole sequence. >>>> There could of course be bugs somewhere, but the "hardware doesn't >>>> work if you mix align1 and align16 depctrl" seems unlikely. >>>> >>>> Do you know of a test that this affects? >>> >>> This only affects FP64 tests, since there we use an align1 mov to do >>> double-to-float and float-to-double. However, I tried commenting out >>> emit_nir_code() and just doing essentially: >>> >>> emit(MOV(...))->force_writemask_all = true; >>> emit(VEC4_OPCODE_PACK_BYTES, ...); >>> emit(MOV(...))->force_writemask_all = true; >>> >>> and on my BDW it hanged. In case it's not clear: this isn't about >>> setting depctrl on the instruction itself, it just can't be inside of >>> a depctrl sequence (which we were already disallowing for math >>> instructions anyways). >> >> Very weird. I'll take a look. So I understand, are the MOV >> instructions writing different channels of the same register? And >> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register >> as the MOVs? (I saw your fixup reply) > > Actually, I had them writing the same thing so the second overwrote > the first one.
Err, just to be clear... in the actual test that led to me discovering this, the instructions (not mov's but a bfi and a mov IIRC) were writing different components of the same register. In my hacked-up test to see if align1 in between a depctrl sequence was the problem, I just had them both write to the same thing since it was easier. In any case, I don't think it should matter too much. > The PACK_BYTES/MOV_BYTES/F2D/D2F (I think I tested all > of them, not sure) were operating on completely different registers, > and in the FP64 test that actually hung the GPU they were as well. > Using d2f since it's simpler and I remember what the operands are > (it's just an align1 mov with a dest stride of 2), the test code was > something like: > > mov g50, g51 { no_dd_clear } > d2f g52, g54 > mov g50, g53 { no_dd_check } > > and changing the d2f to a normal align16 mov or commenting it out > prevented the hang. It would be interesting to see if a math > instruction instead of d2f also hangs. > >> >> By the way, the math code is too heavy handed as far as I know. The >> BDW+ docs say that the MATH instruction itself cannot take dependency >> control hints (and empirically earlier platforms seem to have problems >> with this as well, see >> tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a >> math instruction being in the middle of a NoDDC* block. The person who >> implemented the math did the minimal amount of work to fix the >> problem. >> >> The PRM also says: >> >> """ >> Instructions other than send, may use this control as long as >> operations that have different pipeline latencies are not mixed. The >> operations that have longer latencies are: >> >> Opcodes pln, lrp, dp*. >> Operations involving double precision computation. >> Integer DW multiplication where both source operands are DWs. >> """ >> >> I would say that mixing a double-precision operation and something >> else might cause problems, but that seems like we have a different >> problem thus far. > > Yeah, these are all just mov's so I would expect that section to > apply. It still seems like we're not taking it into account, though... _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev