On Thu, Nov 19, 2015 at 3:11 PM, Matt Turner <matts...@gmail.com> wrote: > On Thu, Nov 19, 2015 at 11:35 AM, Connor Abbott <cwabbo...@gmail.com> wrote: >> On Thu, Nov 19, 2015 at 2:07 PM, Connor Abbott <cwabbo...@gmail.com> wrote: >>> On Thu, Nov 19, 2015 at 1:54 PM, Matt Turner <matts...@gmail.com> wrote: >>>> On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott <cwabbo...@gmail.com> wrote: >>>>> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner <matts...@gmail.com> wrote: >>>>>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga <ito...@igalia.com> >>>>>> wrote: >>>>>>> From: Connor Abbott <connor.w.abb...@intel.com> >>>>>>> >>>>>>> It appears that not only math instructions, but also MOV_BYTES or >>>>>>> any instruction that uses Align1 mode cannot be in the middle >>>>>>> of a dependency control sequence or the GPU will hang (at least on my >>>>>>> BDW). This fixes GPU hangs in some fp64 tests. >>>>>> >>>>>> I'm pretty surprised by this assessment. Doubtful even. >>>>>> >>>>>>> Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> >>>>>>> --- >>>>>>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 ++++++++++++----- >>>>>>> 1 file changed, 12 insertions(+), 5 deletions(-) >>>>>>> >>>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>>>>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>>>>>> index 3bcd5cb..bc0a33b 100644 >>>>>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>>>>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>>>>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const >>>>>>> vec4_instruction *inst) >>>>>>> } >>>>>>> >>>>>>> /* >>>>>>> + * Instructions that use Align1 mode cause the GPU to hang when >>>>>>> inserted >>>>>>> + * between a NoDDClr and NoDDChk in Align16 mode. Discovered >>>>>>> empirically. >>>>>>> + */ >>>>>>> + >>>>>>> + if (inst->opcode == VEC4_OPCODE_PACK_BYTES || >>>>>>> + inst->opcode == VEC4_OPCODE_MOV_BYTES || >>>>>> >>>>>> PACK_BYTES sets depctrl itself in the generator, and at the time I >>>>>> added it I made a test that did >>>>>> >>>>>> vec4 foo = vec4(packUnorm4x8(...), >>>>>> packUnorm4x8(...), >>>>>> packUnorm4x8(...), >>>>>> packUnorm4x8(...)) >>>>>> >>>>>> and confirmed that it set depctrl properly on the whole sequence. >>>>>> There could of course be bugs somewhere, but the "hardware doesn't >>>>>> work if you mix align1 and align16 depctrl" seems unlikely. >>>>>> >>>>>> Do you know of a test that this affects? >>>>> >>>>> This only affects FP64 tests, since there we use an align1 mov to do >>>>> double-to-float and float-to-double. However, I tried commenting out >>>>> emit_nir_code() and just doing essentially: >>>>> >>>>> emit(MOV(...))->force_writemask_all = true; >>>>> emit(VEC4_OPCODE_PACK_BYTES, ...); >>>>> emit(MOV(...))->force_writemask_all = true; >>>>> >>>>> and on my BDW it hanged. In case it's not clear: this isn't about >>>>> setting depctrl on the instruction itself, it just can't be inside of >>>>> a depctrl sequence (which we were already disallowing for math >>>>> instructions anyways). >>>> >>>> Very weird. I'll take a look. So I understand, are the MOV >>>> instructions writing different channels of the same register? And >>>> VEC4_OPCODE_PACK_BYTES is writing to a different or the same register >>>> as the MOVs? (I saw your fixup reply) >>> >>> Actually, I had them writing the same thing so the second overwrote >>> the first one. >> >> Err, just to be clear... in the actual test that led to me discovering >> this, the instructions (not mov's but a bfi and a mov IIRC) were >> writing different components of the same register. In my hacked-up >> test to see if align1 in between a depctrl sequence was the problem, I >> just had them both write to the same thing since it was easier. In any >> case, I don't think it should matter too much. > > Interesting. If it was a BFI and a MOV writing different components of > the same register, I can see that failing. Though I haven't measured, > 3-src instructions usually have 2+ extra cycles of latency and with a > 2-cycle issue time, a BFI followed immediately by a MOV would mean > that both of them retired in the same cycle. That would seem bad if > one of them was supposed to signal the scoreboard. > ×
They weren't right next to each other, though -- there was a sequence of 5-6 instructions between them, one of which happens to be an align1 mov which (according to my testing) hangs the GPU in between any depctrl sequence. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev