On Thu, Nov 19, 2015 at 7:31 AM, Connor Abbott <cwabbo...@gmail.com> wrote:
> On Thu, Nov 19, 2015 at 6:40 AM, Matt Turner <matts...@gmail.com> wrote:
>> On Thu, Nov 19, 2015 at 2:05 AM, Iago Toral Quiroga <ito...@igalia.com> 
>> wrote:
>>> From: Connor Abbott <connor.w.abb...@intel.com>
>>>
>>> It appears that not only math instructions, but also MOV_BYTES or
>>> any instruction that uses Align1 mode cannot be in the middle
>>> of a dependency control sequence or the GPU will hang (at least on my
>>> BDW). This fixes GPU hangs in some fp64 tests.
>>
>> I'm pretty surprised by this assessment. Doubtful even.
>>
>>> Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 17 ++++++++++++-----
>>>  1 file changed, 12 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> index 3bcd5cb..bc0a33b 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
>>> @@ -838,6 +838,17 @@ vec4_visitor::is_dep_ctrl_unsafe(const 
>>> vec4_instruction *inst)
>>>     }
>>>
>>>     /*
>>> +    * Instructions that use Align1 mode cause the GPU to hang when inserted
>>> +    * between a NoDDClr and NoDDChk in Align16 mode. Discovered 
>>> empirically.
>>> +    */
>>> +
>>> +   if (inst->opcode == VEC4_OPCODE_PACK_BYTES ||
>>> +       inst->opcode == VEC4_OPCODE_MOV_BYTES ||
>>
>> PACK_BYTES sets depctrl itself in the generator, and at the time I
>> added it I made a test that did
>>
>>   vec4 foo = vec4(packUnorm4x8(...),
>>                   packUnorm4x8(...),
>>                   packUnorm4x8(...),
>>                   packUnorm4x8(...))
>>
>> and confirmed that it set depctrl properly on the whole sequence.
>> There could of course be bugs somewhere, but the "hardware doesn't
>> work if you mix align1 and align16 depctrl" seems unlikely.
>>
>> Do you know of a test that this affects?
>
> This only affects FP64 tests, since there we use an align1 mov to do
> double-to-float and float-to-double. However, I tried commenting out
> emit_nir_code() and just doing essentially:
>
> emit(MOV(...))->force_writemask_all = true;
> emit(VEC4_OPCODE_PACK_BYTES, ...);
> emit(MOV(...))->force_writemask_all = true;
>
> and on my BDW it hanged. In case it's not clear: this isn't about
> setting depctrl on the instruction itself, it just can't be inside of
> a depctrl sequence (which we were already disallowing for math
> instructions anyways).

Very weird. I'll take a look. So I understand, are the MOV
instructions writing different channels of the same register? And
VEC4_OPCODE_PACK_BYTES is writing to a different or the same register
as the MOVs? (I saw your fixup reply)

By the way, the math code is too heavy handed as far as I know. The
BDW+ docs say that the MATH instruction itself cannot take dependency
control hints (and empirically earlier platforms seem to have problems
with this as well, see
tests/shaders/dependency-hints/exp2.shader_test) -- nothing about a
math instruction being in the middle of a NoDDC* block. The person who
implemented the math did the minimal amount of work to fix the
problem.

The PRM also says:

"""
Instructions other than send, may use this control as long as
operations that have different pipeline latencies are not mixed. The
operations that have longer latencies are:

Opcodes pln, lrp, dp*.
Operations involving double precision computation.
Integer DW multiplication where both source operands are DWs.
"""

I would say that mixing a double-precision operation and something
else might cause problems, but that seems like we have a different
problem thus far.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to