Right now we rely on the code at the bottom of brw_set_dest to set the correct 
execution size for anything that does not operate on a full SIMD register 
(dst.width < BRW_EXECUTE_8). However, this won't work with fp64, where operands 
are twice as big and we see instructions with a horizontal width of 4 that 
still require an execution size of 8. We cannot fix this by simply checking the 
type of the operands involved and skip the automatic execsize adjustment when 
they are doubles because we can also operate on doubles as integers (for pack 
and unpack operations for example).

Connor and I have been discussing this and we see two options:

1) We fix all the instructions that do not operate on doubles to set the
correct execution size so they don't depend on the code in brw_set_dest any 
more, then remove that code from brw_set_dest so it does not break fp64. 
However, this involves a lot of changes throughout the driver and across all 
hardware platforms... it looks rather scary, so I think we probably don't want 
to go down this path.

2) Since we only produce double instructions with a width of 4, we can limit 
the change to these instructions. This means that we only automatically adjust 
execsize in brw_set_dest for instructions where dst.width < BRW_EXECUTE_4 and 
we fix driver code that emits non-double instructions with a width of 4 to set 
the execsize they need explicitly. For gen7/8 it seems that only a handful of 
changes are required for this according to piglit.

This RFC series implements 2). As mentioned, piglit seems happy with the 
changes, but it might be a good idea to run this through Jenkins to make sure 
that we are not missing anything.

My original idea was to expand the fix to cover other gens as well, but 
unfortunately, this would involve many more changes. For example, ILK's 
strip-and-fans or clipping modules are full with code that mixes width8 and 
width4 instructions (I have ~10 patches that fix this that involve changing the 
default execution size to 4 in the sf module), gen6 also needs a few changes 
and I would not be surprised if gen4 required more. Gen9 might require changes 
too. Another problem with this is that we don't have gen4/9 hardware available, 
so trying this would also require involvement from other people with access to 
this hardware.

So considering that we are not targetting fp64 support on gens < 7, at least 
for now, it seems that implementing 2) but special-case it for gen7 and gen8 
only, leaving other generations intact is a resoanble way to go for now.

Opinions?

Iago Toral Quiroga (5):
  i965/eu: set correct execution size in brw_NOP
  i965/fs: set execution size for SEND messages in
    generate_uniform_pull_constant_load_gen7
  i965/eu: set execution size for SEND message in
    brw_send_indirect_message
  i965: set correct execsize for MOVS with a width of 4 in
    brw_find_live_channel
  i965: Skip execution size adjustment for instructions of width 4

 src/mesa/drivers/dri/i965/brw_eu_emit.c        | 21 ++++++++++++++++++++-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  2 ++
 2 files changed, 22 insertions(+), 1 deletion(-)

-- 
2.1.4

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to