It's been some time since we sent the first version of the patches, so here is a v2, which adds:
1. Feedback from Curro to v1. I think the only thing missing is the suggestion to change the semantics of the offset() helper in vec4 to match those in the scalar backend. I sent this as a separate series [1] that is still awaiting review. Once that is good to land we should adapt this series accordingly. 2. Adaptations to the sub-register offsets work done by Curro in master. 3. Some rudimentary support for 64-bit spilling. This is quite limited at the moment, since it skips spilling of fp64 data in a number of cases where it is not safe to do it at present. I guess we can look for ways improve this going forward, but I rather do that after we land the bulk of fp64, since the series is already quite big as it is. 4. Avoid scalarizing a number of swizzle combinations that we can support natively. 5. Many other small clean-ups and fixes. The series is available for testing in the 'i965-fp64-gen7-scalar-vec4-rc2' branch of our github repository [2]. This series implements the bulk of the fp64 align16 backend support and creates the infrastructure to implement vertex attrib 64bit as well, so once this lands in master we plan to send additional series that add VA64 for Haswell, and then Fp64 and VA64 for IvyBridge. [1] https://lists.freedesktop.org/archives/mesa-dev/2016-October/130459.html [2] https://github.com/Igalia/mesa/tree/i965-fp64-gen7-scalar-vec4-rc2 Connor Abbott (6): i965/vec4/nir: simplify glsl_type_for_nir_alu_type() i965/vec4/nir: allocate two registers for dvec3/dvec4 i965/vec4/nir: set the right type for 64-bit registers i965/vec4: add support for printing DF immediates i965: add brw_vecn_grf() i965/vec4: don't constant propagate 64-bit immediates Iago Toral Quiroga (92): i965/vec4/nir: Add bit-size information to types i965/vec4/nir: support doubles in ALU operations i965/vec4/nir: fix emitting 64-bit immediates i965/vec4: add double/float conversion pseudo-opcodes i965/vec4: translate d2f/f2d i965: fix subnr overflow in suboffset() i965/vec4: set correct register regions for 32-bit and 64-bit i965/disasm: align16 DF source regions have a width of 2 i965/vec4: We only support 32-bit integer ALU operations for now i965/vec4: add dst_null_df() i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT i965/vec4: don't copy propagate vector opcodes that operate in align1 mode i965/vec4: implement double unpacking i965/vec4: implement double packing i965/vec4/nir: implement double comparisons i965/vec4: fix base offset for nir_registers with doubles i965/vec4: fix indentation in get_nir_src() i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations i965/vec4: make opt_vector_float ignore doubles i965/vec4: fix register allocation for 64-bit undef sources i965/vec4: Rename DF to/from F generator opcodes i965/vec4: add helpers for conversions to/from doubles i965/vec4: implement hardware workaround for align16 double to float conversion i965/vec4: implement d2i, d2u, i2d and u2d i965/vec4: implement d2b i965/vec4: implement fsign() for doubles i965/vec4: fix optimize predicate for doubles i965/vec4: add a helper function to create double immediates i965: move exec_size from fs_instruction to backend_instruction i965/vec4: fix size_written for doubles i965/vec4: fix regs_read() for doubles i965/vec4: use the IR's execution size i965/vec4: dump the instruction execution size i965/vec4: add a horiz_offset() helper i965: move the group field from fs_inst to backend_instruction. i965/vec4: add a SIMD lowering pass i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions i965/vec4: dump NibCtrl for instructions with execsize != 8 i965/disasm: print NibCtrl for instructions with execsize < 8 i965/vec4: teach CSE about exec_size, group and doubles i965/vec4: teach cmod propagation about different execution sizes i965/vec4: split double-precision bcsel i965/vec4: add a scalarization pass for double-precision instructions i965/vec4: translate 64-bit swizzles to 32-bit i965/vec4: implement access to DF source components Z/W i965/disasm: fix subreg for dst in Align16 mode i965/vec4: teach register coalescing about 64-bit i965/vec4: fix pack_uniform_registers for doubles i965/vec4: fix indentation in pack_uniform_registers i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands i965/vec4/nir: do not emit 64-bit MAD i965/vec4: do not emit 64-bit MAD i965/vec4: support multiple dispatch widths and groups in the IR builder. i965/vec4: Add a shuffle_64bit_data helper i965/vec4: Fix UBO loads for 64-bit data i965/vec4: Fix SSBO loads for 64-bit data i965/vec4: Fix SSBO stores for 64-bit data i965/vec4: prevent copy-propagation from values with a different type size i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions i965/vec4: don't propagate single-precision uniforms into 4-wide instructions i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms i965/vec4: Do not use DepCtrl with 64-bit instructions i965/vec4: do not split scratch read/write opcodes i965/vec4: fix scratch offset for 64bit data i965/vec4: fix scratch reads for 64bit data i965/vec4: fix scratch writes for 64bit data i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-bit data i965/vec4: fix indentation in move_push_constants_to_pull_constants() i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit data i965/vec4: make emit_pull_constant_load support 64-bit loads i965/vec4: fix indentation in lower_attributes_to_hw_regs() i965/vec4: fix attribute setup for doubles i965/vec4: fix store output for 64-bit types i965/vec4/tcs: fix input loading for 64-bit data i965/vec4/tcs: fix outputs for 64-bit data i965/vec4/tes: fix input loading for 64bit data types i965/vec4/tes: fix setup_payload() for 64bit data types i965/vec4/tes: consider register offsets during attribute setup i965/vec4: dump subnr for FIXED_GRF i965/vec4: split instructions that read 64-bit interleaved attributes i965/vec4/scalarize_df: do not scalarize swizzles that we can support natively i965/vec4/scalarize_df: support more swizzles via vstride=0 i965/vec4: prevent src/dst hazards during 64-bit register allocation i965/vec4: run scalarize_df() after spilling i965/vec4: support basic spilling of 64-bit registers i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit access i965/vec4: prevent spilling of DOUBLE_TO_SINGLE destination i965/vec4: adjust spilling costs for 64-bit registers. i965/vec4: enable ARB_gpu_shader_fp64 for Haswell i965/gen7: expose OpenGL 4.0 on Haswell Juan A. Suarez Romero (1): i965/vec4: handle 32 and 64 bit channels in liveness analysis Samuel Iglesias Gonsálvez (4): i965/nir: double/dvec2 uniforms only need to be padded to a single vec4 slot i965/vec4: use the new helper function to create double immediates i965/vec4: don't copy propagate misaligned registers i965/vec4/gs: fix input loading for 64bit data src/mesa/drivers/dri/i965/brw_defines.h | 6 + src/mesa/drivers/dri/i965/brw_disasm.c | 13 +- src/mesa/drivers/dri/i965/brw_ir_fs.h | 16 - src/mesa/drivers/dri/i965/brw_ir_vec4.h | 47 ++ src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp | 3 +- src/mesa/drivers/dri/i965/brw_reg.h | 24 +- src/mesa/drivers/dri/i965/brw_shader.cpp | 12 + src/mesa/drivers/dri/i965/brw_shader.h | 16 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 769 ++++++++++++++++++--- src/mesa/drivers/dri/i965/brw_vec4.h | 24 + src/mesa/drivers/dri/i965/brw_vec4_builder.h | 39 +- .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 4 +- .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 59 ++ src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 33 +- .../dri/i965/brw_vec4_dead_code_eliminate.cpp | 28 +- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 104 +++ src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp | 51 +- .../drivers/dri/i965/brw_vec4_live_variables.cpp | 32 +- .../drivers/dri/i965/brw_vec4_live_variables.h | 15 +- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 611 +++++++++++++--- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 85 ++- src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 65 +- src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 97 ++- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 168 ++++- src/mesa/drivers/dri/i965/intel_extensions.c | 5 + src/mesa/drivers/dri/i965/intel_screen.c | 2 +- 26 files changed, 1984 insertions(+), 344 deletions(-) -- 2.7.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev