It seems overkill to use LLVM IR for BFE. There are several ways to express BFE(value, offset, bits) exactly, for example:
# if bits == 0: # 0 # else if offset + bits < 32: # (value << (32 - offset - bits)) >> (32 - bits) # else: # value >> offset Which can be simplified to either: # both = offset + bits # if bits == 0: # 0 # else if both < 32: # (value << (32 - both)) >> (32 - bits) # else: # value >> offset Or: # if bits == 0: # 0 # else if offset + bits < 32: # inv_bits = 32 - bits # (value << (inv_bits - offset)) >> inv_bits # else: # value >> offset All 3 variants should use ASHR if the signed version is required. Alternatively, one can use this as well: # if bits == 32: # value # else: # (value >> offset) & ((1 << bits) - 1) And an explicit sign extension should be used for the signed version. The first three are so complex that I doubt it would be easy to recognize them and match them precisely. They are also very unlikely to appear open-coded in real applications, therefore, expanding them and then matching them again seems like a huge waste of time. Apps will likely just use LSHR+AND, which extracts a bitfield, but it's not the same as BFE, so BFE can't be used for that. (if the second operand of AND is a constant expression, then BFE can indeed be used) I agree that we could use LLVM IR for simple instructions like BFM and BFI, but I doubt it's a good idea for complex instructions like BFE, and any small optimization that changes the expression can break the matching. Marek On Tue, Mar 10, 2015 at 3:30 PM, Tom Stellard <t...@stellard.net> wrote: > On Tue, Mar 10, 2015 at 12:42:38PM +0100, Marek Olšák wrote: >> OK. What about patches 8 an 9? >> > > I think the intrinsics in 9 are OK, but 8 should be using LLVM IR. > > -Tom > >> Marek >> >> On Thu, Mar 5, 2015 at 8:30 PM, Tom Stellard <t...@stellard.net> wrote: >> > On Mon, Mar 02, 2015 at 12:54:21PM +0100, Marek Olšák wrote: >> >> From: Marek Olšák <marek.ol...@amd.com> >> >> >> > >> > Hi Marek, >> > >> > After discussing with Matt, I think we should use LLVM IR rather than >> > intrinsics for IBFE and UBFE and then add patterns for them either in >> > the TableGen Files or AMDGPUISelDAGToDAG.cpp. >> > >> > Using intrinsics for BREV and POPC is fine though. >> > >> > -Tom >> > >> >> --- >> >> src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c | 8 ++++++++ >> >> 1 file changed, 8 insertions(+) >> >> >> >> diff --git a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c >> >> b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c >> >> index 385d3ad..034095f 100644 >> >> --- a/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c >> >> +++ b/src/gallium/drivers/radeon/radeon_setup_tgsi_llvm.c >> >> @@ -1293,6 +1293,8 @@ void radeon_llvm_context_init(struct >> >> radeon_llvm_context * ctx) >> >> bld_base->op_actions[TGSI_OPCODE_AND].emit = emit_and; >> >> bld_base->op_actions[TGSI_OPCODE_ARL].emit = emit_arl; >> >> bld_base->op_actions[TGSI_OPCODE_BGNLOOP].emit = bgnloop_emit; >> >> + bld_base->op_actions[TGSI_OPCODE_BREV].emit = >> >> build_tgsi_intrinsic_nomem; >> >> + bld_base->op_actions[TGSI_OPCODE_BREV].intr_name = >> >> "llvm.AMDGPU.brev"; >> >> bld_base->op_actions[TGSI_OPCODE_BRK].emit = brk_emit; >> >> bld_base->op_actions[TGSI_OPCODE_CEIL].emit = >> >> build_tgsi_intrinsic_nomem; >> >> bld_base->op_actions[TGSI_OPCODE_CEIL].intr_name = "ceil"; >> >> @@ -1326,6 +1328,8 @@ void radeon_llvm_context_init(struct >> >> radeon_llvm_context * ctx) >> >> bld_base->op_actions[TGSI_OPCODE_FSNE].emit = emit_fcmp; >> >> bld_base->op_actions[TGSI_OPCODE_IABS].emit = >> >> build_tgsi_intrinsic_nomem; >> >> bld_base->op_actions[TGSI_OPCODE_IABS].intr_name = >> >> "llvm.AMDIL.abs."; >> >> + bld_base->op_actions[TGSI_OPCODE_IBFE].emit = >> >> build_tgsi_intrinsic_nomem; >> >> + bld_base->op_actions[TGSI_OPCODE_IBFE].intr_name = >> >> "llvm.AMDGPU.bfe.i32"; >> >> bld_base->op_actions[TGSI_OPCODE_IDIV].emit = emit_idiv; >> >> bld_base->op_actions[TGSI_OPCODE_IF].emit = if_emit; >> >> bld_base->op_actions[TGSI_OPCODE_UIF].emit = uif_emit; >> >> @@ -1350,6 +1354,8 @@ void radeon_llvm_context_init(struct >> >> radeon_llvm_context * ctx) >> >> bld_base->op_actions[TGSI_OPCODE_MOD].emit = emit_mod; >> >> bld_base->op_actions[TGSI_OPCODE_NOT].emit = emit_not; >> >> bld_base->op_actions[TGSI_OPCODE_OR].emit = emit_or; >> >> + bld_base->op_actions[TGSI_OPCODE_POPC].emit = >> >> build_tgsi_intrinsic_nomem; >> >> + bld_base->op_actions[TGSI_OPCODE_POPC].intr_name = "llvm.ctpop.i32"; >> >> bld_base->op_actions[TGSI_OPCODE_POW].emit = >> >> build_tgsi_intrinsic_nomem; >> >> bld_base->op_actions[TGSI_OPCODE_POW].intr_name = "llvm.pow.f32"; >> >> bld_base->op_actions[TGSI_OPCODE_ROUND].emit = >> >> build_tgsi_intrinsic_nomem; >> >> @@ -1389,6 +1395,8 @@ void radeon_llvm_context_init(struct >> >> radeon_llvm_context * ctx) >> >> bld_base->op_actions[TGSI_OPCODE_TRUNC].emit = >> >> build_tgsi_intrinsic_nomem; >> >> bld_base->op_actions[TGSI_OPCODE_TRUNC].intr_name = >> >> "llvm.AMDGPU.trunc"; >> >> bld_base->op_actions[TGSI_OPCODE_UADD].emit = emit_uadd; >> >> + bld_base->op_actions[TGSI_OPCODE_UBFE].emit = >> >> build_tgsi_intrinsic_nomem; >> >> + bld_base->op_actions[TGSI_OPCODE_UBFE].intr_name = >> >> "llvm.AMDGPU.bfe.u32"; >> >> bld_base->op_actions[TGSI_OPCODE_UDIV].emit = emit_udiv; >> >> bld_base->op_actions[TGSI_OPCODE_UMAX].emit = >> >> build_tgsi_intrinsic_nomem; >> >> bld_base->op_actions[TGSI_OPCODE_UMAX].intr_name = >> >> "llvm.AMDGPU.umax"; >> >> -- >> >> 2.1.0 >> >> >> >> _______________________________________________ >> >> mesa-dev mailing list >> >> mesa-dev@lists.freedesktop.org >> >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev