Am 09.11.2016 um 14:55 schrieb Nicolai Hähnle: > On 09.11.2016 14:46, Roland Scheidegger wrote: >> Reviewed-by: Roland Scheidegger <[email protected]> >> >> I'm curious though, is for radeonsi zext not equivalent to interleaving >> the low 32bits of each number with zeros (and hence doing the a >> uninterleave doesn't give you back the low respectively high bits)? > > radeonsi ends up calling lp_build_mul_32_lohi once for each component of > the vector, with no vector types involved at all, it's all plain i32 and > i64. This is better for the CodeGen backend in LLVM. > > The code was previously returning a <1 x i32> for each component, and > that caused the later stages of opcode handling to get confused about > what to do with the result. So casting the result to a scalar would have fixed this too? This is actually sort of a deficiency of lp_build_uninterleave1 - the lp_build_xx code usually is built to avoid 1xn vector types and use scalars in that case, but not quite everywhere.
Roland > > Cheers, > Nicolai > >> >> Roland >> >> Am 09.11.2016 um 12:46 schrieb Nicolai Hähnle: >>> From: Nicolai Hähnle <[email protected]> >>> >>> The fix in commit 88f791db75e9f065bac8134e0937e1b76600aa36 was >>> insufficient >>> for radeonsi because the vector case was not handled properly. It seems >>> piglit only covers the scalar case, unfortunately. >>> >>> Fixes GL45-CTS.shader_bitfield_operation.[iu]mulExtended.* >>> --- >>> src/gallium/auxiliary/gallivm/lp_bld_arit.c | 20 ++++++++++++-------- >>> 1 file changed, 12 insertions(+), 8 deletions(-) >>> >>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c >>> b/src/gallium/auxiliary/gallivm/lp_bld_arit.c >>> index 43ad238..5553cb1 100644 >>> --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c >>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c >>> @@ -1230,42 +1230,46 @@ lp_build_mul_32_lohi_cpu(struct >>> lp_build_context *bld, >>> * Emits generic code. >>> */ >>> LLVMValueRef >>> lp_build_mul_32_lohi(struct lp_build_context *bld, >>> LLVMValueRef a, >>> LLVMValueRef b, >>> LLVMValueRef *res_hi) >>> { >>> struct gallivm_state *gallivm = bld->gallivm; >>> LLVMBuilderRef builder = gallivm->builder; >>> - LLVMValueRef tmp; >>> + LLVMValueRef tmp, shift, res_lo; >>> struct lp_type type_tmp; >>> - LLVMTypeRef wide_type, cast_type; >>> + LLVMTypeRef wide_type, narrow_type; >>> >>> type_tmp = bld->type; >>> + narrow_type = lp_build_vec_type(gallivm, type_tmp); >>> type_tmp.width *= 2; >>> wide_type = lp_build_vec_type(gallivm, type_tmp); >>> - type_tmp = bld->type; >>> - type_tmp.length *= 2; >>> - cast_type = lp_build_vec_type(gallivm, type_tmp); >>> + shift = lp_build_const_vec(gallivm, type_tmp, 32); >>> >>> if (bld->type.sign) { >>> a = LLVMBuildSExt(builder, a, wide_type, ""); >>> b = LLVMBuildSExt(builder, b, wide_type, ""); >>> } else { >>> a = LLVMBuildZExt(builder, a, wide_type, ""); >>> b = LLVMBuildZExt(builder, b, wide_type, ""); >>> } >>> tmp = LLVMBuildMul(builder, a, b, ""); >>> - tmp = LLVMBuildBitCast(builder, tmp, cast_type, ""); >>> - *res_hi = lp_build_uninterleave1(gallivm, bld->type.length * 2, >>> tmp, 1); >>> - return lp_build_uninterleave1(gallivm, bld->type.length * 2, tmp, >>> 0); >>> + >>> + res_lo = LLVMBuildTrunc(builder, tmp, narrow_type, ""); >>> + >>> + /* Since we truncate anyway, LShr and AShr are equivalent. */ >>> + tmp = LLVMBuildLShr(builder, tmp, shift, ""); >>> + *res_hi = LLVMBuildTrunc(builder, tmp, narrow_type, ""); >>> + >>> + return res_lo; >>> } >>> >>> >>> /* a * b + c */ >>> LLVMValueRef >>> lp_build_mad(struct lp_build_context *bld, >>> LLVMValueRef a, >>> LLVMValueRef b, >>> LLVMValueRef c) >>> { >>> >> _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
