[PATCH] D80716: [AArch64]: BFloat Load/Store Intrinsics&CodeGen

Luke Geeson via Phabricator via cfe-commits Tue, 02 Jun 2020 08:48:12 -0700

LukeGeeson added a subscriber: pratlucas.
LukeGeeson added inline comments.



================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:10366
+    auto Alignment = CGM.getNaturalPointeeTypeAlignment(
+        E->getArg(0)->IgnoreParenCasts()->getType());
     Ops[0] = Builder.CreateBitCast(Ops[0], llvm::PointerType::getUnqual(VTy));
----------------
simon_tatham wrote:
> What effect is this change of strategy having on the alignment computation, 
> for the already-supported instances of this builtin?
> 
> It //looks// to me as if `__builtin_neon_vld1_v` with (say) a `uint8_t *` 
> pointer argument will now compute `Alignment=1` (the natural alignment of the 
> pointee type), whereas it would previously have computed `Alignment=8` (the 
> size of the whole vector being loaded or stored).
> 
> Is that intentional? Or accidental? Or have I completely misunderstood what's 
> going on?
> 
> (Whichever of the three, some discussion in the commit message would be a 
> good idea, explaining why this change does or does not make a difference, as 
> appropriate.)
Clang was incorrectly assuming that all the pointers from which loads were 
being generated for vld1 intrinsics were aligned according to according to the 
intrinsics result type. This causes alignment faults on the code generated by 
the backend.
This fixes the issue so that alignment is based on the type of the pointer 
provided to as input to the intrinsic.
 
@pratlucas has done some work on this in parallel 
https://reviews.llvm.org/D79721 which has been approved and may overrule this 
particular line of code. 

I shall add a note to the commit message, and tentatively mark this as fixed, 
given it's liable to adopt the work of Lucas.



================
Comment at: clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c:36
+// CHECK32: ret <4 x bfloat> %vld1_lane
+
+bfloat16x8_t test_vld1q_lane_bf16(bfloat16_t const *ptr, bfloat16x8_t src) {
----------------
labrinea wrote:
> CHECK-NEXT or CHECK-DAG are preferable for sequences.
Added to be consistent with the rest of the file (ie no CHECK-NEXT, but 
CHECK32/64)


================
Comment at: clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c:180
+// %.fca.0.2.insert = insertvalue %struct.bfloat16x4x3_t %.fca.0.1.insert, <4 
x bfloat> %vld3_lane.fca.2.extract, 0, 2
+
+bfloat16x8x3_t test_vld3q_lane_bf16(bfloat16_t const *ptr, bfloat16x8x3_t src) 
{
----------------
labrinea wrote:
> where are the check lines?
Added to be consistent with the rest of the file


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80716/new/

https://reviews.llvm.org/D80716



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D80716: [AArch64]: BFloat Load/Store Intrinsics&CodeGen

Reply via email to