Hi Jiangning,
In the example you mentioned the dup instruction is the best pattern because the test case simply returns the result of vget intrinsic as a float. But we cannot guarantee that vset/vget will use the preferred mapping from the ARM manual nor that the preferred mapping is the best code always, because it depends on how the half-precision FP value is used before or after the vget/vset intrinsics calls (which may cause other patterns like UMOV, INS to be applied). What I decided to do: I added a couple of additional patterns that include fp16_to_f32 operation in the vector extract/insert and vector copy expressions. This way I can optimize the test case you mentioned and a couple more cases I could think of. See the updated patches attached. Thanks, Ana. From: Jiangning Liu [mailto:[email protected]] Sent: Wednesday, December 04, 2013 1:21 AM To: Ana Pazos Cc: llvm-commits; [email protected]; Tim Northover Subject: Re: [PATCH]{AArch64] Implemented half-precision vget/vset_lane_f16 intrinsics Hi Ana, Can we generate more optimized code by using instruction DUP Hd,Vn.H[lane] directly. With your patch we are generating two instructions like, umov w0, v0.h[3] fmov s0, w0 Actually they can be combined to a single "dup h0, v0.h[3]". This is what we want for this intrinsic function. What you need to do is by adding one more pattern to match this case. Thanks, -Jiangning 2013/12/4 Ana Pazos <[email protected]> Hi Tim and reviewers, I followed Tim's suggestion to use macros to handle the float16_t storage type. Notes: 1) Since the vget/vset_lane_f16 intrinsics read and write 16 bit data (no FP arithmetic performed), I simply reinterpreted float16_t and the vector of float16_t as i16 data. See the operators defined in NeonEmitter. 2) With this change vget/vset_lane_f16 use to vget/vset_lane_i16 implementation. 3) I added f16_to_f32 pattern because in the vset_lane case the i16 data is moved from GPR to a FP register, and hence the need for this pattern. 4) I added test cases that define float16_t variable in the function body, but do not return such value type as it is not allowed. Let me know if these tests are satisfactory. 5) I did not try to enforce the recommended intrinsic->instruction map from the ARM document. To force those instructions I would have to use builtins and v1i16 type casts so I can create the pattern. Even doing that, in some cases the UMOV, INS patterns defined earlier can prevail over. Thanks, Ana. _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits -- Thanks, -Jiangning
0002-clang-Implemented-vget-vset_lane_f16-intrinsics.patch
Description: Binary data
0002-llvm-Implemented-vget-vset_lane_f16-intrinsics.patch
Description: Binary data
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
