[PATCH]{AArch64] Implemented half-precision vget/vset_lane_f16 intrinsics

Ana Pazos Tue, 03 Dec 2013 12:04:08 -0800

Hi Tim and reviewers,


I followed Tim's suggestion to use macros to handle the float16_t storage
type.

 

Notes:

1)      Since the vget/vset_lane_f16 intrinsics read and write 16 bit data
(no FP arithmetic performed), I simply reinterpreted float16_t and the
vector of float16_t as i16 data.

See the operators defined in NeonEmitter.

2)      With this change vget/vset_lane_f16 use to vget/vset_lane_i16
implementation.

3)      I added f16_to_f32 pattern because in the vset_lane case the i16
data is moved from GPR to a FP register, and hence the need for this
pattern.

4)      I added test cases that define float16_t variable in the function
body, but do not return such value type as it is not allowed. Let me know if
these tests are satisfactory.

5)      I did not try to enforce the recommended intrinsic->instruction map
from the ARM document. 

To force those instructions I would have to use builtins and v1i16 type
casts so I can create the pattern.

Even doing that, in some cases the UMOV, INS patterns defined earlier can
prevail over.

 

Thanks,

Ana.

0001-llvm-Implemented-vget-vset_lane_f16-intrinsics.patch
Description: Binary data

0001-clang-Implemented-vget-vset_lane_f16-intrinsics.patch
Description: Binary data

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

[PATCH]{AArch64] Implemented half-precision vget/vset_lane_f16 intrinsics

Reply via email to