Re: [PATCH]{AArch64] Implemented half-precision vget/vset_lane_f16 intrinsics

Jiangning Liu Thu, 05 Dec 2013 01:17:25 -0800

Ana,

LGTM now!


Thanks,
-Jiangning



2013/12/5 Ana Pazos <[email protected]>

> Hi Jiangning,
>
>
>
> In the example you mentioned the dup instruction is the best pattern
> because the test case simply returns the result of vget intrinsic as a
> float.
>
>
>
> But we cannot guarantee that vset/vget will use the preferred mapping from
> the ARM manual nor that the preferred mapping is the best code always,
> because it depends on how the half-precision FP value is used before or
> after the vget/vset intrinsics calls (which may cause other patterns like
> UMOV, INS to be applied).
>
>
>
> What I decided to do:
>
> I added a couple of additional patterns that include fp16_to_f32 operation
> in the vector extract/insert and vector copy expressions. This way I can
> optimize the test case you mentioned and a couple more cases I could think
> of.
>
>
>
> See the updated patches attached.
>
>
>
> Thanks,
>
> Ana.
>
>
>
>
>
> *From:* Jiangning Liu [mailto:[email protected]]
> *Sent:* Wednesday, December 04, 2013 1:21 AM
> *To:* Ana Pazos
> *Cc:* llvm-commits; [email protected]; Tim Northover
> *Subject:* Re: [PATCH]{AArch64] Implemented half-precision
> vget/vset_lane_f16 intrinsics
>
>
>
> Hi Ana,
>
>
>
> Can we generate more optimized code by using instruction DUP Hd,Vn.H[lane]
>  directly.
>
>
>
> With your patch we are generating two instructions like,
>
>
>
>           umov w0, v0.h[3]
>
>           fmov   s0, w0
>
>
>
> Actually they can be combined to a single "dup h0, v0.h[3]". This is what
> we want for this intrinsic function.
>
>
>
> What you need to do is by adding one more pattern to match this case.
>
>
>
> Thanks,
>
> -Jiangning
>
>
>
>
>
> 2013/12/4 Ana Pazos <[email protected]>
>
> Hi Tim and reviewers,
>
>
>
> I followed Tim’s suggestion to use macros to handle the float16_t storage
> type.
>
>
>
> Notes:
>
> 1)      Since the vget/vset_lane_f16 intrinsics read and write 16 bit
> data (no FP arithmetic performed), I simply reinterpreted float16_t and the
> vector of float16_t as i16 data.
>
> See the operators defined in NeonEmitter.
>
> 2)      With this change vget/vset_lane_f16 use to vget/vset_lane_i16
> implementation.
>
> 3)      I added f16_to_f32 pattern because in the vset_lane case the i16
> data is moved from GPR to a FP register, and hence the need for this
> pattern.
>
> 4)      I added test cases that define float16_t variable in the function
> body, but do not return such value type as it is not allowed. Let me know
> if these tests are satisfactory.
>
> 5)      I did not try to enforce the recommended intrinsic->instruction
> map from the ARM document.
>
> To force those instructions I would have to use builtins and v1i16 type
> casts so I can create the pattern.
>
> Even doing that, in some cases the UMOV, INS patterns defined earlier can
> prevail over.
>
>
>
> Thanks,
>
> Ana.
>
>
> _______________________________________________
> cfe-commits mailing list
> [email protected]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
>
>
>
>
> --
>
> Thanks,
>
> -Jiangning
>



-- 
Thanks,
-Jiangning

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH]{AArch64] Implemented half-precision vget/vset_lane_f16 intrinsics

Reply via email to