Ana, LGTM now!
Thanks, -Jiangning 2013/12/5 Ana Pazos <[email protected]> > Hi Jiangning, > > > > In the example you mentioned the dup instruction is the best pattern > because the test case simply returns the result of vget intrinsic as a > float. > > > > But we cannot guarantee that vset/vget will use the preferred mapping from > the ARM manual nor that the preferred mapping is the best code always, > because it depends on how the half-precision FP value is used before or > after the vget/vset intrinsics calls (which may cause other patterns like > UMOV, INS to be applied). > > > > What I decided to do: > > I added a couple of additional patterns that include fp16_to_f32 operation > in the vector extract/insert and vector copy expressions. This way I can > optimize the test case you mentioned and a couple more cases I could think > of. > > > > See the updated patches attached. > > > > Thanks, > > Ana. > > > > > > *From:* Jiangning Liu [mailto:[email protected]] > *Sent:* Wednesday, December 04, 2013 1:21 AM > *To:* Ana Pazos > *Cc:* llvm-commits; [email protected]; Tim Northover > *Subject:* Re: [PATCH]{AArch64] Implemented half-precision > vget/vset_lane_f16 intrinsics > > > > Hi Ana, > > > > Can we generate more optimized code by using instruction DUP Hd,Vn.H[lane] > directly. > > > > With your patch we are generating two instructions like, > > > > umov w0, v0.h[3] > > fmov s0, w0 > > > > Actually they can be combined to a single "dup h0, v0.h[3]". This is what > we want for this intrinsic function. > > > > What you need to do is by adding one more pattern to match this case. > > > > Thanks, > > -Jiangning > > > > > > 2013/12/4 Ana Pazos <[email protected]> > > Hi Tim and reviewers, > > > > I followed Tim’s suggestion to use macros to handle the float16_t storage > type. > > > > Notes: > > 1) Since the vget/vset_lane_f16 intrinsics read and write 16 bit > data (no FP arithmetic performed), I simply reinterpreted float16_t and the > vector of float16_t as i16 data. > > See the operators defined in NeonEmitter. > > 2) With this change vget/vset_lane_f16 use to vget/vset_lane_i16 > implementation. > > 3) I added f16_to_f32 pattern because in the vset_lane case the i16 > data is moved from GPR to a FP register, and hence the need for this > pattern. > > 4) I added test cases that define float16_t variable in the function > body, but do not return such value type as it is not allowed. Let me know > if these tests are satisfactory. > > 5) I did not try to enforce the recommended intrinsic->instruction > map from the ARM document. > > To force those instructions I would have to use builtins and v1i16 type > casts so I can create the pattern. > > Even doing that, in some cases the UMOV, INS patterns defined earlier can > prevail over. > > > > Thanks, > > Ana. > > > _______________________________________________ > cfe-commits mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits > > > > > > -- > > Thanks, > > -Jiangning > -- Thanks, -Jiangning
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
