On 22 January 2016 at 18:06, Alan Lawrence <alan.lawre...@foss.arm.com> wrote: > On 20/01/16 21:10, Christophe Lyon wrote: >> >> On 19 January 2016 at 15:51, Alan Lawrence <alan.lawre...@foss.arm.com> >> wrote: >>> >>> On 19/01/16 11:15, Christophe Lyon wrote: >>> >>>>>> For neon_vdupn, I chose to implement neon_vdup_nv4hf and >>>>>> neon_vdup_nv8hf instead of updating the VX iterator because I thought >>>>>> it was not desirable to impact neon_vrev32<mode>. >>>>> >>>>> >>>>> >>>>> Well, the same instruction will suffice for vrev32'ing vectors of HF >>>>> just >>>>> as >>>>> well as vectors of HI, so I think I'd argue that's harmless enough. To >>>>> gain the >>>>> benefit, we'd need to update arm_evpc_neon_vrev with a few new cases, >>>>> though. >>>>> >>>> Since this is more intrusive, I'd rather leave that part for later. OK? >>> >>> >>> >>> Sure. >>> >>>>>> +#ifdef __ARM_BIG_ENDIAN >>>>>> + /* Here, 3 is (4-1) where 4 is the number of lanes. This is also >>>>>> the >>>>>> + right value for vectors with 8 lanes. */ >>>>>> +#define __arm_lane(__vec, __idx) (__idx ^ 3) >>>>>> +#else >>>>>> +#define __arm_lane(__vec, __idx) __idx >>>>>> +#endif >>>>>> + >>>>> >>>>> >>>>> >>>>> Looks right, but sounds... my concern here is that I'm hoping at some >>>>> point we >>>>> will move the *other* vget/set_lane intrinsics to use GCC vector >>>>> extensions >>>>> too. At which time (unlike __aarch64_lane which can be used everywhere) >>>>> this >>>>> will be the wrong formula. Can we name (and/or comment) it to avoid >>>>> misleading >>>>> anyone? The key characteristic seems to be that it is for vectors of >>>>> 16-bit >>>>> elements only. >>>>> >>>> I'm not to follow, here. Looking at the patterns for >>>> neon_vget_lane<mode>_*internal in neon.md, >>>> I can see 2 flavours: one for VD, one for VQ2. The latter uses >>>> "halfelts". >>>> >>>> Do you prefer that I create 2 macros (say __arm_lane and __arm_laneq), >>>> that would be similar to the aarch64 ones (by computing the number of >>>> lanes of the input vector), but the "q" one would use half the total >>>> number of lanes instead? >>> >>> >>> >>> That works for me! Sthg like: >>> >>> #define __arm_lane(__vec, __idx) NUM_LANES(__vec) - __idx >>> #define __arm_laneq(__vec, __idx) (__idx & (NUM_LANES(__vec)/2)) + >>> (NUM_LANES(__vec)/2 - __idx) >>> //or similarly >>> #define __arm_laneq(__vec, __idx) (__idx ^ (NUM_LANES(__vec)/2 - 1)) >>> >>> Alternatively I'd been thinking >>> >>> #define __arm_lane_32xN(__idx) __idx ^ 1 >>> #define __arm_lane_16xN(__idx) __idx ^ 3 >>> #define __arm_lane_8xN(__idx) __idx ^ 7 >>> >>> Bear in mind PR64893 that we had on AArch64 :-( >>> >> >> Here is a new version, based on the comments above. >> I've also removed the addition of arm_fp_ok effective target since I >> added that in my other testsuite patch. >> >> OK now? > > > Yes. I realize my worry about PR64893 doesn't apply here since we pass > constant lane numbers / vector bounds into __builtin_arm_lane_check. Thanks! >
Thanks, I guess I still have to wait for Kyrill/Ramana 's approval. > --Alan > >> >> Thanks, >> >> Christophe >> >>> Cheers, Alan > >