On Mon, Jul 4, 2016 at 3:09 PM, Matthew Wahab <matthew.wa...@foss.arm.com> wrote: > On 18/05/16 01:58, Joseph Myers wrote: >> On Tue, 17 May 2016, Matthew Wahab wrote: >> >>> As with the VFP FP16 arithmetic instructions, operations on __fp16 >>> values are done by conversion to single-precision. Any new optimization >>> supported by the instruction descriptions can only apply to code >>> generated using intrinsics added in this patch series. >> >> As with the scalar instructions, I think it is legitimate in most cases to >> optimize arithmetic via single precision to work direct on __fp16 values >> (and this would be natural for vectorization of __fp16 arithmetic). >> >>> A number of the instructions are modelled as two variants, one using >>> UNSPEC and the other using RTL operations, with the model used decided >>> by the funsafe-math-optimizations flag. This follows the >>> single-precision instructions and is due to the half-precision >>> operations having the same conditions and restrictions on their use in >>> optmizations (when they are enabled). >> >> (Of course, these restrictions still apply.) > > The F16 support generally follows the F32 implementation and, for F32, > direct arithmetic vector operations are only available when > unsafe-math-optimizations is enabled. I want to check the behaviour of > the F16 operations when unsafe-math is enabled so I'll defer to a follow > up patch the change to use standard names for the vector operations. > > There are still some changes from the previous patch: > > - Two fma/fmsub patterns *fma<VH:mode>4 and <*fmsub<VH:mode>4 are > dropped since they just duplicated *fma<VH:mode>4_intrinsic and > <*fmsub<VH:mode>4_intrinsic. > > - Patterns neon_vadd<mode>_unspec and neon_vsub<mode>_unspec are > dropped, they were redundant. > > - <absneg_str><mode>2_fp16 is renamed to <absneg_str><mode>2. This > implements the abs and neg operations which are always safe to use. > > - neon_vsqrte<mode> is renamed to neon_vrsqrte<mode>. This is a > misspelled intrinsic that wasn't caught in testing because the > relevant test case is missing. The intrinsic is fixed here and in > other patches and an advsimd-intrinsics test added later in the > (updated) series. > > - neon_vcvt<sup>_n<mode: The bounds on the scalar were wrong, the > correct range for f16 is 0-17. > > - Test armv8_2-fp16-arith-1.c is updated to expect f16 arithmetic > instructions rather then f32 and to use the neon command line options. > > Tested the series for arm-none-linux-gnueabihf with native bootstrap and > make check and for arm-none-eabi and armeb-none-eabi with make check on > an ARMv8.2-A emulator. > > Ok for trunk?
OK. Ramana > Matthew > > 2016-07-04 Matthew Wahab <matthew.wa...@arm.com> > > * config/arm/iterators.md (VCVTHI): New. > (NEON_VCMP): Add UNSPEC_VCLT and UNSPEC_VCLE. Fix a long line. > (NEON_VAGLTE): New. > (VFM_LANE_AS): New. > (VH_CVTTO): New. > (V_reg): Add HF, V4HF and V8HF. Fix white-space. > (V_HALF): Add V4HF. Fix white-space. > (V_if_elem): Add HF, V4HF and V8HF. Fix white-space. > (V_s_elem): Likewise. > (V_sz_elem): Fix white-space. > (V_elem_ch): Likewise. > (VH_elem_ch): New. > (scalar_mul_constraint): Add V8HF and V4HF. > (Is_float_mode): Fix white-space. > (Is_d_reg): Fix white-space. > (q): Add HF. Fix white-space. > (float_sup): New. > (float_SUP): New. > (cmp_op_unsp): Add UNSPEC_VCALE and UNSPEC_VCALT. > (neon_vfm_lane_as): New. > * config/arm/neon.md (add<mode>3_fp16): New. > (sub<mode>3_fp16): New. > (mul<mode>3add<mode>_neon): New. > (fma<VH:mode>4_intrinsic): New. > (fmsub<VCVTF:mode>4_intrinsic): Fix white-space. > (fmsub<VH:mode>4_intrinsic): New. > (<absneg_str><mode>2): New. > (neon_v<absneg_str><mode>): New. > (neon_v<fp16_rnd_str><mode>): New. > (neon_vrsqrte<mode>): New. > (neon_vpaddv4hf): New. > (neon_vadd<mode>): New. > (neon_vsub<mode>): New. > (neon_vmulf<mode>): New. > (neon_vfma<VH:mode>): New. > (neon_vfms<VH:mode>): New. > (neon_vc<cmp_op><mode>): New. > (neon_vc<cmp_op><mode>_fp16insn): New > (neon_vc<cmp_op_unsp><mode>_fp16insn_unspec): New. > (neon_vca<cmp_op><mode>): New. > (neon_vca<cmp_op><mode>_fp16insn): New. > (neon_vca<cmp_op_unsp><mode>_fp16insn_unspec): New. > (neon_vc<cmp_op>z<mode>): New. > (neon_vabd<mode>): New. > (neon_v<maxmin>f<mode>): New. > (neon_vp<maxmin>fv4hf: New. > (neon_<fmaxmin_op><mode>): New. > (neon_vrecps<mode>): New. > (neon_vrsqrts<mode>): New. > (neon_vrecpe<mode>): New (VH variant). > (neon_vdup_lane<mode>_internal): New. > (neon_vdup_lane<mode>): New. > (neon_vcvt<sup><mode>): New (VCVTHI variant). > (neon_vcvt<sup><mode>): New (VH variant). > (neon_vcvt<sup>_n<mode>): New (VH variant). > (neon_vcvt<sup>_n<mode>): New (VCVTHI variant). > (neon_vcvt<vcvth_op><sup><mode>): New. > (neon_vmul_lane<mode>): New. > (neon_vmul_n<mode>): New. > * config/arm/unspecs.md (UNSPEC_VCALE): New > (UNSPEC_VCALT): New. > (UNSPEC_VFMA_LANE): New. > (UNSPECS_VFMS_LANE): New. > > testsuite/ > 2016-07-04 Matthew Wahab <matthew.wa...@arm.com> > > * gcc.target/arm/armv8_2-fp16-arith-1.c: Use arm_v8_2a_fp16_neon > options. Add tests for float16x4_t and float16x8_t. >