This patch adds the initial support for the 16-bit floating point formats. _Float16 is the IEEE 754 half precision format. __bfloat16 is the Google Brain 16-bit format.
In order to use both _Float16 and __bfloat16, the user has to use the -mfloat16 option to enable the support. In this patch only the machine indepndent support is used. In order to be usable, the next patch will also need to be installed. That patch will add support in libgcc for 16-bit floating point support. 2025-11-03 Michael Meissner <[email protected]> gcc/ * config/rs6000/float16.md: New file to add basic 16-bit floating point support. * config/rs6000/predicates.md (easy_fp_constant): Add support for HFmode and BFmode constants. (fp16_xxspltiw_constant): New predicate. * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add support for 16-bit floating point types. (rs6000_init_builtins): Create the bfloat16_type_node. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define __FLOAT16__ and __BFLOAT16__ if 16-bit floating pont is enabled. * config/rs6000/rs6000-call.cc (init_cumulative_args): Warn if a function returns a 16-bit floating point value unless -Wno-psabi is used. (rs6000_function_arg): Warn if a 16-bit floating point value is passed to a function unless -Wno-psabi is ued. * config/rs6000/rs6000-protos.h (vec_const_128bit_type): Add mode field to detect initializing 16-bit floating constants. * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Add support for 16-bit floating point. (rs6000_modes_tieable_p): Don't allow 16-bit floating point modes to tie with other modes. (rs6000_debug_reg_global): Add BFmode and HFmode. (rs6000_setup_reg_addr_masks): Add support for 16-bit floating point types. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Likewise. (rs6000_option_override_internal): Add a check whether -mfloat16 can be used. (easy_altivec_constant): Add suport for 16-bit floating point. (xxspltib_constant_p): Likewise. (rs6000_expand_vector_init): Likewise. (reg_offset_addressing_ok_p): Likewise. (rs6000_legitimate_offset_address_p): Likewise. (legitimate_lo_sum_address_p): Likewise. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_can_change_mode_class): Likewise. (rs6000_load_constant_and_splat): Likewise. (rs6000_scalar_mode_supported_p): Likewise. (rs6000_floatn_mode): Enable _Float16 if -mfloat16. (rs6000_opt_masks): Add -mfloat16. (constant_fp_to_128bit_vector): Add support for 16-bit floating point. (vec_const_128bit_to_bytes): Likewise. (constant_generates_xxspltiw): Likewise. * config/rs6000/rs6000.h (TARGET_BFLOAT16_HW): New macro. (TARGET_FLOAT16_HW): Likewise. (TARGET_BFLOAT16_HW_VECTOR): Likewise. (TARGET_FLOAT16_HW_VECTOR): Likewise. (FP16_SCALAR_MODE_P): Likewise. (FP16_HW_SCALAR_MODE_P): Likewise. (FP16_VECTOR_MODE_P): Likewise. * config/rs6000/rs6000.md (wd): Add BFmode and HFmode. * config/rs6000/rs6000.opt (-mloat16): New option. --- gcc/config/rs6000/float16.md | 124 +++++++++++++++++++ gcc/config/rs6000/predicates.md | 26 ++++ gcc/config/rs6000/rs6000-builtin.cc | 20 +++ gcc/config/rs6000/rs6000-c.cc | 6 + gcc/config/rs6000/rs6000-call.cc | 20 +++ gcc/config/rs6000/rs6000-protos.h | 1 + gcc/config/rs6000/rs6000.cc | 181 ++++++++++++++++++++++++---- gcc/config/rs6000/rs6000.h | 26 ++++ gcc/config/rs6000/rs6000.md | 3 + gcc/config/rs6000/rs6000.opt | 4 + 10 files changed, 388 insertions(+), 23 deletions(-) create mode 100644 gcc/config/rs6000/float16.md diff --git a/gcc/config/rs6000/float16.md b/gcc/config/rs6000/float16.md new file mode 100644 index 00000000000..fec4bb87fd0 --- /dev/null +++ b/gcc/config/rs6000/float16.md @@ -0,0 +1,124 @@ +;; Machine description for IBM RISC System 6000 (POWER) for GNU C compiler +;; Copyright (C) 1990-2025 Free Software Foundation, Inc. +;; Contributed by Richard Kenner ([email protected]) + +;; This file is part of GCC. + +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published +;; by the Free Software Foundation; either version 3, or (at your +;; option) any later version. + +;; GCC is distributed in the hope that it will be useful, but WITHOUT +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public +;; License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. + +;; Support for _Float16 (HFmode) and __bfloat16 (BFmode) + +;; Mode iterator for 16-bit floating point modes both as a scalar and +;; as a vector. +(define_mode_iterator FP16 [(BF "TARGET_FLOAT16") + (HF "TARGET_FLOAT16")]) + +;; Mode attribute giving the vector mode for a 16-bit floating point +;; scalar in both upper and lower case. +(define_mode_attr FP16_VECTOR8 [(BF "V8BF") + (HF "V8HF")]) + +(define_mode_attr fp16_vector8 [(BF "v8bf") + (HF "v8hf")]) + +;; _Float16 and __bfloat16 moves +(define_expand "mov<mode>" + [(set (match_operand:FP16 0 "nonimmediate_operand") + (match_operand:FP16 1 "any_operand"))] + "" +{ + if (MEM_P (operands[0]) && !REG_P (operands[1])) + operands[1] = force_reg (<MODE>mode, operands[1]); +}) + +;; On power10, we can load up HFmode and BFmode constants with xxspltiw +;; or pli. +(define_insn "*mov<mode>_xxspltiw" + [(set (match_operand:FP16 0 "gpc_reg_operand" "=wa,wa,?r,?r") + (match_operand:FP16 1 "fp16_xxspltiw_constant" "j,eP,j,eP"))] + "TARGET_PREFIXED || operands[1] == CONST0_RTX (<MODE>mode)" +{ + rtx op1 = operands[1]; + const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op1); + long real_words[1]; + + if (op1 == CONST0_RTX (<MODE>mode)) + return (!vsx_register_operand (operands[0], <MODE>mode) + ? "li %0,0" + : "xxlxor %x0,%x0,%x0"); + + real_to_target (real_words, rtype, <MODE>mode); + operands[2] = GEN_INT (real_words[0]); + return (vsx_register_operand (operands[0], <MODE>mode) + ? "xxspltiw %x0,%2" + : "pli %0,%2"); +} + [(set_attr "type" "veclogical, vecsimple, *, *") + (set_attr "prefixed" "no, yes, no, yes")]) + +(define_insn "*mov<mode>_internal" + [(set (match_operand:FP16 0 "nonimmediate_operand" + "=wa, wa, Z, r, r, + m, r, wa, wa, r") + + (match_operand:FP16 1 "any_operand" + "wa, Z, wa, r, m, + r, wa, r, j, j"))] + "gpc_reg_operand (operands[0], <MODE>mode) + || gpc_reg_operand (operands[1], <MODE>mode)" + "@ + xxlor %x0,%x1,%x1 + lxsihzx %x0,%y1 + stxsihx %x1,%y0 + mr %0,%1 + lhz%U1%X1 %0,%1 + sth%U0%X0 %1,%0 + mfvsrwz %0,%x1 + mtvsrwz %x0,%1 + xxlxor %x0,%x0,%x0 + li %0,0" + [(set_attr "type" "vecsimple, fpload, fpstore, *, load, + store, mtvsr, mfvsr, veclogical, *") + (set_attr "isa" "*, p9v, p9v, *, *, + *, p8v, p8v, p9v, *")]) + +;; Vector duplicate +(define_insn "*vecdup<mode>_reg" + [(set (match_operand:<FP16_VECTOR8> 0 "altivec_register_operand" "=v") + (vec_duplicate:<FP16_VECTOR8> + (match_operand:FP16 1 "altivec_register_operand" "v")))] + "" + "vsplth %0,%1,3" + [(set_attr "type" "vecperm")]) + +(define_insn "*vecdup<mode>_const" + [(set (match_operand:<FP16_VECTOR8> 0 "vsx_register_operand" "=wa,wa") + (vec_duplicate:<FP16_VECTOR8> + (match_operand:FP16 1 "fp16_xxspltiw_constant" "j,eP")))] + "TARGET_PREFIXED || operands[1] == CONST0_RTX (<MODE>mode)" +{ + rtx op1 = operands[1]; + if (op1 == CONST0_RTX (<MODE>mode)) + return "xxlxor %x0,%x0,%x0"; + + const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op1); + long real_words[1]; + + real_to_target (real_words, rtype, <MODE>mode); + operands[2] = GEN_INT (real_words[0]); + return "xxspltiw %x0,2"; +} + [(set_attr "type" "veclogical,vecperm") + (set_attr "prefixed" "*,yes")]) diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 647e89afb6a..e9ddc61e3a8 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant" if (TARGET_VSX && op == CONST0_RTX (mode)) return 1; + /* Power9 needs to load HFmode constants from memory, Power10 can use + XXSPLTIW. */ + if (mode == HFmode && !TARGET_POWER10) + return 0; + /* Constants that can be generated with ISA 3.1 instructions are easy. */ vec_const_128bit_type vsx_const; if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const)) @@ -2166,3 +2171,24 @@ (define_predicate "lowpart_subreg_operator" (and (match_code "subreg") (match_test "subreg_lowpart_offset (mode, GET_MODE (SUBREG_REG (op))) == SUBREG_BYTE (op)"))) + +;; Return 1 if this is a 16-bit floating point constant that can be +;; loaded with XXSPLTIW or is 0.0 that can be loaded with XXSPLTIB. +(define_predicate "fp16_xxspltiw_constant" + (match_code "const_double") +{ + if (!FP16_SCALAR_MODE_P (mode)) + return false; + + if (op == CONST0_RTX (mode)) + return true; + + if (!TARGET_PREFIXED) + return false; + + vec_const_128bit_type vsx_const; + if (!vec_const_128bit_to_bytes (op, mode, &vsx_const)) + return false; + + return constant_generates_xxspltiw (&vsx_const); +}) diff --git a/gcc/config/rs6000/rs6000-builtin.cc b/gcc/config/rs6000/rs6000-builtin.cc index dfbb7d02157..cdd41f2d6cc 100644 --- a/gcc/config/rs6000/rs6000-builtin.cc +++ b/gcc/config/rs6000/rs6000-builtin.cc @@ -491,6 +491,10 @@ const char *rs6000_type_string (tree type_node) return "voidc*"; else if (type_node == float128_type_node) return "_Float128"; + else if (type_node == float16_type_node) + return "_Float16"; + else if (TARGET_FLOAT16 && type_node == bfloat16_type_node) + return "__bfloat16"; else if (type_node == vector_pair_type_node) return "__vector_pair"; else if (type_node == vector_quad_type_node) @@ -756,6 +760,22 @@ rs6000_init_builtins (void) else ieee128_float_type_node = NULL_TREE; + /* __bfloat16 support. */ + if (TARGET_FLOAT16) + { + if (!bfloat16_type_node) + { + bfloat16_type_node = make_node (REAL_TYPE); + TYPE_PRECISION (bfloat16_type_node) = 16; + SET_TYPE_MODE (bfloat16_type_node, BFmode); + layout_type (bfloat16_type_node); + t = build_qualified_type (bfloat16_type_node, TYPE_QUAL_CONST); + } + + lang_hooks.types.register_builtin_type (bfloat16_type_node, + "__bfloat16"); + } + /* Vector pair and vector quad support. */ vector_pair_type_node = make_node (OPAQUE_TYPE); SET_TYPE_MODE (vector_pair_type_node, OOmode); diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index 70e6d4b1e6d..b3ace1166f4 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -586,6 +586,12 @@ rs6000_target_modify_macros (bool define_p, if ((flags & OPTION_MASK_FLOAT128_HW) != 0) rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__"); + /* 16-bit floating point support. */ + if ((flags & OPTION_MASK_FLOAT16) != 0) + { + rs6000_define_or_undefine_macro (define_p, "__FLOAT16__"); + rs6000_define_or_undefine_macro (define_p, "__BFLOAT16__"); + } /* Tell the user if we are targeting CELL. */ if (rs6000_cpu == PROCESSOR_CELL) rs6000_define_or_undefine_macro (define_p, "__PPU__"); diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc index 8fe5652442e..41c0d4f7159 100644 --- a/gcc/config/rs6000/rs6000-call.cc +++ b/gcc/config/rs6000/rs6000-call.cc @@ -684,6 +684,18 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype, " altivec instructions are disabled, use %qs" " to enable them", "-maltivec"); } + + /* Warn that __bfloat16 and _Float16 might be returned differently in the + future. The issue is currently 16-bit floating point is returned in + floating point register #1 in 16-bit format. We may or may not want to + return it as a scalar 64-bit value. */ + if (fntype && warn_psabi && !cum->libcall) + { + machine_mode ret_mode = TYPE_MODE (TREE_TYPE (fntype)); + if (ret_mode == BFmode || ret_mode == HFmode) + warning (OPT_Wpsabi, "%s might be returned differently in the future", + ret_mode == BFmode ? "__bfloat16" : "_Float16"); + } } @@ -1641,6 +1653,14 @@ rs6000_function_arg (cumulative_args_t cum_v, const function_arg_info &arg) return NULL_RTX; } + /* Warn that _Float16 and __bfloat16 might be passed differently in the + future. The issue is currently 16-bit floating point values are passed in + floating point registers in the native 16-bit format. We may or may not + want to pass the value it as a scalar 64-bit value. */ + if (warn_psabi && !cum->libcall && (mode == BFmode || mode == HFmode)) + warning (OPT_Wpsabi, "%s might be passed differently in the future", + mode == BFmode ? "__bfloat16" : "_Float16"); + /* Return a marker to indicate whether CR1 needs to set or clear the bit that V.4 uses to say fp args were passed in registers. Assume that we don't need the marker for software floating point, diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 4619142d197..9bf971370d4 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -250,6 +250,7 @@ typedef struct { bool all_words_same; /* Are the words all equal? */ bool all_half_words_same; /* Are the half words all equal? */ bool all_bytes_same; /* Are the bytes all equal? */ + machine_mode mode; /* Original constant mode. */ } vec_const_128bit_type; extern bool vec_const_128bit_to_bytes (rtx, machine_mode, diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 44ab0d5b4ca..93639c3cbc7 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -1896,7 +1896,8 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode) if (ALTIVEC_REGNO_P (regno)) { - if (GET_MODE_SIZE (mode) < 16 && !reg_addr[mode].scalar_in_vmx_p) + if (GET_MODE_SIZE (mode) < 16 && !reg_addr[mode].scalar_in_vmx_p + && !FP16_SCALAR_MODE_P (mode)) return 0; return ALTIVEC_REGNO_P (last_regno); @@ -1986,7 +1987,8 @@ static bool rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2) { if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode - || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode) + || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode + || FP16_SCALAR_MODE_P (mode1) || FP16_SCALAR_MODE_P (mode2)) return mode1 == mode2; if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1)) @@ -2252,6 +2254,8 @@ rs6000_debug_reg_global (void) DImode, TImode, PTImode, + BFmode, + HFmode, SFmode, DFmode, TFmode, @@ -2632,8 +2636,14 @@ rs6000_setup_reg_addr_masks (void) /* SDmode is special in that we want to access it only via REG+REG addressing on power7 and above, since we want to use the LFIWZX and - STFIWZX instructions to load it. */ - bool indexed_only_p = (m == SDmode && TARGET_NO_SDMODE_STACK); + STFIWZX instructions to load it. + + Never allow offset addressing for 16-bit floating point modes, since + it is expected that 16-bit floating point should always go into the + vector registers and we only have indexed and indirect 16-bit loads to + VSR registers. */ + bool indexed_only_p = ((m == SDmode && TARGET_NO_SDMODE_STACK) + || FP16_SCALAR_MODE_P (m)); any_addr_mask = 0; for (rc = FIRST_RELOAD_REG_CLASS; rc <= LAST_RELOAD_REG_CLASS; rc++) @@ -2682,6 +2692,7 @@ rs6000_setup_reg_addr_masks (void) && !complex_p && (m != E_DFmode || !TARGET_VSX) && (m != E_SFmode || !TARGET_P8_VECTOR) + && !FP16_SCALAR_MODE_P (m) && !small_int_vsx_p) { addr_mask |= RELOAD_REG_PRE_INCDEC; @@ -2935,6 +2946,15 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) rs6000_vector_align[V1TImode] = 128; } + /* _Float16 support. */ + if (TARGET_FLOAT16) + { + rs6000_vector_mem[HFmode] = VECTOR_VSX; + rs6000_vector_mem[BFmode] = VECTOR_VSX; + rs6000_vector_align[HFmode] = 16; + rs6000_vector_align[BFmode] = 16; + } + /* DFmode, see if we want to use the VSX unit. Memory is handled differently, so don't set rs6000_vector_mem. */ if (TARGET_VSX) @@ -3049,6 +3069,14 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) reg_addr[TFmode].reload_load = CODE_FOR_reload_tf_di_load; } + if (TARGET_FLOAT16) + { + reg_addr[HFmode].reload_store = CODE_FOR_reload_hf_di_store; + reg_addr[BFmode].reload_store = CODE_FOR_reload_bf_di_store; + reg_addr[HFmode].reload_load = CODE_FOR_reload_hf_di_load; + reg_addr[BFmode].reload_load = CODE_FOR_reload_bf_di_load; + } + /* Only provide a reload handler for SDmode if lfiwzx/stfiwx are available. */ if (TARGET_NO_SDMODE_STACK) @@ -3149,6 +3177,14 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p) reg_addr[TFmode].reload_load = CODE_FOR_reload_tf_si_load; } + if (TARGET_FLOAT16) + { + reg_addr[HFmode].reload_store = CODE_FOR_reload_hf_si_store; + reg_addr[BFmode].reload_store = CODE_FOR_reload_bf_si_store; + reg_addr[HFmode].reload_load = CODE_FOR_reload_hf_si_load; + reg_addr[BFmode].reload_load = CODE_FOR_reload_bf_si_load; + } + /* Only provide a reload handler for SDmode if lfiwzx/stfiwx are available. */ if (TARGET_NO_SDMODE_STACK) @@ -3888,6 +3924,16 @@ rs6000_option_override_internal (bool global_init_p) } } + /* -mfloat16 needs power8 at a minimum in order to load up 16-bit values into + vector registers via loads/stores from GPRs and then using direct + moves. */ + if (TARGET_FLOAT16 && !TARGET_POWER8) + { + rs6000_isa_flags &= ~OPTION_MASK_FLOAT16; + if (rs6000_isa_flags_explicit & OPTION_MASK_FLOAT16) + error ("%qs requires at least %qs", "-mfloat16", "-mcpu=power8"); + } + /* If hard-float/altivec/vsx were explicitly turned off then don't allow the -mcpu setting to enable options that conflict. */ if ((!TARGET_HARD_FLOAT || !TARGET_ALTIVEC || !TARGET_VSX) @@ -6451,9 +6497,12 @@ easy_altivec_constant (rtx op, machine_mode mode) else if (mode != GET_MODE (op)) return 0; - /* V2DI/V2DF was added with VSX. Only allow 0 and all 1's as easy - constants. */ - if (mode == V2DFmode) + /* V2DI/V2DF was added with VSX. Only allow 0 and all 1's as easy constants. + Likewise, don't handle 16-bit floating point constants here, unless they + are 0.0. */ + if (mode == V2DFmode + || FP16_SCALAR_MODE_P (mode) + || FP16_VECTOR_MODE_P (mode)) return zero_constant (op, mode) ? 8 : 0; else if (mode == V2DImode) @@ -6579,6 +6628,12 @@ xxspltib_constant_p (rtx op, /* Handle (vec_duplicate <constant>). */ if (GET_CODE (op) == VEC_DUPLICATE) { + element = XEXP (op, 0); + + /* For V8BFmode & V8HFmode, the only valid to use xxspltib is 0.0. */ + if (mode == V8BFmode || mode == V8HFmode) + return element == CONST0_RTX (GET_MODE_INNER (mode)); + if (mode != V16QImode && mode != V8HImode && mode != V4SImode && mode != V2DImode) return false; @@ -6595,6 +6650,20 @@ xxspltib_constant_p (rtx op, /* Handle (const_vector [...]). */ else if (GET_CODE (op) == CONST_VECTOR) { + /* For V8BFmode & V8HFmode, the only valid to use xxspltib is 0.0. */ + if (mode == V8BFmode || mode == V8HFmode) + { + if (op == CONST0_RTX (mode)) + return true; + + rtx zero = CONST0_RTX (GET_MODE_INNER (mode)); + for (i = 0; i < nunits; i++) + if (CONST_VECTOR_ELT (op, i) != zero) + return false; + + return true; + } + if (mode != V16QImode && mode != V8HImode && mode != V4SImode && mode != V2DImode) return false; @@ -7041,6 +7110,15 @@ rs6000_expand_vector_init (rtx target, rtx vals) return; } + /* Special case splats of 16-bit floating point. */ + if (all_same && FP16_VECTOR_MODE_P (mode)) + { + rtx op0 = force_reg (GET_MODE_INNER (mode), XVECEXP (vals, 0, 0)); + rtx dup = gen_rtx_VEC_DUPLICATE (mode, op0); + emit_insn (gen_rtx_SET (target, dup)); + return; + } + /* Special case initializing vector short/char that are splats if we are on 64-bit systems with direct move. */ if (all_same && TARGET_DIRECT_MOVE_64BIT @@ -8701,6 +8779,13 @@ reg_offset_addressing_ok_p (machine_mode mode) return mode_supports_dq_form (mode); break; + /* For 16-bit floating point types, do not allow offset addressing, since + it is assumed that most of the use will be in vector registers, and we + only have reg+reg addressing for 16-bit modes. */ + case E_BFmode: + case E_HFmode: + return false; + /* The vector pair/quad types support offset addressing if the underlying vectors support offset addressing. */ case E_OOmode: @@ -8991,6 +9076,13 @@ rs6000_legitimate_offset_address_p (machine_mode mode, rtx x, extra = 0; switch (mode) { + /* For 16-bit floating point types, do not allow offset addressing, since + it is assumed that most of the use will be in vector registers, and we + only have reg+reg addressing for 16-bit modes. */ + case E_BFmode: + case E_HFmode: + return false; + case E_DFmode: case E_DDmode: case E_DImode: @@ -9092,6 +9184,11 @@ macho_lo_sum_memory_operand (rtx x, machine_mode mode) static bool legitimate_lo_sum_address_p (machine_mode mode, rtx x, int strict) { + /* For 16-bit floating point types, do not allow offset addressing, since + it is assumed that most of the use will be in vector registers, and we + only have reg+reg addressing for 16-bit modes. */ + if (FP16_SCALAR_MODE_P (mode)) + return false; if (GET_CODE (x) != LO_SUM) return false; if (!REG_P (XEXP (x, 0))) @@ -12688,6 +12785,9 @@ rs6000_secondary_reload_simple_move (enum rs6000_reg_type to_type, && ((to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE) || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE))) { + if (FP16_SCALAR_MODE_P (mode)) + return true; + if (TARGET_POWERPC64) { /* ISA 2.07: MTVSRD or MVFVSRD. */ @@ -13475,6 +13575,11 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass) || mode_supports_dq_form (mode)) return rclass; + /* IEEE 16-bit and bfloat16 don't support offset addressing, but they can + go in any floating point/vector register. */ + if (FP16_SCALAR_MODE_P (mode)) + return rclass; + /* If this is a scalar floating point value and we don't have D-form addressing, prefer the traditional floating point registers so that we can use D-form (register+offset) addressing. */ @@ -13704,6 +13809,9 @@ rs6000_can_change_mode_class (machine_mode from, unsigned from_size = GET_MODE_SIZE (from); unsigned to_size = GET_MODE_SIZE (to); + if (FP16_SCALAR_MODE_P (from) || FP16_SCALAR_MODE_P (to)) + return from_size == to_size; + if (from_size != to_size) { enum reg_class xclass = (TARGET_VSX) ? VSX_REGS : FLOAT_REGS; @@ -22990,7 +23098,7 @@ rs6000_load_constant_and_splat (machine_mode mode, REAL_VALUE_TYPE dconst) { rtx reg; - if (mode == SFmode || mode == DFmode) + if (mode == SFmode || mode == DFmode || FP16_SCALAR_MODE_P (mode)) { rtx d = const_double_from_real_value (dconst, mode); reg = force_reg (mode, d); @@ -24317,6 +24425,8 @@ rs6000_scalar_mode_supported_p (scalar_mode mode) return default_decimal_float_supported_p (); else if (TARGET_FLOAT128_TYPE && (mode == KFmode || mode == IFmode)) return true; + else if (FP16_SCALAR_MODE_P (mode)) + return true; else return default_scalar_mode_supported_p (mode); } @@ -24368,6 +24478,9 @@ rs6000_floatn_mode (int n, bool extended) { switch (n) { + case 16: + return TARGET_FLOAT16 ? SFmode : opt_scalar_float_mode (); + case 32: return DFmode; @@ -24389,6 +24502,9 @@ rs6000_floatn_mode (int n, bool extended) { switch (n) { + case 16: + return TARGET_FLOAT16 ? HFmode : opt_scalar_float_mode (); + case 32: return SFmode; @@ -24508,6 +24624,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] = { "fprnd", OPTION_MASK_FPRND, false, true }, { "hard-dfp", OPTION_MASK_DFP, false, true }, { "htm", OPTION_MASK_HTM, false, true }, + { "float16", OPTION_MASK_FLOAT16, false, true }, { "isel", OPTION_MASK_ISEL, false, true }, { "mfcrf", OPTION_MASK_MFCRF, false, true }, { "mfpgpr", 0, false, true }, @@ -28986,24 +29103,37 @@ constant_fp_to_128bit_vector (rtx op, const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op); long real_words[VECTOR_128BIT_WORDS]; - /* Make sure we don't overflow the real_words array and that it is - filled completely. */ - gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0); - - real_to_target (real_words, rtype, mode); + /* For 16-bit floating point, the constant doesn't fill the whole 32-bit + word. Deal with it here, storing the bytes in big endian fashion. */ + if (FP16_SCALAR_MODE_P (mode)) + { + real_to_target (real_words, rtype, mode); + info->bytes[byte_num] = (unsigned char) (real_words[0] >> 8); + info->bytes[byte_num+1] = (unsigned char) (real_words[0]); + } - /* Iterate over each 32-bit word in the floating point constant. The - real_to_target function puts out words in target endian fashion. We need - to arrange the order so that the bytes are written in big endian order. */ - for (unsigned num = 0; num < num_words; num++) + else { - unsigned endian_num = (BYTES_BIG_ENDIAN - ? num - : num_words - 1 - num); + /* Make sure we don't overflow the real_words array and that it is filled + completely. */ + gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0); - unsigned uvalue = real_words[endian_num]; - for (int shift = 32 - 8; shift >= 0; shift -= 8) - info->bytes[byte_num++] = (uvalue >> shift) & 0xff; + real_to_target (real_words, rtype, mode); + + /* Iterate over each 32-bit word in the floating point constant. The + real_to_target function puts out words in target endian fashion. We + need to arrange the order so that the bytes are written in big endian + order. */ + for (unsigned num = 0; num < num_words; num++) + { + unsigned endian_num = (BYTES_BIG_ENDIAN + ? num + : num_words - 1 - num); + + unsigned uvalue = real_words[endian_num]; + for (int shift = 32 - 8; shift >= 0; shift -= 8) + info->bytes[byte_num++] = (uvalue >> shift) & 0xff; + } } /* Mark that this constant involves floating point. */ @@ -29042,6 +29172,7 @@ vec_const_128bit_to_bytes (rtx op, return false; /* Set up the bits. */ + info->mode = mode; switch (GET_CODE (op)) { /* Integer constants, default to double word. */ @@ -29269,6 +29400,10 @@ constant_generates_xxspltiw (vec_const_128bit_type *vsx_const) if (!TARGET_SPLAT_WORD_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX) return 0; + /* HFmode/BFmode constants can always use XXSPLTIW. */ + if (FP16_SCALAR_MODE_P (vsx_const->mode)) + return 1; + if (!vsx_const->all_words_same) return 0; diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h index c31fd1a7e0f..d3a4c08438d 100644 --- a/gcc/config/rs6000/rs6000.h +++ b/gcc/config/rs6000/rs6000.h @@ -343,6 +343,32 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); || ((MODE) == TDmode) \ || (!TARGET_FLOAT128_TYPE && FLOAT128_IEEE_P (MODE))) +/* Do we have conversion support in hardware for the 16-bit floating point? */ +#define TARGET_BFLOAT16_HW (TARGET_FLOAT16 && TARGET_POWER10) +#define TARGET_FLOAT16_HW (TARGET_FLOAT16 && TARGET_POWER9) + +/* Do we have conversion support in hardware for the 16-bit floating point and + also enable the 16-bit floating point vector optimizations? */ +#define TARGET_BFLOAT16_HW_VECTOR \ + (TARGET_FLOAT16 && TARGET_POWER10 && TARGET_BFLOAT16_VECTOR) + +#define TARGET_FLOAT16_HW_VECTOR \ + (TARGET_FLOAT16 && TARGET_POWER9 && TARGET_FLOAT16_VECTOR) + +/* Is this a valid 16-bit scalar floating point mode? */ +#define FP16_SCALAR_MODE_P(MODE) \ + (TARGET_FLOAT16 && ((MODE) == HFmode || (MODE) == BFmode)) + +/* Is this a valid 16-bit scalar floating point mode that has hardware + conversions? */ +#define FP16_HW_SCALAR_MODE_P(MODE) \ + (((MODE) == HFmode && TARGET_FLOAT16_HW) \ + || ((MODE) == BFmode && TARGET_BFLOAT16_HW)) + +/* Is this a valid 16-bit scalar floating point mode? */ +#define FP16_VECTOR_MODE_P(MODE) \ + (TARGET_FLOAT16 && ((MODE) == V8HFmode || (MODE) == V8BFmode)) + /* Return true for floating point that does not use a vector register. */ #define SCALAR_FLOAT_MODE_NOT_VECTOR_P(MODE) \ (SCALAR_FLOAT_MODE_P (MODE) && !FLOAT128_VECTOR_P (MODE)) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index f6cb7d7f481..bc5baf1bdbe 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -714,6 +714,8 @@ (define_code_attr uns [(fix "") ; A generic w/d attribute, for things like cmpw/cmpd. (define_mode_attr wd [(QI "b") (HI "h") + (BF "h") + (HF "h") (SI "w") (DI "d") (V16QI "b") @@ -15891,3 +15893,4 @@ (define_insn "hashchk" (include "htm.md") (include "fusion.md") (include "pcrel-opt.md") +(include "float16.md") diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 31852e02aa0..3da1219f8ea 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -638,6 +638,10 @@ mieee128-constant Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save Generate (do not generate) code that uses the LXVKQ instruction. +mfloat16 +Target Mask(FLOAT16) Var(rs6000_isa_flags) +Enable or disable 16-bit floating point. + ; Documented parameters -param=rs6000-vect-unroll-limit= -- 2.51.1 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: [email protected]
