[PATCH 2/10]: Add initial 16-bit floating point support

Michael Meissner Wed, 01 Jul 2026 15:19:08 -0700

This patch adds the initial support for the 16-bit floating point formats.
_Float16 is the IEEE 754 half precision format.  __bfloat16 is the Google Brain
16-bit format.


In order to use both _Float16 and __bfloat16, the user has to use the -mfloat16
option to enable the support.

This patch also contains configuration options that determines if 16-bit
floating point is supported by default in the compiler.  In previous versions of
the patches, this patch for the configuration options was a separate patch later
in the series.

In this patch only the machine indepndent support is used.  In order to be
usable, the next patch will also need to be installed. That patch will add
support in libgcc for 16-bit floating point support.

I have committed all of the patches in my backlog (dense math registers, other
-mcpu=future instructions, random bug fixes, support for _Float16 and
__bfloat16, and optimizations for vector logical operations on power10/power11)
into the IBM vendor branch:

        vendors/ibm/gcc-17-future

2026-07-01  Michael Meissner  <[email protected]>

gcc/

        * config.gcc (powerpc*-*-*): Add support for the configuration option
        --with-powerpc-float16 and --with-powerpc-float16-disable-warning.
        * config/rs6000/rs6000-cpus.def (TARGET_16BIT_FLOATING_POINT): Likewise.
        (ISA_2_7_MASKS_SERVER): Likewise.
        (POWERPC_MASKS): Add -mfloat16.
        * config/rs6000/constraints.md (eZ): New constraint for -0.0.
        * config/rs6000/float16.md: New file to add basic 16-bit floating point
        support.
        * config/rs6000/predicates.md (easy_fp_constant): Add support for HFmode
        and BFmode constants.
        (easy_vector_constant): Add support for V8HFmode and V8BFmode to load up
        the vector -0.0 constant.
        (minus_zero_constant): New predicate.
        (fp16_xxspltiw_constant): Likewise.
        * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add support for
        16-bit floating point types.
        (rs6000_init_builtins): Create the bfloat16_type_node if needed.
        * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
        __FLOAT16__ and __BFLOAT16__ if 16-bit floating pont is enabled.
        * config/rs6000/rs6000-call.cc (init_cumulative_args): Warn if a
        function returns a 16-bit floating point value unless -Wno-psabi is
        used or if this warning is disabled via the configuration option
        --with-powerpc-float16-disable-warning.
        (rs6000_function_arg): Warn if a 16-bit floating point value is passed
        to a function unless -Wno-psabi is ued or if this warning is disabled
        via the configuration option --with-powerpc-float16-disable-warning.
        * config/rs6000/rs6000-protos.h (vec_const_128bit_type): Add mode field
        to detect initializing 16-bit floating constants.
        * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Add
        support for 16-bit floating point.
        (rs6000_modes_tieable_p): Don't allow 16-bit floating point modes to tie
        with other modes.
        (rs6000_debug_reg_global): Add BFmode and HFmode.
        (rs6000_setup_reg_addr_masks): Add support for 16-bit floating point
        types.
        (rs6000_setup_reg_addr_masks): Likewise.
        (rs6000_init_hard_regno_mode_ok): Likewise.
        (rs6000_option_override_internal): Add a check whether -mfloat16 can be
        used.
        (easy_altivec_constant): Add suport for 16-bit floating point.
        (xxspltib_constant_p): Likewise.
        (rs6000_expand_vector_init): Likewise.
        (rs6000_expand_vector_set): Likewise.
        (rs6000_expand_vector_extract): Likewise.
        (rs6000_split_vec_extract_var): Likewise.
        (reg_offset_addressing_ok_p): Likewise.
        (rs6000_legitimate_offset_address_p): Likewise.
        (legitimate_lo_sum_address_p): Likewise.
        (rs6000_secondary_reload_simple_move): Likewise.
        (rs6000_preferred_reload_class): Likewise.
        (rs6000_can_change_mode_class): Likewise.
        (rs6000_output_move_128bit): Likewise.
        (rs6000_load_constant_and_splat): Likewise.
        (rs6000_scalar_mode_supported_p): Likewise.
        (rs6000_libgcc_floating_mode_supported_p): Return true for HFmode and
        BFmode if -mfloat16.
        (rs6000_floatn_mode): Enable _Float16 if -mfloat16.
        (rs6000_opt_masks): Add -mfloat16.
        (constant_fp_to_128bit_vector): Add support for 16-bit floating point.
        (vec_const_128bit_to_bytes): Likewise.
        (constant_generates_xxspltiw): Likewise.
        * config/rs6000/rs6000.h (FP16_SCALAR_MODE_P): Ne macro.
        (FP16_VECTOR_MODE_P): Likewise.
        (TARGET_BFLOAT16_HW): New macro.
        (TARGET_FLOAT16_HW): Likewise.
        (TARGET_BFLOAT16_HW_VECTOR): Likewise.
        (TARGET_FLOAT16_HW_VECTOR): Likewise.
        * config/rs6000/rs6000.md (wd): Add BFmode and HFmode.
        (toplevel): Include float16.md.
        * config/rs6000/rs6000.opt (-mloat16): New option.
        * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mfloat16.

---
 gcc/config.gcc                      |  18 +++
 gcc/config/rs6000/constraints.md    |   5 +
 gcc/config/rs6000/float16.md        | 159 +++++++++++++++++++
 gcc/config/rs6000/predicates.md     |  78 +++++++++
 gcc/config/rs6000/rs6000-builtin.cc |  21 +++
 gcc/config/rs6000/rs6000-c.cc       |   6 +
 gcc/config/rs6000/rs6000-call.cc    |  40 +++++
 gcc/config/rs6000/rs6000-cpus.def   |  13 +-
 gcc/config/rs6000/rs6000-protos.h   |   1 +
 gcc/config/rs6000/rs6000.cc         | 238 +++++++++++++++++++++++++---
 gcc/config/rs6000/rs6000.h          |  20 +++
 gcc/config/rs6000/rs6000.md         |   3 +
 gcc/config/rs6000/rs6000.opt        |   4 +
 gcc/doc/invoke.texi                 |  24 ++-
 14 files changed, 604 insertions(+), 26 deletions(-)
 create mode 100644 gcc/config/rs6000/float16.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index fb4cdc0a475..739ba98b28d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -5934,6 +5934,24 @@ case "${target}" in
                elif test x$with_long_double_format = xibm; then
                    tm_defines="${tm_defines} TARGET_IEEEQUAD_DEFAULT=0"
                fi
+
+               # Test if we should enable 16-bit floating point on the 
platforms
+               # where we can support __bfloat16 and _Float16.
+               if test x$with_powerpc_float16 = xyes; then
+                   tm_defines="${tm_defines} POWERPC_FLOAT16_DEFAULT=1"
+
+               elif test x$with_powerpc_16bit_floating_point = xyes; then
+                   tm_defines="${tm_defines} POWERPC_FLOAT16_DEFAULT=0"
+               fi
+
+               # Test if we should disable the warning about passing
+               # and returning 16-bit floating point values.
+               if test x$with_powerpc_float16_disable_warning = xyes; then
+                   tm_defines="${tm_defines} POWERPC_FLOAT16_DISABLE_WARNING=1"
+
+               elif test x$with_powerpc_float16_disable_warning = xno; then
+                   tm_defines="${tm_defines} POWERPC_FLOAT16_DISABLE_WARNING=0"
+               fi
                ;;
 
        s390*-*-*)
diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index d0ed47faab8..9b4f2c4ae16 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -219,6 +219,11 @@ (define_constraint "eQ"
   "An IEEE 128-bit constant that can be loaded into VSX registers."
   (match_operand 0 "easy_vector_constant_ieee128"))
 
+;; A negative 0 constant
+(define_constraint "eZ"
+  "A floating point -0.0 constant."
+  (match_operand 0 "minus_zero_constant"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/float16.md b/gcc/config/rs6000/float16.md
new file mode 100644
index 00000000000..967a6f03612
--- /dev/null
+++ b/gcc/config/rs6000/float16.md
@@ -0,0 +1,159 @@
+;; Machine description for IBM RISC System 6000 (POWER) for GNU C compiler
+;; Copyright (C) 1990-2025 Free Software Foundation, Inc.
+;; Contributed by Richard Kenner ([email protected])
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Support for _Float16 (HFmode) and __bfloat16 (BFmode)
+
+;; Mode iterator for 16-bit floating point modes both as a scalar and
+;; as a vector.
+(define_mode_iterator FP16     [BF HF])
+(define_mode_iterator VS_FP16  [BF HF V8BF V8HF])
+(define_mode_iterator VFP16    [V8BF V8HF])
+
+;; Mode attribute giving the vector mode for a 16-bit floating point
+;; scalar in both upper and lower case.
+(define_mode_attr FP16_VECTOR8 [(BF "V8BF")
+                               (HF "V8HF")])
+
+(define_mode_attr fp16_vector8 [(BF "v8bf")
+                               (HF "v8hf")])
+
+;; _Float16 and __bfloat16 moves
+(define_expand "mov<mode>"
+  [(set (match_operand:FP16 0 "nonimmediate_operand")
+       (match_operand:FP16 1 "any_operand"))]
+  "TARGET_FLOAT16"
+{
+  if (MEM_P (operands[0]) && !REG_P (operands[1]))
+    operands[1] = force_reg (<MODE>mode, operands[1]);
+})
+
+;; On power10, we can load up HFmode and BFmode constants with xxspltiw
+;; or pli.
+(define_insn "*mov<mode>_xxspltiw"
+  [(set (match_operand:FP16 0 "gpc_reg_operand"        "=wa,wa,?r,?r")
+       (match_operand:FP16 1 "fp16_xxspltiw_constant"   "j,eP,j, eP"))]
+  "TARGET_FLOAT16
+   && (TARGET_PREFIXED || operands[1] == CONST0_RTX (<MODE>mode))"
+{
+  rtx op1 = operands[1];
+  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op1);
+  long real_words[1];
+
+  if (op1 == CONST0_RTX (<MODE>mode))
+    return (!vsx_register_operand (operands[0], <MODE>mode)
+           ? "li %0,0"
+           : "xxlxor %x0,%x0,%x0");
+
+  real_to_target (real_words, rtype, <MODE>mode);
+  operands[2] = GEN_INT (real_words[0]);
+  return (vsx_register_operand (operands[0], <MODE>mode)
+         ? "xxspltiw %x0,%2"
+         : "li %0,%2");
+}
+  [(set_attr "type"     "veclogical, vecsimple, *,  *")
+   (set_attr "prefixed" "no,         yes,       no, yes")])
+
+;; Handle creating -0.0 if we don't have XXSPLTIW.  For the scalar
+;; modes, we can't do the gen_lowpart call until after register
+;; allocation.
+(define_split
+  [(set (match_operand:VS_FP16 0 "altivec_register_operand")
+       (match_operand:VS_FP16 1 "minus_zero_constant"))]
+  "TARGET_FLOAT16 && reload_completed"
+  [(const_int 0)]
+{
+  int dest_r = reg_or_subregno (operands[0]);
+  rtx dest = gen_rtx_REG (V8HImode, dest_r);
+  size_t nunits = GET_MODE_NUNITS (V8HFmode);
+
+  rtvec v = rtvec_alloc (nunits);
+  for (size_t i = 0; i < nunits; i++)
+    RTVEC_ELT (v, i) = constm1_rtx;
+
+  rs6000_expand_vector_init (dest, gen_rtx_PARALLEL (V8HImode, v));
+  emit_insn (gen_rtx_SET (dest, gen_rtx_ASHIFT (V8HImode, dest, dest)));
+  DONE;
+})
+
+
+(define_insn "*mov<mode>_internal"
+  [(set (match_operand:FP16 0 "nonimmediate_operand"
+                      "=wa,       wa,       Z,         r,          r,
+                        m,        r,        wa,        wa,         r,
+                        v")
+
+       (match_operand:FP16 1 "any_operand"
+                      "wa,        Z,        wa,        r,          m,
+                       r,         wa,       r,         j,          j,
+                       eZ"))]
+  "TARGET_FLOAT16
+   && (gpc_reg_operand (operands[0], <MODE>mode)
+       || gpc_reg_operand (operands[1], <MODE>mode))"
+  "@
+   xxlor %x0,%x1,%x1
+   lxsihzx %x0,%y1
+   stxsihx %x1,%y0
+   mr %0,%1
+   lhz%U1%X1 %0,%1
+   sth%U0%X0 %1,%0
+   mfvsrwz %0,%x1
+   mtvsrwz %x0,%1
+   xxlxor %x0,%x0,%x0
+   li %0,0
+   #"
+  [(set_attr "type"   "vecsimple, fpload,    fpstore,   *,          load,
+                       store,     mtvsr,     mfvsr,     veclogical, *,
+                       vecperm")
+   (set_attr "isa"    "*,         p9v,       p9v,       *,          *,
+                       *,         p8v,       p8v,       p9v,        *,
+                       *")
+   (set_attr "length" "*,         *,         *,         *,          *,
+                       *,         *,         *,         *,          *,
+                       8")])
+
+;; Vector duplicate
+(define_insn "*vecdup<mode>_reg"
+  [(set (match_operand:<FP16_VECTOR8> 0 "altivec_register_operand" "=v")
+       (vec_duplicate:<FP16_VECTOR8>
+        (match_operand:FP16 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT16"
+  "vsplth %0,%1,3"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "*vecdup<mode>_const"
+  [(set (match_operand:<FP16_VECTOR8> 0 "vsx_register_operand" "=wa,wa")
+       (vec_duplicate:<FP16_VECTOR8>
+        (match_operand:FP16 1 "fp16_xxspltiw_constant" "j,eP")))]
+  "TARGET_FLOAT16
+   && (TARGET_PREFIXED || operands[1] == CONST0_RTX (<MODE>mode))"
+{
+  rtx op1 = operands[1];
+  if (op1 == CONST0_RTX (<MODE>mode))
+    return "xxlxor %x0,%x0,%x0";
+
+  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op1);
+  long real_words[1];
+
+  real_to_target (real_words, rtype, <MODE>mode);
+  operands[2] = GEN_INT (real_words[0]);
+  return "xxspltiw %x0,2";
+}
+  [(set_attr "type" "veclogical,vecperm")
+   (set_attr "prefixed" "*,yes")])
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 54dbc8bcc95..33515ad0eb9 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant"
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* If we are on power10, we can use XXSPLTIW to load constants.  On power8
+     and power9, we can use direct move.  */
+  if (FP16_SCALAR_MODE_P (mode))
+    return true;
+
   /* Constants that can be generated with ISA 3.1 instructions are easy.  */
   vec_const_128bit_type vsx_const;
   if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
@@ -744,6 +749,10 @@ (define_predicate "easy_vector_constant"
            return true;
        }
 
+      /* -0,0 can be done as VSPLTIH x,-1 and VSLH x,x,x.  */
+      if (FP16_VECTOR_MODE_P (mode) && minus_zero_constant (op, mode))
+       return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
        return true;
@@ -2171,3 +2180,72 @@ (define_predicate "lowpart_subreg_operator"
 (define_predicate "lxvl_else_operand"
   (and (match_code "const_vector")
        (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+;; Return 1 if this is a floating point scalar constant that is -0.0 or
+;; a vector floating point constant where each element is -0.0.
+(define_predicate "minus_zero_constant"
+  (match_code "const_double,vec_duplicate,const_vector")
+{
+  if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      if (!CONST_DOUBLE_P (op))
+       return false;
+
+      mode = GET_MODE (op);
+    }
+
+  /* Scalar or vector filled with duplicates.  */
+  if (CONST_DOUBLE_P (op))
+    {
+      if (!SCALAR_FLOAT_MODE_P (mode))
+       return false;
+
+      const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
+      return real_isnegzero (rtype);
+    }
+
+  /* Vector constant, check all elements.  */
+  else if (CONST_VECTOR_P (op))
+    {
+      if (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT)
+       return false;
+
+      size_t nunits = GET_MODE_NUNITS (mode);
+      for (size_t i = 0; i < nunits; i++)
+       {
+         rtx ele = CONST_VECTOR_ELT (op, i);
+         if (!CONST_DOUBLE_P (ele))
+           return false;
+
+         const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (ele);
+         if (!real_isnegzero (rtype))
+           return false;
+       }
+
+      return true;
+    }
+
+  return false;
+})
+    
+;; Return 1 if this is a 16-bit floating point constant that can be
+;; loaded with XXSPLTIW or is 0.0 that can be loaded with XXSPLTIB.
+(define_predicate "fp16_xxspltiw_constant"
+  (match_code "const_double")
+{
+  if (!FP16_SCALAR_MODE_P (mode))
+    return false;
+
+  if (op == CONST0_RTX (mode))
+    return true;
+
+  if (!TARGET_PREFIXED)
+    return false;
+
+  vec_const_128bit_type vsx_const;
+  if (!vec_const_128bit_to_bytes (op, mode, &vsx_const))
+    return false;
+
+  return constant_generates_xxspltiw (&vsx_const);
+})
diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 541958d38c0..dab6e048269 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -508,6 +508,10 @@ const char *rs6000_type_string (tree type_node)
     return "voidc*";
   else if (type_node == float128_type_node)
     return "_Float128";
+  else if (type_node == float16_type_node)
+    return "_Float16";
+  else if (TARGET_FLOAT16 && type_node == bfloat16_type_node)
+    return "__bfloat16";
   else if (type_node == vector_pair_type_node)
     return "__vector_pair";
   else if (type_node == vector_quad_type_node)
@@ -788,6 +792,23 @@ rs6000_init_builtins (void)
       lang_hooks.types.register_builtin_type (uintPTI_type_internal_node,
                                              "__upti_internal");
     }
+
+  /* __bfloat16 support.  */
+  if (TARGET_FLOAT16)
+    {
+      if (!bfloat16_type_node)
+       {
+         bfloat16_type_node = make_node (REAL_TYPE);
+         TYPE_PRECISION (bfloat16_type_node) = 16;
+         SET_TYPE_MODE (bfloat16_type_node, BFmode);
+         layout_type (bfloat16_type_node);
+         t = build_qualified_type (bfloat16_type_node, TYPE_QUAL_CONST);
+       }
+
+      lang_hooks.types.register_builtin_type (bfloat16_type_node,
+                                             "__bfloat16");
+    }
+
   /* Vector pair and vector quad support.  */
   vector_pair_type_node = make_node (OPAQUE_TYPE);
   SET_TYPE_MODE (vector_pair_type_node, OOmode);
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 3fa7c04a7ce..45b1100c11b 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -583,6 +583,12 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
   if ((flags & OPTION_MASK_FLOAT128_HW) != 0)
     rs6000_define_or_undefine_macro (define_p, "__FLOAT128_HARDWARE__");
 
+  /* 16-bit floating point support.  */
+  if ((flags & OPTION_MASK_FLOAT16) != 0)
+    {
+      rs6000_define_or_undefine_macro (define_p, "__FLOAT16__");
+      rs6000_define_or_undefine_macro (define_p, "__BFLOAT16__");
+    }
   /* Tell the user if we are targeting CELL.  */
   if (rs6000_cpu == PROCESSOR_CELL)
     rs6000_define_or_undefine_macro (define_p, "__PPU__");
diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc
index b9b791bfe8a..f3e10d8ba0f 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -684,6 +684,28 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype,
             " altivec instructions are disabled, use %qs"
             " to enable them", "-maltivec");
     }
+
+#if !POWERPC_FLOAT16_DISABLE_WARNING
+  /* Warn that __bfloat16 and _Float16 might be returned differently in the
+     future.  The issue is currently 16-bit floating point is returned in
+     floating point register #1 in 16-bit format.  We may or may not want to
+     return it as a scalar 64-bit value.  */
+  if (fntype && warn_psabi && !cum->libcall)
+    {
+      static bool warned_about_float16_return = false;
+
+      if (!warned_about_float16_return)
+       {
+         machine_mode ret_mode = TYPE_MODE (TREE_TYPE (fntype));
+
+         warned_about_float16_return = true;
+         if (ret_mode == BFmode || ret_mode == HFmode)
+           warning (OPT_Wpsabi,
+                    "%s might be returned differently in the future",
+                    ret_mode == BFmode ? "__bfloat16" : "_Float16");
+       }
+    }
+#endif
 }
 
 
@@ -1641,6 +1663,24 @@ rs6000_function_arg (cumulative_args_t cum_v, const 
function_arg_info &arg)
       return NULL_RTX;
     }
 
+#if !POWERPC_FLOAT16_DISABLE_WARNING
+  /* Warn that _Float16 and __bfloat16 might be passed differently in the
+     future.  The issue is currently 16-bit floating point values are passed in
+     floating point registers in the native 16-bit format.  We may or may not
+     want to pass the value it as a scalar 64-bit value.  */
+  if (warn_psabi && !cum->libcall && FP16_SCALAR_MODE_P (mode))
+    {
+      static bool warned_about_float16_call = false;
+
+      if (!warned_about_float16_call)
+       {
+         warned_about_float16_call = true;
+         warning (OPT_Wpsabi, "%s might be passed differently in the future",
+                  mode == BFmode ? "__bfloat16" : "_Float16");
+       }
+    }
+#endif
+
   /* Return a marker to indicate whether CR1 needs to set or clear the
      bit that V.4 uses to say fp args were passed in registers.
      Assume that we don't need the marker for software floating point,
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index a110860acce..d852c7b5860 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -43,6 +43,15 @@
                                 | OPTION_MASK_ALTIVEC                  \
                                 | OPTION_MASK_VSX)
 
+/* Determine whether to enable 16-bit floating point types on power8 systems
+   and above.  */
+#if POWERPC_FLOAT16_DEFAULT
+#define TARGET_16BIT_FLOATING_POINT    OPTION_MASK_FLOAT16
+
+#else
+#define TARGET_16BIT_FLOATING_POINT    0
+#endif
+
 /* For now, don't provide an embedded version of ISA 2.07.  Do not set power8
    fusion here, instead set it in rs6000.cc if we are tuning for a power8
    system.  */
@@ -52,7 +61,8 @@
                                 | OPTION_MASK_CRYPTO                   \
                                 | OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
                                 | OPTION_MASK_QUAD_MEMORY              \
-                                | OPTION_MASK_QUAD_MEMORY_ATOMIC)
+                                | OPTION_MASK_QUAD_MEMORY_ATOMIC       \
+                                | TARGET_16BIT_FLOATING_POINT)
 
 /* ISA masks setting fusion options.  */
 #define OTHER_FUSION_MASKS     (OPTION_MASK_P8_FUSION                  \
@@ -122,6 +132,7 @@
                                 | OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
                                 | OPTION_MASK_FLOAT128_HW              \
                                 | OPTION_MASK_FLOAT128_KEYWORD         \
+                                | OPTION_MASK_FLOAT16                  \
                                 | OPTION_MASK_FPRND                    \
                                 | OPTION_MASK_POWER10                  \
                                 | OPTION_MASK_POWER11                  \
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 09424ebaf97..e507562ab8d 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -250,6 +250,7 @@ typedef struct {
   bool all_words_same;                 /* Are the words all equal?  */
   bool all_half_words_same;            /* Are the half words all equal?  */
   bool all_bytes_same;                 /* Are the bytes all equal?  */
+  machine_mode mode;                   /* Original constant mode.  */
 } vec_const_128bit_type;
 
 extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 84c8a2887bb..0155aee11eb 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1892,7 +1892,8 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
machine_mode mode)
 
       if (ALTIVEC_REGNO_P (regno))
        {
-         if (GET_MODE_SIZE (mode) < 16 && !reg_addr[mode].scalar_in_vmx_p)
+         if (GET_MODE_SIZE (mode) < 16 && !reg_addr[mode].scalar_in_vmx_p
+             && !FP16_SCALAR_MODE_P (mode))
            return 0;
 
          return ALTIVEC_REGNO_P (last_regno);
@@ -1982,7 +1983,8 @@ static bool
 rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 {
   if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
-      || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
+      || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode
+      || FP16_SCALAR_MODE_P (mode1) || FP16_SCALAR_MODE_P (mode2))
     return mode1 == mode2;
 
   if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1))
@@ -2247,6 +2249,8 @@ rs6000_debug_reg_global (void)
     DImode,
     TImode,
     PTImode,
+    BFmode,
+    HFmode,
     SFmode,
     DFmode,
     TFmode,
@@ -2625,8 +2629,14 @@ rs6000_setup_reg_addr_masks (void)
 
       /* SDmode is special in that we want to access it only via REG+REG
         addressing on power7 and above, since we want to use the LFIWZX and
-        STFIWZX instructions to load it.  */
-      bool indexed_only_p = (m == SDmode && TARGET_NO_SDMODE_STACK);
+        STFIWZX instructions to load it.
+
+        Never allow offset addressing for 16-bit floating point modes, since
+        it is expected that 16-bit floating point should always go into the
+        vector registers and we only have indexed and indirect 16-bit loads to
+        VSR registers.  */
+      bool indexed_only_p = ((m == SDmode && TARGET_NO_SDMODE_STACK)
+                            || FP16_SCALAR_MODE_P (m));
 
       any_addr_mask = 0;
       for (rc = FIRST_RELOAD_REG_CLASS; rc <= LAST_RELOAD_REG_CLASS; rc++)
@@ -2675,6 +2685,7 @@ rs6000_setup_reg_addr_masks (void)
                  && !complex_p
                  && (m != E_DFmode || !TARGET_VSX)
                  && (m != E_SFmode || !TARGET_P8_VECTOR)
+                 && !FP16_SCALAR_MODE_P (m)
                  && !small_int_vsx_p)
                {
                  addr_mask |= RELOAD_REG_PRE_INCDEC;
@@ -2928,6 +2939,15 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
       rs6000_vector_align[V1TImode] = 128;
     }
 
+  /* _Float16 support.  */
+  if (TARGET_FLOAT16)
+    {
+      rs6000_vector_mem[HFmode] = VECTOR_VSX;
+      rs6000_vector_mem[BFmode] = VECTOR_VSX;
+      rs6000_vector_align[HFmode] = 16;
+      rs6000_vector_align[BFmode] = 16;
+    }
+
   /* DFmode, see if we want to use the VSX unit.  Memory is handled
      differently, so don't set rs6000_vector_mem.  */
   if (TARGET_VSX)
@@ -3042,6 +3062,14 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
              reg_addr[TFmode].reload_load  = CODE_FOR_reload_tf_di_load;
            }
 
+         if (TARGET_FLOAT16)
+           {
+             reg_addr[HFmode].reload_store = CODE_FOR_reload_hf_di_store;
+             reg_addr[BFmode].reload_store = CODE_FOR_reload_bf_di_store;
+             reg_addr[HFmode].reload_load  = CODE_FOR_reload_hf_di_load;
+             reg_addr[BFmode].reload_load  = CODE_FOR_reload_bf_di_load;
+           }
+
          /* Only provide a reload handler for SDmode if lfiwzx/stfiwx are
             available.  */
          if (TARGET_NO_SDMODE_STACK)
@@ -3142,6 +3170,14 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
              reg_addr[TFmode].reload_load  = CODE_FOR_reload_tf_si_load;
            }
 
+         if (TARGET_FLOAT16)
+           {
+             reg_addr[HFmode].reload_store = CODE_FOR_reload_hf_si_store;
+             reg_addr[BFmode].reload_store = CODE_FOR_reload_bf_si_store;
+             reg_addr[HFmode].reload_load  = CODE_FOR_reload_hf_si_load;
+             reg_addr[BFmode].reload_load  = CODE_FOR_reload_bf_si_load;
+           }
+
          /* Only provide a reload handler for SDmode if lfiwzx/stfiwx are
             available.  */
          if (TARGET_NO_SDMODE_STACK)
@@ -3881,6 +3917,19 @@ rs6000_option_override_internal (bool global_init_p)
        }
     }
 
+  /* 16-bit floating point needs 64-bit power8 at a minimum in order to load up
+     16-bit values into vector registers via loads/stores from GPRs and then
+     using direct moves.  Don't allow 16-bit float on big endian systems at the
+     current time.  */
+  if (TARGET_FLOAT16 && (!TARGET_DIRECT_MOVE_64BIT || BYTES_BIG_ENDIAN))
+    {
+      rs6000_isa_flags &= ~OPTION_MASK_FLOAT16;
+      if (rs6000_isa_flags_explicit & OPTION_MASK_FLOAT16)
+       error ("%qs is only available on 64-bit little endian systems "
+              "that use at least %qs",
+              "-mfloat16", "-mcpu=power8");
+    }
+
   /* If hard-float/altivec/vsx were explicitly turned off then don't allow
      the -mcpu setting to enable options that conflict. */
   if ((!TARGET_HARD_FLOAT || !TARGET_ALTIVEC || !TARGET_VSX)
@@ -6462,6 +6511,12 @@ easy_altivec_constant (rtx op, machine_mode mode)
       return 0;
     }
 
+  /* For 16-bit floating point vectors, only allow 0.0 and -0.0 as easy altivec
+     constants.  */
+  if (FP16_VECTOR_MODE_P (mode))
+    return (zero_constant (op, mode) || minus_zero_constant (op, mode)
+           ? 8 : 0);
+
   /* V1TImode is a special container for TImode.  Ignore for now.  */
   else if (mode == V1TImode)
     return 0;
@@ -6569,6 +6624,12 @@ xxspltib_constant_p (rtx op,
   /* Handle (vec_duplicate <constant>).  */
   if (GET_CODE (op) == VEC_DUPLICATE)
     {
+      element = XEXP (op, 0);
+
+      /* For 16-bit floating point, the only valid use is xxspltib is 0.0.  */
+      if (FP16_VECTOR_MODE_P (mode))
+       return element == CONST0_RTX (GET_MODE_INNER (mode));
+
       if (mode != V16QImode && mode != V8HImode && mode != V4SImode
          && mode != V2DImode)
        return false;
@@ -6585,6 +6646,20 @@ xxspltib_constant_p (rtx op,
   /* Handle (const_vector [...]).  */
   else if (GET_CODE (op) == CONST_VECTOR)
     {
+      /* For V8BFmode & V8HFmode, the only valid use is xxspltib is 0.0.  */
+      if (FP16_VECTOR_MODE_P (mode))
+       {
+         if (op == CONST0_RTX (mode))
+           return true;
+
+         rtx zero = CONST0_RTX (GET_MODE_INNER (mode));
+         for (i = 0; i < nunits; i++)
+           if (CONST_VECTOR_ELT (op, i) != zero)
+             return false;
+
+         return true;
+       }
+
       if (mode != V16QImode && mode != V8HImode && mode != V4SImode
          && mode != V2DImode)
        return false;
@@ -7031,6 +7106,15 @@ rs6000_expand_vector_init (rtx target, rtx vals)
       return;
     }
 
+  /* Special case splats of 16-bit floating point.  */
+  if (all_same && FP16_VECTOR_MODE_P (mode))
+    {
+      rtx op0 = force_reg (GET_MODE_INNER (mode), XVECEXP (vals, 0, 0));
+      rtx dup = gen_rtx_VEC_DUPLICATE (mode, op0);
+      emit_insn (gen_rtx_SET (target, dup));
+      return;
+    }
+                                                    
   /* Special case initializing vector short/char that are splats if we are on
      64-bit systems with direct move.  */
   if (all_same && TARGET_DIRECT_MOVE_64BIT
@@ -7525,6 +7609,10 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx 
elt_rtx)
            insn = gen_vsx_set_v4si_p9 (target, target, val, elt_rtx);
          else if (mode == V8HImode)
            insn = gen_vsx_set_v8hi_p9 (target, target, val, elt_rtx);
+         else if (mode == V8HFmode)
+           insn = gen_vsx_set_v8hf_p9 (target, target, val, elt_rtx);
+         else if (mode == V8BFmode)
+           insn = gen_vsx_set_v8bf_p9 (target, target, val, elt_rtx);
          else if (mode == V16QImode)
            insn = gen_vsx_set_v16qi_p9 (target, target, val, elt_rtx);
          else if (mode == V4SFmode)
@@ -7642,6 +7730,22 @@ rs6000_expand_vector_extract (rtx target, rtx vec, rtx 
elt)
            }
          else
            break;
+       case E_V8HFmode:
+         if (TARGET_DIRECT_MOVE_64BIT)
+           {
+             emit_insn (gen_vsx_extract_v8hf (target, vec, elt));
+             return;
+           }
+         else
+           break;
+       case E_V8BFmode:
+         if (TARGET_DIRECT_MOVE_64BIT)
+           {
+             emit_insn (gen_vsx_extract_v8bf (target, vec, elt));
+             return;
+           }
+         else
+           break;
        case E_V4SImode:
          if (TARGET_DIRECT_MOVE_64BIT)
            {
@@ -7689,6 +7793,14 @@ rs6000_expand_vector_extract (rtx target, rtx vec, rtx 
elt)
          emit_insn (gen_vsx_extract_v8hi_var (target, vec, elt));
          return;
 
+       case E_V8HFmode:
+         emit_insn (gen_vsx_extract_v8hf_var (target, vec, elt));
+         return;
+
+       case E_V8BFmode:
+         emit_insn (gen_vsx_extract_v8bf_var (target, vec, elt));
+         return;
+
        case E_V16QImode:
          emit_insn (gen_vsx_extract_v16qi_var (target, vec, elt));
          return;
@@ -7966,7 +8078,10 @@ rs6000_split_vec_extract_var (rtx dest, rtx src, rtx 
element, rtx tmp_gpr,
       /* See if we want to generate VEXTU{B,H,W}{L,R}X if the destination is in
         a general purpose register.  */
       if (TARGET_P9_VECTOR
-         && (mode == V16QImode || mode == V8HImode || mode == V4SImode)
+         && (mode == V16QImode
+             || mode == V8HImode
+             || mode == V4SImode
+             || FP16_VECTOR_MODE_P (mode))
          && INT_REGNO_P (dest_regno)
          && ALTIVEC_REGNO_P (src_regno)
          && INT_REGNO_P (element_regno))
@@ -7979,7 +8094,7 @@ rs6000_split_vec_extract_var (rtx dest, rtx src, rtx 
element, rtx tmp_gpr,
                       ? gen_vextublx (dest_si, element_si, src)
                       : gen_vextubrx (dest_si, element_si, src));
 
-         else if (mode == V8HImode)
+         else if (mode == V8HImode || FP16_VECTOR_MODE_P (mode))
            {
              rtx tmp_gpr_si = gen_rtx_REG (SImode, REGNO (tmp_gpr));
              emit_insn (gen_ashlsi3 (tmp_gpr_si, element_si, const1_rtx));
@@ -8081,6 +8196,8 @@ rs6000_split_vec_extract_var (rtx dest, rtx src, rtx 
element, rtx tmp_gpr,
 
        case E_V4SImode:
        case E_V8HImode:
+       case E_V8HFmode:
+       case E_V8BFmode:
        case E_V16QImode:
          {
            rtx tmp_altivec_di = gen_rtx_REG (DImode, REGNO (tmp_altivec));
@@ -8695,6 +8812,13 @@ reg_offset_addressing_ok_p (machine_mode mode)
     case E_XOmode:
       return TARGET_MMA;
 
+      /* For 16-bit floating point types, do not allow offset addressing, since
+        it is assumed that most of the use will be in vector registers, and we
+        only have reg+reg addressing for 16-bit modes.  */
+    case E_BFmode:
+    case E_HFmode:
+      return false;
+
     case E_SDmode:
       /* If we can do direct load/stores of SDmode, restrict it to reg+reg
         addressing for the LFIWZX and STFIWX instructions.  */
@@ -8979,6 +9103,13 @@ rs6000_legitimate_offset_address_p (machine_mode mode, 
rtx x,
   extra = 0;
   switch (mode)
     {
+      /* For 16-bit floating point types, do not allow offset addressing, since
+        it is assumed that most of the use will be in vector registers, and we
+        only have reg+reg addressing for 16-bit modes.  */
+    case E_BFmode:
+    case E_HFmode:
+      return false;
+
     case E_DFmode:
     case E_DDmode:
     case E_DImode:
@@ -9080,6 +9211,11 @@ macho_lo_sum_memory_operand (rtx x, machine_mode mode)
 static bool
 legitimate_lo_sum_address_p (machine_mode mode, rtx x, int strict)
 {
+      /* For 16-bit floating point types, do not allow offset addressing, since
+        it is assumed that most of the use will be in vector registers, and we
+        only have reg+reg addressing for 16-bit modes.  */
+  if (FP16_SCALAR_MODE_P (mode))
+    return false;
   if (GET_CODE (x) != LO_SUM)
     return false;
   if (!REG_P (XEXP (x, 0)))
@@ -12676,6 +12812,9 @@ rs6000_secondary_reload_simple_move (enum 
rs6000_reg_type to_type,
       && ((to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
          || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)))
     {
+      if (FP16_SCALAR_MODE_P (mode))
+       return true;
+
       if (TARGET_POWERPC64)
        {
          /* ISA 2.07: MTVSRD or MVFVSRD.  */
@@ -13463,6 +13602,11 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
rclass)
          || mode_supports_dq_form (mode))
        return rclass;
 
+      /* IEEE 16-bit and bfloat16 don't support offset addressing, but they can
+        go in any floating point/vector register.  */
+      if (FP16_SCALAR_MODE_P (mode))
+       return rclass;
+
       /* If this is a scalar floating point value and we don't have D-form
         addressing, prefer the traditional floating point registers so that we
         can use D-form (register+offset) addressing.  */
@@ -13480,6 +13624,16 @@ rs6000_preferred_reload_class (rtx x, enum reg_class 
rclass)
       return rclass;
     }
 
+  /* For 16-bit floating point scalar modes, if we have lxsihzx/stxsihzx from
+     Power9, prefer the vector registers.  On power8, we will need to use GPRs
+     to do load/store.  For 16-bit floating point vector modes, only prefer
+     VSX.  */
+  if (FP16_VECTOR_MODE_P (mode))
+    return TARGET_P9_VECTOR ? VSX_REGS : rclass;
+
+  if (FP16_VECTOR_MODE_P (mode))
+    return VSX_REGS;
+
   if (is_constant || GET_CODE (x) == PLUS)
     {
       if (reg_class_subset_p (GENERAL_REGS, rclass))
@@ -13692,6 +13846,9 @@ rs6000_can_change_mode_class (machine_mode from,
   unsigned from_size = GET_MODE_SIZE (from);
   unsigned to_size = GET_MODE_SIZE (to);
 
+  if (FP16_SCALAR_MODE_P (from) || FP16_SCALAR_MODE_P (to))
+    return from_size == to_size;
+
   if (from_size != to_size)
     {
       enum reg_class xclass = (TARGET_VSX) ? VSX_REGS : FLOAT_REGS;
@@ -13909,7 +14066,8 @@ rs6000_output_move_128bit (rtx operands[])
          else if (TARGET_P9_VECTOR)
            return "lxvx %x0,%y1";
 
-         else if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
+         else if (mode == V16QImode || mode == V8HImode || mode == V4SImode
+                  || FP16_VECTOR_MODE_P (mode))
            return "lxvw4x %x0,%y1";
 
          else
@@ -13947,7 +14105,8 @@ rs6000_output_move_128bit (rtx operands[])
          else if (TARGET_P9_VECTOR)
            return "stxvx %x1,%y0";
 
-         else if (mode == V16QImode || mode == V8HImode || mode == V4SImode)
+         else if (mode == V16QImode || mode == V8HImode || mode == V4SImode
+                  || FP16_VECTOR_MODE_P (mode))
            return "stxvw4x %x1,%y0";
 
          else
@@ -22978,7 +23137,7 @@ rs6000_load_constant_and_splat (machine_mode mode, 
REAL_VALUE_TYPE dconst)
 {
   rtx reg;
 
-  if (mode == SFmode || mode == DFmode)
+  if (mode == SFmode || mode == DFmode || FP16_SCALAR_MODE_P (mode))
     {
       rtx d = const_double_from_real_value (dconst, mode);
       reg = force_reg (mode, d);
@@ -24305,6 +24464,8 @@ rs6000_scalar_mode_supported_p (scalar_mode mode)
     return default_decimal_float_supported_p ();
   else if (TARGET_FLOAT128_TYPE && (mode == KFmode || mode == IFmode))
     return true;
+  else if (FP16_SCALAR_MODE_P (mode))
+    return true;
   else
     return default_scalar_mode_supported_p (mode);
 }
@@ -24329,6 +24490,10 @@ rs6000_libgcc_floating_mode_supported_p 
(scalar_float_mode mode)
     case E_KFmode:
       return TARGET_FLOAT128_TYPE && !TARGET_IEEEQUAD;
 
+    case E_BFmode:
+    case E_HFmode:
+      return TARGET_FLOAT16;
+
     default:
       return false;
     }
@@ -24356,6 +24521,9 @@ rs6000_floatn_mode (int n, bool extended)
     {
       switch (n)
        {
+       case 16:
+         return TARGET_FLOAT16 ? SFmode : opt_scalar_float_mode ();
+
        case 32:
          return DFmode;
 
@@ -24377,6 +24545,9 @@ rs6000_floatn_mode (int n, bool extended)
     {
       switch (n)
        {
+       case 16:
+         return TARGET_FLOAT16 ? HFmode : opt_scalar_float_mode ();
+
        case 32:
          return SFmode;
 
@@ -24499,6 +24670,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "future",                  OPTION_MASK_FUTURE,             false, false },
   { "hard-dfp",                        OPTION_MASK_DFP,                false, 
true  },
   { "htm",                     OPTION_MASK_HTM,                false, true  },
+  { "float16",                 OPTION_MASK_FLOAT16,            false, true  },
   { "isel",                    OPTION_MASK_ISEL,               false, true  },
   { "mfcrf",                   OPTION_MASK_MFCRF,              false, true  },
   { "mfpgpr",                  0,                              false, true  },
@@ -28919,24 +29091,37 @@ constant_fp_to_128bit_vector (rtx op,
   const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
   long real_words[VECTOR_128BIT_WORDS];
 
-  /* Make sure we don't overflow the real_words array and that it is
-     filled completely.  */
-  gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0);
-
-  real_to_target (real_words, rtype, mode);
+  /* For 16-bit floating point, the constant doesn't fill the whole 32-bit
+     word.  Deal with it here, storing the bytes in big endian fashion.  */
+  if (FP16_SCALAR_MODE_P (mode))
+    {
+      real_to_target (real_words, rtype, mode);
+      info->bytes[byte_num] = (unsigned char) (real_words[0] >> 8);
+      info->bytes[byte_num+1] = (unsigned char) (real_words[0]);
+    }
 
-  /* Iterate over each 32-bit word in the floating point constant.  The
-     real_to_target function puts out words in target endian fashion.  We need
-     to arrange the order so that the bytes are written in big endian order.  
*/
-  for (unsigned num = 0; num < num_words; num++)
+  else
     {
-      unsigned endian_num = (BYTES_BIG_ENDIAN
-                            ? num
-                            : num_words - 1 - num);
+      /* Make sure we don't overflow the real_words array and that it is filled
+        completely.  */
+      gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0);
 
-      unsigned uvalue = real_words[endian_num];
-      for (int shift = 32 - 8; shift >= 0; shift -= 8)
-       info->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+      real_to_target (real_words, rtype, mode);
+
+      /* Iterate over each 32-bit word in the floating point constant.  The
+        real_to_target function puts out words in target endian fashion.  We
+        need to arrange the order so that the bytes are written in big endian
+        order.  */
+      for (unsigned num = 0; num < num_words; num++)
+       {
+         unsigned endian_num = (BYTES_BIG_ENDIAN
+                                ? num
+                                : num_words - 1 - num);
+
+         unsigned uvalue = real_words[endian_num];
+         for (int shift = 32 - 8; shift >= 0; shift -= 8)
+           info->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+       }
     }
 
   /* Mark that this constant involves floating point.  */
@@ -28975,6 +29160,7 @@ vec_const_128bit_to_bytes (rtx op,
     return false;
 
   /* Set up the bits.  */
+  info->mode = mode;
   switch (GET_CODE (op))
     {
       /* Integer constants, default to double word.  */
@@ -29202,6 +29388,10 @@ constant_generates_xxspltiw (vec_const_128bit_type 
*vsx_const)
   if (!TARGET_SPLAT_WORD_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
     return 0;
 
+  /* HFmode/BFmode constants can always use XXSPLTIW.  */
+  if (FP16_SCALAR_MODE_P (vsx_const->mode))
+    return 1;
+
   if (!vsx_const->all_words_same)
     return 0;
 
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index a5df5a39321..f52a136d488 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -343,6 +343,26 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
    || ((MODE) == TDmode)                                               \
    || (!TARGET_FLOAT128_TYPE && FLOAT128_IEEE_P (MODE)))
 
+/* Is this a valid 16-bit scalar floating point mode?  */
+#define FP16_SCALAR_MODE_P(MODE)                                       \
+  (TARGET_FLOAT16 && ((MODE) == HFmode || (MODE) == BFmode))
+
+/* Is this a valid 16-bit scalar floating point mode?  */
+#define FP16_VECTOR_MODE_P(MODE)                                       \
+  (TARGET_FLOAT16 && ((MODE) == V8HFmode || (MODE) == V8BFmode))
+
+/* Do we have conversion support in hardware for the 16-bit floating point?  */
+#define TARGET_BFLOAT16_HW     (TARGET_FLOAT16 && TARGET_POWER10)
+#define TARGET_FLOAT16_HW      (TARGET_FLOAT16 && TARGET_P9_VECTOR)
+
+/* Do we have conversion support in hardware for the 16-bit floating point and
+   also enable the 16-bit floating point vector optimizations?  */
+#define TARGET_BFLOAT16_HW_VECTOR                                      \
+  (TARGET_FLOAT16 && TARGET_POWER10 && TARGET_BFLOAT16_VECTOR)
+
+#define TARGET_FLOAT16_HW_VECTOR                                       \
+  (TARGET_FLOAT16 && TARGET_POWER9 && TARGET_FLOAT16_VECTOR)
+
 /* Return true for floating point that does not use a vector register.  */
 #define SCALAR_FLOAT_MODE_NOT_VECTOR_P(MODE)                           \
   (SCALAR_FLOAT_MODE_P (MODE) && !FLOAT128_VECTOR_P (MODE))
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 9632da3ebb2..fbcfb02d944 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -718,6 +718,8 @@ (define_code_attr uns [(fix         "")
 ; A generic w/d attribute, for things like cmpw/cmpd.
 (define_mode_attr wd [(QI    "b")
                      (HI    "h")
+                     (BF    "h")
+                     (HF    "h")
                      (SI    "w")
                      (DI    "d")
                      (V16QI "b")
@@ -15895,3 +15897,4 @@ (define_insn "hashchk"
 (include "htm.md")
 (include "fusion.md")
 (include "pcrel-opt.md")
+(include "float16.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 2b6ec5222fc..ddbf1c882f1 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -638,6 +638,10 @@ mieee128-constant
 Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the LXVKQ instruction.
 
+mfloat16
+Target Mask(FLOAT16) Var(rs6000_isa_flags)
+Enable or disable 16-bit floating point.
+
 ; Documented parameters
 
 -param=rs6000-vect-unroll-limit=
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8da5f03ccbd..aebfcf30054 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1401,7 +1401,7 @@ See RS/6000 and PowerPC Options.
 -mpointers-to-nested-functions  -msave-toc-indirect  -mpower8-fusion
 -mcrypto  -mhtm  -mquad-memory  -mquad-memory-atomic
 -mcompat-align-parm
--mfloat128  -mfloat128-hardware
+-mfloat128  -mfloat128-hardware -mfloat16
 -mgnu-attribute
 -mstack-protector-guard=@var{guard}  -mstack-protector-guard-reg=@var{reg}
 -mstack-protector-guard-offset=@var{offset}  -mprefixed
@@ -32025,6 +32025,28 @@ The default for @option{-mfloat128-hardware} is 
enabled on PowerPC
 Linux systems using the ISA 3.0 instruction set, and disabled on other
 systems.
 
+
+@opindex mfloat16
+@opindex mno-float16
+@item -mfloat16
+@itemx -mno-float16
+Enable/disable both the @code{_Float16} and @code{__bfloat16} keywords
+for using 16-bit floating point.
+
+The @code{_Float16} keyword is for IEEE 16-bit floating point and GCC
+generates either software emulation for IEEE 16-bit floating point or
+hardware instructions.
+
+The @code{__bfloat16} keyword is for Google brain 16-bit floating
+point and GCC generates either software emulation for Google brain
+16-bit floating point or hardware instructions.
+
+At the current time, 16-bit floating point support is experimental,
+and support may be changed in future releases.  If you pass or return
+a 16-bit floating point value, GCC will issue a warning that the ABI
+may change in the future unless you use the @option{-Wno-psabi}
+option.
+
 @opindex m32
 @opindex m64
 @item -m32
-- 
2.54.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: [email protected]

[PATCH 2/10]: Add initial 16-bit floating point support

Reply via email to