[PATCH, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2018-12-13 Thread Kelvin Nilsen


This patch is a refinement of a path first submitted to this list on Nov. 10, 
2018.  This new patch incorporates improvements suggested by 
seg...@gcc.gnu.org.  Two regression observed at the time this patch was 
previously distributed have been resolved as described here: 
https://sourceware.org/bugzilla/show_bug.cgi?id=23937

New D-form instructions available on Power9 introduce new code generation 
options that result in more efficient execution.

This new pass scans existing rtl expressions and replaces them with rtl 
expressions that favor selection of the D-form instructions in contexts for 
which the D-form instructions are preferred.  The new pass runs after the RTL 
loop optimizations since loop unrolling often introduces opportunities for 
beneficial replacements of X-form addressing instructions.

I have built and regression tested this patch on powerpc64le-unknown-linux 
(Power9) target with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2018-12-13  Kelvin Nilsen  

* config/rs6000/rs6000-p9dform.c: New file.
* config/rs6000/rs6000-passes.def: Add pass_insert_dform after
pass_loop2.
* config/rs6000/rs6000-protos.h
(rs6000_target_supports_dform_offset_p): New prototype.
(make_pass_insert_dform): New prototype.
* config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
New function.
* config/rs6000/t-rs6000: Add entry to compile rs6000-p9dform.c.
* config.gcc: Add entry to link new object file rs6000-p9dform.o.

gcc/testsuite/ChangeLog:

2018-12-13  Kelvin Nilsen  

* gcc.target/powerpc/p9-dform-0.c: New test.
* gcc.target/powerpc/p9-dform-1.c: New test.

Index: gcc/config/rs6000/rs6000-p9dform.c
===
--- gcc/config/rs6000/rs6000-p9dform.c  (nonexistent)
+++ gcc/config/rs6000/rs6000-p9dform.c  (working copy)
@@ -0,0 +1,1487 @@
+/* Subroutines used to transform array subscripting expressions into
+   forms that are more amenable to d-form instruction selection for p9
+   little-endian VSX code.
+   Copyright (C) 1991-2018 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "tree-pass.h"
+#include "rtx-vector-builder.h"
+#include "cfgloop.h"
+
+#include "insn-config.h"
+#include "recog.h"
+
+#include "print-rtl.h"
+#include "tree-pretty-print.h"
+
+#include "genrtl.h"
+
+/* This pass transforms array indexing expressions from a form that
+   favors selection of X-form instructions into a form that favors
+   selection of D-form instructions.
+
+   Showing favor for D-form instructions is especially important when
+   targeting Power9, as the Power9 architecture added a number of new
+   D-form instruction capabilities.
+
+   Consider, for example, the following loop, excerpted from an actual
+   program:
+
+double sacc, x[], y[], z[];
+sacc = 0.00;
+for (unsigned long long int i = 0; i < N; i++) {
+  z[i] = x[i] * y[i];
+  sacc += z[i];
+}
+
+   Compile this program with the following gcc options which enable both
+   vectorization and loop unrolling:
+-m64 -fdump-rtl-all-details -mcpu=power9 -mtune=power9 -funroll-loops -O3
+
+   Without this pass, this loop is represented by the following:
+
+   lxvx:  16
+   addi:   8
+   xvmuldp:8
+   stxvx:  8
+   fmr:8
+   xxpermdi:   8
+   fadd:  16
+   bdnz:   1
+ ___
+ total:   73 instructions
+
+.L3:
+   lxvx 0,29,11
+   lxvx 12,30,11
+   addi 12,11,16
+   addi 0,11,48
+   addi 5,11,64
+   addi 9,11,32
+   addi 6,11,80
+   addi 7,11,96
+   addi 8,11,112
+   lxvx 2,29,12
+   lxvx 3,30,12
+   lxvx 4,29,0
+   lxvx 5,30,0
+   lxvx 10,30,9
+   lxvx 11,29,5
+   xvmuldp 6,0,12
+   lxvx 13,30,5
+   lxvx 8,29,9
+   lxvx 27,29,6
+   lxvx 28,30,6
+   xvmuldp 7,2,3
+   lxvx 29,29,7
+   lxvx 

Re: [RFC][PATCH, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2018-11-16 Thread Segher Boessenkool
On Sat, Nov 10, 2018 at 11:36:28AM -0600, Kelvin Nilsen wrote:
> This new pass scans existing rtl expressions and replaces them with rtl 
> expressions that favor selection of the D-form instructions in contexts for 
> which the D-form instructions are preferred.

(Here would be a good place to say it runs right after the RTL loop opts).

> Both regressions relate to resolution of ifuncs, and I have determined that 
> the toc pointer upon entry into the resolver functions are not valid.  I have 
> not yet determined why this is happening, though I have observed that the 
> same problem seems to occur with certain other versions of the compiler prior 
> to my trunk with patch.  The two failures are:
> 
> FAIL: gcc.dg/attr-ifunc-4.c execution test
> FAIL: gcc.dg/ipa/ipa-pta-19.c execution test

It sounds like those are not new failures.  Would be good to figure out
what causes it of course.

>   * config/rs6000/rs6000-p9indexing.c: New file.

A better name would be good :-)


> +bool
> +rs6000_target_supports_dform_offset_p (bool is_store __attribute__((unused)),

Just say   bool /*is_store*/   or completely delete this if it isn't used?
Shorter name wouldn't hurt either ;-)

> +{
> +  const int max_16bit_signed = (0x7fff);
> +  const int min_16bit_signed = -1 - max_16bit_signed;

Please use HOST_WIDE_INT, not int.

> +  /* available d-form instructions with P1 (the original Power architecture):

We don't actually support that anymore.  But we do support original PowerPC
still, which matches what you say here (POWER used diffent mnemonics).

> +  /* available d-form instructions with PPC (prior to v2.00):

This is the 64-bit instructions (64-bit is not required before ISA 3.0).
Those did not exist before PowerPC either, yup.

> +  if ((byte_offset >= min_16bit_signed)
> +   && (byte_offset <= max_16bit_signed))

  if (IN_RANGE (byte_offset, min_16bit_signed, max_16bit_signed))

> + return true;
> +  else
> + break;

Don't say "return else break" please, just say "return; break;"

> +  if (rs6000_isa_flags & OPTION_MASK_MODULO)
> +{/* ISA 3.0 */

Don't put the "ISA" comment here please, put it somewhere before the "{".

> + case E_DFmode:
> + case E_SFmode:
> +   /* E_DFmode handled by lxsd and stxsd insns.  E_SFmode handled
> +  by lxssp and stxssp insn.  */

These are already handled by {l,st}f[sd] which are base PowerPC, at any
offset (not just multiples of 4)?

> --- gcc/config/rs6000/rs6000-passes.def   (revision 263589)
> +++ gcc/config/rs6000/rs6000-passes.def   (working copy)
> @@ -25,3 +25,4 @@
>   */
>  
>INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
> +  INSERT_PASS_AFTER (pass_loop2, 1, pass_fix_indexing);

This name is not super.

> \ No newline at end of file

And neither is that :-)

> +  /* If this insn is relevant, it belongs to an equivalence class.
> + The equivalence classes are identified by the definitions that
> + define the inputs to this insn.
> +   */

No new line for the comment end please.

> +  unsigned int is_relevant: 1;

A space before the colon, too.

> +static int count_links (struct df_link *def_link)
> +{
> +  if (def_link == NULL) return 0;
> +  else return 1 + count_links (def_link->next);

Each return should start a new line.

> +static int
> +help_hash (int count, struct df_link *def_link) {
> +  int *ids;
> +  int i = 0;
> +
> +  if (count > max_use_links)
> +max_use_links = count;
> +
> +  ids = (int *) alloca (count * sizeof (int));

sizeof *ids; but can't you use a VLA?

> +  while (def_link != NULL) {

{ on a new line, indented.

> +  /* bubble sort to put ids in ascending order. */
> +  for (int end = count - 1; end > 0; end--) {
> +for (int j = 0; j < end; j++) {
> +  if (ids[j] > ids[j+1]) {
> + int swap = ids[j];
> + ids[j] = ids[j+1];
> + ids[j+1] = swap;

std::swap to swap.  And just use qsort?

> +static void
> +find_defs (indexing_web_entry *insn_entry, rtx_insn *insn,
> +struct df_link **insn_base, struct df_link **insn_index)
> +{
> +  unsigned int uid = INSN_UID (insn);
> +  rtx body = PATTERN (insn);
> +  rtx mem;
> +  if ((GET_CODE (body) == SET) && MEM_P (SET_SRC (body)))
> +mem = XEXP (SET_SRC (body), 0);
> +  else if ((GET_CODE (body) == SET) && MEM_P (SET_DEST (body)))
> +mem = XEXP (SET_DEST (body), 0);
> +  else
> +mem = NULL;
> +  /* else, this insn is neither load nor store.  */

Use single_set instead?

> + }
> + }
> + }
> + }
> + }
> + }
> +}
> +}

This means you need some more factoring and/or some early outs.

> +/* Return non-zero if an only if use represents a compile-time constant.  */
> +static int
> +represents_constant_p (df_ref use)

You're returning true or false, so please declare this s "bool".

> +  /* We're only happy with multiple uses if all but one represent
> +  constant values.  

[RFC][PATCH, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2018-11-10 Thread Kelvin Nilsen


New D-form instructions available on Power9 introduce new code generation 
options that result in more efficient execution.

This new pass scans existing rtl expressions and replaces them with rtl 
expressions that favor selection of the D-form instructions in contexts for 
which the D-form instructions are preferred.

I have built and regression tested this patch on powerpc64le-unknown-linux 
(Power9) target with only two regressions.

Both regressions relate to resolution of ifuncs, and I have determined that the 
toc pointer upon entry into the resolver functions are not valid.  I have not 
yet determined why this is happening, though I have observed that the same 
problem seems to occur with certain other versions of the compiler prior to my 
trunk with patch.  The two failures are:

FAIL: gcc.dg/attr-ifunc-4.c execution test
FAIL: gcc.dg/ipa/ipa-pta-19.c execution test


I invite comments and suggestions regarding this draft patch at this time.

gcc/ChangeLog:

2018-11-10  Kelvin Nilsen  

* config.gcc: Add entry to compile new object rs6000-p9indexing.o.
* config/rs6000/rs6000-passes.def: Add pass_fix_indexing after
pass_loop2.
* config/rs6000/t-rs6000: Add entry to compile rs6000-p9indexing.c.
* config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
New function.
* config/rs6000/rs6000-protos.h
(rs6000_target_supports_dform_offset_p): New prototype.
(make_pass_fix_indexing): New prototype.
* config/rs6000/rs6000-p9indexing.c: New file.

Index: gcc/config/rs6000/t-rs6000
===
--- gcc/config/rs6000/t-rs6000  (revision 263589)
+++ gcc/config/rs6000/t-rs6000  (working copy)
@@ -35,6 +35,10 @@
$(COMPILE) $<
$(POSTCOMPILE)
 
+rs6000-p9indexing.o: $(srcdir)/config/rs6000/rs6000-p9indexing.c
+   $(COMPILE) $<
+   $(POSTCOMPILE)
+
 $(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
   $(srcdir)/config/rs6000/rs6000-cpus.def
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 263589)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -47,6 +47,8 @@
 extern bool legitimate_indirect_address_p (rtx, int);
 extern bool legitimate_indexed_address_p (rtx, int);
 extern bool avoiding_indexed_address_p (machine_mode);
+extern bool rs6000_target_supports_dform_offset_p (bool, machine_mode,
+  HOST_WIDE_INT);
 
 extern rtx rs6000_got_register (rtx);
 extern rtx find_addr_reg (rtx);
@@ -244,6 +246,8 @@
 class rtl_opt_pass;
 
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
+extern rtl_opt_pass *make_pass_fix_indexing (gcc::context *);
+
 extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
 extern bool rs6000_quadword_masked_address_p (const_rtx exp);
 extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 263589)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -9263,6 +9263,169 @@
   return ret;
 }
 
+/* This function provides an approximation of which d-form addressing
+   expressions are valid on any given target configuration.  This
+   approximation guides optimization choices.  Secondary validation
+   of the addressing mode is performed before code generation.
+
+   Return true iff target has instructions to perform a memory
+   operation at the specified BYTE_OFFSET from an address held
+   in a general purpose register.  if IS_STORE is true, test for
+   availability of a store instruction.  Otherwise, test for
+   availability of a load instruction.  */
+bool
+rs6000_target_supports_dform_offset_p (bool is_store __attribute__((unused)),
+  machine_mode mode,
+  HOST_WIDE_INT byte_offset)
+{
+  const int max_16bit_signed = (0x7fff);
+  const int min_16bit_signed = -1 - max_16bit_signed;
+
+  /* available d-form instructions with P1 (the original Power architecture):
+
+ lbz RT,D(RA) - load byte and zero d-form
+ lhz RT,D(RA) - load half word and zero d-form
+ lha RT,D(RA) - load half word algebraic d-form
+ lwz RT,D(RA) - load word and zero d-form
+ lfs FRT,D(RA) - load floating-point single d-form
+ lfd FRT,D(RA) - load floating-point double d-form
+
+ stb RS,D(RA) - store byte d-form
+ sth RS,D(RA) - store half word d-form
+ stfs FRS,D(RA) - store floating point single d-form
+ stfd FRS,D(RA) - store floating point double d-form
+  */
+
+  /* available d-form instructions with PPC (prior to v2.00):
+ (option mpowerpc "existed in the past" but is now "always on"
+
+ lwa RT,DS(RA) -