date:20181213

[PATCH, middle-end/i386]: Fix PR88474, Inline built-in hypot for -ffast-math

2018-12-13 Thread Uros Bizjak

Attached patch inlines calls to hypotf and hypot using x87 XFmode arithmetic.

2018-12-14  Uros Bizjak  

PR target/88474
* internal-fn.def (HYPOT): New.
* optabs.def (hypot_optab): New.
* config/i386/i386.md (hypot3): New expander.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

There are a couple of one-liners in middle-end part, so patch also
needs middle-end approval.

OK for mainline?

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5e46bdcdd37..537b90c3b4b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15048,6 +15048,32 @@
}
 })
 
+(define_expand "hypot3"
+  [(use (match_operand:MODEF 0 "register_operand"))
+   (use (match_operand:MODEF 1 "general_operand"))
+   (use (match_operand:MODEF 2 "general_operand"))]
+  "TARGET_USE_FANCY_MATH_387
+   && (!(SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
+   || TARGET_MIX_SSE_I387)
+   && flag_finite_math_only
+   && flag_unsafe_math_optimizations"
+{
+  rtx op0 = gen_reg_rtx (XFmode);
+  rtx op1 = gen_reg_rtx (XFmode);
+  rtx op2 = gen_reg_rtx (XFmode);
+
+  emit_insn (gen_extendxf2 (op2, operands[2]));
+  emit_insn (gen_extendxf2 (op1, operands[1]));
+
+  emit_insn (gen_mulxf3 (op1, op1, op1));
+  emit_insn (gen_mulxf3 (op2, op2, op2));
+  emit_insn (gen_addxf3 (op0, op2, op1));
+  emit_insn (gen_sqrtxf2 (op0, op0));
+
+  emit_insn (gen_truncxf2 (operands[0], op0));
+  DONE;
+})
+
 (define_insn "x86_fnstsw_1"
   [(set (match_operand:HI 0 "register_operand" "=a")
(unspec:HI [(reg:CCFP FPSR_REG)] UNSPEC_FNSTSW))]
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index cda314e1121..593eead4ef1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -230,6 +230,7 @@ DEF_INTERNAL_FLT_FLOATN_FN (TRUNC, ECF_CONST, btrunc, unary)
 DEF_INTERNAL_FLT_FN (ATAN2, ECF_CONST, atan2, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (COPYSIGN, ECF_CONST, copysign, binary)
 DEF_INTERNAL_FLT_FN (FMOD, ECF_CONST, fmod, binary)
+DEF_INTERNAL_FLT_FN (HYPOT, ECF_CONST, hypot, binary)
 DEF_INTERNAL_FLT_FN (POW, ECF_CONST, pow, binary)
 DEF_INTERNAL_FLT_FN (REMAINDER, ECF_CONST, remainder, binary)
 DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 5a67f5eed5e..2d039e7682b 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -284,6 +284,7 @@ OPTAB_D (exp2_optab, "exp2$a2")
 OPTAB_D (exp_optab, "exp$a2")
 OPTAB_D (expm1_optab, "expm1$a2")
 OPTAB_D (fmod_optab, "fmod$a3")
+OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")

Re: [PATCH] handle expressions in __builtin_has_attribute (PR 88383)

2018-12-13 Thread Jakub Jelinek

On Thu, Dec 13, 2018 at 09:03:57PM -0700, Martin Sebor wrote:
> It's as good as the design of GCC attributes allows.  Based on

No.

> the manual a function declaration like
> 
>   __attribute__ ((alloc_size (1), malloc))
>   void* allocate (unsigned);
> 
> should have those two attributes applied to it.  Yet, alloc_size
> is actually applied to its type (but not to its decl) while malloc
> to the function's decl but not its type (bug 88397).
> 
> I'm pretty sure most users still expect the following to pass:
> 
>   _Static_assert (__builtin_has_attribute (allocate, alloc_size));
>   _Static_assert (__builtin_has_attribute (allocate, malloc));

Users shouldn't expect this.  If anything, we should document what
attributes are type attributes and which attributes are declaration
attributes.

With the way you're proposing, users could check the type attributes
simply through __typeof/decltype etc., but couldn't test solely the
declaration attributes.

Jakub

Re: [RS6000] PR88311, mlongcall indirections are optimised away

2018-12-13 Thread Segher Boessenkool

On Fri, Dec 14, 2018 at 04:55:37PM +1030, Alan Modra wrote:
> Masking CALL_LONG from the cookie was done in order to simplify and
> correct length attribute calculations for indirect calls at one point
> in my call series tidy when the indirect patterns used alternatives
> "0,n" on the cookie operand.  (Leaving the CALL_LONG in place
> calculated the wrong length for long calls without fp args.)
> 
> This is no longer necessary now that the indirect sysv call patterns
> explicitly test for the fp arg bits in their length attribute
> expressions.  And without the CALL_LONG to disable insns like
> call_value_local_svsv, combine merrily replaces the indirect long call
> sequence with a direct call.  As it should.  This patch reinstates
> the CALL_LONG bit.

Thanks!  Okay for trunk.


Segher


>   PR rtl-optimization/88311
>   * config/rs6000/rs6000.c (rs6000_call_sysv): Do not mask cookie.
>   (rs6000_sibcall_sysv): Likewise.

Re: [PATCH] x86: Don't use get_frame_size to finalize stack frame

2018-12-13 Thread Uros Bizjak

On Thu, Dec 13, 2018 at 6:36 PM H.J. Lu  wrote:
>
> get_frame_size () returns used stack slots during compilation, which
> may be optimized out later.  Since ix86_find_max_used_stack_alignment
> is called by ix86_finalize_stack_frame_flags to check if stack frame
> is required, there is no need to call get_frame_size () which may give
> inaccurate final stack frame size.
>
> Tested on AVX512 machine configured with
>
> --with-arch=native --with-cpu=native
>
> OK for trunk?
>
>
> H.J.
> ---
> gcc/
>
> PR target/88483
> * config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
> use get_frame_size ().
>
> gcc/testsuite/
>
> PR target/88483
> * gcc.target/i386/stackalign/pr88483.c: New test.

LGTM, but you know this part of the compiler better than I.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c  |  1 -
>  .../gcc.target/i386/stackalign/pr88483.c| 17 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index caa701fe242..edc8f4f092e 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -12876,7 +12876,6 @@ ix86_finalize_stack_frame_flags (void)
>&& flag_exceptions
>&& cfun->can_throw_non_call_exceptions)
>&& !ix86_frame_pointer_required ()
> -  && get_frame_size () == 0
>&& ix86_nsaved_sseregs () == 0
>&& ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0)
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c 
> b/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
> new file mode 100644
> index 000..5aec8fd4cf6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-O2 -mavx2" } */
> +
> +struct B
> +{
> +  char a[12];
> +  int b;
> +};
> +
> +struct B
> +f2 (void)
> +{
> +  struct B x = {};
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not 
> "and\[lq\]?\[^\\n\]*-\[0-9\]+,\[^\\n\]*sp" } } */
> --
> 2.19.2
>

[RS6000] PR88311, mlongcall indirections are optimised away

2018-12-13 Thread Alan Modra

Masking CALL_LONG from the cookie was done in order to simplify and
correct length attribute calculations for indirect calls at one point
in my call series tidy when the indirect patterns used alternatives
"0,n" on the cookie operand.  (Leaving the CALL_LONG in place
calculated the wrong length for long calls without fp args.)

This is no longer necessary now that the indirect sysv call patterns
explicitly test for the fp arg bits in their length attribute
expressions.  And without the CALL_LONG to disable insns like
call_value_local_svsv, combine merrily replaces the indirect long call
sequence with a direct call.  As it should.  This patch reinstates
the CALL_LONG bit.

Bootstrapped and regression tested powerpc64le-linux (with
HAVE_AS_PLTSEQ) and powerpc64-linux biarch (without HAVE_AS_PLTSEQ).
OK?

PR rtl-optimization/88311
* config/rs6000/rs6000.c (rs6000_call_sysv): Do not mask cookie.
(rs6000_sibcall_sysv): Likewise.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 88f4f620ae4..c4682fce6cd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -37978,8 +37978,7 @@ rs6000_call_sysv (rtx value, rtx func_desc, rtx tlsarg, 
rtx cookie)
   if (value != NULL_RTX)
 call[0] = gen_rtx_SET (value, call[0]);
 
-  unsigned int mask = CALL_V4_SET_FP_ARGS | CALL_V4_CLEAR_FP_ARGS;
-  call[1] = gen_rtx_USE (VOIDmode, GEN_INT (INTVAL (cookie) & mask));
+  call[1] = gen_rtx_USE (VOIDmode, cookie);
   call[2] = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO));
 
   insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (3, call));
@@ -38043,8 +38042,7 @@ rs6000_sibcall_sysv (rtx value, rtx func_desc, rtx 
tlsarg, rtx cookie)
   if (value != NULL_RTX)
 call[0] = gen_rtx_SET (value, call[0]);
 
-  unsigned int mask = CALL_V4_SET_FP_ARGS | CALL_V4_CLEAR_FP_ARGS;
-  call[1] = gen_rtx_USE (VOIDmode, GEN_INT (INTVAL (cookie) & mask));
+  call[1] = gen_rtx_USE (VOIDmode, cookie);
   call[2] = simple_return_rtx;
 
   insn = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (3, call));

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] handle expressions in __builtin_has_attribute (PR 88383)

2018-12-13 Thread Martin Sebor


On 12/13/18 4:39 PM, Jakub Jelinek wrote:

On Thu, Dec 13, 2018 at 02:05:29PM -0700, Martin Sebor wrote:

So how is the builtin defined then?  Is the argument always an expression
and you only return whether its type has the attribute, or do something
different if the expression is of certain kind?


For a DECL the built-in looks at both the DECL_ATTRIBUTES and
the DECL's type's TYPE_ATTRIBUTES.  For expressions, it just
looks at TYPE_ATTRIBUTES of the expression's type.  I'm not
sure that this is necessarily the cleanest design but it's
meant to follow how attributes are applied to DECLs.  Sometimes
they get applied to the DECL itself and other times to its type.
This has been causing confusion to both users and GCC devs such
as myself so at some point I'd like to make this clearer somehow.
Not sure exactly how yet.


But some users of the attribute look for the attribute on TYPE_ATTRIBUTES,
other on DECL_ATTRIBUTES, and the attributes have bools which say to what
they apply.  So, I think the API need to be very clear whether the attribute
is on a decl or on a type.

If it is similar to say sizeof/__alignof__ etc., where it accepts either
a type (for which one can use __typeof/decltype etc.), in which case it
would check the type attributes, or an expression and in that case require
it to be a decl and use decl attributes, one API is fine, but supporting
arbitrary expressions and sometimes looking at type, other times at decl
or both is not a good design.


It's as good as the design of GCC attributes allows.  Based on
the manual a function declaration like

  __attribute__ ((alloc_size (1), malloc))
  void* allocate (unsigned);

should have those two attributes applied to it.  Yet, alloc_size
is actually applied to its type (but not to its decl) while malloc
to the function's decl but not its type (bug 88397).

I'm pretty sure most users still expect the following to pass:

  _Static_assert (__builtin_has_attribute (allocate, alloc_size));
  _Static_assert (__builtin_has_attribute (allocate, malloc));

It wouldn't if the built-in differentiated between decls and types.

Even if/when we make these function attributes (and others like it)
apply to types as I think we should so it's possible to apply them
to function pointers as users expect, there still will be others
that will apply only to the function decls.  The distinction
between when GCC chooses to apply an attribute to a function decl
vs to its type is not intuitive and cannot very well be, and neither
would be an API that queried just a decl attribute or just a type
attribute.  The most common use case is to query whether a function
has been declared with an attribute -- it doesn't matter whether
it's on its unique type or on its decl.  (The others might perhaps
be useful too, but a lot less often.)

Martin

[ping] Fix ICE in pp_cxx_unqualified_id (PR c++/88348)

2018-12-13 Thread Zhouyi Zhou

Hello,

Ping for https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00151.html

By the way, are there any GCC open project for volunteer beginners to

do in spare time.

Thanks in advance for your feedback,

Zhouyi

Re: [PATCH] [PR87012] canonicalize ref type for tmpl arg

2018-12-13 Thread Alexandre Oliva

On Dec  5, 2018, Jason Merrill  wrote:

> I would expect that this same issue would come up with other types; I
> think we want to fix this sooner, when we are figuring out what type
> we want to convert the argument to.

You mean like this?

[PR87012] canonicalize ref type for tmpl arg

When binding an object to a template parameter of reference type, we
take the address of the object and dereference that address.  The type
of the address may still carry (template) typedefs, but
verify_unstripped_args_1 rejects such typedefs other than in the top
level of template arguments.

Canonicalizing the type we want to convert to right after any
substitutions or deductions avoids that issue.

Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?


for  gcc/cp/ChangeLog

PR c++/87012
* pt.c (convert_template_argument): Canonicalize type after
tsubst/deduce.

for  gcc/testsuite/ChangeLog

PR c++/87012
* g++.dg/cpp0x/pr87012.C: New.
---
 gcc/cp/pt.c  |2 ++
 gcc/testsuite/g++.dg/cpp0x/pr87012.C |   11 +++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr87012.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 72ae7173d92c..0d388c67459a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8018,6 +8018,8 @@ convert_template_argument (tree parm,
   if (invalid_nontype_parm_type_p (t, complain))
return error_mark_node;
 
+  t = canonicalize_type_argument (t, complain);
+
   if (!type_dependent_expression_p (orig_arg)
  && !uses_template_parms (t))
/* We used to call digest_init here.  However, digest_init
diff --git a/gcc/testsuite/g++.dg/cpp0x/pr87012.C 
b/gcc/testsuite/g++.dg/cpp0x/pr87012.C
new file mode 100644
index ..fd3eea47c390
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/pr87012.C
@@ -0,0 +1,11 @@
+// { dg-do compile { target c++11 } }
+
+template
+using ref = T&;
+
+int x;
+
+template class T, T>
+struct X { };
+
+struct Y : X { };



-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe

Re: [PATCH] [PR86823] retain deferred access checks from outside firewall

2018-12-13 Thread Alexandre Oliva

On Dec  6, 2018, Alexandre Oliva  wrote:

> I'm giving your proposed patch a full round of testing along with other
> patches.

[PR86823] retain deferred access checks from outside firewall

We used to preserve deferred access check along with resolved template
ids, but a tentative parsing firewall introduced additional layers of
deferred access checks, so that we don't preserve the checks we
want to any more.

This patch moves the deferred access checks from outside the firewall
into it.

Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?


From: Jason Merrill 
for  gcc/cp/ChangeLog

PR c++/86823
* parser.c (cp_parser_template_id): Rearrange deferred access
checks into the firewall.

From: Alexandre Oliva 
for  gcc/testsuite/ChangeLog

PR c++/86823
* g++.dg/pr86823.C: New.
---
 gcc/cp/parser.c|   10 ++
 gcc/testsuite/g++.dg/pr86823.C |   15 +++
 2 files changed, 21 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr86823.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index adfe09e494dc..0bf0e309a588 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -16182,16 +16182,18 @@ cp_parser_template_id (cp_parser *parser,
   is_declaration,
   tag_type,
   _identifier);
+
+  /* Push any access checks inside the firewall we're about to create.  */
+  vec *checks = get_deferred_access_checks ();
+  pop_deferring_access_checks ();
   if (templ == error_mark_node || is_identifier)
-{
-  pop_deferring_access_checks ();
-  return templ;
-}
+return templ;
 
   /* Since we're going to preserve any side-effects from this parse, set up a
  firewall to protect our callers from cp_parser_commit_to_tentative_parse
  in the template arguments.  */
   tentative_firewall firewall (parser);
+  reopen_deferring_access_checks (checks);
 
   /* If we find the sequence `[:' after a template-name, it's probably
  a digraph-typo for `< ::'. Substitute the tokens and check if we can
diff --git a/gcc/testsuite/g++.dg/pr86823.C b/gcc/testsuite/g++.dg/pr86823.C
new file mode 100644
index ..18914b00aa8d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr86823.C
@@ -0,0 +1,15 @@
+// { dg-do compile }
+
+struct X {
+private:
+  template
+  struct Y {
+int data;
+  };
+public:
+  int value;
+};
+
+int main() {
+  typename X::Y a; // { dg-error "private" }
+}


-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe

Re: [doc,committed] clarify docs for function attribute "const"

2018-12-13 Thread Sandra Loosemore


On 12/13/18 3:52 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00426.html

(I have now committed the @code{const} cleanup mentioned below.)


This patch is OK except for one nit.


 @cindex pointer arguments
 Note that a function that has pointer arguments and examines the data
-pointed to must @emph{not} be declared @code{const}.  Likewise, a
-function that calls a non-@code{const} function usually must not be
-@code{const}.  Because a @code{const} function cannot have any side
-effects it does not make sense for such a function to return @code{void}.
-Declaring such a function is diagnosed.
+pointed to must @emph{not} be declared @code{const} if the pointed-to
+data might change between successive invocations of the function.  In
+general, since a function cannot distinguish data that might change
+from data that cannot, const functions should never be
+pass-by-reference (take pointer or, in C++, reference arguments).


I'd really rather avoid "pass-by-reference" since it adds nothing with 
the clarification about what it means there, too.  How about just:


...const functions should never take pointer or, in C++, reference 
arguments.


OK to commit with that fixed.

-Sandra

Re: [PATCH] Added information about inline assembler in stack calculations (.su files)

2018-12-13 Thread Jeff Law

On 12/7/18 4:35 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Dec 07, 2018 at 07:51:35AM +, Niklas DAHLQUIST wrote:
>> On 12/1/18 1:15 AM, Jeff Law wrote:
>>> One could argue that allocating stack space inside an ASM is a really
>>> bad idea.  Consider things like dwarf debugging and unwind tables.  If
>>> you're allocating stack inside an ASM that stuff is going to be totally
>>> wrong.
>>>
>>> So I think my question before moving forward with something like this is
>>> whether or not it makes sense at all to bother dumping data for a
>>> scenario that we'd probably suggest developers avoid to begin with.
>>
>> The purpose of the patch is to notify when the reported stack usage might be
>> incorrect. Even if it's bad practice to alter stack in asm, there are 
>> use cases
>> in the embedded world that makes sense. A notable common use case is 
>> FreeRTOS
>> task switch using ARM "naked" attribute and inline asm, which reports "0 
>> static",
>> which gives a faulty stack usage. We have considered the other option to
>> report a warning for these cases, but that alternative hasn't appealed 
>> to us.
> 
> Would that work well?  Only warn for naked functions?  It would work
> better for all users that do *not* mess with the stack in their asm ;-)
What I'm questioning is whether or not this is at all useful.  ie, if
I've written a something like task switching in C+asms, then I would
fully expect any  data related to stack usage in that function to be
totally bogus.  Telling me it's bogus in the assembly output really
isn't of value.  It's telling me something I should clearly already know.

And in the common case of an asm that doesn't mess with the stack at
all, the stack usage info is valid and warning me that it may not be is
just a huge annoyance.

jeff

[PATCH] PR fortran 88116,88467 -- Catch array constructor errors

2018-12-13 Thread Steve Kargl

The attached patch has been regression tested on
x86_64-*-freebsd.  It fixes to PR.  Both PR are related
to errors in array constructors.  PR fortran/88116 was not 
reporting an error on type mismatches with nested array
constructors.  PR fortran/88467 was silently accepting
a CHARACTER entity within nested array constructors.

OK to commit?

2018-12-13  Steven G. Kargl  

PR fortran/88116
PR fortran/88467
* array.c (gfc_match_array_constructor): Check return value of
gfc_convert_type().  Skip constructor elements with BT_UNKNOWN,
which need to go through resolution.
* intrinsic.c (gfc_convert_type_warn): Return early if the types
martch (i.e., no conversion is required).
* simplify.c (gfc_convert_constant): Remove a gfc_internal_error,
and return gfc_bad_expr.

2018-12-13  Steven G. Kargl  

PR fortran/88116
* gfortran.dg/pr88116_1.f90: New test.
* gfortran.dg/pr88116_2.f90: Ditto.

PR fortran/88467
* gfortran.dg/pr88467.f90: New test.

-- 
Steve
Index: gcc/fortran/array.c
===
--- gcc/fortran/array.c	(revision 267063)
+++ gcc/fortran/array.c	(working copy)
@@ -1246,7 +1246,9 @@ done:
 	{
 	  c = gfc_constructor_first (head);
 	  for (; c; c = gfc_constructor_next (c))
-	gfc_convert_type (c->expr, , 1);
+	if (!gfc_convert_type (c->expr, , 1)
+		&& c->expr->ts.type != BT_UNKNOWN)
+	  return MATCH_ERROR;
 	}
 }
   else
Index: gcc/fortran/intrinsic.c
===
--- gcc/fortran/intrinsic.c	(revision 267063)
+++ gcc/fortran/intrinsic.c	(working copy)
@@ -5030,6 +5030,13 @@ gfc_convert_type_warn (gfc_expr *expr, gfc_typespec *t
   if (expr->ts.type == BT_UNKNOWN)
 goto bad;
 
+  /* In building an array constructor, gfortran can end up here when no
+ conversion is required for an intrinsic type.  We need to let derive
+ types drop through.  */
+  if (from_ts.type != BT_DERIVED
+  && (from_ts.type == ts->type && from_ts.kind == ts->kind))
+return true;
+
   if (expr->ts.type == BT_DERIVED && ts->type == BT_DERIVED
   && gfc_compare_types (>ts, ts))
 return true;
Index: gcc/fortran/simplify.c
===
--- gcc/fortran/simplify.c	(revision 267063)
+++ gcc/fortran/simplify.c	(working copy)
@@ -8360,7 +8360,7 @@ gfc_convert_constant (gfc_expr *e, bt type, int kind)
 
 default:
 oops:
-  gfc_internal_error ("gfc_convert_constant(): Unexpected type");
+  return _bad_expr;
 }
 
   result = NULL;
Index: gcc/testsuite/gfortran.dg/pr88116_1.f90
===
--- gcc/testsuite/gfortran.dg/pr88116_1.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr88116_1.f90	(working copy)
@@ -0,0 +1,4 @@
+! { dg-do compile }
+program p
+   print *, [integer :: 1, [integer(8) :: 2, ['3']]] ! { dg-error "Can't convert" }
+end
Index: gcc/testsuite/gfortran.dg/pr88116_2.f90
===
--- gcc/testsuite/gfortran.dg/pr88116_2.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr88116_2.f90	(working copy)
@@ -0,0 +1,7 @@
+! { dg-do run }
+program p
+  real :: a(2) = [real :: 1, [integer :: (real(k), k=2,1), 2]]
+  real :: b(1) = [real :: [integer :: (dble(k), k=1,0), 2]]
+  if (a(1) /= 1. .or. a(2) /= 2. .or. b(1) /= 2.) stop 1
+end
+
Index: gcc/testsuite/gfortran.dg/pr88467.f90
===
--- gcc/testsuite/gfortran.dg/pr88467.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr88467.f90	(working copy)
@@ -0,0 +1,4 @@
+! { dg-do compile }
+program foo
+   print *, [integer :: 1, [integer(8) :: 2, '3']] ! { dg-error "Can\'t convert" }
+end program foo

Re: [PATCH] handle function pointers in __builtin_object_size (PR 88372)

2018-12-13 Thread Jeff Law

On 12/6/18 4:01 PM, Martin Sebor wrote:
> On 12/6/18 2:26 PM, Jakub Jelinek wrote:
>> On Thu, Dec 06, 2018 at 01:21:58PM -0700, Martin Sebor wrote:
>>> Bug 88372 - alloc_size attribute is ignored on function pointers
>>> points out that even though the alloc_size attribute is accepted
>>> on function pointers it doesn't have any effect on Object Size
>>> Checking.  The reporter, who is implementing the feature in Clang,
>>> wants to know if by exposing it under the same name they won't be
>>> causing incompatibilities with GCC.
>>>
>>> I don't think it's intentional that GCC doesn't take advantage of
>>> the attribute for Object Size Checking, and certainly not to detect
>>> the same kinds of issues as with other allocation functions (such
>>> as excessive or negative size arguments).  Rather, it's almost
>>> certainly an oversight since GCC does make use of function pointer
>>> attributes in other contexts (e.g., attributes alloc_align and
>>> noreturn).
>>>
>>> As an oversight, I think it's fair to consider it a bug rather
>>> than a request for an enhancement.  Since not handling
>>> the attribute in Object Size Checking has adverse security
>>> implications, I also think this bug should be addressed in GCC
>>> 9.  With that, I submit the attached patch to resolve both
>>> aspects of the problem.
>>
>> This is because alloc_object_size has been written before we had attributes
>>
>> like alloc_size.  The only thing I'm unsure about is whether we should
>> prefer gimple_call_fntype or TREE_TYPE (gimple_call_fndecl ()) if it is a
>> direct call or if we should try to look for alloc_size attribute on both
>> of those if they are different types.  E.g. if somebody does
>>
>> #include 
>>
>> typedef void *(*allocfn) (size_t);
>>
>> static inline void *
>> foo (allocfn fn, size_t sz)
>> {
>>    return fn (sz);
>> }
>>
>> static inline void *
>> bar (size_t sz)
>> {
>>    return foo (malloc, sz);
>> }
>>
>> then I think this patch would no longer treat it as malloc.
>>
>> As this is security relevant, I'd probably look for alloc_size
>> attribute in both gimple_call_fntype and, if gimple_call_fndecl is non-NULL,
>>
>> its TREE_TYPE.
> 
> Thanks for the test case!  I wondered if using fntype would
> always work but couldn't think of when it wouldn't.  I've
> adjusted the function to use both and added the test case.
> 
> While thinking about this it occurred to me that alloc_size
> is only documented as a function attribute but not one that
> applies to pointers or types.  I added documentation for
> these uses to the Common Type and Common Variable sections.
> 
> Martin
> 
> PS Other function attributes that also apply to types and
> variables are only documented in the function section.  They
> should also be mentioned in the other sections.  Which, if
> done in the established style, will result in duplicating
> a lot of text in three places.  I think that suggests that
> we might want to think about structuring these sections of
> the manual differently to avoid the duplication.
> 
> gcc-88372.diff
> 
> PR tree-optimization/88372 - alloc_size attribute is ignored on function 
> pointers
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/88372
>   * calls.c (maybe_warn_alloc_args_overflow): Handle function pointers.
>   * tree-object-size.c (alloc_object_size): Same.  Simplify.
>   * doc/extend.texi (Object Size Checking): Update.
>   (Other Builtins): Add __builtin_object_size.
>   (Common Type Attributes): Add alloc_size.
>   (Common Variable Attributes): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/88372
>   * gcc.dg/Walloc-size-larger-than-18.c: New test.
>   * gcc.dg/builtin-object-size-19.c: Same.
OK
jeff

Re: [ping] Change static chain to r11 on aarch64

2018-12-13 Thread Wilco Dijkstra

Hi Martin,

> One could also argue that it creates a false sense of security
> and diverts resources from properly fixing the real problems
> i.e. the buffer overflows which lets an attacker write to the
> stack in the first place. A program without buffer overflows
> is secure even without an executable stack and a program with
> buffer overflows is still insecure even with a non-executable
> stack.

Avoiding the root cause of the problem is practically impossible, even the
so called "safe" languages didn't turn out to be as safe as claimed after all...

So anything that is simple and free is absolutely worth doing, and far more
useful than mitigations that just slow things down but don't really improve
security. The _chk variants of various string functions are a total waste of
time for example and give a false sense of security. And yet people waste
a large amount of effort optimizing them in assembler code, even creating
ifuncs for them etc...

> I am a bit surprised that the static chain register is not
> always already a fixed part of the ABI.

Why? If every language or extension needed several special ABI registers you'd
run out of registers pretty quickly and the ABI would be too complex to
implement. The this-pointer in C++ doesn't need a special ABI register - 
likewise
a static chain is just an extra argument.

>> If we didn't want to expose the static chain register as an ABI
>> with -fno-trampolines, we could use a helper function which could
>> be made backwards compatible even if one changes the static chain
>> register (it just needs to set all of them!).
>
> Yes, this is a possibility. But I think it could simply be
> fixed as part of the ABI.

Yes we could just pick a register now since this feature won't be used much.

Wilco

Re: [PATCH] Testcase for PR 88297 and minor fixes

2018-12-13 Thread Jeff Law

On 12/6/18 1:13 PM, Michael Ploujnikov wrote:
> Thanks to Martin we now have a test that exercises (cp) cloning
> machinery during the WPA stage of LTO.
> 
> Also, during debugging I found that print_all_lattices would trigger
> an assert if I tried to call it inside decide_whether_version_node.
> 
> Finally I've attached some comment spelling fixes as a bonus.
> 
> 
> Bootstrapping (--with-build-config=bootstrap-lto) and regression testing on 
> x86_64.
> 
> Ok for trunk after tests pass?
Thanks.  I went ahead and installed all 3 patches on the trunk.

jeff

Re: [PATCH] Fix VRP with -fno-delete-null-pointer-checks (PR c/88367)

2018-12-13 Thread Jeff Law

On 12/6/18 1:32 PM, Jakub Jelinek wrote:
>>  For -fno-delete-null-pointer-checks ISTM
>> we should indicate "we don't know anything about the result" of such an
>> operation.
> 
> There are cases where we still know something.  The largest valid object
> that can be supported is half of the address space, so without pointer
> wrapping, positive additions to the pointer shouldn't wrap around and yield
> NULL, negative ones can.  With -fwrapv-pointers anything can happen, sure,
> the only case handled in that case is &[ptr + 0] if ptr is ~[0, 0] then
> &[ptr + 0] is also ~[0, 0].
Yea.  I just didn't figure those were worth worrying about.
jeff

V5 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-13 Thread H.J. Lu

On Thu, Dec 13, 2018 at 12:50 PM Jason Merrill  wrote:
>
> On 9/25/18 11:46 AM, H.J. Lu wrote:
> > On Fri, Aug 31, 2018 at 2:04 PM, Jason Merrill  wrote:
> >> On 07/23/2018 05:24 PM, H.J. Lu wrote:
> >>>
> >>> On Mon, Jun 18, 2018 at 12:26 PM, Joseph Myers 
> >>> wrote:
> 
>  On Mon, 18 Jun 2018, Jason Merrill wrote:
> 
> > On Mon, Jun 18, 2018 at 11:59 AM, Joseph Myers 
> > wrote:
> >>
> >> On Mon, 18 Jun 2018, Jason Merrill wrote:
> >>
>  +  if (TREE_CODE (rhs) == COND_EXPR)
>  +{
>  +  /* Check the THEN path first.  */
>  +  tree op1 = TREE_OPERAND (rhs, 1);
>  +  context = check_address_of_packed_member (type, op1);
> >>>
> >>>
> >>> This should handle the GNU extension of re-using operand 0 if operand
> >>> 1 is omitted.
> >>
> >>
> >> Doesn't that just use a SAVE_EXPR?
> >
> >
> > Hmm, I suppose it does, but many places in the compiler seem to expect
> > that it produces a COND_EXPR with TREE_OPERAND 1 as NULL_TREE.
> 
> 
>  Maybe that's used somewhere inside the C++ front end.  For C a SAVE_EXPR
>  is produced directly.
> >>>
> >>>
> >>> Here is the updated patch.  Changes from the last one:
> >>>
> >>> 1. Handle COMPOUND_EXPR.
> >>> 2. Fixed typos in comments.
> >>> 3. Combined warn_for_pointer_of_packed_member and
> >>> warn_for_address_of_packed_member into
> >>> warn_for_address_or_pointer_of_packed_member.
> >>
> >>
> >>> c.i:4:33: warning: converting a packed ‘struct C *’ pointer increases the
> >>> alignment of ‘long int *’ pointer from 1 to 8 [-Waddress-of-packed-member]
> >>
> >>
> >> I think this would read better as
> >>
> >> c.i:4:33: warning: converting a packed ‘struct C *’ pointer (alignment 1) 
> >> to
> >> ‘long int *’ (alignment 8) may result in an unaligned pointer value
> >> [-Waddress-of-packed-member]
> >
> > Fixed.
> >
> >>> +  while (TREE_CODE (base) == ARRAY_REF)
> >>> +   base = TREE_OPERAND (base, 0);
> >>> +  if (TREE_CODE (base) != COMPONENT_REF)
> >>> +   return NULL_TREE;
> >>
> >>
> >> Are you deliberately not handling the other handled_component_p cases? If
> >> so, there should be a comment.
> >
> > I changed it to
> >
> >   while (handled_component_p (base))
> >  {
> >enum tree_code code = TREE_CODE (base);
> >if (code == COMPONENT_REF)
> >  break;
> >switch (code)
> >  {
> >  case ARRAY_REF:
> >base = TREE_OPERAND (base, 0);
> >break;
> >  default:
> >/* FIXME: Can it ever happen?  */
> >gcc_unreachable ();
> >break;
> >  }
> >  }
> >
> > Is there a testcase to trigger this ICE? I couldn't find one.
>
> You can take the address of an element of complex:
>
>__complex int i;
>int *p = &__real(i);
>
> You may get VIEW_CONVERT_EXPR with location wrappers.

Fixed.  I replaced gcc_unreachable with return NULL_TREE;

> >>> +  /* Check alignment of the object.  */
> >>> +  if (TREE_CODE (object) == COMPONENT_REF)
> >>> +{
> >>> +  field = TREE_OPERAND (object, 1);
> >>> +  if (TREE_CODE (field) == FIELD_DECL && DECL_PACKED (field))
> >>> +   {
> >>> + type_align = TYPE_ALIGN (type);
> >>> + context = DECL_CONTEXT (field);
> >>> + record_align = TYPE_ALIGN (context);
> >>> + if ((record_align % type_align) != 0)
> >>> +   return context;
> >>> +   }
> >>> +}
> >>
> >>
> >> Why doesn't this recurse?  What if you have a packed field three
> >> COMPONENT_REFs down?
> >
> > My patch works on
> > [hjl@gnu-cfl-1 pr51628-4]$ cat x.i
> > struct A { int i; } __attribute__ ((packed));
> > struct B { struct A a; };
> > struct C { struct B b; };
> >
> > extern struct C *p;
> >
> > int* g8 (void) { return >b.a.i; }
> > [hjl@gnu-cfl-1 pr51628-4]$ make x.s
> > /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
> > -B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2
> > -S x.i
> > x.i: In function ‘g8’:
> > x.i:7:25: warning: taking address of packed member of ‘struct A’ may
> > result in an unaligned pointer value [-Waddress-of-packed-member]
> > 7 | int* g8 (void) { return >b.a.i; }
> >| ^
> > [hjl@gnu-cfl-1 pr51628-4]$
> >
> > If it isn't what you had in mind, can you give me a testcase?
>
> In that testcase, 'i' is the top COMPONENT_EXPR.  What I was talking
> about would be more like
>
>   struct A { int i; };
>   struct B { struct A a; };
>   struct C { struct B b __attribute__ ((packed)); };
>
>   extern struct C *p;
>
>   int* g8 (void) { return >b.a.i; }
>

Fixed with a recursive call:

  tree context = check_alignment_of_packed_member (type, field);
  if (context)
return context;

  /* Check alignment of the object.  */
  while (TREE_CODE (object) == COMPONENT_REF)
{

Re: [PATCH 01/10] Fix LRA bug

2018-12-13 Thread Jeff Law

On 12/12/18 4:52 AM, Andrew Stubbs wrote:
> 
> [This is new patch not included in the previously posted patch sets.]
> 
> This patch fixes an ICE building libgfortran/random.c.
> 
> The problem was an adddi3 instruction that had an eliminable frame pointer.
> GCN adddi3 includes a match_scratch, which LRA substitutes with a REG, and
> checks if it can be converted back to a scratch afterwards.  In the meantime,
> the add was converted to a move, meaning that the instruction pattern
> completely changed, thus causing a segfault when the instruction is revisited
> in restore_scratches.
I'm torn here.  There's undocumented rules about this kind of stuff,
largely inherited from the days of "reload".  But I don't actually
recall the specifics of any of those rules.

I don't think there's any inherently wrong with what you've done in LRA.
 My worry is there's other places where this kind of changing the
structure of the underlying insn is going to cause you problems if your
patterns aren't following those undocumented rules.




> 
> 2018-12-12  Andrew Stubbs  
> 
>   gcc/
>   * gcc/lra-int.h (lra_register_new_scratch_op): Add third parameter.
>   * gcc/lra-remat.c (update_scratch_ops): Pass icode to
>   lra_register_new_scratch_op.
>   * gcc/lra.c (struct sloc): Add icode field.
>   (lra_register_new_scratch_op): Add icode parameter.
>   Use icode to skip insns that have changed beyond recognition.
OK.  But be aware we may have to revisit and look more closely what what
you're doing in your port if we stumble over more problems with reload
changing the structure of your insns and causing problems in the process.

jeff

Re: [PATCH] handle expressions in __builtin_has_attribute (PR 88383)

2018-12-13 Thread Jakub Jelinek

On Thu, Dec 13, 2018 at 02:05:29PM -0700, Martin Sebor wrote:
> > So how is the builtin defined then?  Is the argument always an expression
> > and you only return whether its type has the attribute, or do something
> > different if the expression is of certain kind?
> 
> For a DECL the built-in looks at both the DECL_ATTRIBUTES and
> the DECL's type's TYPE_ATTRIBUTES.  For expressions, it just
> looks at TYPE_ATTRIBUTES of the expression's type.  I'm not
> sure that this is necessarily the cleanest design but it's
> meant to follow how attributes are applied to DECLs.  Sometimes
> they get applied to the DECL itself and other times to its type.
> This has been causing confusion to both users and GCC devs such
> as myself so at some point I'd like to make this clearer somehow.
> Not sure exactly how yet.

But some users of the attribute look for the attribute on TYPE_ATTRIBUTES,
other on DECL_ATTRIBUTES, and the attributes have bools which say to what
they apply.  So, I think the API need to be very clear whether the attribute
is on a decl or on a type.

If it is similar to say sizeof/__alignof__ etc., where it accepts either
a type (for which one can use __typeof/decltype etc.), in which case it
would check the type attributes, or an expression and in that case require
it to be a decl and use decl attributes, one API is fine, but supporting
arbitrary expressions and sometimes looking at type, other times at decl
or both is not a good design.

Jakub

Re: [PATCH/wwwdocs] add uses of @code to coding conventions

2018-12-13 Thread Jeff Law

On 12/13/18 4:34 PM, Martin Sebor wrote:
> Attached is the update I propose to add to the Spelling,
> terminology and markup section of the Coding conventions, to
> give guidance for when to use the @code{} Texinfo command.
>  I believe this reflects existing (if somewhat inconsistent)
> practice.  It's not intended to add a new requirement.
> 
OK.  And yes, it's just documenting what's supposed to be standard
practice rather than adding new requirements.


Jeff

Re: [PATCH] Fix ICE due to cross-jumping (PR rtl-optimization/88470)

2018-12-13 Thread Jeff Law

On 12/13/18 4:32 PM, Jakub Jelinek wrote:
> On Thu, Dec 13, 2018 at 04:28:10PM -0700, Jeff Law wrote:
>>> 2018-12-13  Jakub Jelinek  
>>>
>>> PR rtl-optimization/88470
>>> * cfgcleanup.c (outgoing_edges_match): If the function is
>>> shrink-wrapped and bb1 ends with a JUMP_INSN with a single fake
>>> edge to EXIT, return false.
>>>
>>> * gcc.target/i386/pr88470.c: New test.
>> OK.  Are you planning to check the other ICEs in
>> maybe_record_trace_start that are in BZ to see if any are addressed by
>> this change?
> 
> I will, but it is highly unlikely those are the same issues.  Computed
> gotos/non-local jumps are fairly infrequent.  At least one of those is known
> to be a sel-sched bug.
Thanks.  You're probably right, but doesn't hurt to confirm.
jeff

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-13 Thread Jeff Law

On 12/12/18 11:12 AM, Uecker, Martin wrote:
> 
> Hi Jeff,
> 
> thank you. I fixed all the minor issues, but see below.
> 
> 
> Am Montag, den 03.12.2018, 14:56 -0700 schrieb Jeff Law:
>> On 11/4/18 1:48 PM, Uecker, Martin wrote:
>>> Hi Joseph,
>>>
>>> here is a new version of this patch which adds a warning
>>> for targets which do not support -fno-trampolines  and
>>> only runs the test case on architectures where this is
>>> supported. It seems that documentation for this general
>>> feature has improved in the meantime so I only mention
>>> C as supported.
>>>
>>>
>>> Best,
>>> Martin
>>>
>>> diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
>>> index ce2d43f989e..b79f2373c63 100644
>>> --- a/gcc/ada/gcc-interface/trans.c
>>> +++ b/gcc/ada/gcc-interface/trans.c
>>> @@ -1753,7 +1753,8 @@ Attribute_to_gnu (Node_Id gnat_node, tree 
>>> *gnu_result_type_p, int
>>> attribute)
>>>       if ((attribute == Attr_Access
>>>        || attribute == Attr_Unrestricted_Access)
>>>       && targetm.calls.custom_function_descriptors > 0
>>> -     && Can_Use_Internal_Rep (Etype (gnat_node)))
>>> +     && Can_Use_Internal_Rep (Etype (gnat_node))
>>> +  && (flag_trampolines != 1))
>>>     FUNC_ADDR_BY_DESCRIPTOR (gnu_expr) = 1;
>>
>> You've got an extraneous set of parenthesis around your flag_trampolines
>> check.  Please remove them.
>>
>>
>>>  
>>>       /* Otherwise, we need to check that we are not violating the
>>> @@ -4330,7 +4331,8 @@ Call_to_gnu (Node_Id gnat_node, tree 
>>> *gnu_result_type_p, tree gnu_target,
>>>    /* If the access type doesn't require foreign-compatible 
>>> representation,
>>>      be prepared for descriptors.  */
>>>    if (targetm.calls.custom_function_descriptors > 0
>>> -     && Can_Use_Internal_Rep (Etype (Prefix (Name (gnat_node)
>>> +     && Can_Use_Internal_Rep (Etype (Prefix (Name (gnat_node
>>> +  && (flag_trampolines != 1))
>>
>> Similarly here.
>>
>>
>>> diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
>>> index 78e768c2366..ef039560eb9 100644
>>> --- a/gcc/c/c-objc-common.h
>>> +++ b/gcc/c/c-objc-common.h
>>> @@ -110,4 +110,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  
>>>  #undef LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P
>>>  #define LANG_HOOKS_TREE_INLINING_VAR_MOD_TYPE_P c_vla_unspec_p
>>> +
>>> +#undef LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS
>>> +#define LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS true
>>>  #endif /* GCC_C_OBJC_COMMON */
>>
>> I wonder if we even need the lang hook anymore.  ISTM that a front-end
>> that wants to use the function descriptors can just set
>> FUNC_ADDR_BY_DESCRIPTOR and we'd use the function descriptor, else we'll
>> use the trampoline.  Thoughts?
> 
> The lang hook also affects the minimum alignment for function
> pointers via the FUNCTION_ALIGNMENT macro (gcc/default.h). This does
> not appear to change the default alignment on any architecture, but
> it causes a failure in i386/gcc.target/i386/attr-aligned.c when
> requesting a smaller alignment which is then silently ignored.
Ugh.  I didn't see that.

> 
> I am not sure what the best approach is, but my preference
> would be to remove the lang hook and the FUNCTION_ALIGNMENT
> logic which will also fix the test case (the requested
> alignment will be applied).
> 
> I would then instead add a warning (or error?) which triggers
> only with -fno-trampolines if the user requests an alignment
> which is too small for this mechanism to work.
> 
> Does this sound reasonable?
So I'm thinking we should wrap the existing patch as-is for the trunk
(we're well into stage3 after all).  So leave the hook as-is for gcc-9.

We can then tackle removal of the hook, including twiddling
FUNCTION_ALIGNMENT for gcc-10.

Does that sound reasonable to you?

jeff

[PATCH/wwwdocs] add uses of @code to coding conventions

2018-12-13 Thread Martin Sebor


Attached is the update I propose to add to the Spelling,
terminology and markup section of the Coding conventions, to
give guidance for when to use the @code{} Texinfo command.
 I believe this reflects existing (if somewhat inconsistent)
practice.  It's not intended to add a new requirement.

Martin
Index: htdocs/codingconventions.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/codingconventions.html,v
retrieving revision 1.90
diff -u -r1.90 codingconventions.html
--- htdocs/codingconventions.html	1 Dec 2018 17:19:36 -	1.90
+++ htdocs/codingconventions.html	13 Dec 2018 23:34:00 -
@@ -569,6 +569,17 @@
   required if the capital letter is within @code or
   @samp.)
 
+  Use @code for an expression in a program, for the name
+  of a variable or function used in a program, or for a keyword in
+  a programming language.  However, avoid @code in uses
+  of language keywords as adjectives.  For example, appropriate uses
+  of @code are in phrases such as
+  "@code{const}-qualified type", or
+  "@code{asm} statement", or
+  "function returns @code{true}".
+  Examples where @code should be avoided are phrases such as
+  "const variable", "volatile access", or
+  "condition is false."

Re: [PATCH] Fix ICE due to cross-jumping (PR rtl-optimization/88470)

2018-12-13 Thread Jakub Jelinek

On Thu, Dec 13, 2018 at 04:28:10PM -0700, Jeff Law wrote:
> > 2018-12-13  Jakub Jelinek  
> > 
> > PR rtl-optimization/88470
> > * cfgcleanup.c (outgoing_edges_match): If the function is
> > shrink-wrapped and bb1 ends with a JUMP_INSN with a single fake
> > edge to EXIT, return false.
> > 
> > * gcc.target/i386/pr88470.c: New test.
> OK.  Are you planning to check the other ICEs in
> maybe_record_trace_start that are in BZ to see if any are addressed by
> this change?

I will, but it is highly unlikely those are the same issues.  Computed
gotos/non-local jumps are fairly infrequent.  At least one of those is known
to be a sel-sched bug.

Jakub

[C++ PATCH] Sanity check __cxa_* user declarations (PR c++/88482)

2018-12-13 Thread Jakub Jelinek

Hi!

If the user provides his own __cxa_* prototypes and does so incorrectly
(or even worse declares them as variables etc.), we can get various ICEs.

The following patch adds some sanity checking, mainly that they are actually
functions and with a compatible return type and argument type(s).
For __cxa_throw it gives some freedom for the second and third arguments,
but apparently not enough for libitm where it causes
+FAIL: libitm.c++/throwdown.C (test for excess errors)

This is because cxxabi.h has:
  void
  __cxa_throw(void*, std::type_info*, void (_GLIBCXX_CDTOR_CALLABI *) (void *))
  __attribute__((__noreturn__));
and the patch checks that the second argument is a pointer (any kind) and
third parameter is a function pointer, but libitm.h has:
extern void _ITM_cxa_throw (void *obj, void *tinfo, void *dest);
and eh_cpp.cc has:
extern void __cxa_throw (void *, void *, void *) WEAK;
void 
_ITM_cxa_throw (void *obj, void *tinfo, void *dest)
{
  // This used to be instrumented, but does not need to be anymore.
  __cxa_throw (obj, tinfo, dest);
}
Shall we fix libitm to use void (*dest) (void *) instead of void *dest,
or shall I make the verify_library_fn handle both i == 1 and i == 2
the same?

Bootstrapped/regtested on x86_64-linux and i686-linux (with the above
mentioned FAIL), ok for trunk (and with what solution for libitm)?

2018-12-13  Jakub Jelinek  

PR c++/88482
* except.c (verify_library_fn): New function.
(declare_library_fn): Use it.
(do_end_catch): Don't set TREE_NOTHROW on error_mark_node.
(expand_start_catch_block): Don't call initialize_handler_parm
for error_mark_node.
(build_throw): Use verify_library_fn.  Don't crash if
any library fn is error_mark_node.

* g++.dg/eh/builtin5.C: New test.
* g++.dg/eh/builtin6.C: New test.
* g++.dg/eh/builtin7.C: New test.
* g++.dg/eh/builtin8.C: New test.
* g++.dg/eh/builtin9.C: New test.
* g++.dg/eh/builtin10.C: New test.
* g++.dg/eh/builtin11.C: New test.
* g++.dg/parse/crash55.C: Adjust expected diagnostics.

--- gcc/cp/except.c.jj  2018-12-07 00:23:15.008998854 +0100
+++ gcc/cp/except.c 2018-12-13 20:05:23.053122023 +0100
@@ -132,6 +132,49 @@ build_exc_ptr (void)
   1, integer_zero_node);
 }
 
+/* Check that user declared function FN is a function and has return
+   type RTYPE and argument types ARG{1,2,3}TYPE.  */
+
+static bool
+verify_library_fn (tree fn, const char *name, tree rtype,
+  tree arg1type, tree arg2type, tree arg3type)
+{
+  if (TREE_CODE (fn) != FUNCTION_DECL
+  || TREE_CODE (TREE_TYPE (fn)) != FUNCTION_TYPE)
+{
+  bad:
+  error_at (DECL_SOURCE_LOCATION (fn), "%qs declared incorrectly", name);
+  return false;
+}
+  tree fntype = TREE_TYPE (fn);
+  if (!same_type_p (TREE_TYPE (fntype), rtype))
+goto bad;
+  tree targs = TYPE_ARG_TYPES (fntype);
+  tree args[3] = { arg1type, arg2type, arg3type };
+  for (int i = 0; i < 3 && args[i]; i++)
+{
+  if (targs == NULL_TREE)
+   goto bad;
+  if (!same_type_p (TREE_VALUE (targs), args[i]))
+   {
+ if (i == 0)
+   goto bad;
+ /* Be less strict for __cxa_throw last 2 arguments.  */
+ if (i == 1 && TREE_CODE (TREE_VALUE (targs)) != POINTER_TYPE)
+   goto bad;
+ if (i == 2
+ && (TREE_CODE (TREE_VALUE (targs)) != POINTER_TYPE
+ || (TREE_CODE (TREE_TYPE (TREE_VALUE (targs)))
+ != FUNCTION_TYPE)))
+   goto bad;
+   }
+  targs = TREE_CHAIN (targs);
+}
+  if (targs != void_list_node)
+goto bad;
+  return true;
+}
+
 /* Find or declare a function NAME, returning RTYPE, taking a single
parameter PTYPE, with an empty exception specification. ECF are the
library fn flags.  If TM_ECF is non-zero, also find or create a
@@ -161,9 +204,16 @@ declare_library_fn (const char *name, tr
  tree tm_fn = get_global_binding (tm_ident);
  if (!tm_fn)
tm_fn = push_library_fn (tm_ident, type, except, ecf | tm_ecf);
- record_tm_replacement (res, tm_fn);
+ else if (!verify_library_fn (tm_fn, tm_name, rtype, ptype,
+  NULL_TREE, NULL_TREE))
+   tm_fn = error_mark_node;
+ if (tm_fn != error_mark_node)
+   record_tm_replacement (res, tm_fn);
}
 }
+  else if (!verify_library_fn (res, name, rtype, ptype, NULL_TREE, NULL_TREE))
+return error_mark_node;
+
   return res;
 }
 
@@ -236,7 +286,8 @@ do_end_catch (tree type)
 
   tree cleanup = cp_build_function_call_vec (end_catch_fn,
 NULL, tf_warning_or_error);
-  TREE_NOTHROW (cleanup) = dtor_nothrow (type);
+  if (cleanup != error_mark_node)
+TREE_NOTHROW (cleanup) = dtor_nothrow (type);
 
   return cleanup;
 }
@@ -400,7 +451,8 @@ expand_start_catch_block (tree decl)
   &&

Re: [PATCH] Fix ICE due to cross-jumping (PR rtl-optimization/88470)

2018-12-13 Thread Jeff Law

On 12/13/18 3:53 PM, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase ICEs, because we have an indirect jump with
> a single (fake) successor edge to EXIT, one reachable from the body
> of the function after prologue and another one reachable from before the
> prologue (due to shrink-wrapping).
> 
> The patch fixes this by disallowing crossjumping of basic blocks ending in
> such indirect jumps with no (non-fake) successors.
> 
> That condition hits on the following testcases (never during bootstrap):
> gcc/testsuite/gcc.c-torture/compile/20050122-2.c
> gcc/testsuite/gcc.c-torture/execute/920428-2.c
> gcc/testsuite/gcc.c-torture/execute/pr24135.c
> gcc/testsuite/gcc.dg/pr49994-1.c
> gcc/testsuite/gcc.dg/pr79494.c
> gcc/testsuite/gcc.dg/torture/stackalign/nested-4.c
> gcc/testsuite/gcc.dg/torture/stackalign/non-local-goto-3.c
> gcc/testsuite/gcc.dg/torture/stackalign/non-local-goto-5.c
> gcc/testsuite/gcc.target/i386/pr88470.c
> libgomp/testsuite/libgomp.c/pr81687-2.c
> where in most of them it is a non-local jump ending the bb; in a few of
> those testcases there is a crossjumping possibility, but it seems it
> is actually already handled during PRE (e.g. on pr24135.c).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2018-12-13  Jakub Jelinek  
> 
>   PR rtl-optimization/88470
>   * cfgcleanup.c (outgoing_edges_match): If the function is
>   shrink-wrapped and bb1 ends with a JUMP_INSN with a single fake
>   edge to EXIT, return false.
> 
>   * gcc.target/i386/pr88470.c: New test.
OK.  Are you planning to check the other ICEs in
maybe_record_trace_start that are in BZ to see if any are addressed by
this change?

jeff

Re: [PATCH] Fix stmt folding in the inliner (PR tree-optimization/88444)

2018-12-13 Thread Jeff Law

On 12/13/18 3:59 PM, Jakub Jelinek wrote:
> Hi!
> 
> The inliner doesn't want to fold statements immediately, but records them
> and then fold_marked_statements is meant to fold them when inlining is done.
> 
> On the following testcase it doesn't fold some of them though.
> The problem is that it wants to scan only newly added basic blocks (i.e.
> those created during the inlining), but the way it is written only works if
> there are no gaps in the basic_block vector.  If there are, it can fold
> stmts only in some of the basic blocks or none of them.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
> 
> 2018-12-13  Jakub Jelinek  
> 
>   PR tree-optimization/88444
>   * tree-inline.c (fold_marked_statements): Iterate up to
>   last_basic_block_for_fn rather than n_basic_blocks_for_fn.
> 
>   * gcc.dg/tree-ssa/pr88444.c: New test.
OK
jeff

[PATCH] Improve gimplification of constructors with RANGE_EXPRs (PR c++/82294, PR c++/87436)

2018-12-13 Thread Jakub Jelinek

Hi!

With the previously posted patch to use RANGE_EXPRs in build_vec_init,
the following patch attempts to do what the reporters were asking for
- determine if it wouldn't be more efficient to use (perhaps nested) loops
to initialize at runtime some automatic variable over having huge .rodata
constants that are copied over into the automatic variables or used as the
variable itself.

E.g. on the PR87436 testcase, we have RANGE_EXPR x4096 nested in a
RANGE_EXPR x4096 with 24 byte innermost constructor.  Without this patch
we emit 384MB long .rodata constant initializer that is then copied over
in the constructor, with this patch we just have 2 nested loops.

The patch extends categorize_ctor_elements so that it gives next to the
number of all scalars and number of non-zero scalars also something that
allows us to guess how many RANGE_EXPRs are in there, i.e. how large the
runtime code initializing that ctor would be.  It keeps doing the old thing
for all initializers < 64 bytes, I think for those generally using runtime
loops wouldn't be beneficial.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

If the C FE starts using RANGE_EXPRs at some point e.g. for:
struct S { int a, b, c, d; };
void foo (const struct S *);

void
bar (void)
{
  const struct S s[] = { [0 ... 9] = { 0, 1, 2, 3 } };
  foo ([0]);
}
it could benefit from this patch too.

2018-12-13  Jakub Jelinek  

PR c++/82294
PR c++/87436
* expr.h (categorize_ctor_elements): Add p_unique_nz_elts argument.
* expr.c (categorize_ctor_elements_1): Likewise.  Compute it like
p_nz_elts, except don't multiply it by mult.  Adjust recursive call.
Fix up COMPLEX_CST handling.
(categorize_ctor_elements): Add p_unique_nz_elts argument, initialize
it and pass it through to categorize_ctor_elements_1.
(mostly_zeros_p, all_zeros_p): Adjust categorize_ctor_elements callers.
* gimplify.c (gimplify_init_constructor): Likewise.  Don't force
ctor into readonly data section if num_unique_nonzero_elements is
smaller or equal to 1/8 of num_nonzero_elements and size is >= 64
bytes.

* g++.dg/tree-ssa/pr82294.C: New test.
* g++.dg/tree-ssa/pr87436.C: New test.

--- gcc/expr.h.jj   2018-11-27 16:37:40.249419631 +0100
+++ gcc/expr.h  2018-12-13 16:39:24.839050215 +0100
@@ -309,7 +309,8 @@ extern bool can_move_by_pieces (unsigned
 extern unsigned HOST_WIDE_INT highest_pow2_factor (const_tree);
 
 extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *,
- HOST_WIDE_INT *, bool *);
+ HOST_WIDE_INT *, HOST_WIDE_INT *,
+ bool *);
 
 extern void expand_operands (tree, tree, rtx, rtx*, rtx*,
 enum expand_modifier);
--- gcc/expr.c.jj   2018-11-27 16:37:40.249419631 +0100
+++ gcc/expr.c  2018-12-13 17:13:47.083545126 +0100
@@ -5945,10 +5945,11 @@ count_type_elements (const_tree type, bo
 
 static bool
 categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
+   HOST_WIDE_INT *p_unique_nz_elts,
HOST_WIDE_INT *p_init_elts, bool *p_complete)
 {
   unsigned HOST_WIDE_INT idx;
-  HOST_WIDE_INT nz_elts, init_elts, num_fields;
+  HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields;
   tree value, purpose, elt_type;
 
   /* Whether CTOR is a valid constant initializer, in accordance with what
@@ -5958,6 +5959,7 @@ categorize_ctor_elements_1 (const_tree c
   bool const_p = const_from_elts_p ? true : TREE_STATIC (ctor);
 
   nz_elts = 0;
+  unique_nz_elts = 0;
   init_elts = 0;
   num_fields = 0;
   elt_type = NULL_TREE;
@@ -5982,12 +5984,13 @@ categorize_ctor_elements_1 (const_tree c
{
case CONSTRUCTOR:
  {
-   HOST_WIDE_INT nz = 0, ic = 0;
+   HOST_WIDE_INT nz = 0, unz = 0, ic = 0;
 
-   bool const_elt_p = categorize_ctor_elements_1 (value, , ,
-  p_complete);
+   bool const_elt_p = categorize_ctor_elements_1 (value, , ,
+  , p_complete);
 
nz_elts += mult * nz;
+   unique_nz_elts += unz;
init_elts += mult * ic;
 
if (const_from_elts_p && const_p)
@@ -5999,21 +6002,31 @@ categorize_ctor_elements_1 (const_tree c
case REAL_CST:
case FIXED_CST:
  if (!initializer_zerop (value))
-   nz_elts += mult;
+   {
+ nz_elts += mult;
+ unique_nz_elts++;
+   }
  init_elts += mult;
  break;
 
case STRING_CST:
  nz_elts += mult * TREE_STRING_LENGTH (value);
+ unique_nz_elts += TREE_STRING_LENGTH (value);
  init_elts += mult * TREE_STRING_LENGTH (value);
  break;
 
case COMPLEX_CST:
  if

[C++ PATCH] Use RANGE_EXPRs in build_vec_init (PR c++/82294, PR c++/87436)

2018-12-13 Thread Jakub Jelinek

Hi!

The following patch makes use of RANGE_EXPRs in build_vec_init, instead of
appending many ctor elements.

E.g. on the PR87436 testcase the memory usage goes down from more than 5GB
to a few megabytes and compile time decreases significantly too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Testcases not included, see follow-up patch.

2018-12-13  Jakub Jelinek  

PR c++/82294
PR c++/87436
* init.c (build_vec_init): Change num_initialized_elts type from int
to HOST_WIDE_INT.  Build a RANGE_EXPR if e needs to be repeated more
than once.

--- gcc/cp/init.c.jj2018-11-13 09:49:33.150035688 +0100
+++ gcc/cp/init.c   2018-12-13 15:08:08.446783069 +0100
@@ -4104,7 +4104,7 @@ build_vec_init (tree base, tree maxindex
   tree compound_stmt;
   int destroy_temps;
   tree try_block = NULL_TREE;
-  int num_initialized_elts = 0;
+  HOST_WIDE_INT num_initialized_elts = 0;
   bool is_global;
   tree obase = base;
   bool xvalue = false;
@@ -4539,10 +4539,13 @@ build_vec_init (tree base, tree maxindex
 
  if (e)
{
- int max = tree_to_shwi (maxindex)+1;
- for (; num_initialized_elts < max; ++num_initialized_elts)
+ HOST_WIDE_INT last = tree_to_shwi (maxindex);
+ if (num_initialized_elts <= last)
{
  tree field = size_int (num_initialized_elts);
+ if (num_initialized_elts != last)
+   field = build2 (RANGE_EXPR, sizetype, field,
+   size_int (last));
  CONSTRUCTOR_APPEND_ELT (const_vec, field, e);
}
}

Jakub

[PATCH] Fix stmt folding in the inliner (PR tree-optimization/88444)

2018-12-13 Thread Jakub Jelinek

Hi!

The inliner doesn't want to fold statements immediately, but records them
and then fold_marked_statements is meant to fold them when inlining is done.

On the following testcase it doesn't fold some of them though.
The problem is that it wants to scan only newly added basic blocks (i.e.
those created during the inlining), but the way it is written only works if
there are no gaps in the basic_block vector.  If there are, it can fold
stmts only in some of the basic blocks or none of them.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-12-13  Jakub Jelinek  

PR tree-optimization/88444
* tree-inline.c (fold_marked_statements): Iterate up to
last_basic_block_for_fn rather than n_basic_blocks_for_fn.

* gcc.dg/tree-ssa/pr88444.c: New test.

--- gcc/tree-inline.c.jj2018-12-07 00:23:15.745986912 +0100
+++ gcc/tree-inline.c   2018-12-13 13:04:30.407956734 +0100
@@ -4906,7 +4906,7 @@ gimple_expand_calls_inline (basic_block
 static void
 fold_marked_statements (int first, hash_set *statements)
 {
-  for (; first < n_basic_blocks_for_fn (cfun); first++)
+  for (; first < last_basic_block_for_fn (cfun); first++)
 if (BASIC_BLOCK_FOR_FN (cfun, first))
   {
 gimple_stmt_iterator gsi;
--- gcc/testsuite/gcc.dg/tree-ssa/pr88444.c.jj  2018-12-13 13:11:39.577990474 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr88444.c 2018-12-13 13:12:45.184925117 
+0100
@@ -0,0 +1,6 @@
+/* PR tree-optimization/88444 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -finline-functions -finline-small-functions 
-fdump-tree-fixup_cfg3" } */
+/* { dg-final { scan-tree-dump-not " = \\(long int\\) 0;" "fixup_cfg3" } } */
+
+#include "../pr88444.c"

Jakub

[PATCH] Fix ICE due to cross-jumping (PR rtl-optimization/88470)

2018-12-13 Thread Jakub Jelinek

Hi!

The following testcase ICEs, because we have an indirect jump with
a single (fake) successor edge to EXIT, one reachable from the body
of the function after prologue and another one reachable from before the
prologue (due to shrink-wrapping).

The patch fixes this by disallowing crossjumping of basic blocks ending in
such indirect jumps with no (non-fake) successors.

That condition hits on the following testcases (never during bootstrap):
gcc/testsuite/gcc.c-torture/compile/20050122-2.c
gcc/testsuite/gcc.c-torture/execute/920428-2.c
gcc/testsuite/gcc.c-torture/execute/pr24135.c
gcc/testsuite/gcc.dg/pr49994-1.c
gcc/testsuite/gcc.dg/pr79494.c
gcc/testsuite/gcc.dg/torture/stackalign/nested-4.c
gcc/testsuite/gcc.dg/torture/stackalign/non-local-goto-3.c
gcc/testsuite/gcc.dg/torture/stackalign/non-local-goto-5.c
gcc/testsuite/gcc.target/i386/pr88470.c
libgomp/testsuite/libgomp.c/pr81687-2.c
where in most of them it is a non-local jump ending the bb; in a few of
those testcases there is a crossjumping possibility, but it seems it
is actually already handled during PRE (e.g. on pr24135.c).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-13  Jakub Jelinek  

PR rtl-optimization/88470
* cfgcleanup.c (outgoing_edges_match): If the function is
shrink-wrapped and bb1 ends with a JUMP_INSN with a single fake
edge to EXIT, return false.

* gcc.target/i386/pr88470.c: New test.

--- gcc/cfgcleanup.c.jj 2018-12-03 11:14:52.449764174 +0100
+++ gcc/cfgcleanup.c2018-12-13 10:19:40.453539925 +0100
@@ -1592,10 +1592,13 @@ outgoing_edges_match (int mode, basic_bl
   if (crtl->shrink_wrapped
   && single_succ_p (bb1)
   && single_succ (bb1) == EXIT_BLOCK_PTR_FOR_FN (cfun)
-  && !JUMP_P (BB_END (bb1))
+  && (!JUMP_P (BB_END (bb1))
+ /* Punt if the only successor is a fake edge to exit, the jump
+must be some weird one.  */
+ || (single_succ_edge (bb1)->flags & EDGE_FAKE) != 0)
   && !(CALL_P (BB_END (bb1)) && SIBLING_CALL_P (BB_END (bb1
 return false;
-  
+
   /* If BB1 has only one successor, we may be looking at either an
  unconditional jump, or a fake edge to exit.  */
   if (single_succ_p (bb1)
--- gcc/testsuite/gcc.target/i386/pr88470.c.jj  2018-12-13 10:11:39.144381178 
+0100
+++ gcc/testsuite/gcc.target/i386/pr88470.c 2018-12-13 10:10:04.610921517 
+0100
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/88470 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -maccumulate-outgoing-args -ftrapv -fno-ivopts 
-fno-reorder-blocks-and-partition" } */
+
+void
+foo (long x, long *y)
+{
+  long *a = y - 64, i;
+  for (i = 0; i < x; i++)
+{
+  long v = y[i];
+  *a++ = v;
+}
+  register void **c __asm__ ("di");
+  goto **c;
+}

Jakub

Re: [doc,committed] clarify docs for function attribute "const"

2018-12-13 Thread Martin Sebor


Ping: https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00426.html

(I have now committed the @code{const} cleanup mentioned below.)

On 12/6/18 6:12 PM, Martin Sebor wrote:

On 12/3/18 11:04 PM, Sandra Loosemore wrote:

On 12/3/18 2:47 PM, Martin Sebor wrote:

[snip]

Attached is my proposed update.  The user's email suggested going
into a lot of detail that I'm not sure would be helpful.  I think
it's safer to keep it simple than to try to carefully outline tricky
conditions under which some const or pure functions might get away
with modifying program state.  At the same time, I tried not to
outright prohibit it so I phrased the restrictions in terms of
observable program state (as opposed to reading or writing
"global" variables and such).

I also made a few other changes, like move the example of
the square function from pure to const (since the function is
much more likely const), and made the const restrictions stand
on their own, rather than depending on pure.



Index: /ssd/src/gcc/svn/gcc/doc/extend.texi
===
--- /ssd/src/gcc/svn/gcc/doc/extend.texi    (revision 266760)
+++ /ssd/src/gcc/svn/gcc/doc/extend.texi    (working copy)
@@ -2518,25 +2518,45 @@ are automatically detected and this attribute 
is i

 @item const
 @cindex @code{const} function attribute
 @cindex functions that have no side effects
-Many functions do not examine any values except their arguments, and
-have no effects except to return a value.  Calls to such functions lend
-themselves to optimization such as common subexpression elimination.
-The presence of the @code{const} attribute on a function declaration 
-allows GCC to emit more efficient code for some calls to the 
function. +Calls to functions whose return value is not affected by 
changes to

+the observable state of the program and that have no observable effects
+on such state other than to return a value may lend themselves to
+optimizations such as common subexpression elimination.  Declaring such
+functions with the @code{const} attribute allows GCC to emit more 
efficient

+code for consecutive calls to the function.


I don't think the calls have to be consecutive for GCC to emit more 
efficient code.  I'd kill that last bit.


I added it because I want to make it clear that the attribute
shouldn't be expected to improve the efficiency of any single
call.  It can only improve it for the second (and subsequent)
call with the same argument values as the first.  I've tweaked
the sentence a bit.




+The @code{const} attribute prohibits a function from reading objects
+that affect its return value between successive invocations.  However,
+functions declared with the attribute can safely read objects that do
+not affect their return value, such as non-volatile constants.
 The @code{const} attribute imposes greater restrictions on a function's
-definition than the similar @code{pure} attribute below because it
-additionally prohibits the function from reading memory except for
-constant global variables.  Decorating the same function with
-both the @code{const} and the @code{pure} attribute is diagnosed.
+definition than the similar @code{pure} attribute.  Declaring the same
+function with both the @code{const} and the @code{pure} attribute is
+diagnosed.  Because a const function cannot have any observable side


@code{const}


I actually meant to bring this up separately.

I don't think @code would be appropriate in this context.  It's
not a use of the attribute name but rather a description of
the function.  Analogous to "a const pointer" or "a volatile
access" vs "a @code{const}-qualified pointer or
"the @code{volatile} qualifier."  I think the manual already
uses this style, albeit not completely consistently.  The C
and C++ standards aren't 100% consistent either, but I believe
the editors follow this principle when they think of it.  Unless
you really do think GCC should go the other way around I would
like to add this to the GCC style guidelines and fix
the inconsistencies.

I thought I was consistent in this in my own changes but I see
I was not.  I've changed the @code{const} to just const according
to the above.


+effects it does not make sense for it to return @code{void}.  Declaring
+such a function is diagnosed.

+For example,
+
+@smallexample
+int square (int) __attribute__ ((const));
+@end smallexample
+
+@noindent
+tells GCC that subsequent calls to function @code{square} with the same
+argument value can be replaced by the result of the first call 
regardless

+of the statements in between.
+


I think it would be better to move this example immediately after the 
first paragraph, before you digress about the diagnostics and such like.


Done.




 @cindex pointer arguments
 Note that a function that has pointer arguments and examines the data
-pointed to must @emph{not} be declared @code{const}.  Likewise, a
-function that calls a non-@code{const} function usually must not be

Re: [PATCH] handle expressions in __builtin_has_attribute (PR 88383)

2018-12-13 Thread Martin Sebor


On 12/13/18 12:56 PM, Jakub Jelinek wrote:

On Thu, Dec 13, 2018 at 12:48:18PM -0700, Martin Sebor wrote:

The support is necessary in order to determine the attributes
in expressions such as:

    struct S { __attribute__ ((packed)) int a[32]; };

    extern struct S s;

    _Static_assert (__builtin_has_attribute (s.a, packed));


An example involving types might be a better one:

   typedef __attribute__ ((may_alias)) int* BadInt;

   void f (BadInt *p)
   {
 _Static_assert (__builtin_has_attribute (*p, may_alias));
   }


So how is the builtin defined then?  Is the argument always an expression
and you only return whether its type has the attribute, or do something
different if the expression is of certain kind?


For a DECL the built-in looks at both the DECL_ATTRIBUTES and
the DECL's type's TYPE_ATTRIBUTES.  For expressions, it just
looks at TYPE_ATTRIBUTES of the expression's type.  I'm not
sure that this is necessarily the cleanest design but it's
meant to follow how attributes are applied to DECLs.  Sometimes
they get applied to the DECL itself and other times to its type.
This has been causing confusion to both users and GCC devs such
as myself so at some point I'd like to make this clearer somehow.
Not sure exactly how yet.


Perhaps it would be better have two builtins, one would always take
type from the expression, the other would only accept decls (or perhaps
some syntax for the FIELD_DECLs).


I'm not sure I see a need for two built-ins here.

There is another way to handle the may_alias example: by using
typeof first and then the built-in on the result:

  _Static_assert (__builtin_has_attribute (__typeof__ (*p), may_alias));

so it seems that it would be possible to do what Jeff suggests
and only accept DECLs and TYPEs.  But I'm also not sure I see
an advantage of restricting the built-in like that.

FWIW, I had in mind two uses when I introduced the built-in:
1) a conditional attribute copy (where the condition would be
whether or not the source DECL or TYPE has some attribute), and
2) detecting that attributes have been successfully applied as
expected in tests.  I haven't implemented (1) yet but I'd like
to at some point (if only to avoid hardcoding into GCC the set
of attributes that are excluded from copying).  I have added
a handful of tests that make use of the built-in.  So if changes
should be made to the built-in, I will want to make sure they
don't make these use cases difficult.

Martin

Re: [C++ Patch] PR 84644 ("internal compiler error: in warn_misplaced_attr_for_class_type, at cp/decl.c:4718")

2018-12-13 Thread Jason Merrill


On 10/30/18 9:22 PM, Paolo Carlini wrote:

Hi,

On 30/10/18 21:37, Jason Merrill wrote:

On 10/26/18 2:02 PM, Paolo Carlini wrote:

On 26/10/18 17:18, Jason Merrill wrote:
On Fri, Oct 26, 2018 at 4:52 AM Paolo Carlini 
 wrote:

On 24/10/18 22:41, Jason Merrill wrote:

On 10/15/18 12:45 PM, Paolo Carlini wrote:

 && ((TREE_CODE (declspecs->type) != TYPENAME_TYPE
+   && TREE_CODE (declspecs->type) != DECLTYPE_TYPE
  && MAYBE_CLASS_TYPE_P (declspecs->type))
I would think that the MAYBE_CLASS_TYPE_P here should be 
CLASS_TYPE_P,

and then we can remove the TYPENAME_TYPE check.  Or do we want to
allow template type parameters for some reason?

Indeed, it would be nice to just use OVERLOAD_TYPE_P. However it seems
we at least want to let through TEMPLATE_TYPE_PARMs representing 
'auto'

- otherwise Dodji's check a few lines below which fixed c++/51473
doesn't work anymore - and also BOUND_TEMPLATE_TEMPLATE_PARM, 
otherwise

we regress on template/spec32.C and template/ttp22.C because we don't
diagnose the shadowing anymore. Thus, I would say either we keep on
using MAYBE_CLASS_TYPE_P or we pick what we need, possibly we add a 
comment?

Aha.  I guess the answer is not to restrict that test any more, but
instead to fix the code further down so it gives a proper diagnostic
rather than call warn_misplaced_attr_for_class_type.


I see. Thus something like the below? It passes testing on x86_64-linux.



+  if ((!declared_type || TREE_CODE (declared_type) == DECLTYPE_TYPE)
+  && ! saw_friend && !error_p)
 permerror (input_location, "declaration does not declare 
anything");


I see no reason to make this specific to decltype.  Maybe move this 
diagnostic into the final 'else' block with the other declspec 
diagnostics and not look at declared_type at all?


I'm not sure to fully understand: if we do that we still want to at 
least minimally check that declared_type is null, like we already do, 
and then we simply accept the new testcase. Is that Ok? Because, as I 
probably mentioned at some point, all the other compilers I have at hand 
issue a "does not declare anything" diagnostic, and we likewise do that 
for the legacy __typeof. Not looking into declared_type *at all* doesn't 
work with plain class types and enums, of course. Or you meant something 
entirely different??



+  if (declspecs->attributes && warn_attributes && declared_type
+  && TREE_CODE (declared_type) != DECLTYPE_TYPE)


I think we do want to give a diagnostic about useless attributes, not 
skip it.


Agreed. FWIW the attached tests fine.


The problem here is that the code toward the bottom expects 
"declared_type" to be the tagged type declared by a declaration with no 
declarator, and in this testcase it's ending up as a DECLTYPE_TYPE.


I think once we've checked for 'auto' we don't want declared_type to be 
anything that isn't OVERLOAD_TYPE_P.  We can arrange that either by 
checking for 'auto' first and then changing the code that sets 
declared_type to use OVERLOAD_TYPE_P, or by clearing declared_type after 
checking for 'auto' if it isn't OVERLOAD_TYPE_P.


Jason

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-13 Thread Dimitar Dimitrov

On Thu, Dec 13, 2018 at 8:48:38 EET Segher Boessenkool wrote:
> On Wed, Dec 12, 2018 at 06:26:10PM +0200, Dimitar Dimitrov wrote:
> > I expect that if I mark a HW register as "clobber", compiler would save
> > its
> > contents before executing the asm statement, and after that it would
> > restore its contents. This is the GCC behaviour for all but the SP and
> > PIC registers. That is why I believe that PR52813 is a valid bug.
> 
> It won't do it for *any* fixed registers.  But you do not want to error
> or even warn for some fixed registers, for example the "flags" register
> on x86 is *always* written to by asm.

Yes, you are correct.

> 
> But you never want to warn for non-fixed registers, and e.g.
> PIC_OFFSET_TABLE_REGNUM isn't always a fixed register (when flag_pic is 0
> for example).
I  could not trace how PIC_OFFSET_TABLE_REGNUM on i386 gets marked as fixed 
register. I'll dig more through the source.

> 
> > I'm not sure how GCC could recover if SP is clobbered. If SP is clobbered
> > in such a way that GCC will not notice (e.g. thread switching), then why
> > should GCC know about it in the first place?
> 
> Up until today, GCC has always just ignored it if you claimed to clobber
> the stack pointer.

My point is that the silent ignoring is confusing to users, as shown by 
PR52813. How would you suggest me to proceed:
 - Leave patch as-is.
 - Revert patch. Update documentation to point that clobber marker for fixed 
registers is ignored by GCC. Close PR52813 as invalid.
 - Revert patch. Discuss more broadly and specify behaviour of asm clobber for 
fixed registers (and SP in particular).

Thanks,
Dimitar

patch to fix PR88414

2018-12-13 Thread Vladimir Makarov


The following patch fixes

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88414

The patch was bootstrapped and tested on x86-64.

Committed as rev. 267109.

Index: ChangeLog
===
--- ChangeLog	(revision 267108)
+++ ChangeLog	(working copy)
@@ -1,3 +1,14 @@
+2018-12-13  Vladimir Makarov  
+
+	PR rtl-optimization/88414
+	* lra-int.h (lra_asm_error_p): New.
+	* lra-assigns.c (lra_assign): Check lra_asm_error_p for checking
+	call crossed pseudo assignment correctness.
+	(lra_split_hard_reg_for): Set up lra_asm_error_p.
+	* lra-constraints.c (curr_insn_transform): Ditto.
+	* lra.c (lra_asm_error_p): New.
+	(lra): Reset lra_asm_error_p.
+
 2018-12-13  Jakub Jelinek  
 
 	PR rtl-optimization/88416
Index: lra-int.h
===
--- lra-int.h	(revision 266678)
+++ lra-int.h	(working copy)
@@ -288,6 +288,7 @@ typedef struct lra_insn_recog_data *lra_
 
 extern FILE *lra_dump_file;
 
+extern bool lra_asm_error_p;
 extern bool lra_reg_spill_p;
 
 extern HARD_REG_SET lra_no_alloc_regs;
Index: lra.c
===
--- lra.c	(revision 266678)
+++ lra.c	(working copy)
@@ -2334,6 +2334,9 @@ bitmap_head lra_subreg_reload_pseudos;
 /* File used for output of LRA debug information.  */
 FILE *lra_dump_file;
 
+/* True if we found an asm error.  */
+bool lra_asm_error_p;
+
 /* True if we should try spill into registers of different classes
instead of memory.  */
 bool lra_reg_spill_p;
@@ -2371,7 +2374,8 @@ lra (FILE *f)
   bool live_p, inserted_p;
 
   lra_dump_file = f;
-
+  lra_asm_error_p = false;
+  
   timevar_push (TV_LRA);
 
   /* Make sure that the last insn is a note.  Some subsequent passes
Index: lra-constraints.c
===
--- lra-constraints.c	(revision 267108)
+++ lra-constraints.c	(working copy)
@@ -3940,6 +3940,7 @@ curr_insn_transform (bool check_only_p)
 	fatal_insn ("unable to generate reloads for:", curr_insn);
   error_for_asm (curr_insn,
 		 "inconsistent operand constraints in an %");
+  lra_asm_error_p = true;
   /* Avoid further trouble with this insn.  Don't generate use
 	 pattern here as we could use the insn SP offset.  */
   lra_set_insn_deleted (curr_insn);
Index: lra-assigns.c
===
--- lra-assigns.c	(revision 266678)
+++ lra-assigns.c	(working copy)
@@ -1615,7 +1615,10 @@ lra_assign (bool _p)
   bitmap_initialize (_spilled_pseudos, _obstack);
   create_live_range_start_chains ();
   setup_live_pseudos_and_spill_after_risky_transforms (_spilled_pseudos);
-  if (flag_checking && !flag_ipa_ra)
+  if (! lra_asm_error_p && flag_checking && !flag_ipa_ra)
+/* Check correctness of allocation for call-crossed pseudos but
+   only when there are no asm errors as in the case of errors the
+   asm is removed and it can result in incorrect allocation.  */
 for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
   if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0
 	  && lra_reg_info[i].call_p
@@ -1783,7 +1786,7 @@ lra_split_hard_reg_for (void)
   insn = lra_insn_recog_data[u]->insn;
   if (asm_noperands (PATTERN (insn)) >= 0)
 	{
-	  asm_p = true;
+	  lra_asm_error_p = asm_p = true;
 	  error_for_asm (insn,
 			 "% operand has impossible constraints");
 	  /* Avoid further trouble with this insn.
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 267108)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2018-12-13  Vladimir Makarov  
+
+	PR rtl-optimization/88414
+	* gcc.target/i386/pr88414.c: New.
+
 2018-12-13  Marek Polacek  
 
 	PR c++/88216 - ICE with class type in non-type template parameter.
Index: testsuite/gcc.target/i386/pr88414.c
===
--- testsuite/gcc.target/i386/pr88414.c	(nonexistent)
+++ testsuite/gcc.target/i386/pr88414.c	(working copy)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftrapv" } */
+
+unsigned int
+foo (unsigned int *x, const unsigned int *y, int z, unsigned int w)
+{
+  unsigned int a, b, c, s;
+  int j;
+  j = -z;
+  x -= j;
+  y -= j;
+  a = 0;
+  do
+{
+  asm volatile ("" : "=d" (b), "=d" (c) : "r" (y[j]), "d" (w)); /* { dg-error "'asm' operand has impossible constraints" } */
+  c += a;
+  a = (c < a) + b;
+  s = x[j];
+  c = s + c;
+  a += (c < s);
+  x[j] = c;
+}
+  while (++j != 0);
+  return a;
+}

Re: V4 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-13 Thread Jason Merrill


On 9/25/18 11:46 AM, H.J. Lu wrote:

On Fri, Aug 31, 2018 at 2:04 PM, Jason Merrill  wrote:

On 07/23/2018 05:24 PM, H.J. Lu wrote:


On Mon, Jun 18, 2018 at 12:26 PM, Joseph Myers 
wrote:


On Mon, 18 Jun 2018, Jason Merrill wrote:


On Mon, Jun 18, 2018 at 11:59 AM, Joseph Myers 
wrote:


On Mon, 18 Jun 2018, Jason Merrill wrote:


+  if (TREE_CODE (rhs) == COND_EXPR)
+{
+  /* Check the THEN path first.  */
+  tree op1 = TREE_OPERAND (rhs, 1);
+  context = check_address_of_packed_member (type, op1);



This should handle the GNU extension of re-using operand 0 if operand
1 is omitted.



Doesn't that just use a SAVE_EXPR?



Hmm, I suppose it does, but many places in the compiler seem to expect
that it produces a COND_EXPR with TREE_OPERAND 1 as NULL_TREE.



Maybe that's used somewhere inside the C++ front end.  For C a SAVE_EXPR
is produced directly.



Here is the updated patch.  Changes from the last one:

1. Handle COMPOUND_EXPR.
2. Fixed typos in comments.
3. Combined warn_for_pointer_of_packed_member and
warn_for_address_of_packed_member into
warn_for_address_or_pointer_of_packed_member.




c.i:4:33: warning: converting a packed ‘struct C *’ pointer increases the
alignment of ‘long int *’ pointer from 1 to 8 [-Waddress-of-packed-member]



I think this would read better as

c.i:4:33: warning: converting a packed ‘struct C *’ pointer (alignment 1) to
‘long int *’ (alignment 8) may result in an unaligned pointer value
[-Waddress-of-packed-member]


Fixed.


+  while (TREE_CODE (base) == ARRAY_REF)
+   base = TREE_OPERAND (base, 0);
+  if (TREE_CODE (base) != COMPONENT_REF)
+   return NULL_TREE;



Are you deliberately not handling the other handled_component_p cases? If
so, there should be a comment.


I changed it to

  while (handled_component_p (base))
 {
   enum tree_code code = TREE_CODE (base);
   if (code == COMPONENT_REF)
 break;
   switch (code)
 {
 case ARRAY_REF:
   base = TREE_OPERAND (base, 0);
   break;
 default:
   /* FIXME: Can it ever happen?  */
   gcc_unreachable ();
   break;
 }
 }

Is there a testcase to trigger this ICE? I couldn't find one.


You can take the address of an element of complex:

  __complex int i;
  int *p = &__real(i);

You may get VIEW_CONVERT_EXPR with location wrappers.


+  /* Check alignment of the object.  */
+  if (TREE_CODE (object) == COMPONENT_REF)
+{
+  field = TREE_OPERAND (object, 1);
+  if (TREE_CODE (field) == FIELD_DECL && DECL_PACKED (field))
+   {
+ type_align = TYPE_ALIGN (type);
+ context = DECL_CONTEXT (field);
+ record_align = TYPE_ALIGN (context);
+ if ((record_align % type_align) != 0)
+   return context;
+   }
+}



Why doesn't this recurse?  What if you have a packed field three
COMPONENT_REFs down?


My patch works on
[hjl@gnu-cfl-1 pr51628-4]$ cat x.i
struct A { int i; } __attribute__ ((packed));
struct B { struct A a; };
struct C { struct B b; };

extern struct C *p;

int* g8 (void) { return >b.a.i; }
[hjl@gnu-cfl-1 pr51628-4]$ make x.s
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2
-S x.i
x.i: In function ‘g8’:
x.i:7:25: warning: taking address of packed member of ‘struct A’ may
result in an unaligned pointer value [-Waddress-of-packed-member]
7 | int* g8 (void) { return >b.a.i; }
   | ^
[hjl@gnu-cfl-1 pr51628-4]$

If it isn't what you had in mind, can you give me a testcase?


In that testcase, 'i' is the top COMPONENT_EXPR.  What I was talking 
about would be more like


 struct A { int i; };
 struct B { struct A a; };
 struct C { struct B b __attribute__ ((packed)); };

 extern struct C *p;

 int* g8 (void) { return >b.a.i; }

Jason

Re: C++ PATCH for c++/88216, ICE with class type in non-type template parameter

2018-12-13 Thread Jason Merrill


On 12/11/18 4:05 PM, Marek Polacek wrote:

On Tue, Dec 11, 2018 at 10:48:17AM -0500, Jason Merrill wrote:

On 12/10/18 2:52 PM, Marek Polacek wrote:

+  if (processing_template_decl && value_dependent_expression_p (expr))


You don't need to check processing_template_decl before
value_dependent_expression_p.


Ok.
  

I would lean toward checking for value-dependence in
convert_nontype_argument, which already does that a lot.  Enough, actually,
that perhaps we should remember the result in a local variable.


I've moved the call, but I didn't add a local variable for the result,
because I was afraid that, since EXPR is being modified on some of the
codepaths, its value-dependent-ness (is that a term?) may change.


Any change would be a bug, but the patch is OK as is.

Jason

Re: [PATCH] v4: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-13 Thread Jason Merrill


On 12/13/18 3:12 PM, David Malcolm wrote:

On Wed, 2018-12-12 at 15:37 -0500, Jason Merrill wrote:

On 12/7/18 3:13 PM, David Malcolm wrote:

On Tue, 2018-12-04 at 18:31 -0500, Jason Merrill wrote:

On 12/3/18 5:10 PM, Jeff Law wrote:

On 11/19/18 9:51 AM, David Malcolm wrote:


[...]

@@ -1058,6 +1058,9 @@ grokbitfield (const cp_declarator
*declarator,
 return NULL_TREE;
   }
   
+  if (width)

+STRIP_ANY_LOCATION_WRAPPER (width);


Why is this needed?  We should already be reducing width to an
unwrapped
constant value somewhere along the line.


"width" is coming from cp_parser_member_declaration, from:

  /* Get the width of the bitfield.  */
  width = cp_parser_constant_expression (parser, false,
NULL,
 cxx_dialect >=
cxx11);

and currently nothing is unwrapping the value.  We presumably need
to
unwrap (or fold?) it before it is stashed in
DECL_BIT_FIELD_REPRESENTATIVE (value).

Without stripping (or folding) here, we e.g. lose a warning and get
this:
FAIL: g++.dg/abi/empty22.C  -std=gnu++98  (test for warnings,
line 15)


Why does that happen?  check_bitfield_decl ought to handle the
location
wrapper fine.  That's where it gets folded.


The unstripped location wrapper defeats this check for zero in
check_field_decls within cp/class.c:

3555  if (DECL_C_BIT_FIELD (x)
3556  && integer_zerop (DECL_BIT_FIELD_REPRESENTATIVE (x)))
3556/* We don't treat zero-width bitfields as making a class
3557   non-empty.  */


Aha.  I wonder if integer_zerop should look through location wrappers? 
Or alternately, abort if it gets one?


On a tangent, perhaps we also want a macro like TREE_CODE that looks 
through wrappers.  TREE_CODE_WRAPPED?  _NO_WRAP?  Other name ideas?



3558;
3559  else

leading it to erroneously use the "else" clause, which thus erroneously
clears CLASSTYPE_EMPTY_P, leading to the loss of:

g++.dg/abi/empty22.C:15:6: warning: empty class 'dummy' parameter passing
   ABI changes in -fabi-version=12 (GCC 8) [-Wabi]
15 |   fun(d, f); // { dg-warning "empty" }
   |   ~~~^~

check_bitfield_decl is called *after* that check, so the folding there
doesn't help.


@@ -656,6 +656,9 @@ add_capture (tree lambda, tree id, tree
orig_init, bool by_reference_p,
 listmem = make_pack_expansion (member);
 initializer = orig_init;
   }
+
+  STRIP_ANY_LOCATION_WRAPPER (initializer);


Why is this needed?  What cares about the tree codes of the
capture
initializer?


This is used to populate LAMBDA_EXPR_CAPTURE_LIST.  Without
stripping,
we end up with wrapped VAR_DECLs, rather than the VAR_DECLs
themselves,


Sure, that sounds fine.


and this confuses things later on, for example leading to:

   PASS -> FAIL : g++.dg/cpp0x/lambda/lambda-type.C  -std=c++14
(test for excess errors)
   PASS -> FAIL : g++.dg/cpp0x/lambda/lambda-type.C  -std=c++17
(test for excess errors)


Confuses how?


Without stripping, we get extra errors for decltype() on identifiers in
the capture-list, e.g.:

   [i] {
 same_type();
 same_type();
   };

cpp0x/lambda/lambda-type.C: In lambda function:
cpp0x/lambda/lambda-type.C:48:27: error: 'i' is not captured
48 | same_type();
   |   ^
cpp0x/lambda/lambda-type.C:48:39: error: template argument 1 is invalid
48 | same_type();
   |   ^

These occur because, in capture_decltype, when searching for the decl for
"i" here:

   tree cap = value_member (decl, LAMBDA_EXPR_CAPTURE_LIST (lam));

the LAMBDA_EXPR_CAPTURE_LIST contains a wrapper around "i" rather than "i"
itself, and so "i" isn't found by value_member; "cap" thus becomes NULL_TREE,
and thus the error.


Ah.  Just above that I notice,

  /* FIXME do lookup instead of list walk? */

We could make that change, i.e.

tree cap = lookup_name_real (DECL_NAME (decl), /*type*/0, /*nonclass*/1,
 /*block_p=*/true, /*ns*/0, LOOKUP_HIDDEN);
...
if (cap && is_capture_proxy (cap))

Jason

[PATCH] Fix handling of POSIX paths containing a root-name

2018-12-13 Thread Jonathan Wakely


Fix path appending and concatenating to work correctly for a leading
root-name. Check a new macro, SLASHSLASH_IS_ROOT_NAME, instead of making
the behaviour depend directly on __CYGWIN__.

* src/filesystem/std-path.cc (SLASHSLASH_IS_ROOT_NAME): New macro to
control whether interpret paths with two slashes as a root-name.
(path::operator/=(const path&)) [SLASHSLASH_IS_ROOT_NAME]: Add a
root-directory when appending to a root-name.
(path::_M_append(basic_string_view))
[SLASHSLASH_IS_ROOT_NAME]: Likewise.
(path::operator/=(const path&)) [SLASHSLASH_IS_ROOT_NAME]: Likewise.
(path::_M_concat(basic_string_view))
[SLASHSLASH_IS_ROOT_NAME]: Likewise.
(path::lexically_normal()) [SLASHSLASH_IS_ROOT_NAME]: Use += instead
of /= to add a root-directory to the result.
* testsuite/27_io/filesystem/path/decompose/root_directory.cc: Fix
expected result for Cygwin.

Tested x86_64-linux, as committed, and also with the new
SLASHSLASH_IS_ROOT_NAME macro defined to test that version of the
code.

Committed to trunk.


commit b15b89d09ec824a42c897851d9efe2405e92fb69
Author: redi 
Date:   Thu Dec 13 20:34:10 2018 +

Fix handling of POSIX paths containing a root-name

Fix path appending and concatenating to work correctly for a leading
root-name. Check a new macro, SLASHSLASH_IS_ROOT_NAME, instead of making
the behaviour depend directly on __CYGWIN__.

* src/filesystem/std-path.cc (SLASHSLASH_IS_ROOT_NAME): New macro to
control whether interpret paths with two slashes as a root-name.
(path::operator/=(const path&)) [SLASHSLASH_IS_ROOT_NAME]: Add a
root-directory when appending to a root-name.
(path::_M_append(basic_string_view))
[SLASHSLASH_IS_ROOT_NAME]: Likewise.
(path::operator/=(const path&)) [SLASHSLASH_IS_ROOT_NAME]: Likewise.
(path::_M_concat(basic_string_view))
[SLASHSLASH_IS_ROOT_NAME]: Likewise.
(path::lexically_normal()) [SLASHSLASH_IS_ROOT_NAME]: Use += instead
of /= to add a root-directory to the result.
* testsuite/27_io/filesystem/path/decompose/root_directory.cc: Fix
expected result for Cygwin.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267107 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/src/filesystem/std-path.cc 
b/libstdc++-v3/src/filesystem/std-path.cc
index e9c78924b8e..d2520492c03 100644
--- a/libstdc++-v3/src/filesystem/std-path.cc
+++ b/libstdc++-v3/src/filesystem/std-path.cc
@@ -26,6 +26,11 @@
 # define _GLIBCXX_USE_CXX11_ABI 1
 #endif
 
+#ifdef __CYGWIN__
+// Interpret "//x" as a root-name, not root-dir + filename
+# define SLASHSLASH_IS_ROOTNAME 1
+#endif
+
 #include 
 #include 
 #include 
@@ -70,7 +75,7 @@ struct path::_Parser
 // look for root name or root directory
 if (is_dir_sep(input[0]))
   {
-#ifdef __CYGWIN__
+#if SLASHSLASH_IS_ROOTNAME
// look for root name, such as "//foo"
if (len > 2 && input[1] == input[0])
  {
@@ -515,7 +520,7 @@ path::operator/=(const path& __p)
   string_view_type sep;
   if (has_filename())
 sep = { _separator, 1 };  // need to add a separator
-#ifdef __CYGWIN__
+#if SLASHSLASH_IS_ROOTNAME
   else if (_M_type() == _Type::_Root_name) // root-name with no root-dir
 sep = { _separator, 1 };  // need to add a separator
 #endif
@@ -535,6 +540,10 @@ path::operator/=(const path& __p)
 capacity += __p._M_cmpts.size();
   else if (!__p.empty() || !sep.empty())
 capacity += 1;
+#if SLASHSLASH_IS_ROOTNAME
+  if (orig_type == _Type::_Root_name)
+++capacity; // Need to insert root-directory after root-name
+#endif
 
   if (orig_type == _Type::_Multi)
 {
@@ -568,6 +577,14 @@ path::operator/=(const path& __p)
  string_view_type s(_M_pathname.data(), orig_pathlen);
  ::new(output++) _Cmpt(s, orig_type, 0);
  ++_M_cmpts._M_impl->_M_size;
+#if SLASHSLASH_IS_ROOTNAME
+ if (orig_type == _Type::_Root_name)
+   {
+ ::new(output++) _Cmpt(sep, _Type::_Root_dir,
+   orig_pathlen + sep.length());
+ ++_M_cmpts._M_impl->_M_size;
+   }
+#endif
}
 
   if (__p._M_type() == _Type::_Multi)
@@ -668,7 +685,7 @@ path::_M_append(basic_string_view s)
   basic_string_view sep;
   if (has_filename())
 sep = { _separator, 1 };  // need to add a separator
-#ifdef __CYGWIN__
+#if SLASHSLASH_IS_ROOTNAME
   else if (_M_type() == _Type::_Root_name) // root-name with no root-dir
 sep = { _separator, 1 };  // need to add a separator
 #endif
@@ -723,6 +740,11 @@ path::_M_append(basic_string_view s)
   else if (!sep.empty())
 ++capacity;
 
+#if SLASHSLASH_IS_ROOTNAME
+  if (orig_type == _Type::_Root_name)
+++capacity; // Need to insert root-directory after root-name
+#endif
+
   __try
 {

Re: [PATCH] PR libstdc++/71044 optimize std::filesystem::path construction

2018-12-13 Thread Jonathan Wakely


On 12/12/18 17:22 +, Jonathan Wakely wrote:

This new implementation has a smaller footprint than the previous
implementation, due to replacing std::vector<_Cmpt> with a custom pimpl
type that only needs a single pointer. The _M_type enumeration is also
combined with the pimpl type, by using a tagged pointer, reducing
sizeof(path) further still.

Construction and modification of paths is now done more efficiently, by
splitting the input into a stack-based buffer of string_view objects
instead of a dynamically-allocated vector containing strings. Once the
final size is known only a single allocation is needed to reserve space
for it. The append and concat operations no longer require constructing
temporary path objects, nor re-parsing the entire native pathname.

This results in algorithmic improvements to path construction, and
working with large paths is much faster.

PR libstdc++/71044
* include/bits/fs_path.h (path::path(path&&)): Add noexcept when
appropriate. Move _M_cmpts instead of reparsing the native pathname.
(path::operator=(const path&)): Do not define as defaulted.
(path::operator/=, path::append): Call _M_append.
(path::concat): Call _M_concat.
(path::path(string_type, _Type): Change type of first parameter to
basic_string_view.
(path::_M_append(string_type), path::_M_concat(string_type)): New
member functions.
(path::_S_is_dir_sep): Replace with non-member is_dir_sep.
(path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir)
(path::_M_add_filename): Remove.
(path::_M_type()): New member function to replace _M_type data member.
(path::_List): Define new struct type instead of using std::vector.
(path::_Cmpt::_Cmpt(string_type, _Type, size_t)): Change type of
first parameter to basic_string_view.
(path::operator+=(const path&)): Do not define inline.
(path::operator+=(const string_type&)): Call _M_concat.
(path::operator+=(const value_type*)): Likewise.
(path::operator+=(value_type)): Likewise.
(path::operator+=(basic_string_view)): Likewise.
(path::operator/=(const path&)): Do not define inline.
(path::_M_append(path)): Remove.
* python/libstdcxx/v6/printers.py (StdPathPrinter): New printer that
understands the new path::_List type.
* src/filesystem/std-path.cc (is_dir_sep): New function to replace
path::_S_is_dir_sep.
(path::_Parser): New helper class to parse strings as paths.
(path::_List::_Impl): Define container type for path components.
(path::_List): Define members.
(path::operator=(const path&)): Define explicitly, to provide the
strong exception safety guarantee.
(path::operator/=(const path&)): Implement manually by processing
each component of the argument, rather than using _M_split_cmpts
to parse the entire string again.
(path::_M_append(string_type)): Likewise.
(path::operator+=(const path&)): Likewise.
(path::_M_concat(string_type)): Likewise.
(path::remove_filename()): Perform trim directly instead of calling
_M_trim().
(path::_M_split_cmpts()): Rewrite in terms of _Parser class.
(path::_M_trim, path::_M_add_root_name, path::_M_add_root_dir)
(path::_M_add_filename): Remove.

Tested x86_64-linux. I intend to commit this to trunk tomorrow.

There are some other performance improvements that could be made to
the std::filesystem implementation, but this is the main thing I
wanted to do for PR 71044.


Here's an improved patch which I'm commmitting to trunk. The previous
patch added _M_append(string_type) and _M_concat(string_type), which
had the advantage that the argument owned its contents and so would
not be invalidated by modifications to the strings contained in *this.
The new version of the patch uses string_view arguments for those
functions. This avoids allocating a string when appending a NTBS or
an existing string_view, at the expense of some additional complexity
to ensure the contents of the string_view are not used after they
could become invalidated (e.g. if they refer to one of the strings
contained in *this which gets altered by the append/concat operation).

This also fixes some bugs in the MinGW-specific code, which I hadn't
tested yesterday. All 27_io/filesystem/path/* tests pass on
x86_64-w64-mingw32 now.

This puts us in a position where the std::filesystem::path class can
be moved from libstdc++fs.a to the main libstdc++.so and enabled
unconditionally. I'll do that in the next few days. (The directory
iterators and filesystem "operations" like copy_file will still need
to be conditionally defined, depending on whether the target supports
the necessary OS APIs.)

Tested x86_64-linux, committed to trunk.

commit 355eb6608a70864f05c3003ac8a1713b64e0d802
Author: redi 
Date:   Thu Dec 13 11:01:03 2018 +

Re: [C++ PATCH] Fix up __builtin_is_constant_evaluated handling in array type sizes (PR c++/88446)

2018-12-13 Thread Jakub Jelinek

On Thu, Dec 13, 2018 at 03:05:10PM -0500, Jason Merrill wrote:
> > Perhaps use /*pretend_const_required=*/true?
> 
> Please.  And the name of the parameter should change to match the other
> patches.  OK with those changes.

You've already acked a newer version of that patch that has that in
and current trunk uses /*manifestly_const_required=*/true.

Jakub

Re: [C++ PATCH] Fix up __builtin_is_constant_evaluated handling in array type sizes (PR c++/88446)

2018-12-13 Thread Jason Merrill


On 12/11/18 3:35 PM, Marek Polacek wrote:

On Tue, Dec 11, 2018 at 05:39:25PM +0100, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, while we allow VLAs in some contexts in C++ as
an extension, they aren't standard and the standard requires in those spots
constant expressions, thus __builtin_is_constant_evaluated () needs to be
true if those sizes are indeed constant expressions.

Fixed by calling cxx_eval_outermost_constant_expr with
pretend_const_required too in that case.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-11  Jakub Jelinek  

PR c++/88446
* cp-tree.h (maybe_constant_value): Add pretend_const_required
argument.
* constexpr.c (maybe_constant_value): Likewise.  If true, don't
cache and call cxx_eval_outermost_constant_expr with true as
pretend_const_required.
* decl.c (compute_array_index_type_loc): Call maybe_constant_value
with true as pretend_const_required.

* g++.dg/cpp2a/is-constant-evaluated3.C: New test.

--- gcc/cp/cp-tree.h.jj 2018-12-07 00:23:15.024998595 +0100
+++ gcc/cp/cp-tree.h2018-12-11 13:55:51.933845503 +0100
@@ -7663,7 +7663,7 @@ extern bool require_rvalue_constant_expr
  extern bool require_potential_rvalue_constant_expression (tree);
  extern tree cxx_constant_value(tree, tree = 
NULL_TREE);
  extern tree cxx_constant_init (tree, tree = NULL_TREE);
-extern tree maybe_constant_value   (tree, tree = NULL_TREE);
+extern tree maybe_constant_value   (tree, tree = NULL_TREE, bool = 
false);
  extern tree maybe_constant_init   (tree, tree = 
NULL_TREE, bool = false);
  extern tree fold_non_dependent_expr   (tree, tsubst_flags_t = 
tf_warning_or_error);
  extern tree fold_simple   (tree);
--- gcc/cp/constexpr.c.jj   2018-12-11 12:01:27.968941683 +0100
+++ gcc/cp/constexpr.c  2018-12-11 13:56:50.382890876 +0100
@@ -5244,7 +5244,7 @@ fold_simple (tree t)
  static GTY((deletable)) hash_map *cv_cache;
  
  tree

-maybe_constant_value (tree t, tree decl)
+maybe_constant_value (tree t, tree decl, bool pretend_const_required)
  {
tree r;
  


Could you please describe the new param in the comment?


--- gcc/cp/decl.c.jj2018-12-07 00:23:15.0 +0100
+++ gcc/cp/decl.c   2018-12-11 13:57:30.779231098 +0100
@@ -9645,7 +9645,10 @@ compute_array_index_type_loc (location_t
{
  size = instantiate_non_dependent_expr_sfinae (size, complain);
  size = build_converted_constant_expr (size_type_node, size, complain);
- size = maybe_constant_value (size);
+ /* Pedantically a constant expression is required here and so
+__builtin_is_constant_evaluated () should fold to true if it
+is successfully folded into a constant.  */
+ size = maybe_constant_value (size, NULL_TREE, true);


Perhaps use /*pretend_const_required=*/true?


Please.  And the name of the parameter should change to match the other 
patches.  OK with those changes.


Jason

Re: [PATCH] handle expressions in __builtin_has_attribute (PR 88383)

2018-12-13 Thread Jakub Jelinek

On Thu, Dec 13, 2018 at 12:48:18PM -0700, Martin Sebor wrote:
> > The support is necessary in order to determine the attributes
> > in expressions such as:
> > 
> >    struct S { __attribute__ ((packed)) int a[32]; };
> > 
> >    extern struct S s;
> > 
> >    _Static_assert (__builtin_has_attribute (s.a, packed));
> 
> An example involving types might be a better one:
> 
>   typedef __attribute__ ((may_alias)) int* BadInt;
> 
>   void f (BadInt *p)
>   {
> _Static_assert (__builtin_has_attribute (*p, may_alias));
>   }

So how is the builtin defined then?  Is the argument always an expression
and you only return whether its type has the attribute, or do something
different if the expression is of certain kind?
Perhaps it would be better have two builtins, one would always take
type from the expression, the other would only accept decls (or perhaps
some syntax for the FIELD_DECLs).

Jakub

Re: [PATCH] handle expressions in __builtin_has_attribute (PR 88383)

2018-12-13 Thread Martin Sebor


On 12/13/18 12:20 PM, Martin Sebor wrote:

On 12/13/18 11:59 AM, Jeff Law wrote:

On 12/5/18 8:55 PM, Martin Sebor wrote:

The __builtin_has_attribute function fails with an ICE when
its first argument is an expression such as INDIRECT_REF, or
many others.  The code simply assumes it's either a type or
a decl.  The attached patch corrects this oversight.

While testing the fix I also noticed the C++ front end expects
the first operand to be a unary expression, which causes most
other kinds of expressions to be rejected.  The patch fixes
that as well.

Finally, while testing the fix even more I realized that
the built-in considers the underlying array itself in ARRAY_REF
expressions rather than its type, which leads to inconsistent
results for overaligned arrays (it's the array itself that's
overaligned, not its elements).  So I fixed that too and
adjusted the one test designed to verify this.

Tested on x86_64-linux.

Martin

gcc-88383.diff

PR c/88383 - ICE calling __builtin_has_attribute on a reference

gcc/c-family/ChangeLog:

PR c/88383
* c-attribs.c (validate_attribute): Handle expressions.
(has_attribute): Handle types referenced by expressions.
Avoid considering array attributes in ARRAY_REF expressions .

gcc/cp/ChangeLog:

PR c/88383
* parser.c (cp_parser_has_attribute_expression): Handle assignment
expressions.

gcc/testsuite/ChangeLog:

PR c/88383
* c-c++-common/builtin-has-attribute-4.c: Adjust expectations.
* c-c++-common/builtin-has-attribute-6.c: New test.

Well, the high level question here is do we want to support this builtin
on expressions at all.  Generally attributes apply to types and decls,
not expressions.

Clearly we shouldn't fault, but my first inclination is that the
attribute applies to types or decls, not expressions.  In that case we
should just be issuing an error.

I could be convinced otherwise, so if you think we should support
passing expressions to this builtin, make your case.


The support is necessary in order to determine the attributes
in expressions such as:

   struct S { __attribute__ ((packed)) int a[32]; };

   extern struct S s;

   _Static_assert (__builtin_has_attribute (s.a, packed));


An example involving types might be a better one:

  typedef __attribute__ ((may_alias)) int* BadInt;

  void f (BadInt *p)
  {
_Static_assert (__builtin_has_attribute (*p, may_alias));
  }

Martin

Re: [PATCH/doc] consistently use @code for const and volatile qualifiers

2018-12-13 Thread Jeff Law

On 12/13/18 12:10 PM, Martin Sebor wrote:
> As we discussed, the manual isn't completely consistent in its
> use of @code for const and volatile qualifiers.  I made a pass
> through extend.texi and added @code wherever it seemed to be
> missing.  This doesn't mean all uses but only those that
> specifically refer to the qualifiers.  In terms like "const
> function" or "volatile object" the terms "const" and "volatile"
> do not specifically refer to the qualifiers and so they don't
> get the @code.  AFAICS, this is in line with existing usage in
> the manual and also in the C and C++ standard (though none is
> 100% consistent).
> 
> While doing this, I also noticed similar inconsistencies in
> the quoting of the values true and false, and in references
> to the asm statement.  There too the manual tends to use @code
> more often than not, so I made similar changes there.  Here
> too I've left unchanged uses of the ordinary English words
> true or false.
> 
> I made no other changes here so the churn is only the result
> of maintaining the 80 characters per line limit.
> 
> If we're happy with this as the general rule of thumb to use
> in these cases I'll propose an update to the Coding Guidelines
> to capture it.
> 
> Martin
> 
> gcc-doc-extend-cvqual-code.diff
> 
> gcc/ChangeLog:
> 
>   * doc/extend.texi: Consistemtly use @code for const and volatile
>   qualifiers and asm statements.
OK.  And definitely add it to the guidelines.  It's easily missed when
making changes as well as during reviews.

jeff

Re: [PATCH] rs6000: Fix AIX aggregate passing fix

2018-12-13 Thread David Edelsohn

On Tue, Dec 4, 2018 at 7:26 PM Segher Boessenkool
 wrote:
>
> David's fix for the AIX aggregate passing from yesterday unfortunately
> also triggers on powerpc64-linux.  This fixes it.
>
> David, looking at this once more, does this not need a "&& type" test
> on AIX?  Before the AGGREGATE_TYPE_P test.  I suspect it only didn't
> crash on AIX because AIX doesn't mind dereferencing address 0?

The APIs for the functions explicitly allow type to be NULL for
libcalls, so type should be tested. AIX allows dereference of NULL
pointer, which explains the lack of symptoms on AIX.

As far as I can tell, adding a test for non-NULL type does not cause
additional incompatibilities.

I am committing the appended patch.

Bootstrapped on powerpc-ibm-aix7.2.0.0.

Thanks, David

* config/rs6000/rs6000.c (rs6000_function_arg): Ensure type is non-NULL.
(rs6000_arg_partial_bytes): Same.

Index: rs6000.c
===
--- rs6000.c(revision 267103)
+++ rs6000.c(working copy)
@@ -11999,7 +11999,8 @@ rs6000_function_arg (cumulative_args_t cum_v, mach
cum->fregno++;

   if (USE_FP_FOR_ARG_P (cum, elt_mode)
- && !(TARGET_AIX && !TARGET_ELF && AGGREGATE_TYPE_P (type)))
+ && !(TARGET_AIX && !TARGET_ELF
+  && type != NULL && AGGREGATE_TYPE_P (type)))
{
  rtx rvec[GP_ARG_NUM_REG + AGGR_ARG_NUM_REG + 1];
  rtx r, off;
@@ -12136,7 +12137,8 @@ rs6000_arg_partial_bytes (cumulative_args_t cum_v,
   align_words = rs6000_parm_start (mode, type, cum->words);

   if (USE_FP_FOR_ARG_P (cum, elt_mode)
-  && !(TARGET_AIX && !TARGET_ELF && AGGREGATE_TYPE_P (type)))
+  && !(TARGET_AIX && !TARGET_ELF
+  && type != NULL && AGGREGATE_TYPE_P (type)))
 {
   unsigned long n_fpreg = (GET_MODE_SIZE (elt_mode) + 7) >> 3;

[PATCH, rs6000] Replace X-form addressing with D-form addressing in new pass for Power9

2018-12-13 Thread Kelvin Nilsen



This patch is a refinement of a path first submitted to this list on Nov. 10, 
2018.  This new patch incorporates improvements suggested by 
seg...@gcc.gnu.org.  Two regression observed at the time this patch was 
previously distributed have been resolved as described here: 
https://sourceware.org/bugzilla/show_bug.cgi?id=23937

New D-form instructions available on Power9 introduce new code generation 
options that result in more efficient execution.

This new pass scans existing rtl expressions and replaces them with rtl 
expressions that favor selection of the D-form instructions in contexts for 
which the D-form instructions are preferred.  The new pass runs after the RTL 
loop optimizations since loop unrolling often introduces opportunities for 
beneficial replacements of X-form addressing instructions.

I have built and regression tested this patch on powerpc64le-unknown-linux 
(Power9) target with no regressions.

Is this ok for trunk?

gcc/ChangeLog:

2018-12-13  Kelvin Nilsen  

* config/rs6000/rs6000-p9dform.c: New file.
* config/rs6000/rs6000-passes.def: Add pass_insert_dform after
pass_loop2.
* config/rs6000/rs6000-protos.h
(rs6000_target_supports_dform_offset_p): New prototype.
(make_pass_insert_dform): New prototype.
* config/rs6000/rs6000.c (rs6000_target_supports_dform_offset_p):
New function.
* config/rs6000/t-rs6000: Add entry to compile rs6000-p9dform.c.
* config.gcc: Add entry to link new object file rs6000-p9dform.o.

gcc/testsuite/ChangeLog:

2018-12-13  Kelvin Nilsen  

* gcc.target/powerpc/p9-dform-0.c: New test.
* gcc.target/powerpc/p9-dform-1.c: New test.

Index: gcc/config/rs6000/rs6000-p9dform.c
===
--- gcc/config/rs6000/rs6000-p9dform.c  (nonexistent)
+++ gcc/config/rs6000/rs6000-p9dform.c  (working copy)
@@ -0,0 +1,1487 @@
+/* Subroutines used to transform array subscripting expressions into
+   forms that are more amenable to d-form instruction selection for p9
+   little-endian VSX code.
+   Copyright (C) 1991-2018 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "tree-pass.h"
+#include "rtx-vector-builder.h"
+#include "cfgloop.h"
+
+#include "insn-config.h"
+#include "recog.h"
+
+#include "print-rtl.h"
+#include "tree-pretty-print.h"
+
+#include "genrtl.h"
+
+/* This pass transforms array indexing expressions from a form that
+   favors selection of X-form instructions into a form that favors
+   selection of D-form instructions.
+
+   Showing favor for D-form instructions is especially important when
+   targeting Power9, as the Power9 architecture added a number of new
+   D-form instruction capabilities.
+
+   Consider, for example, the following loop, excerpted from an actual
+   program:
+
+double sacc, x[], y[], z[];
+sacc = 0.00;
+for (unsigned long long int i = 0; i < N; i++) {
+  z[i] = x[i] * y[i];
+  sacc += z[i];
+}
+
+   Compile this program with the following gcc options which enable both
+   vectorization and loop unrolling:
+-m64 -fdump-rtl-all-details -mcpu=power9 -mtune=power9 -funroll-loops -O3
+
+   Without this pass, this loop is represented by the following:
+
+   lxvx:  16
+   addi:   8
+   xvmuldp:8
+   stxvx:  8
+   fmr:8
+   xxpermdi:   8
+   fadd:  16
+   bdnz:   1
+ ___
+ total:   73 instructions
+
+.L3:
+   lxvx 0,29,11
+   lxvx 12,30,11
+   addi 12,11,16
+   addi 0,11,48
+   addi 5,11,64
+   addi 9,11,32
+   addi 6,11,80
+   addi 7,11,96
+   addi 8,11,112
+   lxvx 2,29,12
+   lxvx 3,30,12
+   lxvx 4,29,0
+   lxvx 5,30,0
+   lxvx 10,30,9
+   lxvx 11,29,5
+   xvmuldp 6,0,12
+   lxvx 13,30,5
+   lxvx 8,29,9
+   lxvx 27,29,6
+   lxvx 28,30,6
+   xvmuldp 7,2,3
+   lxvx 29,29,7
+   lxvx

[PATCH] v4: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-13 Thread David Malcolm

On Wed, 2018-12-12 at 15:37 -0500, Jason Merrill wrote:
> On 12/7/18 3:13 PM, David Malcolm wrote:
> > On Tue, 2018-12-04 at 18:31 -0500, Jason Merrill wrote:
> > > On 12/3/18 5:10 PM, Jeff Law wrote:
> > > > On 11/19/18 9:51 AM, David Malcolm wrote:
> > 
> > [...]
> > > > @@ -1058,6 +1058,9 @@ grokbitfield (const cp_declarator
> > > > *declarator,
> > > > return NULL_TREE;
> > > >   }
> > > >   
> > > > +  if (width)
> > > > +STRIP_ANY_LOCATION_WRAPPER (width);
> > > 
> > > Why is this needed?  We should already be reducing width to an
> > > unwrapped
> > > constant value somewhere along the line.
> > 
> > "width" is coming from cp_parser_member_declaration, from:
> > 
> >   /* Get the width of the bitfield.  */
> >   width = cp_parser_constant_expression (parser, false,
> > NULL,
> >  cxx_dialect >=
> > cxx11);
> > 
> > and currently nothing is unwrapping the value.  We presumably need
> > to
> > unwrap (or fold?) it before it is stashed in
> >DECL_BIT_FIELD_REPRESENTATIVE (value).
> > 
> > Without stripping (or folding) here, we e.g. lose a warning and get
> > this:
> >FAIL: g++.dg/abi/empty22.C  -std=gnu++98  (test for warnings,
> > line 15)
> 
> Why does that happen?  check_bitfield_decl ought to handle the
> location 
> wrapper fine.  That's where it gets folded.

The unstripped location wrapper defeats this check for zero in
check_field_decls within cp/class.c:

3555  if (DECL_C_BIT_FIELD (x)
3556  && integer_zerop (DECL_BIT_FIELD_REPRESENTATIVE (x)))
3556/* We don't treat zero-width bitfields as making a class
3557   non-empty.  */
3558;
3559  else

leading it to erroneously use the "else" clause, which thus erroneously
clears CLASSTYPE_EMPTY_P, leading to the loss of:

g++.dg/abi/empty22.C:15:6: warning: empty class 'dummy' parameter passing
  ABI changes in -fabi-version=12 (GCC 8) [-Wabi]
   15 |   fun(d, f); // { dg-warning "empty" }
  |   ~~~^~

check_bitfield_decl is called *after* that check, so the folding there
doesn't help.

> > > > @@ -656,6 +656,9 @@ add_capture (tree lambda, tree id, tree
> > > > orig_init, bool by_reference_p,
> > > > listmem = make_pack_expansion (member);
> > > > initializer = orig_init;
> > > >   }
> > > > +
> > > > +  STRIP_ANY_LOCATION_WRAPPER (initializer);
> > > 
> > > Why is this needed?  What cares about the tree codes of the
> > > capture
> > > initializer?
> > 
> > This is used to populate LAMBDA_EXPR_CAPTURE_LIST.  Without
> > stripping,
> > we end up with wrapped VAR_DECLs, rather than the VAR_DECLs
> > themselves,
> 
> Sure, that sounds fine.
> 
> > and this confuses things later on, for example leading to:
> > 
> >   PASS -> FAIL : g++.dg/cpp0x/lambda/lambda-type.C  -std=c++14
> > (test for excess errors)
> >   PASS -> FAIL : g++.dg/cpp0x/lambda/lambda-type.C  -std=c++17
> > (test for excess errors)
> 
> Confuses how?

Without stripping, we get extra errors for decltype() on identifiers in
the capture-list, e.g.:

  [i] {
same_type();
same_type();
  };

cpp0x/lambda/lambda-type.C: In lambda function:
cpp0x/lambda/lambda-type.C:48:27: error: 'i' is not captured
   48 | same_type();
  |   ^
cpp0x/lambda/lambda-type.C:48:39: error: template argument 1 is invalid
   48 | same_type();
  |   ^

These occur because, in capture_decltype, when searching for the decl for
"i" here:

  tree cap = value_member (decl, LAMBDA_EXPR_CAPTURE_LIST (lam));

the LAMBDA_EXPR_CAPTURE_LIST contains a wrapper around "i" rather than "i"
itself, and so "i" isn't found by value_member; "cap" thus becomes NULL_TREE,
and thus the error.

Without the stripping, we also gain:

cpp0x/lambda/lambda-type.C: In lambda function:
cpp0x/lambda/lambda-type.C:52:41: error: invalid use of incomplete type
  'struct same_type'
   52 | same_type();
  | ^
cpp0x/lambda/lambda-type.C:17:8: note: declaration of 'struct same_type'
   17 | struct same_type;
  |^

> 
> Jason

Thanks for the feedback.

Here's version 4 of the patch, due to rebasing it against today's r267082.

As before, successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, in
conjunction with the followup patch:
  [PATCH 2/2] v2: C++: improvements to binary operator diagnostics (PR 
c++/87504)
https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00236.html
updated per your comments (as noted in an earlier version of the 1/2 patch,
they could be reworked to be fully independent, but there's some churn in
the results for -Wtautological-compare introduced by the 1st patch which
the 2nd patch addresses).

OK for trunk?

Thanks
Dave

gcc/c-family/ChangeLog:
PR c++/43064
PR c++/43486
* c-common.c (unsafe_conversion_p): Fold any location wrapper.
(verify_tree): Handle location

Re: [PATCH] handle expressions in __builtin_has_attribute (PR 88383)

2018-12-13 Thread Martin Sebor


On 12/13/18 11:59 AM, Jeff Law wrote:

On 12/5/18 8:55 PM, Martin Sebor wrote:

The __builtin_has_attribute function fails with an ICE when
its first argument is an expression such as INDIRECT_REF, or
many others.  The code simply assumes it's either a type or
a decl.  The attached patch corrects this oversight.

While testing the fix I also noticed the C++ front end expects
the first operand to be a unary expression, which causes most
other kinds of expressions to be rejected.  The patch fixes
that as well.

Finally, while testing the fix even more I realized that
the built-in considers the underlying array itself in ARRAY_REF
expressions rather than its type, which leads to inconsistent
results for overaligned arrays (it's the array itself that's
overaligned, not its elements).  So I fixed that too and
adjusted the one test designed to verify this.

Tested on x86_64-linux.

Martin

gcc-88383.diff

PR c/88383 - ICE calling __builtin_has_attribute on a reference

gcc/c-family/ChangeLog:

PR c/88383
* c-attribs.c (validate_attribute): Handle expressions.
(has_attribute): Handle types referenced by expressions.
Avoid considering array attributes in ARRAY_REF expressions .

gcc/cp/ChangeLog:

PR c/88383
* parser.c (cp_parser_has_attribute_expression): Handle assignment
expressions.

gcc/testsuite/ChangeLog:

PR c/88383
* c-c++-common/builtin-has-attribute-4.c: Adjust expectations.
* c-c++-common/builtin-has-attribute-6.c: New test.

Well, the high level question here is do we want to support this builtin
on expressions at all.  Generally attributes apply to types and decls,
not expressions.

Clearly we shouldn't fault, but my first inclination is that the
attribute applies to types or decls, not expressions.  In that case we
should just be issuing an error.

I could be convinced otherwise, so if you think we should support
passing expressions to this builtin, make your case.


The support is necessary in order to determine the attributes
in expressions such as:

  struct S { __attribute__ ((packed)) int a[32]; };

  extern struct S s;

  _Static_assert (__builtin_has_attribute (s.a, packed));

Martin

[PATCH/doc] consistently use @code for const and volatile qualifiers

2018-12-13 Thread Martin Sebor


As we discussed, the manual isn't completely consistent in its
use of @code for const and volatile qualifiers.  I made a pass
through extend.texi and added @code wherever it seemed to be
missing.  This doesn't mean all uses but only those that
specifically refer to the qualifiers.  In terms like "const
function" or "volatile object" the terms "const" and "volatile"
do not specifically refer to the qualifiers and so they don't
get the @code.  AFAICS, this is in line with existing usage in
the manual and also in the C and C++ standard (though none is
100% consistent).

While doing this, I also noticed similar inconsistencies in
the quoting of the values true and false, and in references
to the asm statement.  There too the manual tends to use @code
more often than not, so I made similar changes there.  Here
too I've left unchanged uses of the ordinary English words
true or false.

I made no other changes here so the churn is only the result
of maintaining the 80 characters per line limit.

If we're happy with this as the general rule of thumb to use
in these cases I'll propose an update to the Coding Guidelines
to capture it.

Martin
gcc/ChangeLog:

	* doc/extend.texi: Consistemtly use @code for const and volatile
	qualifiers and asm statements.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 267103)
+++ gcc/doc/extend.texi	(working copy)
@@ -8483,7 +8483,8 @@ All basic @code{asm} blocks are implicitly volatil
 
 @item inline
 If you use the @code{inline} qualifier, then for inlining purposes the size
-of the asm is taken as the smallest size possible (@pxref{Size of an asm}).
+of the @code{asm} statement is taken as the smallest size possible (@pxref{Size
+of an asm}).
 @end table
 
 @subsubheading Parameters
@@ -8629,7 +8630,8 @@ qualifier to disable certain optimizations. @xref{
 
 @item inline
 If you use the @code{inline} qualifier, then for inlining purposes the size
-of the asm is taken as the smallest size possible (@pxref{Size of an asm}).
+of the @code{asm} statement is taken as the smallest size possible
+(@pxref{Size of an asm}).
 
 @item goto
 This qualifier informs the compiler that the @code{asm} statement may 
@@ -8741,7 +8743,7 @@ void DoCheck(uint32_t dwSomeValue)
 The next example shows a case where the optimizers can recognize that the input 
 (@code{dwSomeValue}) never changes during the execution of the function and can 
 therefore move the @code{asm} outside the loop to produce more efficient code. 
-Again, using @code{volatile} disables this type of optimization.
+Again, using the @code{volatile} qualifier disables this type of optimization.
 
 @example
 void do_print(uint32_t dwSomeValue)
@@ -8797,11 +8799,11 @@ GCC's optimizers do not treat this code like the n
 earlier examples. They do not move it out of loops or omit it on the 
 assumption that the result from a previous call is still valid.
 
-Note that the compiler can move even volatile @code{asm} instructions relative 
+Note that the compiler can move even @code{volatile asm} instructions relative
 to other code, including across jump instructions. For example, on many 
 targets there is a system register that controls the rounding mode of 
-floating-point operations. Setting it with a volatile @code{asm}, as in the 
-following PowerPC example, does not work reliably.
+floating-point operations. Setting it with a @code{volatile asm} statement,
+as in the following PowerPC example, does not work reliably.
 
 @example
 asm volatile("mtfsf 255, %0" : : "f" (fpenv));
@@ -8808,9 +8810,10 @@ asm volatile("mtfsf 255, %0" : : "f" (fpenv));
 sum = x + y;
 @end example
 
-The compiler may move the addition back before the volatile @code{asm}. To 
-make it work as expected, add an artificial dependency to the @code{asm} by 
-referencing a variable in the subsequent code, for example: 
+The compiler may move the addition back before the @code{volatile asm}
+statement. To make it work as expected, add an artificial dependency to
+the @code{asm} by referencing a variable in the subsequent code, for
+example:
 
 @example
 asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
@@ -8819,7 +8822,7 @@ sum = x + y;
 
 Under certain circumstances, GCC may duplicate (or remove duplicates of) your 
 assembly code when optimizing. This can lead to unexpected duplicate symbol 
-errors during compilation if your asm code defines symbols or labels. 
+errors during compilation if your @code{asm} code defines symbols or labels. 
 Using @samp{%=} 
 (@pxref{AssemblerTemplate}) may help resolve this problem.
 
@@ -8848,7 +8851,7 @@ that some assembler dialects use semicolons to sta
 Do not expect a sequence of @code{asm} statements to remain perfectly 
 consecutive after compilation, even when you are using the @code{volatile} 
 qualifier. If certain instructions need to remain consecutive in the output, 
-put them in a single multi-instruction asm statement.
+put them in

Re: [PATCH AutoFDO/4]Fix profile count computation/propagation.

2018-12-13 Thread Jeff Law

On 12/10/18 6:36 PM, Bin.Cheng wrote:
> On Thu, Nov 8, 2018 at 6:33 AM Jeff Law  wrote:
>> On 10/31/18 12:34 AM, bin.cheng wrote:
>>> Hi,
>>> This patch fixes AutoFDO breakage on trunk.  The main reason for breakage 
>>> is AutoFDO
>>> relies on standalone edge count computing and propagating profile 
>>> count/probability info
>>> on CFG, but in new infra, edge count is actually computed from probability, 
>>> which leads
>>> to chicken-egg problem and corrupted profile count.  This patch fixes the 
>>> issue by using
>>> explicit edge count.
>>>
>>> There is another issue not touched yet that, in quite common case, profiled 
>>> samples are
>>> not enough and profile info computed for lots of blocks is ZERO.  In the 
>>> future, we may
>>> add some heuristics checking quality of sampled counts and reverting to 
>>> guessed profile
>>> count if necessary.  I think change made in this patch is also needed for 
>>> that.
>>>
>>> Package mysql server is used in test of this patch set.  It can't be 
>>> compiled with autofdo
>>> on trunk, even with compilation issues worked-around, there isn't 
>>> performance improvement.
>>> I local experiments, with this patch set it's improved by 12.3%, 4.3% 
>>> irrespectively for
>>> read-only/write-heavy benchmarks.  Unfortunately,  this patch set was 
>>> written against
>>> GCC 8 branch a while ago, improvement gets worse on trunk and I haven't 
>>> investigated
>>> the reason yet.  I guess there are still other issues which need to be 
>>> fixed in the future.
>>>
>>> Bootstrap and test on x86_64 in patch set.  Is it OK?
>>>
>>> Thanks,
>>> bin
>>> 2018-10-31  Bin Cheng  
>>>
>>>   * auto-profile.c (AFDO_EINFO): New macro.
>>>   (struct edge_info): New structure.
>>>   (is_edge_annotated, set_edge_annotated): Delete.
>>>   (afdo_propagate_edge, afdo_propagate_circuit, afdo_propagate): Remove
>>>   parameter.  Adjust edge count computation and annotation using struct
>>>   edge_info.
>>>   (afdo_calculate_branch_prob): Ditto.
>>>   (afdo_annotate_cfg): Simplify code setting basic block profile count.
>>>
>>>
>>> 0004-Fix-AutoFDO-breakage-after-profile-count-rewriting.patch
>>>
>>> From 6506c12d1b633b6d1bfae839b3633a4f99b3a481 Mon Sep 17 00:00:00 2001
>>> From: chengbin 
>>> Date: Mon, 20 Aug 2018 15:25:02 +0800
>>> Subject: [PATCH 4/4] Fix AutoFDO breakage after profile count rewriting.
>>>
>>> ---
>>>  gcc/auto-profile.c | 190 
>>> ++---
>>>  1 file changed, 95 insertions(+), 95 deletions(-)
>>>
>>> diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
>>> index cde4f41c1d9..ff3ea23d830 100644
>>> --- a/gcc/auto-profile.c
>>> +++ b/gcc/auto-profile.c
>>> @@ -101,6 +101,17 @@ along with GCC; see the file COPYING3.  If not see
>>>  namespace autofdo
>>>  {
>>>
>>> +/* Intermediate edge info used when propagating AutoFDO profile 
>>> information.
>>> +   We can't edge->count() directly since it's computed from edge's 
>>> probability
>>> +   while probability is yet not decided during propagation.  */
>>> +#define AFDO_EINFO(e) ((struct edge_info *) e->aux)
>>> +struct edge_info
>>> +{
>>> +  edge_info () : count (profile_count::zero ().afdo ()), annotated_p 
>>> (false) {}
>>> +  profile_count count;
>>> +  bool annotated_p;
>>> +};
>> edge_info isn't POD, so make it a class rather than a struct.
>>
>> OK with that change assuming it does not have a hard dependency on prior
>> patches in this series.
> Thanks very much for review.  Now that all prerequisite patches are
> approved and committed, I update this one by making edge_info a class
> as suggested.
> Bootstrap and test as before, is it OK?
> 
> Thanks,
> bin
> 
> 2018-12-11  Bin Cheng  
> 
> * auto-profile.c (AFDO_EINFO): New macro.
> (class edge_info): New class.
> (is_edge_annotated, set_edge_annotated): Delete.
> (afdo_propagate_edge, afdo_propagate_circuit, afdo_propagate): Remove
> parameter.  Adjust edge count computation and annotation using class
> edge_info.
> (afdo_calculate_branch_prob, afdo_annotate_cfg): Likewise.
OK
jeff

Re: [PATCH 1/3][GCC] Add new target hook asm_post_cfi_startproc

2018-12-13 Thread Jason Merrill


On 11/5/18 5:18 AM, Sam Tebbs wrote:



On 11/05/2018 07:54 AM, Richard Biener wrote:

On Fri, 2 Nov 2018, Sam Tebbs wrote:


On 11/02/2018 05:28 PM, Sam Tebbs wrote:


Hi all,

This patch adds a new target hook called "asm_post_cfi_startproc". This hook is
intended to be used by the aarch64 backend to emit a directive that enables
support for unwinding frames signed with the pointer authentication B-key. This
hook is triggered after the ".cfi_startproc" directive is emitted in
gcc/dwarf2out.c.

Bootstrapped on aarch64-none-linux-gnu and tested on aarch64-none-elf with no 
regressions.

Ok for trunk?

Can you explain why existing prologue/cfi emission points are not
enough?


I couldn't find any target hooks that were triggered at the
assembly-printing level at the correct point in time (after
.cfi_startproc is emitted), please do point me to one if that is not the
case.

An alternative could have been to implement a new reg_note but that
would have meant adding target-specific code to target-agnostic files
and wouldn't have been as flexible.


And this seems consistent with the other stuff in 
dwarf2out_do_cfi_startproc.  You might amend the documentation to 
mention that the expected use is to add more .cfi_* directives.  OK with 
that change.


Jason

Re: [PATCH] handle expressions in __builtin_has_attribute (PR 88383)

2018-12-13 Thread Jeff Law

On 12/5/18 8:55 PM, Martin Sebor wrote:
> The __builtin_has_attribute function fails with an ICE when
> its first argument is an expression such as INDIRECT_REF, or
> many others.  The code simply assumes it's either a type or
> a decl.  The attached patch corrects this oversight.
> 
> While testing the fix I also noticed the C++ front end expects
> the first operand to be a unary expression, which causes most
> other kinds of expressions to be rejected.  The patch fixes
> that as well.
> 
> Finally, while testing the fix even more I realized that
> the built-in considers the underlying array itself in ARRAY_REF
> expressions rather than its type, which leads to inconsistent
> results for overaligned arrays (it's the array itself that's
> overaligned, not its elements).  So I fixed that too and
> adjusted the one test designed to verify this.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-88383.diff
> 
> PR c/88383 - ICE calling __builtin_has_attribute on a reference
> 
> gcc/c-family/ChangeLog:
> 
>   PR c/88383
>   * c-attribs.c (validate_attribute): Handle expressions.
>   (has_attribute): Handle types referenced by expressions.
>   Avoid considering array attributes in ARRAY_REF expressions .
> 
> gcc/cp/ChangeLog:
> 
>   PR c/88383
>   * parser.c (cp_parser_has_attribute_expression): Handle assignment
>   expressions.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR c/88383
>   * c-c++-common/builtin-has-attribute-4.c: Adjust expectations.
>   * c-c++-common/builtin-has-attribute-6.c: New test.
Well, the high level question here is do we want to support this builtin
on expressions at all.  Generally attributes apply to types and decls,
not expressions.

Clearly we shouldn't fault, but my first inclination is that the
attribute applies to types or decls, not expressions.  In that case we
should just be issuing an error.

I could be convinced otherwise, so if you think we should support
passing expressions to this builtin, make your case.

jeff

Re: [PATCH] Fix cleanup_auto_inc_dec on x86 (PR rtl-optimization/88416)

2018-12-13 Thread Jeff Law

On 12/11/18 9:43 AM, Jakub Jelinek wrote:
> Hi!
> 
> As mentioned in the PR, x86 (maybe a couple of other targets) isn't an
> AUTO_INC_DEC target, it doesn't have REG_INC notes nor wants the generic
> code to synthetize any pre/post inc/dec/modify, but does support push/pop
> patterns that use those RTL codes.
> 
> If unlucky enough, as on the following testcase, we can end up with trying
> to propagate such pre/post inc/dec into a DEBUG_INSN, which is invalid.
> 
> As cleanup_auto_inc_dec calls copy_rtx which is pretty much the same
> function as cleanup_auto_inc_dec in the way how it performs deep copy of the
> RTX, except that cleanup_auto_inc_dec also handles the pre/post
> inc/dec/modify, I think the easiest fix is just to remove the special case,
> it shouldn't make it any slower on !AUTO_INC_DEC targets.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2018-12-11  Jakub Jelinek  
> 
>   PR rtl-optimization/88416
>   * valtrack.c (cleanup_auto_inc_dec): Handle pre/post-inc/dec/modify
>   even if !AUTO_INC_DEC.
> 
>   * gcc.target/i386/pr88416.c: New test.
OK
jeff

Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-13 Thread Jeff Law

On 12/12/18 8:50 PM, bin.cheng wrote:
> Hi,
> 
> Due to ICE and mal-functional bugs, indirect call value profile transformation
> is disabled on GCC-7/8/trunk.  This patch restores the transformation.  The
> main issue is AutoFDO should store cgraph_node's profile_id of callee func in
> the first histogram value's counter, rather than pointer to callee's name 
> string
> as it is now.
> With the patch, some "Indirect call -> direct call" tests pass with autofdo, 
> while
> others are unstable.  I think the instability is caused by poor perf data 
> collected
> during regrets run, and can confirm these tests pass if good perf data could 
> be
> collected in manual experiments.
> 
> Bootstrap and test along with previous patches.  Is it OK?
> 
> FYI, an update about AutoFDO status: 
> All AutoFDO ICEs in regtest are fixed, while several tests still failing fall 
> in below
> three categories:
> 
> Unstable indirect call value profile transformation:
> FAIL: g++.dg/tree-prof/indir-call-prof.C scan-ipa-dump afdo "Indirect call -> 
> direct call.* AA transformation on insn"
> FAIL: g++.dg/tree-prof/morefunc.C scan-ipa-dump-times afdo "Indirect call -> 
> direct call" 2
> FAIL: g++.dg/tree-prof/pr35545.C scan-ipa-dump profile_estimate "Indirect 
> call -> direct call"
> 
> loop peeling case because we don't honor autofdo profile count as reliable:
> FAIL: gcc.dg/tree-prof/peel-1.c scan-tree-dump cunroll "Peeled loop ., 1 
> times"
> 
> cold/hot partition cases:
> FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler foo[._]+cold
> FAIL: gcc.dg/tree-prof/cold_partition_label.c scan-assembler size[ 
> \ta-zA-Z0-0]+foo[._]+cold
> FAIL: gcc.dg/tree-prof/section-attr-1.c scan-assembler .section[\t 
> ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold
> FAIL: gcc.dg/tree-prof/section-attr-2.c scan-assembler .section[\t 
> ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold
> FAIL: gcc.dg/tree-prof/section-attr-3.c scan-assembler .section[\t 
> ]*.text.unlikely[\\n\\r]+[\t ]*.size[\t ]*foo.cold
> These are more difficult to enable because we can't simply treat autofdo::zero
> count as cold, it's just too many.
> 
> Besides regtest, I run autofdo with kernel/mysql-server, the build and 
> performance
> match expectations now, but I haven't run autofdo with any spec yet.
> 
> Thanks,
> bin
> 
> 2018-12-13  Bin Cheng  
> 
> * auto-profile.c (afdo_indirect_call): Skip generating histogram
> value if we can't find cgraph_node for then indirected callee.  Save
> profile_id of the cgraph_node in histogram value's first counter.
> * value-prof.c (gimple_value_profile_transformations): Don't skip
> for flag_auto_profile.
OK
jeff

Re: [PATCH] Fix split-path-5.c testcase (PR testsuite/88454)

2018-12-13 Thread Jeff Law

On 12/13/18 12:49 AM, Jakub Jelinek wrote:
> Hi!
> 
> On Mon, Dec 10, 2018 at 09:56:46PM -0700, Jeff Law wrote:
>> Note that split-path-5 has the same basic structure.  A half-diamond
>> with a single statement in the middle block that should be trivially
>> if-convertable if profitable.  So I adjusted that testcase.
> 
> The split-path-5.c testcase now fails on powerpc64*, arm*, aarch64* etc.
> targets.
> 
> When looking for the difference, I found out it is a -fsigned-char vs.
> -funsigned-char issue, on -funsigned-char targets we are simply compiling
> something quite different.
> 
> The following patch fixes it, regtested on x86_64-linux (-m64/-m32) and
> tested with cross to aarch64-linux and powerpc64-linux.  Ok for trunk?
> 
> 2018-12-13  Jakub Jelinek  
> 
>   PR testsuite/88454
>   * gcc.dg/tree-ssa/split-path-5.c (__ctype_ptr__): Change type from
>   const char * to const signed char *.
>   (bmhi_init): Change pattern parameter's type the same.  Use
>   __builtin_strlen instead of undeclared strlen.
Yea.  Interesting that we hadn't had problems with it before.  THanks
for taking care of it while I was out.

jeff

Re: [RFC] Handle LHS zero_extracts in DSE

2018-12-13 Thread Jeff Law

On 12/7/18 8:57 AM, Andreas Krebbel wrote:
> Hi,
> 
> debugging a problem with an older GCC (4.6).  I've seen RTXs using
> ZERO_EXTRACT LHS operands on MEMs as in:
> 
> (insn 7 6 0 (set (zero_extract:DI (mem/s/c:QI (plus:DI (reg/f:DI 39 
> virtual-stack-vars)
>   (const_int -5 [0xfffb])) [0+0 S1 A8])
>   (const_int 24 [0x18])
>   (const_int 0 [0]))
>   (subreg:DI (reg:SI 45) 0)) t.c:19 -1
>  (nil))
Yes.  They're essentially a bitfield manipulation.


[ ... ]

> 
> However, I still think that's a legal RTX which needs to be handled by
> DSE correctly. I couldn't find anything preventing that problem from
> occuring also with current GCCs. On the other hand I couldn't trigger
> it with anything in testsuite.I'd look for a target which can do a bitfield 
> manipulation of a memory
object.  So likely old targets nobody cares about :-)  m68k, h8, etc.


> 
> Perhaps something like that would be needed for current GCC:
> 
> diff --git a/gcc/dse.c b/gcc/dse.c
> index 21d166d..d27fb54 100644
> --- a/gcc/dse.c
> +++ b/gcc/dse.c
> @@ -1346,6 +1346,47 @@ record_store (rtx body, bb_info_t bb_info)
> 
>mem = SET_DEST (body);
> 
> +  /* Deal with (zero_extract:XX (mem:QI ...)).  */
> +  if (GET_CODE (mem) == ZERO_EXTRACT && MEM_P (XEXP (mem, 0)))
> +{
> +  rtx size_op = XEXP (mem, 1);
> +  rtx offset_op = XEXP (mem, 2);
> +  enum machine_mode mode = GET_MODE (mem);
> +  scalar_int_mode int_mode;
> +
> +  /* Turn a (zero_extract:XX (mem:QI ...)) into a (mem:BLK with
> +  the proper size and the adjusted address.  */
> +  if (CONST_INT_P (size_op)
> +   && CONST_INT_P (offset_op)
> +   && is_a  (mode, _mode)
> +   && INTVAL (size_op) + INTVAL (offset_op) <= GET_MODE_PRECISION 
> (int_mode))
> + {
> +   int size = INTVAL (size_op);
> +   int offset = INTVAL (offset_op);
> +
> +   mem = copy_rtx (XEXP (mem, 0));
> +   if (BITS_BIG_ENDIAN)
> + mem = adjust_address (mem, BLKmode, offset / BITS_PER_UNIT);
> +   else
> + mem = adjust_address (mem, BLKmode,
> +   (GET_MODE_BITSIZE (int_mode) - size - offset)
> +   / BITS_PER_UNIT);
> +
> +   set_mem_size (mem, (size - 1) / BITS_PER_UNIT + 1);
> + }
> +  else
> + {
> +   if (find_reg_note (insn_info->insn, REG_UNUSED, mem) == NULL)
> + {
> +   /* If the set or clobber is unused, then it does not effect our
> +  ability to get rid of the entire insn.  */
> +   insn_info->cannot_delete = true;
> +   clear_rhs_from_active_local_stores ();
> + }
> +   return 0;
> + }
> +}
Looks sane at a high level.  The trick is exercising it...

jeff

Re: [PATCH AutoFDO]Call update_max_bb_count even if autofdo counts are all zeros

2018-12-13 Thread Jeff Law

On 12/12/18 8:50 PM, bin.cheng wrote:
> Hi,
> This patch calls update_max_bb_count even if autofdo counts are all zeros,
> otherwise it would trigger ICE because of mismatch between basic blocks'
> count (all autofdo::zero) and cfun->cfg->max_count (guessed::non_zero).
> For functions with all autofdo::zero counts, we should improve by restoring
> guessed profile counts, but this maybe not for GCC-9.
> 
> Bootstrap and test on x86_64 along with following patches.  Is it OK?
> 
> Thanks,
> bin
> 
> 2018-12-13  Bin Cheng  
> 
> * auto-profile.c (afdo_annotate_cfg): Call update_max_bb_count even
> if autofdo counts are all zeros.
> 
OK.  Though note there may be some kind of skew between your
auto-profile.c and the trunk.  Your patch has a comment just before the
removed calls to update_max_bb_count that isn't on the trunk.

I'm sure you can adjust accordingly.

jeff

Re: [PATCH] avoid folding snprintf calls with bounds > INT_MAX (PR 87096)

2018-12-13 Thread Jeff Law

On 12/12/18 4:18 PM, Martin Sebor wrote:
> POSIX requires snprintf to fail with EOVERFLOW when the specified
> bound exceeds INT_MAX.  This requirement conflicts with the C
> requirements on valid calls to the function and isn't universally
> implemented (e.g., Glibc doesn't seem to follow it, and GCC has
> historically not paid heed to it either).  Nevertheless, there
> are implementations that do respect it (Solaris being one of
> them), and it seems that GCC should make a tricky situation
> even more treacherous for programmers by returning different
> values from some calls to the function than the library would.
> This is also what bug 87096 requests.  The patch also adds
> a warning that was missing from a subset of these troublesome
> calls.
> 
> The attached patch disables the snprintf constant folding and
> range optimization for calls to it and other related bounded
> functions when the bound is not known not to exceed INT_MAX.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-87096.diff
> 
> PR tree-optimization/87096 - Optimised snprintf is not POSIX conformant
> 
> gcc/ChangeLog:
> 
>   PR rtl-optimization/87096
>   * gimple-ssa-sprintf.c (sprintf_dom_walker::handle_gimple_call): Avoid
>   folding calls whose bound may exceed INT_MAX.  Diagnose bound ranges
>   that exceed the limit.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/87096
>   * gcc.dg/tree-ssa/builtin-snprintf-4.c: New test.
OK
jeff

Re: [PATCH, rs6000] Allow libitm to use HTM on newer hw and kernels

2018-12-13 Thread Peter Bergner

On 12/13/18 10:42 AM, Peter Bergner wrote:
> On 12/13/18 2:41 AM, Segher Boessenkool wrote:
>> Or like
>>
>>   unsigned long htm_flags = PPC_FEATURE2_HAS_HTM
>> #ifdef PPC_FEATURE2_HTM_NO_SUSPEND
>>  | PPC_FEATURE2_HTM_NO_SUSPEND
>> #endif
>>  | 0;
>>
>> Okay for trunk with either style.  Thanks!
> 
> Ok, I'll go with this and commit once I've done another round of
> testing.  I'm assuming your backport approval still stands for this
> code too.

Testing was clean everywhere.  Committed everywhere.  Thanks!

Peter

Re: [PATCH] match_asm_constraints: Use copy_rtx where needed (PR88001)

2018-12-13 Thread Jeff Law

On 12/12/18 1:22 PM, Segher Boessenkool wrote:
> The new insn here (temporarily) illegally shares RTL.  This fixes it.
> 
> Tested with an ARC cross, and regstrapped on powerpc64-linux {-m32,-m64}.
> Is this okay for trunk?
> 
> 
> Segher
> 
> 
> 2018-12-12  Segher Boessenkool  
> 
>   PR rtl-optimization/88001
>   * function.c (match_asm_constraints_1): Don't invalidly share RTL.
> 
> ---
>  gcc/function.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/function.c b/gcc/function.c
> index 69523c1..60e96f3 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -6529,7 +6529,7 @@ match_asm_constraints_1 (rtx_insn *insn, rtx *p_sets, 
> int noutputs)
>output_matched[match] = true;
>  
>start_sequence ();
> -  emit_move_insn (output, input);
> +  emit_move_insn (output, copy_rtx (input));
>insns = get_insns ();
>end_sequence ();
>emit_insn_before (insns, insn);
> 
Presumably INPUT  is already referenced by INSN, thus the potential
invalid sharing via the newly introduced move?

OK.

Jeff

Re: [PATCH AutoFDO]Skip generating histogram value for internal call

2018-12-13 Thread Jeff Law

On 12/12/18 8:50 PM, bin.cheng wrote:
> Hi,
> This patch skips generating histogram value for internal function call in 
> autofdo,
> otherwise it would trigger ICE with following patch re-enabling indirect call 
> value
> profile transformation.  I think this patch is actually needed for GCC-6 on 
> which
> indirect call value profile is not disabled.
> 
> Bootstrap and test on x86_64 along with following patches.  Is it OK?
> 
> Thanks,
> bin
> 
> 2018-12-13  Bin Cheng  
> 
> * auto-profile.c (afdo_indirect_call): Skip generating histogram
> value for internal call.
> 
OK
jeff

Re: RFA: PATCH to add pp command to gdbinit.in

2018-12-13 Thread Marek Polacek

On Thu, Dec 13, 2018 at 12:48:37PM -0500, Jason Merrill wrote:
> On Thu, Dec 13, 2018 at 12:26 PM Marek Polacek  wrote:
> > On Thu, Dec 13, 2018 at 10:15:07AM -0700, Jeff Law wrote:
> > > On 12/6/18 1:59 PM, Jason Merrill wrote:
> > > > Since pvt was removed, it's bugged me that to pretty-print a vec I
> > > > needed to write out "call debug($)".  So this patch adds a generic
> > > > command "pp" to print anything handled by a debug overload.
> > > >
> > > > OK for trunk?
> > > >
> > > Seems quite reasonable.  I didn't even know we had a debug method for 
> > > vecs.
> 
> > > I wonder if we should standardize this stuff so that every significant
> > > datastructure has a debug method.
> 
> Yes, I thought that was the idea.
> 
> > Yeah, I think go ahead.  Speaking of which, in the C++ FE, I used to could 
> > do
> > this:
> >
> > (gdb) call debug(parser)
> >
> > but that now prints some crap like [128], and so I have to type
> >
> > (gdb) call cp_debug_parser(stderr,parser)
> >
> > which bothers me very very much.  Anyone know what's up with that?  It used 
> > to
> > work.
> 
> Hmm, it works for me.

Aha -- I had to
(gdb) set overload-resolution on
which I had had set to off for some other reasons.

Marek

Re: RFA: PATCH to add pp command to gdbinit.in

2018-12-13 Thread Jason Merrill

On Thu, Dec 13, 2018 at 12:26 PM Marek Polacek  wrote:
> On Thu, Dec 13, 2018 at 10:15:07AM -0700, Jeff Law wrote:
> > On 12/6/18 1:59 PM, Jason Merrill wrote:
> > > Since pvt was removed, it's bugged me that to pretty-print a vec I
> > > needed to write out "call debug($)".  So this patch adds a generic
> > > command "pp" to print anything handled by a debug overload.
> > >
> > > OK for trunk?
> > >
> > Seems quite reasonable.  I didn't even know we had a debug method for vecs.

> > I wonder if we should standardize this stuff so that every significant
> > datastructure has a debug method.

Yes, I thought that was the idea.

> Yeah, I think go ahead.  Speaking of which, in the C++ FE, I used to could do
> this:
>
> (gdb) call debug(parser)
>
> but that now prints some crap like [128], and so I have to type
>
> (gdb) call cp_debug_parser(stderr,parser)
>
> which bothers me very very much.  Anyone know what's up with that?  It used to
> work.

Hmm, it works for me.

Jason

[PATCH] x86: Don't use get_frame_size to finalize stack frame

2018-12-13 Thread H.J. Lu

get_frame_size () returns used stack slots during compilation, which
may be optimized out later.  Since ix86_find_max_used_stack_alignment
is called by ix86_finalize_stack_frame_flags to check if stack frame
is required, there is no need to call get_frame_size () which may give
inaccurate final stack frame size.

Tested on AVX512 machine configured with

--with-arch=native --with-cpu=native

OK for trunk?


H.J.
---
gcc/

PR target/88483
* config/i386/i386.c (ix86_finalize_stack_frame_flags): Don't
use get_frame_size ().

gcc/testsuite/

PR target/88483
* gcc.target/i386/stackalign/pr88483.c: New test.
---
 gcc/config/i386/i386.c  |  1 -
 .../gcc.target/i386/stackalign/pr88483.c| 17 +
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/stackalign/pr88483.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index caa701fe242..edc8f4f092e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12876,7 +12876,6 @@ ix86_finalize_stack_frame_flags (void)
   && flag_exceptions
   && cfun->can_throw_non_call_exceptions)
   && !ix86_frame_pointer_required ()
-  && get_frame_size () == 0
   && ix86_nsaved_sseregs () == 0
   && ix86_varargs_gpr_size + ix86_varargs_fpr_size == 0)
 {
diff --git a/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c 
b/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
new file mode 100644
index 000..5aec8fd4cf6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/stackalign/pr88483.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -mavx2" } */
+
+struct B
+{
+  char a[12];
+  int b;
+};
+
+struct B
+f2 (void)
+{
+  struct B x = {};
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "and\[lq\]?\[^\\n\]*-\[0-9\]+,\[^\\n\]*sp" 
} } */
-- 
2.19.2

Re: RFA: PATCH to add pp command to gdbinit.in

2018-12-13 Thread Marek Polacek

On Thu, Dec 13, 2018 at 10:15:07AM -0700, Jeff Law wrote:
> On 12/6/18 1:59 PM, Jason Merrill wrote:
> > Since pvt was removed, it's bugged me that to pretty-print a vec I
> > needed to write out "call debug($)".  So this patch adds a generic
> > command "pp" to print anything handled by a debug overload.
> > 
> > OK for trunk?
> > 
> Seems quite reasonable.  I didn't even know we had a debug method for vecs.

Yeah, I think go ahead.  Speaking of which, in the C++ FE, I used to could do
this:

(gdb) call debug(parser)

but that now prints some crap like [128], and so I have to type

(gdb) call cp_debug_parser(stderr,parser)

which bothers me very very much.  Anyone know what's up with that?  It used to
work.

Marek

Re: RFA: PATCH to add pp command to gdbinit.in

2018-12-13 Thread Jeff Law

On 12/6/18 1:59 PM, Jason Merrill wrote:
> Since pvt was removed, it's bugged me that to pretty-print a vec I
> needed to write out "call debug($)".  So this patch adds a generic
> command "pp" to print anything handled by a debug overload.
> 
> OK for trunk?
> 
Seems quite reasonable.  I didn't even know we had a debug method for vecs.

I wonder if we should standardize this stuff so that every significant
datastructure has a debug method.

jeff

Re: [ping] Change static chain to r11 on aarch64

2018-12-13 Thread Uecker, Martin


Hi Wilco,

Am Donnerstag, den 13.12.2018, 16:33 + schrieb Wilco Dijkstra:
> Uecker, Martin wrote:
> > Am Mittwoch, den 12.12.2018, 22:04 + schrieb Wilco Dijkstra:
> > > Hi Martin,
> > > 
> > > > Does a non-executable stack actually improve security?
> > > 
> > > Absolutely, it's like closing your front door rather than just leave it 
> > > open
> > > for anyone.
> > 
> > The question is whether it is like closing the front door
> > while leaving a window open. It makes it harder to
> > exploit a system but does not really prevent it.
> 
> Security is never absolute, it's all about making it harder and more expensive
> for attackers so they go after other, easier targets.

One could also argue that it creates a false sense of security
and diverts resources from properly fixing the real problems
i.e. the buffer overflows which lets an attacker write to the
stack in the first place. A program without buffer overflows
is secure even without an executable stack and a program with
buffer overflows is still insecure even with a non-executable
stack.

> > It was implemented for Ada. But here is a patch to also
> > activate it for C:
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00853.html
> > 
> > With this patch one can use nested functions in C without
> > having an executable stack.
> 
> I tried your patch and it seems to inline the code to load the static chain 
> at every
> indirect callsite. For Ada I don't think that is ABI (IIRC no separate 
> compilation),
> but for C it would create a new ABI.

I am a bit surprised that the static chain register is not
always already a fixed part of the ABI.

> >   but it wouldn't affect the ABI since you can
> > only take the address of a nested function when you're
> > the parent function.
> > But you can pass the address to another function. Without
> > trampolines, this other function needs to call the nested
> > function directly using the right ABI.
> 
> Yes that was a really bad idea - function pointers with a
> descriptor should be explictly  typed to avoid the need to
> use special trampolines.

This is essentially what Apple's block extension does. The
downside is that you cannot pass such pointers to existing code. 

> If we didn't want to expose the static chain register as an ABI
> with -fno-trampolines, we could use a helper function which could
> be made backwards compatible even if one changes the static chain
> register (it just needs to set all of them!).

Yes, this is a possibility. But I think it could simply be
fixed as part of the ABI.

Best,
Martin

Re: [PATCH][RS6000] Fix PR87870: ppc64 generates poor code when loading constants into TImode vars

2018-12-13 Thread Peter Bergner

On 11/16/18 5:29 PM, Segher Boessenkool wrote:
> On Fri, Nov 16, 2018 at 04:26:18PM -0600, Peter Bergner wrote:
>> However, when I made the change below, the length attribute seems a
>> little off.  For *_64bit, we have a length of 4, but for *_32bit, we
>> have a length of 32.  The "4" looks correct for both *_64bit and *_32bit
>> if we're loading an easy_vector_constant into one of the vector regs.
>> For loading a TImode constant into a GPR, then it could be anything from
>> 8 bytes to 40bytes (loading 0xdeadbeefcafebabefacefeedbaad) for
>> -m64.  Since TImode isn't supposrted in -m32 (yet?), who knows, probably
>> it would be 16 bytes to ??? bytes.
>>
>> Should those sizes be updated too?  If so, what should they be?
>> The smallest, average or worst case lengths?  I assume we could use
>> another iterator to separate the vector lengths from the gpr lengths
>> if we need to.
> 
> Worst case.  This is required for correctness.

Ok, I looked into this and the point where we must have correct length info
is in final assembly generation, so very very late.  For the alternatives
I'm changing, we're loading into GPR regs and these alternatives are always
split (split2), so these length values are never used/seen at final assembly
time.

Given the above, I'm guessing we should probably go with the most common
length value (ie, 8 for 64-bit and 16 for 32-bit)?  The following patch
implements that.  Does this seem reasonable to you?

Peter

gcc/
PR target/87870
* config/rs6000/vsx.md (nW): New mode iterator.
(vsx_mov_64bit): Use it.  Remove redundant GPR 0/-1 alternative.
Update length attribute for (, )  alternative.
(vsx_mov_32bit): Likewise.

gcc/testsuite/
PR target/87870
* gcc.target/powerpc/pr87870.c: New test.


Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 265971)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -183,6 +183,18 @@ (define_mode_attr ??r  [(V16QI "??r")
 (TF"??r")
 (TI"r")])
 
+;; A mode attribute used for 128-bit constant values.
+(define_mode_attr nW   [(V16QI "W")
+(V8HI  "W")
+(V4SI  "W")
+(V4SF  "W")
+(V2DI  "W")
+(V2DF  "W")
+(V1TI  "W")
+(KF"W")
+(TF"W")
+(TI"n")])
+
 ;; Same size integer type for floating point data
 (define_mode_attr VSi [(V4SF  "v4si")
   (V2DF  "v2di")
@@ -1193,17 +1205,17 @@ (define_insn_and_split "*xxspltib_
 
 ;;  VSX store  VSX load   VSX move  VSX->GPR   GPR->VSXLQ (GPR)
 ;;  STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIBVSPLTISW
-;;  VSX 0/-1   GPR 0/-1   VMX const GPR const  LVX (VMX)   STVX 
(VMX)
+;;  VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
"=ZwO,  , , r, we,?wQ,
 ?,   ??r,   ??Y,   , wo,v,
-?,*r,v, ??r,   wZ,v")
+?,v, , wZ,v")
 
(match_operand:VSX_M 1 "input_operand" 
", ZwO,   , we,r, r,
 wQ,Y, r, r, wE,jwM,
-?jwM,  jwM,   W, W, v, wZ"))]
+?jwM,  W, ,  v, wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)
&& (register_operand (operands[0], mode) 
@@ -1214,25 +1226,25 @@ (define_insn "vsx_mov_64bit"
   [(set_attr "type"
"vecstore,  vecload,   vecsimple, mffgpr,mftgpr,load,
 store, load,  store, *, vecsimple, 
vecsimple,
-vecsimple, *, *, *, vecstore,  
vecload")
+vecsimple, *, *, vecstore,  vecload")
 
(set_attr "length"
"4, 4, 4, 8, 4, 8,
 8, 8, 8, 8, 4, 4,
-4, 8, 20,20,4, 4")])
+4, 20,8, 4, 4")])
 
 ;;  VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;  XXSPLTIB   VSPLTISW   VSX 0/-1   GPR 0/-1   VMX const  GPR 
const
+;;  XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;  LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
"=ZwO,  , , ??r,   ??Y,   ,
-

Re: [PATCH] Delete powerpcspe

2018-12-13 Thread Jeff Law

On 12/12/18 10:33 AM, Segher Boessenkool wrote:
> On Wed, Dec 12, 2018 at 11:36:29AM +0100, Richard Biener wrote:
>> On Tue, Dec 11, 2018 at 2:37 PM Jeff Law  wrote:
>>> One way to deal with these problems is to create a fake simulator that
>>> always returns success.  That's what my tester does for the embedded
>>> targets.  That allows us to do reliable compile-time tests as well as
>>> the various scan-whatever tests.
>>>
>>> It would be trivial to start sending those results to gcc-testresults.
>>
>> I think it would be more useful if the execute testing would be
>> reported as UNSUPPORTED rather than simply PASS w/o being
>> sure it does.
> 
> Yes.
Yes, but I don't think we've got a reasonable way to do that in the
existing dejagnu framework.


> 
>> But while posting to gcc-testresults is a sign of testing tracking
>> regressions (and progressions!) in bugzilla and caring for those
>> bugs is far more important...
> 
> If results are posted to gcc-testresults then other people can get a
> feel whether the port is detoriating, and at what rate.  If no results
> are posted we just have to assume the worst.  Most people do not have
> the time (or setup) to test it for themselves.
Yup.  I wish I had the time to extract more of the data the tester is
gathering and produce this kind of info.

I have not made it a priority to try and address all the issues I've
seen in the tester.  We have some ports that are incredibly flaky
(epiphany for example), and many that have a lot of failures, but are
stable in their set of failures.

My goal to date has mostly been to identify regressions.  I'm not even
able to keep up with that.  For example s390/s390x have been failing for
about a week with their kernel builds.sparc, i686, aarch64 are
consistently tripping over regressions.  ia64 hasn't worked since we put
in qsort consistency checking, etc etc.

Jeff

Re: [PATCH, rs6000] Allow libitm to use HTM on newer hw and kernels

2018-12-13 Thread Peter Bergner

On 12/13/18 2:41 AM, Segher Boessenkool wrote:
> Or like
> 
>   unsigned long htm_flags = PPC_FEATURE2_HAS_HTM
> #ifdef PPC_FEATURE2_HTM_NO_SUSPEND
>   | PPC_FEATURE2_HTM_NO_SUSPEND
> #endif
>   | 0;
> 
> Okay for trunk with either style.  Thanks!

Ok, I'll go with this and commit once I've done another round of
testing.  I'm assuming your backport approval still stands for this
code too.

Peter

Re: [ping] Change static chain to r11 on aarch64

2018-12-13 Thread Wilco Dijkstra

Hi Martin,

Uecker, Martin wrote:
>Am Mittwoch, den 12.12.2018, 22:04 + schrieb Wilco Dijkstra:
>> Hi Martin,
>> 
>> > Does a non-executable stack actually improve security?
>> 
>> Absolutely, it's like closing your front door rather than just leave it open
>> for anyone.
>
> The question is whether it is like closing the front door
> while leaving a window open. It makes it harder to
> exploit a system but does not really prevent it.

Security is never absolute, it's all about making it harder and more expensive
for attackers so they go after other, easier targets.

> It was implemented for Ada. But here is a patch to also
> activate it for C:
>
> https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00853.html
>
> With this patch one can use nested functions in C without
> having an executable stack.

I tried your patch and it seems to inline the code to load the static chain at 
every
indirect callsite. For Ada I don't think that is ABI (IIRC no separate 
compilation),
but for C it would create a new ABI.

>  but it wouldn't affect the ABI since you can
> only take the address of a nested function when you're
> the parent function.

> But you can pass the address to another function. Without
> trampolines, this other function needs to call the nested
> function directly using the right ABI.

Yes that was a really bad idea - function pointers with a descriptor should be 
explictly 
typed to avoid the need to use special trampolines.

If we didn't want to expose the static chain register as an ABI with 
-fno-trampolines,
we could use a helper function which could be made backwards compatible even
if one changes the static chain register (it just needs to set all of them!).

Wilco

Re: [PATCH] Support AVX512F masked gather loads (PR tree-optimization/88464)

2018-12-13 Thread Jakub Jelinek

On Thu, Dec 13, 2018 at 05:09:17PM +0100, Richard Biener wrote:
> Is there any chance you could look at replacing the i386 code to work with
> the new IFN style used by Aarch64?

Maybe for GCC 10, I'm afraid it is a lot of work mainly due to the weirdnesses 
of
x86 mixed index vs. data sized types, that would be non-fun to represent in
the optabs.

Jakub

Re: [PATCH] Support AVX512F masked gather loads (PR tree-optimization/88464)

2018-12-13 Thread Richard Biener

On December 12, 2018 11:43:10 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>We support either the AVX2 gather loads (128-bit or 256-bit, using
>masks in
>vector registers), both conditional and unconditional, but AVX512F
>gather
>loads only unconditional.  The problem was that tree-vect-stmts.c
>didn't
>have code to deal with the integral masktype argument the builtins
>have,
>where we need to compute it as vector bool first and then VCE.
>
>Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK. 

Is there any chance you could look at replacing the i386 code to work with the 
new IFN style used by Aarch64? 

>2018-12-12  Jakub Jelinek  
>
>   PR tree-optimization/88464
>   * tree-vect-stmts.c (vect_build_gather_load_calls): Handle
>INTEGER_TYPE
>   masktype if mask is non-NULL.
>   (vectorizable_load): Don't reject masked gather loads if masktype
>   in the decl is INTEGER_TYPE.
>
>   * gcc.target/i386/avx512f-pr88462-1.c: New test.
>   * gcc.target/i386/avx512f-pr88462-2.c: New test.
>
>--- gcc/tree-vect-stmts.c.jj   2018-11-20 21:39:06.983478940 +0100
>+++ gcc/tree-vect-stmts.c  2018-12-12 20:05:42.980019685 +0100
>@@ -2647,8 +2647,13 @@ vect_build_gather_load_calls (stmt_vec_i
>   tree idxtype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
>  tree masktype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
>   tree scaletype = TREE_VALUE (arglist);
>+  tree real_masktype = masktype;
>   gcc_checking_assert (types_compatible_p (srctype, rettype)
>- && (!mask || types_compatible_p (srctype, masktype)));
>+ && (!mask
>+ || TREE_CODE (masktype) == INTEGER_TYPE
>+ || types_compatible_p (srctype, masktype)));
>+  if (mask && TREE_CODE (masktype) == INTEGER_TYPE)
>+masktype = build_same_sized_truth_vector_type (srctype);
> 
>   tree perm_mask = NULL_TREE;
>   tree mask_perm_mask = NULL_TREE;
>@@ -2763,9 +2768,9 @@ vect_build_gather_load_calls (stmt_vec_i
> mask_op = vec_mask;
> if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask)))
>   {
>-gcc_assert
>-  (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op)),
>- TYPE_VECTOR_SUBPARTS (masktype)));
>+poly_uint64 sub1 = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op));
>+poly_uint64 sub2 = TYPE_VECTOR_SUBPARTS (masktype);
>+gcc_assert (known_eq (sub1, sub2));
> var = vect_get_new_ssa_name (masktype, vect_simple_var);
> mask_op = build1 (VIEW_CONVERT_EXPR, masktype, mask_op);
> gassign *new_stmt
>@@ -2777,8 +2782,33 @@ vect_build_gather_load_calls (stmt_vec_i
> src_op = mask_op;
>   }
> 
>+  tree mask_arg = mask_op;
>+  if (masktype != real_masktype)
>+  {
>+tree utype;
>+if (TYPE_MODE (real_masktype) == TYPE_MODE (masktype))
>+  utype = real_masktype;
>+else
>+  utype = lang_hooks.types.type_for_mode (TYPE_MODE (masktype), 1);
>+var = vect_get_new_ssa_name (utype, vect_scalar_var);
>+mask_arg = build1 (VIEW_CONVERT_EXPR, utype, mask_op);
>+gassign *new_stmt
>+  = gimple_build_assign (var, VIEW_CONVERT_EXPR, mask_arg);
>+vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
>+mask_arg = var;
>+if (!useless_type_conversion_p (real_masktype, utype))
>+  {
>+gcc_assert (TYPE_PRECISION (utype)
>+<= TYPE_PRECISION (real_masktype));
>+var = vect_get_new_ssa_name (real_masktype, vect_scalar_var);
>+new_stmt = gimple_build_assign (var, NOP_EXPR, utype);
>+vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
>+mask_arg = var;
>+  }
>+src_op = build_zero_cst (srctype);
>+  }
>gcall *new_call = gimple_build_call (gs_info->decl, 5, src_op, ptr, op,
>- mask_op, scale);
>+ mask_arg, scale);
> 
>   stmt_vec_info new_stmt_info;
>   if (!useless_type_conversion_p (vectype, rettype))
>@@ -7555,20 +7585,6 @@ vectorizable_load (stmt_vec_info stmt_in
>TYPE_MODE (mask_vectype), true))
>   return false;
>   }
>-  else if (memory_access_type == VMAT_GATHER_SCATTER &&
>gs_info.decl)
>-  {
>-tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info.decl));
>-tree masktype
>-  = TREE_VALUE (TREE_CHAIN (TREE_CHAIN (TREE_CHAIN (arglist;
>-if (TREE_CODE (masktype) == INTEGER_TYPE)
>-  {
>-if (dump_enabled_p ())
>-  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>-   "masked gather with integer mask not"
>-   " supported.");
>-

Re: Add a loop versioning pass

2018-12-13 Thread Richard Biener

On December 12, 2018 7:43:10 PM GMT+01:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On Thu, Dec 6, 2018 at 2:19 PM Richard Sandiford
>>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and aarch64_be-elf.
>>> Also repeated the performance testing (but haven't yet tried an
>>> LTO variant; will do that over the weekend).
>>
>> Any results?
>
>Sorry, I should've remembered that finding time to run tests is easy,
>finding time to analyse them is hard.
>
>Speed-wise, the impact of the patch for LTO is similar to without,
>with 554.roms_r being the main beneficiary for both AArch64 and x86_64.
>I get a 6.8% improvement on Cortex-A57 with -Ofast -mcpu=native
>-flto=jobserver.
>
>Size-wise, there are three tests that grow by >=2% on x86_64:
>
>549.fotonik3d_r: 5.5%
>548.exchange2_r: 29.5%
>554.roms_r: 39.6%

Uh. With LTO we might have a reasonable guessed profile and you do have a 
optimize_loop_nest_for_speed guard on the transform? 

How does compile time fare with the above benchmarks?

>The roms increase seems fair enough given the benefit, since the
>whole program uses a similar style for handling arrays.
>
>fotonik3d is a mixed bag.  Some versioning conditions are from
>things like:
>
>  subroutine foo(a)
>real :: a(:,:,:)
>a(1,:,:) = ...
>  end subroutine
>
>where we version for the middle dimension having a stride of 1.
>This could be eliminated if we had more semantic information.
>
>Other conditions are for things like:
>
>  subroutine foo(a)
>real :: a(:,:,:)
>a(:,1,:) = ...
>  end subroutine
>
>where the pass really does help, but unfortunately those loops
>aren't hot and might not even be used at all.
>
>Some opportunities are cases in which we avoid gather loads, such as
>in the "mp" loads in the hot __material_mod_MOD_mat_updatee function.
>But mp feeds a gather load itself and these assignments aren't a
>critical part of the loop.
>
>(I'm testing on an AVX-only system, so these are synthesised gathers
>rather than real gathers.)
>
>The growth in 548.exchange2_r comes from reasonable changes to cold
>code.
>The test spends almost all its time in __brute_force_MOD_digits_2,
>which
>contains the same code before and after the patch, but which accounts
>for less than 1% of .text before the patch.
>
>> I've skimmed over the updated patch and it looks
>> a lot better now.
>>
>> +bool
>> +loop_versioning
>> +::find_per_loop_multiplication (address_info ,
>address_term_info )
>> +{
>>
>> is that what coding convention allows?
>
>Not sure we've settled on something, so I improvised.
>
>> For grepping I'd then say we should do
>>
>> bool loop_versioning::
>> find_per_loop_multiplication (...)
>>
>> ;)
>
>Sounds good to me.
>
>> Anywhere else we you use
>>
>> loop_versioning::foo
>>
>> so please stick to that.
>
>Yeah, I used that when I could avoid an argument spilling over 80
>chars,
>but I think I missed that the above function now fits into that
>category,
>Will double-check the others.
>
>> Otherwise OK.
>
>Thanks :-)
>
>> I think I don't see a testcase where we could version both loops in a
>nest
>> so I'm not sure whether the transform works fine when you are only
>> updating SSA form at the very end of the pass.
>
>You mean something like:
>
>  real :: foo(:,:), bar(:)
>
>  do i=1,n
>do j=1,n
>  foo(i,j) = ...
>end do
>bar(i) = ..
>  end do
>
>?  I can add a test if so.

Please. 

>At the moment the pass only looks for direct versioning opportunities
>in inner loops, so the assignment to bar wouldn't be treated as a
>versioning opportunity.  

Ah, OK. 

We should still hoist the version check
>for foo outside both loops, which is tested by loop_versioning_4.f90
>for cases in which the outer loop doesn't have its own array
>arithmetic, but isn't tested for cases like the above.
>
>> There may also be some subtle issues with substitute_and_fold being
>> applied to non-up-to-date SSA form given it folds stmts looking at
>> (single-use!) SSA edges.  The single-use guard might be what saves
>you
>> here (SSA uses in the copies are not yet updated to point to the
>> copied DEFs).
>
>OK.  I was hoping that because we only apply substitute_and_fold
>to new code, there would be no problem with uses elsewhere.

Might be, yes.

>Would it be safer to:
>
>  - version all loops we want to version
>  - update SSA explicitly
>  - apply substitute and fold to all "new" loops

That would be definitely less fishy. But you need to get at the actual 'new' 
SSA names for the replacements as I guess they'll be rewritten? Or maybe those 
are not. 

>?  Could we then get away with returning a 0 TODO at the end?

Yes. 

Richard. 

>Thanks,
>Richard

Re: [committed] Clean up Fortran OpenACC wait clause handling

2018-12-13 Thread Thomas Schwinge

Hi Julian!

On Mon, 3 Dec 2018 16:02:23 +, Julian Brown  wrote:
> On Fri, 30 Nov 2018 21:48:20 +0100
> Thomas Schwinge  wrote:
> > commit 3e3de40a5ab21d72f08071a7a40120dd05608cc1
> > Author: tschwinge 
> > Date:   Fri Nov 30 20:39:18 2018 +
> > 
> > Clean up Fortran OpenACC wait clause handling
> > 
> > "wait" can be deduced from "wait_list".
> > 
> > gcc/fortran/
> > * gfortran.h (struct gfc_omp_clauses): Remove "wait".
> > Adjust all users.
> 
> This appears to conflict with Chung-Lin's uncommitted patch ("Properly
> handle wait clause with no arguments"):
> 
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01973.html
> 
> I'm not sure if such waits have a "wait_list" or not -- I guess not
> though?

Right; that's what I provided an incremental patch for in my review in
, and asked
Chung-Lin (and he agreed) to incorporate.


Grüße
 Thomas

Re: [patch] various OpenACC reduction enhancements - ME and nvptx changes

2018-12-13 Thread Julian Brown

On Tue, 4 Dec 2018 16:55:04 +0100
Tom de Vries  wrote:

> On 04-12-18 13:29, Jakub Jelinek wrote:
> > On Fri, Jun 29, 2018 at 11:19:53AM -0700, Cesar Philippidis wrote:  
> >> The attached patch includes the nvptx and GCC ME reductions
> >> enhancements.
> >>
> >> Is this patch OK for trunk? It bootstrapped / regression tested
> >> cleanly for x86_64 with nvptx offloading.  
> > This is all OpenACC specific code not really shareable with OpenMP,
> > if Thomas (for middle-end) and Tom (for NVPTX backend) are ok with
> > it, it is ok for trunk.
> >   
> 
> Formatting needs to be fixed:
> ...
> There should be exactly one space between function name and
> parenthesis. 160:+  unsigned old_shift = DIM_SIZE(VECTOR);
> ...
> 
> Also, the updated patch does not address my comment about
> probabilities here
> ( https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00325.html ): ...
> > +  /* Create the loop.  */
> > +  post_edge->flags ^= EDGE_TRUE_VALUE | EDGE_FALLTHRU;  
> 
> Edges need probabilities, as in nvptx_lockless_update,
> nvptx_lockfull_update and nvptx_goacc_reduction_init.
> ...

Something like the attached?

Tested alongside other revised patches in the series:

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00930.html
https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00931.html

(except the lines adding edge probabilities, which I've
smoke-tested but haven't yet gone through a full test cycle).

Thanks,

Julian

ChangeLog

gcc/
* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
(nvptx_split_blocks): Call it for cond_uni insn.
(nvptx_expand_cond_uni): New.
(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
(nvptx_init_builtins): Initialize it.
(nvptx_expand_builtin):
(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
tree BITS operand.
(nvptx_vector_reduction): New.
(nvptx_adjust_reduction_type): New.
(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
Use it to adjust the type of ref_to_res.
(nvptx_goacc_reduction_teardown):
* config/nvptx/nvptx.md (cond_uni): New pattern.

commit 401876d422c4fa7f02c1b899e81568eea6ad7531
Author: Julian Brown 
Date:   Tue Dec 11 13:35:52 2018 -0800

Various OpenACC reduction enhancements - ME and nvptx changes

	gcc/
	* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
	(nvptx_split_blocks): Call it for cond_uni insn.
	(nvptx_expand_cond_uni): New.
	(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
	(nvptx_init_builtins): Initialize it.
	(nvptx_expand_builtin):
	(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
	tree BITS operand.
	(nvptx_vector_reduction): New.
	(nvptx_adjust_reduction_type): New.
	(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
	(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
	(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
	Use it to adjust the type of ref_to_res.
	(nvptx_goacc_reduction_teardown):
	* config/nvptx/nvptx.md (cond_uni): New pattern.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9903a27..0023dad 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2863,6 +2863,52 @@ nvptx_reorg_uniform_simt ()
 }
 }
 
+/* UNIFIED is a cond_uni insn.  Find the branch insn it affects, and
+   mark that as unified.  We expect to be in a single block.  */
+
+static void
+nvptx_propagate_unified (rtx_insn *unified)
+{
+  rtx_insn *probe = unified;
+  rtx cond_reg = SET_DEST (PATTERN (unified));
+  rtx pat = NULL_RTX;
+
+  /* Find the comparison.  (We could skip this and simply scan to he
+ blocks' terminating branch, if we didn't care for self
+ checking.)  */
+  for (;;)
+{
+  probe = next_real_insn (probe);
+  if (!probe)
+	break;
+  pat = PATTERN (probe);
+
+  if (GET_CODE (pat) == SET
+	  && GET_RTX_CLASS (GET_CODE (SET_SRC (pat))) == RTX_COMPARE
+	  && XEXP (SET_SRC (pat), 0) == cond_reg)
+	break;
+  gcc_assert (NONJUMP_INSN_P (probe));
+}
+  gcc_assert (pat);
+  rtx pred_reg = SET_DEST (pat);
+
+  /* Find the branch.  */
+  do
+probe = NEXT_INSN (probe);
+  while (!JUMP_P (probe));
+
+  pat = PATTERN (probe);
+  rtx itec = XEXP (SET_SRC (pat), 0);
+  gcc_assert (XEXP (itec, 0) == pred_reg);
+
+  /* Mark the branch's condition as unified.  */
+  rtx unspec = gen_rtx_UNSPEC (BImode, gen_rtvec (1, pred_reg),
+			   UNSPEC_BR_UNIFIED);
+  bool ok = validate_change (probe,  (itec, 0), unspec, false);
+
+  gcc_assert (ok);
+}
+
 /* Loop structure of the function.  The entire function is described as
a NULL loop.  */
 
@@ -2964,6 +3010,9 @@ nvptx_split_blocks (bb_insn_map_t *map)
 	continue;

Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-13 Thread Thomas Schwinge

Hi!

On Thu, 13 Dec 2018 23:28:49 +0800, Chung-Lin Tang  
wrote:
> On 2018/12/7 6:26 AM, Julian Brown wrote:
> > On Thu, 6 Dec 2018 22:22:46 +
> > Julian Brown  wrote:
> > 
> >> On Thu, 6 Dec 2018 21:42:14 +0100
> >> Thomas Schwinge  wrote:
> >>
> >>> [...]
> >>> ..., where the "Invalid read of size 8" happens, and which
> >>> eventually would try to "free (tgt)" again, via
> >>> libgomp/target.c:gomp_unmap_tgt:
> >>>
> >>>  attribute_hidden void
> >>>  gomp_unmap_tgt (struct target_mem_desc *tgt)
> >>>  {
> >>>/* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
> >>> region.  */ if (tgt->tgt_end)
> >>>  gomp_free_device_memory (tgt->device_descr, tgt->to_free);
> >>>  
> >>>free (tgt->array);
> >>>free (tgt);
> >>>  }
> >>>
> >>> Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
> >>> or something else?
> >>
> >> It might be worth trying this with the refcounting changes in the
> >> attach/detach patch.

Well, which exactly?

> > ...oh, also make sure you have this patch in the series you're testing
> > with:
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01973.html
> > 
> > else your "wait" will be ignored, IIUC.

Thanks, and right, and yes, I got that one included.

> just first asking if you tried Julian's patch during this time, and if so did 
> it do anything different?

I did not test with all the attach/detach patches on top of this one
here.  That's too many changes at once.


Grüße
 Thomas

Re: [PATCH, OpenACC] Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

2018-12-13 Thread Julian Brown

On Fri, 7 Dec 2018 15:05:46 +0100
Jakub Jelinek  wrote:

> On Thu, Dec 06, 2018 at 10:40:41PM +, Julian Brown wrote:
> > +   && (TREE_CODE (inner_type) == REAL_TYPE
> > +   || (!omp_is_reference (var)
> > +   && INTEGRAL_TYPE_P (inner_type))
> > +   || TREE_CODE (inner_type) == INTEGER_TYPE)  
> 
> Not sure I understand the above.  INTEGRAL_TYPE_P is INTEGER_TYPE,
> BOOLEAN_TYPE and ENUMERAL_TYPE, so you want to handle INTEGER_TYPE
> no magger whether var should be passed by reference or not, but
> BOOLEAN_TYPE or ENUMERAL_TYPE only if it is not a reference?
> That is just weird.  Any test to back that up?

I couldn't figure out any reason for the test being written like that
-- specifically, what it was meant to exclude -- but the attached
simplifies it to ANY_INTEGRAL_TYPE_P or FLOAT_TYPE_P, and that seems to
work fine.

> > +   if ((TREE_CODE (inner_type) == REAL_TYPE
> > +|| (!omp_is_reference (var)
> > +&& INTEGRAL_TYPE_P (inner_type))
> > +|| TREE_CODE (inner_type) ==
> > INTEGER_TYPE)  
> 
> Ditto here.

Likewise. Re-tested with offloading to NVPTX. OK?

Thanks for review,

Julian

ChangeLog

gcc/
* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
(convert_to_firstprivate_int): New function.
(convert_from_firstprivate_int): New function.
(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.

libgomp/
* oacc-parallel.c (GOACC_parallel_keyed): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* plugin/plugin-nvptx.c (nvptx_exec): Handle GOMP_MAP_FIRSTPRIVATE_INT
host addresses.
* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New test.
* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.
commit 15114e33ecb6cb687dbdfb30d69d7dcbeeb87fca
Author: Julian Brown 
Date:   Thu Dec 6 04:38:59 2018 -0800

Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

	gcc/
	* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
	(convert_to_firstprivate_int): New function.
	(convert_from_firstprivate_int): New function.
	(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.

	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Handle
	GOMP_MAP_FIRSTPRIVATE_INT host addresses.
	* plugin/plugin-nvptx.c (nvptx_exec): Handle GOMP_MAP_FIRSTPRIVATE_INT
	host addresses.
	* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New test.
	* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index b406ce7..adc686c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3497,6 +3497,19 @@ maybe_lookup_decl_in_outer_ctx (tree decl, omp_context *ctx)
   return t ? t : decl;
 }
 
+/* Returns true if DECL is present inside a field that encloses CTX.  */
+
+static bool
+maybe_lookup_field_in_outer_ctx (tree decl, omp_context *ctx)
+{
+  omp_context *up;
+
+  for (up = ctx->outer; up; up = up->outer)
+if (maybe_lookup_field (decl, up))
+  return true;
+
+  return false;
+}
 
 /* Construct the initialization value for reduction operation OP.  */
 
@@ -9052,6 +9065,87 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 }
 }
 
+/* Helper function for lower_omp_target.  Converts VAR to something that can
+   be represented by a POINTER_SIZED_INT_NODE.  Any new instructions are
+   appended to GS.  This is used to optimize firstprivate variables, so that
+   small types (less precision than POINTER_SIZE) do not require additional
+   data mappings.  */
+
+static tree
+convert_to_firstprivate_int (tree var, gimple_seq *gs)
+{
+  tree type = TREE_TYPE (var), new_type = NULL_TREE;
+  tree tmp = NULL_TREE;
+
+  if (omp_is_reference (var))
+type = TREE_TYPE (type);
+
+  if (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
+{
+  if (omp_is_reference (var))
+	{
+	  tmp = create_tmp_var (type);
+	  gimplify_assign (tmp, build_simple_mem_ref (var), gs);
+	  var = tmp;
+	}
+
+  return fold_convert (pointer_sized_int_node, var);
+}
+
+  gcc_assert (tree_to_uhwi (TYPE_SIZE (type)) <= POINTER_SIZE);
+
+  new_type = lang_hooks.types.type_for_size (tree_to_uhwi (TYPE_SIZE (type)),
+	 true);
+
+  if (omp_is_reference (var))
+{
+  tmp = create_tmp_var (type);
+  gimplify_assign (tmp, build_simple_mem_ref (var), gs);
+  var = tmp;
+}
+
+  tmp = create_tmp_var (new_type);
+  var = fold_build1 (VIEW_CONVERT_EXPR, new_type, var);
+  gimplify_assign (tmp, var, gs);
+
+  return fold_convert (pointer_sized_int_node, tmp);
+}
+
+/* Like convert_to_firstprivate_int, but restore the original type.  */
+
+static tree
+convert_from_firstprivate_int (tree var, bool is_ref, gimple_seq

Re: [PATCH 0/6, OpenACC, libgomp] Async re-work

2018-12-13 Thread Chung-Lin Tang


On 2018/12/7 6:26 AM, Julian Brown wrote:

On Thu, 6 Dec 2018 22:22:46 +
Julian Brown  wrote:


On Thu, 6 Dec 2018 21:42:14 +0100
Thomas Schwinge  wrote:


[...]
..., where the "Invalid read of size 8" happens, and which
eventually would try to "free (tgt)" again, via
libgomp/target.c:gomp_unmap_tgt:

 attribute_hidden void
 gomp_unmap_tgt (struct target_mem_desc *tgt)
 {
   /* Deallocate on target the tgt->tgt_start .. tgt->tgt_end
region.  */ if (tgt->tgt_end)
 gomp_free_device_memory (tgt->device_descr, tgt->to_free);
 
   free (tgt->array);

   free (tgt);
 }

Is the "free (tgt)" in libgomp/target.c:gomp_unmap_vars_async wrong,
or something else?


It might be worth trying this with the refcounting changes in the
attach/detach patch.


...oh, also make sure you have this patch in the series you're testing
with:

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01973.html

else your "wait" will be ignored, IIUC.

Julian


Hi Thomas,
just first asking if you tried Julian's patch during this time, and if so did 
it do anything different?
(and apologies for missing responding this part for so long :P )

Chung-Lin

Re: [RS6000] Don't pass -many to the assembler

2018-12-13 Thread David Edelsohn

On Thu, Dec 13, 2018 at 5:26 AM Alan Modra  wrote:
>
> On Wed, Nov 14, 2018 at 01:43:57PM +1030, Alan Modra wrote:
> > On Tue, Nov 13, 2018 at 05:17:41AM -0600, Segher Boessenkool wrote:
> > > On Tue, Nov 13, 2018 at 12:02:55PM +1030, Alan Modra wrote:
> > > > OK, fair enough.  Another option is to just disable -many when gcc is
> > > > in development, like we enable checking.
> > >
> > > That is a good plan for GCC 9 at least.
> >
> > Here's the patch.  Bootstrapped etc. powerpc64le-linux with resultant
> > fail of clone2 test as already noted.
>
> Revised again, with a bunch of related issues solved.  Bootstrapped
> etc. powerpc64le-linux with no regressions.  OK to apply mainline?
>
> ---
> I'd like to remove -many from the options passed by default to the
> assembler, on the grounds that a gcc bug in instruction selection (eg.
> emitting a power9 insn for -mcpu=power8) is better found at assembly
> time than run time.
>
> For now, just do this when --enable-checking or gcc is not a release.
>
> In contrast to the previous patch, I haven't changed any of the AIX
> header files in this patch.  So AIX gcc will continue to pass -many to
> their assembler until someone else (David?) makes that change.  This
> patch also emits .machine assembler directives for ELF targets when
> functions are compiled for different cpus via attributes or pragmas.
> That's necessary when the initial -m option passed to the
> assembler doesn't enable the superset of all opcodes emitted, as seen
> by the earlier failure of gcc.target/powerpc/clone2.c (without
> .machine) when building gcc for power8.

The AIX release schedule and numbering of releases to support new
processors is making it impossible to reliably predict which
directives (processor names) will be supported on a system on which
GCC is installed.  It's safer to hide the problem on AIX with the use
of -many.

Thanks, David

Re: [testsuite] Include gdc.test prefix in test names (PR testsuite/88041)

2018-12-13 Thread Rainer Orth

Hi Iain,

>> I'm omitting the changes (also mentioned in the PR) to
>> compilable/ddoc9676a.d and compilable/depsOutput9948.d which contain
>> absolute path names in EXTRA_SOURCES because those need to go upstream
>> first.
>>
>
> This has been committed upstream
> (https://github.com/dlang/dmd/pull/9051) - I just need to trickle it
> down, among other patches that are in my queue.

great, thanks.

> I have no problem however if you go ahead commit the changes with this patch.

Good: that's what I did.

>> Bootstrapped without regressions on i386-pc-solaris2.11 and
>> x86_64-pc-linux-gnu.  Ok for mainline?
>
> OK on my side, just one mental note.
>
>> @@ -360,6 +361,9 @@ proc gdc-do-test { } {
>>  # Initialize `dg'.
>>  dg-init
>>
>> +# Create gdc.test link so test names include that subdir.
>> +catch { file link $subdir . }
>> +
>
> I guess the proper fix would be to split this into three files, right?
>
>   - compilable/compilable.exp
>   - fail_compilation/fail_compilation.exp
>   - runnable/runnable.exp
>
> Then move the common bits into lib/gdc-dmd.exp

While that's possible, we usually only have per-subdir .exp files when
e.g. there are massive/systematic differences in the set of compilation
flags used.  In the case of those three subdirs, you could indeed split
into per-subdir files and a common one, but it's also fine as is.

The $subdir link above (where subdir is gdc.test, the directory below
.../testsuite) only achieves that both (say)
gdc.test/compilable/99bottles.d as passed to dg-runtest and
imports.jsonimport1 (without the prefixe and referenced in the sources
and -I flags) work just the same.  So this will remain necessary even if
you decide to split gdc-test.exp into three .exp files.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [ping] Change static chain to r11 on aarch64

2018-12-13 Thread Segher Boessenkool

On Wed, Dec 12, 2018 at 10:04:11PM +, Wilco Dijkstra wrote:
> Hi Martin,
> 
> > Does a non-executable stack actually improve security?
> 
> Absolutely, it's like closing your front door rather than just leave it open
> for anyone.

On many Linux systems, if you use trampolines anywhere then the whole
stack will be mapped executable during the whole lifetime of the process.
*That* is not so good, of course.  But there is nothing wrong with having
some executable code on the stack, in principle.

> > For the alternative implementation using (custom) function
> > descriptors (-fno-trampolines) the static chain becomes
> > part of the ABI or not?
> 
> I've not seen such an alternative implementation (-fno-trampolines is
> ignored on all targets I tried), but it wouldn't affect the ABI since you can
> only take the address of a nested function when you're the parent function.

Also if you are in a (later) sibling:

===
void *p;
void f(void)
{
void g(void)
{
}

void h(void)
{
p = g;
}

h();
}
===


Segher

Re: [PATCH, OpenACC, 7/8] Multi-dimensional dynamic array support for OpenACC data clauses, libgomp support

2018-12-13 Thread Chung-Lin Tang


On 2018/12/6 10:43 PM, Jakub Jelinek wrote:

On Thu, Dec 06, 2018 at 10:19:43PM +0800, Chung-Lin Tang wrote:

Why do you call the non-contiguous arrays dynamic arrays?  Is that some OpenACC 
term?
I'd also prefix those with gomp_ and it is important to make it clear what
is the ABI type shared with the compiler and what are the internal types.
struct gomp_array_descr would look more natural to me.


Well it's not particularly an OpenACC term, just that non-contiguous arrays are
often multi-dimensional arrays dynamically allocated and created through 
(arrays of) pointers.
Are you strongly opposed to this naming? If so, I can adjust this part.


The way how those arrays are created (and it doesn't have to be dynamically
allocated) doesn't affect their representation.
There are various terms that describe various data structures, like Iliffe
vectors, jagged/ragged arrays, dope vectors.
I guess it depends on what kind of data structures does this new framework
support, if the Iliffe vectors (arrays of pointers), or just flat but
strided arrays, etc.


+  for (i = 0; i < mapnum; i++)
+{
+  int kind = get_kind (short_mapkind, kinds, i);
+  if (GOMP_MAP_DYNAMIC_ARRAY_P (kind & typemask))
+   {
+ da_data_row_num += gomp_dynamic_array_count_rows (hostaddrs[i]);
+ da_info_num += 1;
+   }
+}


I'm not really happy by adding several extra loops which will not do
anything in the case there are no non-contiguous arrays being mapped (for
now always for OpenMP (OpenMP 5 has support for non-contigious target update
to/from though) and guess rarely for OpenACC).
Can't you use some flag bit in flags passed to GOMP_target* etc. and do the
above loop only if the compiler indicated there are any?


I originally strived to not have that loop, but because each row in the last 
dimension
is mapped as its own target_var_desc, we need to count them at this stage to 
allocate
the right number at start. Otherwise a realloc later seems even more ugly...

We currently don't have a suitable flag word argument in GOMP_target*, 
GOACC_parallel*, etc.
I am not sure if such a feature warrants changing the interface.

If you are weary of OpenMP being affected, I can add a condition to restrict 
such processing
to only (pragma_kind == GOMP_MAP_VARS_OPENACC). Is that okay? (at least before 
making any
larger runtime interface adjustments)


That will still cost you doing that loop for OpenACC constructs that don't
have any of these non-contiguous arrays.  GOMP_target_ext has flags
argument, but GOACC_paralel_keyed doesn't.  It has ... and you could perhaps
encode some flag in there.  Or, could these array descriptors be passed
first in the list of vars, so instead of a loop to check for these you could
just check the first one?


Hi Jakub,
I have revised the patch to rename the main struct da_* types into struct 
gomp_array_* and
added more detailed descriptions in the comments (though admittedly the "dynamic 
array" term
is not purged completely).

I have opted for the place-at-start-of-chain route, this should avoid all the 
tests and
additional iterating when such arrays are not used. There's also another 
omp-low.c update in
another patch.

Besides the revised whole patch, I have also attached a v1-v2 diff showing the 
changes in between.
Tested with offloading to ensure no regressions.

Thanks,
Chung-Lin





Index: libgomp/target.c
===
--- libgomp/target.c(revision 267050)
+++ libgomp/target.c(working copy)
@@ -477,6 +477,151 @@ gomp_map_val (struct target_mem_desc *tgt, void **
   return tgt->tgt_start + tgt->list[i].offset;
 }
 
+/* Definitions for data structures describing dynamic, non-contiguous arrays
+   (Note: interfaces with compiler)
+
+   The compiler generates a descriptor for each such array, places the
+   descriptor on stack, and passes the address of the descriptor to the libgomp
+   runtime as a normal map argument. The runtime then processes the array
+   data structure setup, and replaces the argument with the new actual
+   array address for the child function.
+
+   Care must be taken such that the struct field and layout assumptions
+   of struct gomp_array_dim, gomp_array_descr_type inside the compiler
+   be consistant with the below declarations.  */
+
+struct gomp_array_dim {
+  size_t base;
+  size_t length;
+  size_t elem_size;
+  size_t is_array;
+};
+
+struct gomp_array_descr_type {
+  void *ptr;
+  size_t ndims;
+  struct gomp_array_dim dims[];
+};
+
+/* Internal dynamic array info struct, used only here inside the runtime. */
+
+struct da_info
+{
+  struct gomp_array_descr_type *descr;
+  size_t map_index;
+  size_t ptrblock_size;
+  size_t data_row_num;
+  size_t data_row_size;
+};
+
+static size_t
+gomp_dynamic_array_count_rows (struct gomp_array_descr_type *descr)
+{
+  size_t nrows = 1;
+  for (size_t d = 0; d < descr->ndims - 1; d++)
+nrows *= descr->dims[d].length / sizeof (void *);
+

Re: [PATCH, OpenACC, 4/8] Multi-dimensional dynamic array support for OpenACC data clauses, omp-low: dynamic array descriptor creation

2018-12-13 Thread Chung-Lin Tang


On 2018/10/16 8:56 PM, Chung-Lin Tang wrote:

The next two patches are the bulk of the compiler patch in the middle-ends.

The first patch here, implements the creation of dynamic array descriptors to
pass to the runtime, a different way than completely using map-clauses.

Because we support arbitrary number of dimensions, adding more map kind cases
may convolute a lot of the compiler/runtime logic handling the long map 
sequences.

This implementation uses a descriptor struct created on stack, and passes the
pointer to descriptor through to the libgomp runtime, using the exact same 
receiver field
for the dynamic array.

The libgomp runtime then does its stuff to set things up, and properly adjusts 
the device-side
receiver field pointer to the on-device created dynamic array structures. I.e. 
the same receiver
field serves as descriptor address field on the compiler side, and as the 
actual data address
once we get to device code (a pretty important point needed to clarify).


After the prior revising of libgomp/target.c:gomp_map_vars() to test the first 
map for
necessity of this dynamic array processing, this patch correspondingly updates 
scan_omp_target
to reorder related map clauses to the start of the clause chain for OpenACC 
constructs.

Again, besides the whole revised patch, v1-v2 diff also included.

Thanks,
Chung-Lin
Index: gcc/omp-low.c
===
--- gcc/omp-low.c   (revision 267050)
+++ gcc/omp-low.c   (working copy)
@@ -60,6 +60,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "hsa-common.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "tree-hash-traits.h"
 
 /* Lowering of OMP parallel and workshare constructs proceeds in two
phases.  The first phase scans the function looking for OMP statements
@@ -133,6 +134,9 @@ struct omp_context
 
   /* True if this construct can be cancelled.  */
   bool cancellable;
+
+  /* Hash map of dynamic arrays in this context.  */
+  hash_map *dynamic_arrays;
 };
 
 static splay_tree all_contexts;
@@ -839,6 +843,136 @@ omp_copy_decl (tree var, copy_body_data *cb)
   return error_mark_node;
 }
 
+/* Helper function for create_dynamic_array_descr_type(), to append a new field
+   to a record type.  */
+
+static void
+append_field_to_record_type (tree record_type, tree fld_ident, tree fld_type)
+{
+  tree *p, fld = build_decl (UNKNOWN_LOCATION, FIELD_DECL, fld_ident, 
fld_type);
+  DECL_CONTEXT (fld) = record_type;
+
+  for (p = _FIELDS (record_type); *p; p = _CHAIN (*p))
+;
+  *p = fld;
+}
+
+/* Create type for dynamic array descriptor. Returns created type, and
+   returns the number of dimensions in *DIM_NUM.  */
+
+static tree
+create_dynamic_array_descr_type (tree decl, tree dims, int *dim_num)
+{
+  int n = 0;
+  tree da_descr_type, name, x;
+  gcc_assert (TREE_CODE (dims) == TREE_LIST);
+
+  da_descr_type = lang_hooks.types.make_type (RECORD_TYPE);
+  name = create_tmp_var_name (".omp_dynamic_array_descr_type");
+  name = build_decl (UNKNOWN_LOCATION, TYPE_DECL, name, da_descr_type);
+  DECL_ARTIFICIAL (name) = 1;
+  DECL_NAMELESS (name) = 1;
+  TYPE_NAME (da_descr_type) = name;
+  TYPE_ARTIFICIAL (da_descr_type) = 1;
+
+  /* Main starting pointer/array.  */
+  tree main_var_type = TREE_TYPE (decl);
+  if (TREE_CODE (main_var_type) == REFERENCE_TYPE)
+main_var_type = TREE_TYPE (main_var_type);
+  append_field_to_record_type (da_descr_type, DECL_NAME (decl),
+  (TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+   ? main_var_type
+   : build_pointer_type (main_var_type)));
+  /* Number of dimensions.  */
+  append_field_to_record_type (da_descr_type, get_identifier ("$dim_num"),
+  sizetype);
+
+  for (x = dims; x; x = TREE_CHAIN (x), n++)
+{
+  char *fldname;
+  /* One for the start index.  */
+  ASM_FORMAT_PRIVATE_NAME (fldname, "$dim_base", n);
+  append_field_to_record_type (da_descr_type, get_identifier (fldname),
+  sizetype);
+  /* One for the length.  */
+  ASM_FORMAT_PRIVATE_NAME (fldname, "$dim_length", n);
+  append_field_to_record_type (da_descr_type, get_identifier (fldname),
+  sizetype);
+  /* One for the element size.  */
+  ASM_FORMAT_PRIVATE_NAME (fldname, "$dim_elem_size", n);
+  append_field_to_record_type (da_descr_type, get_identifier (fldname),
+  sizetype);
+  /* One for is_array flag.  */
+  ASM_FORMAT_PRIVATE_NAME (fldname, "$dim_is_array", n);
+  append_field_to_record_type (da_descr_type, get_identifier (fldname),
+  sizetype);
+}
+
+  layout_type (da_descr_type);
+  *dim_num = n;
+  return da_descr_type;
+}
+
+/* Generate code sequence for initializing dynamic array descriptor.  */
+
+static void

Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-13 Thread Segher Boessenkool

On Wed, Dec 12, 2018 at 06:26:10PM +0200, Dimitar Dimitrov wrote:
> I expect that if I mark a HW register as "clobber", compiler would save its 
> contents before executing the asm statement, and after that it would restore 
> its contents. This is the GCC behaviour for all but the SP and PIC registers. 
> That is why I believe that PR52813 is a valid bug. 

It won't do it for *any* fixed registers.  But you do not want to error
or even warn for some fixed registers, for example the "flags" register
on x86 is *always* written to by asm.

But you never want to warn for non-fixed registers, and e.g.
PIC_OFFSET_TABLE_REGNUM isn't always a fixed register (when flag_pic is 0
for example).

> I'm not sure how GCC could recover if SP is clobbered. If SP is clobbered in 
> such a way that GCC will not notice (e.g. thread switching), then why should 
> GCC know about it in the first place?

Up until today, GCC has always just ignored it if you claimed to clobber
the stack pointer.

Segher

Re: [patch] various OpenACC reduction enhancements - test cases

2018-12-13 Thread Julian Brown

On Tue, 4 Dec 2018 13:59:33 +0100
Jakub Jelinek  wrote:

> On Fri, Jun 29, 2018 at 11:23:21AM -0700, Cesar Philippidis wrote:
> > Attached are the updated reductions tests cases. Again, these have
> > been bootstrapped and regression tested cleanly for x86_64 with
> > nvptx offloading. Is it OK for trunk?  
> 
> If Thomas is ok with this, it is ok for trunk.

Here's a new version to go with the FE patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00930.html

Thanks,

Julian

ChangeLog

2018-xx-xx  Cesar Philippidis  
Nathan Sidwell  
Julian Brown  

gcc/testsuite/
* c-c++-common/goacc/orphan-reductions-1.c: New test.
* c-c++-common/goacc/reduction-7.c: New test.
* c-c++-common/goacc/routine-4.c: Update.
* g++.dg/goacc/reductions-1.C: New test.
* gcc.dg/goacc/loop-processing-1.c: Update.
* gfortran.dg/goacc/orphan-reductions-1.f90: New test.

libgomp/
* libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
* libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
* libgomp.oacc-fortran/reduction-9.f90: New test.
commit 7d445a56d6db96696cec8359e58258d47fa7c9ae
Author: Julian Brown 
Date:   Wed Dec 12 11:11:03 2018 -0800

Various OpenACC reduction enhancements - test cases

2018-xx-xx  Cesar Philippidis  
	Nathan Sidwell  
	Julian Brown  

	gcc/testsuite/
	* c-c++-common/goacc/orphan-reductions-1.c: New test.
	* c-c++-common/goacc/reduction-7.c: New test.
	* c-c++-common/goacc/routine-4.c: Update.
	* g++.dg/goacc/reductions-1.C: New test.
	* gcc.dg/goacc/loop-processing-1.c: Update.
	* gfortran.dg/goacc/orphan-reductions-1.f90: New test.

	libgomp/
	* libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
	* libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
	* libgomp.oacc-fortran/reduction-9.f90: New test.

diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
new file mode 100644
index 000..b0bd4a7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
@@ -0,0 +1,56 @@
+/* Test orphan reductions.  */
+
+#include 
+
+#pragma acc routine seq
+int
+seq_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop seq reduction(+:sum)
+  for (i = 0; i < n; i++)
+sum = sum + 1;
+
+  return sum;
+}
+
+#pragma acc routine gang
+int
+gang_reduction (int n)
+{
+  int i, s1 = 0, s2 = 0;
+#pragma acc loop gang reduction(+:s1) /* { dg-error "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s1 = s1 + 2;
+
+#pragma acc loop gang reduction(+:s2) /* { dg-error "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s2 = s2 + 2;
+
+
+  return s1 + s2;
+}
+
+#pragma acc routine worker
+int
+worker_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop worker reduction(+:sum)
+  for (i = 0; i < n; i++)
+sum = sum + 3;
+
+  return sum;
+}
+
+#pragma acc routine vector
+int
+vector_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop vector reduction(+:sum)
+  for (i = 0; i < n; i++)
+sum = sum + 4;
+
+  return sum;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-7.c b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
new file mode 100644
index 000..eba1d02
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
@@ -0,0 +1,111 @@
+/* Exercise invalid reductions on array and struct members.  */
+
+void
+test_parallel ()
+{
+  struct {
+int a;
+float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel reduction(+:s1.a) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+s1.a += 1;
+
+#pragma acc parallel reduction(+:s1.b[3]) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+s1.b[3] += 1;
+
+#pragma acc parallel reduction(+:s2[2].a) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+s2[2].a += 1;
+
+#pragma acc parallel reduction(+:s2[3].b[4]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+s2[3].b[4] += 1;
+
+#pragma acc parallel reduction(+:z[5]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+z[5] += 1;
+}
+
+void
+test_combined ()
+{
+  struct {
+int a;
+float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel loop reduction(+:s1.a) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+s1.a += 1;
+
+#pragma acc parallel loop reduction(+:s1.b[3]) /* { dg-error "expected '\\\)' before '\\\.' token" } */
+  for (i = 0; i < 10; i++)
+s1.b[3] += 1;
+
+#pragma acc parallel loop reduction(+:s2[2].a) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  for (i = 0; i < 10; i++)
+s2[2].a += 1;
+
+#pragma acc parallel loop

Re: [patch] various OpenACC reduction enhancements - FE changes

2018-12-13 Thread Julian Brown

On Tue, 4 Dec 2018 13:57:24 +0100
Jakub Jelinek  wrote:

> On Fri, Jun 29, 2018 at 11:22:00AM -0700, Cesar Philippidis wrote:
> > 2018-06-29  Cesar Philippidis  
> > Nathan Sidwell  
> > 
> > gcc/c/
> > * c-parser.c (c_parser_omp_variable_list): New
> > c_omp_region_type argument.  Use it to specialize handling of
> > OMP_CLAUSE_REDUCTION for OpenACC.
> > (c_parser_omp_clause_reduction): Update call to
> > c_parser_omp_variable_list.  Propage OpenACC errors as
> > necessary. (c_parser_oacc_all_clauses): Update call to
> > p_parser_omp_clause_reduction.
> > (c_parser_omp_all_clauses): Likewise.
> > * c-typeck.c (c_finish_omp_clauses): Emit an error on
> > orphan OpenACC gang reductions.
> > 
> > gcc/cp/
> > * parser.c (cp_parser_omp_var_list_no_open):  New
> > c_omp_region_type argument.  Use it to specialize handling of
> > OMP_CLAUSE_REDUCTION for OpenACC.
> > (cp_parser_omp_clause_reduction): Update call to
> > cp_parser_omp_variable_list.  Propage OpenACC errors as
> > necessary. (cp_parser_oacc_all_clauses): Update call to
> > cp_parser_omp_clause_reduction.
> > (cp_parser_omp_all_clauses): Likewise.
> > * semantics.c (finish_omp_clauses): Emit an error on orphan
> > OpenACC gang reductions.
> > 
> > gcc/fortran/
> > * openmp.c (resolve_oacc_loop_blocks): Emit an error on
> > orphan OpenACC gang reductions.
> > * trans-openmp.c (gfc_omp_clause_copy_ctor): Permit
> > reductions.
> > 
> > diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> > index 7a926285f3a..a6f453dae54 100644
> > --- a/gcc/c/c-parser.c
> > +++ b/gcc/c/c-parser.c
> > @@ -965,12 +965,13 @@ class token_pair
> >  
> >/* Like token_pair::require_close, except that tokens will be
> > skipped until the desired token is found.  An error message is
> > still produced
> > - if the next token is not as expected.  */
> > + if the next token is not as expected, unless QUIET is set.  */
> >  
> > -  void skip_until_found_close (c_parser *parser) const
> > +  void skip_until_found_close (c_parser *parser, bool quiet =
> > false) const {
> >  c_parser_skip_until_found (parser, traits_t::close_token_type,
> > -  traits_t::close_gmsgid, m_open_loc);
> > +  quiet ? NULL :
> > traits_t::close_gmsgid,
> > +  m_open_loc);
> >}  
> 
> I don't like these changes, why do you need them?  C++ FE doesn't
> have such changes either, and it is fine to diagnose missing ) even
> if there was some earlier error.  All other spots which require
> matching parens do it the same.  Please leave those out.

I've removed these bits.

>  static
> tree -c_parser_omp_clause_reduction (c_parser *parser, tree
> list) +c_parser_omp_clause_reduction (c_parser *parser, tree
> list, 
>   
> +  enum c_omp_region_type
> ort)  
>
> 
> Note, the signature is now different, it is ok to replace is_omp
> argument with enum c_omp_region_type if you wish.

I've done as you suggest.

> >  {
> >location_t clause_loc = c_parser_peek_token (parser)->location;
> > +  bool seen_error = false;
> > +
> >matching_parens parens;
> >if (parens.require_open (parser))
> >  {
> > @@ -12855,7 +12876,13 @@ c_parser_omp_clause_reduction (c_parser
> > *parser, tree list) tree nl, c;
> >  
> >   nl = c_parser_omp_variable_list (parser, clause_loc,
> > -  OMP_CLAUSE_REDUCTION,
> > list);
> > +  OMP_CLAUSE_REDUCTION,
> > list, ort);
> > + if (c_parser_peek_token (parser)->type !=
> > CPP_CLOSE_PAREN)
> > +   {
> > + seen_error = true;
> > + goto cleanup;
> > +   }
> > +
> >   for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
> > {
> >   tree d = OMP_CLAUSE_DECL (c), type;
> > @@ -12891,7 +12918,8 @@ c_parser_omp_clause_reduction (c_parser
> > *parser, tree list) 
> >   list = nl;
> > }
> > -  parens.skip_until_found_close (parser);
> > +cleanup:
> > +  parens.skip_until_found_close (parser, seen_error);
> >  }
> >return list;
> >  }  
> 
> And the above hunks as well.

Removed.

> > @@ -13998,7 +14026,7 @@ c_parser_oacc_all_clauses (c_parser
> > *parser, omp_clause_mask mask, c_name = "private";
> >   break;
> > case PRAGMA_OACC_CLAUSE_REDUCTION:
> > - clauses = c_parser_omp_clause_reduction (parser,
> > clauses);
> > + clauses = c_parser_omp_clause_reduction (parser,
> > clauses, C_ORT_ACC); c_name = "reduction";
> >   break;
> > case PRAGMA_OACC_CLAUSE_SEQ:
> > @@ -14157,7 +14185,7 @@ c_parser_omp_all_clauses (c_parser *parser,
> > omp_clause_mask mask, c_name = "private";
> >   break;
> > case PRAGMA_OMP_CLAUSE_REDUCTION:
> > - clauses =

[PR c++/87531] Fix second bug

2018-12-13 Thread Nathan Sidwell

This patch addresses the regression caused by the first fix.  For 
reasons that used to make more sense, an overload set of template 
members would present as just a dependent using-decl, if it contained 
any such using decls.  Then we'd defer repeat lookup to instantiation 
time.  Except in a couple of places where we'd really want the 
non-dependent function set.


However, we're now better at doing lookups correctly in more places, and 
this deferring gets in the way.  In this particular case, we need to know if

   name < whatever >
is a template-id-expr, or a less-than operator[*], and we determine this 
by seeing if any members of whatever 'name' found are templates.  This 
breaks if we just present a dependent using decl.


So, this patch removes the frobbing in name lookup and returns the whole 
binding.  We'll have arranged that if there is at least one dependent 
using decl, it'll be first in the overload set.


Then we need to deal with that as an overload member in a couple of 
places.  The finish_id_expr change is so we properly insert an implicit 
'this->' before hand.  That showed I'd neglected to set the using-decl's 
DECL_CONTEXT, hence the change in finish_struct.


get_class_binding_direct's fn_or_type arg can now return to be a bool 
'want_type', as we no longer call it with the 'want_fns's' value. 
That's a cleanup for stage 1. (The only other place we asked for fns is 
getting the dtor, which cannot be a dependent using decl anyway, so the 
request is moot.)


Booted & tested in x86_64-linux, applying to trunk and gcc-8.

nathan

[*] there's a core or evolution paper about interpreting '<' as a 
template-id-expr in more places regardless of the binding of the single 
identifer on its left.


--
Nathan Sidwell
2018-12-13  Nathan Sidwell  

	PR c++/87531
	* class.c (finish_struct): Set DECL_CONTEXT of template assign op.
	* name-lookup.c (get_class_binding_direct): Don't strip using-decl
	of overload here.
	* parser.c (cp_parser_postfix_expression): Cope with using decl in
	overload set.
	* semantics.c (finish_id_expr): Likewise.

	* g++.dg/lookup/pr87531-2.C: New.

Index: gcc/cp/class.c
===
--- gcc/cp/class.c	(revision 267093)
+++ gcc/cp/class.c	(working copy)
@@ -7158,6 +7158,7 @@ finish_struct (tree t, tree attributes)
 	 time.  */
   tree ass_op = build_lang_decl (USING_DECL, assign_op_identifier,
  NULL_TREE);
+  DECL_CONTEXT (ass_op) = t;
   USING_DECL_SCOPE (ass_op) = t;
   DECL_DEPENDENT_P (ass_op) = true;
   DECL_ARTIFICIAL (ass_op) = true;
Index: gcc/cp/name-lookup.c
===
--- gcc/cp/name-lookup.c	(revision 267093)
+++ gcc/cp/name-lookup.c	(working copy)
@@ -1242,17 +1242,6 @@ get_class_binding_direct (tree klass, tr
 	}
   else if (STAT_HACK_P (val))
 	val = STAT_DECL (val);
-
-  if (val && TREE_CODE (val) == OVERLOAD
-	  && TREE_CODE (OVL_FUNCTION (val)) == USING_DECL)
-	{
-	  /* An overload with a dependent USING_DECL.  Does the caller
-	 want the USING_DECL or the functions?  */
-	  if (type_or_fns < 0)
-	val = OVL_CHAIN (val);
-	  else
-	val = OVL_FUNCTION (val);  
-	}
 }
   else
 {
Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c	(revision 267093)
+++ gcc/cp/parser.c	(working copy)
@@ -7240,14 +7240,19 @@ cp_parser_postfix_expression (cp_parser
 		else if (!args->is_empty ()
 			 && is_overloaded_fn (postfix_expression))
 		  {
+		/* We only need to look at the first function,
+		   because all the fns share the attribute we're
+		   concerned with (all member fns or all local
+		   fns).  */
 		tree fn = get_first_fn (postfix_expression);
 		fn = STRIP_TEMPLATE (fn);
 
 		/* Do not do argument dependent lookup if regular
 		   lookup finds a member function or a block-scope
 		   function declaration.  [basic.lookup.argdep]/3  */
-		if (!DECL_FUNCTION_MEMBER_P (fn)
-			&& !DECL_LOCAL_FUNCTION_P (fn))
+		if (!((TREE_CODE (fn) == USING_DECL && DECL_DEPENDENT_P (fn))
+			  || DECL_FUNCTION_MEMBER_P (fn)
+			  || DECL_LOCAL_FUNCTION_P (fn)))
 		  {
 			koenig_p = true;
 			if (!any_type_dependent_arguments_p (args))
Index: gcc/cp/semantics.c
===
--- gcc/cp/semantics.c	(revision 267093)
+++ gcc/cp/semantics.c	(working copy)
@@ -3805,9 +3805,10 @@ finish_id_expression (tree id_expression
 	return error_mark_node;
 
 	  if (!template_arg_p
-	  && TREE_CODE (first_fn) == FUNCTION_DECL
-	  && DECL_FUNCTION_MEMBER_P (first_fn)
-	  && !shared_member_p (decl))
+	  && (TREE_CODE (first_fn) == USING_DECL
+		  || (TREE_CODE (first_fn) == FUNCTION_DECL
+		  && DECL_FUNCTION_MEMBER_P (first_fn)
+		  && !shared_member_p (decl
 	{
 	  /* A set of member functions.  */
 	  decl =

Re: [PATCH] error on missing LTO symbols

2018-12-13 Thread Jakub Jelinek

On Thu, Dec 13, 2018 at 02:31:45PM +0100, Tom de Vries wrote:
> > 2015-07-24  Cesar Philippidis  

Please use current date ;)

> > 
> > gcc/
> > * lto-cgraph.c (input_overwrite_node): Error instead of assert
> > on missing cgraph partitions.
> > (input_varpool_node): Likewise.
> > 
> > 
> > diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> > index d70537d..7e2fc80 100644
> > --- a/gcc/lto-cgraph.c
> > +++ b/gcc/lto-cgraph.c
> > @@ -1218,9 +1218,11 @@ input_overwrite_node (struct lto_file_decl_data
> > *file_data,
> >  LDPR_NUM_KNOWN);
> >node->instrumentation_clone = bp_unpack_value (bp, 1);
> >node->split_part = bp_unpack_value (bp, 1);
> > -  gcc_assert (flag_ltrans
> > - || (!node->in_other_partition
> > - && !node->used_from_other_partition));
> > +
> > +  int success = flag_ltrans || (!node->in_other_partition
> > +   && !node->used_from_other_partition);

I'd use internal_error (or internal_error_no_backtrace ?) here if it isn't
an offloading compiler (so #ifndef ACCEL_COMPILER), or just gcc_assert in
that case?  Richard/Honza, your thoughts on that?
And error if ACCEL_COMPILER is defined.

> > +  if (!success)
> > +error ("Missing %<%s%>", node->name ());

Diagnostics shouldn't start with capital letters.  %<%s%> should be %qs.

The diagnostics for the non-accel case if any can be less verbose, but
perhaps should at least say whether it is a function (above case) or
variable.

For the ACCEL_COMPILER case, I think we should be more verbose, say that
certain function or variable has been referenced in offloaded code but
corresponding function or variable definition hasn't been marked to be
included in the offloaded code.
Would be nice to have a location where it has been referenced if
that can be dug from somewhere.  Or if e.g. from attribute we can figure out
whether it was OpenMP or OpenACC offloading and even suggest how to fix it
in the corresponding language extension.

Jakub

Re: [libgomp, nvptx] Fix libgomp.c/target-5.c compilation

2018-12-13 Thread Jakub Jelinek

On Thu, Dec 13, 2018 at 01:09:25PM +0100, Tom de Vries wrote:
> 2018-12-13  Tom de Vries  
> 
>   * affinity-fmt.c (gomp_print_string): New function, factored out of ...
>   (omp_display_affinity, gomp_display_affinity_thread): ... here, and ...
>   * fortran.c (omp_display_affinity_): ... here.
>   * libgomp.h (gomp_print_string): Declare.
>   * config/nvptx/affinity-fmt.c: New file.  Include affinity-fmt.c,
>   undefining HAVE_GETPID and HAVE_GETHOSTNAME, and mapping fwrite to
>   write.

> +/* The nvptx newlib implementation does not support fwrite, but it does 
> support
> +   write.  Map fwrite to write for size == 1.  */
> +#undef fwrite
> +#define fwrite(ptr, size, nmemb, stream) \
> +  ((size) == 1 ? write (1, (ptr), (nmemb)) : 0)

Why not
  write (1, (ptr), (nmemb) * (size))
I know it doesn't handle the overflow case properly, but the callers are
using 1 and it actually isn't a security threat if it prints fewer chars
than it should.  We don't have anything where we'd rely on fwrite failing
when stuff overflows...

Ok with that change (plus adjust the comment).

Jakub

Re: [PATCH] error on missing LTO symbols

2018-12-13 Thread Tom de Vries

[ adding gcc-patches ]

On 13-12-18 14:30, Tom de Vries wrote:
> [ Patch copy-pasted from here (
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02076.html ). Patch was
> resubmitted here (
> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00043.html ). ]
> 
> Hi,
> 
> this issue just popped up again for PR88460: an openmp test-case problem
> manifests as an ICE. We've seen the same problem for openacc test-cases.
> 
> From my point of view, the main problem with the current state is that
> we don't known what symbol the ICE relates to, and can only find out by
> reproducing the compile in a debugging session and doing "call
> debug_generic_expr (node.decl)" ( see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88460#c0 ), which is
> cumbersome for gcc developers, and not helpful for openmp and openacc users.
> 
> This patch attempts to address that issue by changing the assert into an
> error. [ We could make the error message more specific, by
> distinguishing between the node->in_other_partition and
> node->used_from_other_partition cases. ]
> 
> Is the approach in this patch OK (perhaps conditionalizing it to be only
> effective for openacc and openmp)?
> 
> Alternatively, we could instead print a message ahead of the ICE.
> 
> Can we have some feedback on what would be an acceptable solution here?
> 
> Thanks,
> - Tom
> 
> 2015-07-24  Cesar Philippidis  
> 
>   gcc/
>   * lto-cgraph.c (input_overwrite_node): Error instead of assert
>   on missing cgraph partitions.
>   (input_varpool_node): Likewise.
> 
> 
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index d70537d..7e2fc80 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -1218,9 +1218,11 @@ input_overwrite_node (struct lto_file_decl_data
> *file_data,
>LDPR_NUM_KNOWN);
>node->instrumentation_clone = bp_unpack_value (bp, 1);
>node->split_part = bp_unpack_value (bp, 1);
> -  gcc_assert (flag_ltrans
> -   || (!node->in_other_partition
> -   && !node->used_from_other_partition));
> +
> +  int success = flag_ltrans || (!node->in_other_partition
> + && !node->used_from_other_partition);
> +  if (!success)
> +error ("Missing %<%s%>", node->name ());
>  }
> 
>  /* Return string alias is alias of.  */
> @@ -1432,9 +1434,11 @@ input_varpool_node (struct lto_file_decl_data
> *file_data,
>  node->set_section_for_node (section);
>node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution,
>   LDPR_NUM_KNOWN);
> -  gcc_assert (flag_ltrans
> -   || (!node->in_other_partition
> -   && !node->used_from_other_partition));
> +
> +  int success = flag_ltrans || (!node->in_other_partition
> + && !node->used_from_other_partition);
> +  if (!success)
> +error ("Missing %<%s%>", node->name ());
> 
>return node;
>  }
>

[committed] Fix libgomp.c++/for-24.C testcase (PR libgomp/88460)

2018-12-13 Thread Jakub Jelinek

Hi!

This is PR86660 problem repeated on another testcase (copied + modified
from the buggy one before it has been fixed, sorry for that).
Fixed thusly, tested on x86_64-linux without offloading and by Tom
with offloading, committed to trunk.  Thanks Tom for reporting it and
testing it.

2018-12-13  Jakub Jelinek  

PR libgomp/88460
* testsuite/libgomp.c++/for-24.C (results): Include it in
omp declare target region.
(main): Use map (always, tofrom: results) instead of
map (tofrom: results).

--- libgomp/testsuite/libgomp.c++/for-24.C.jj   2018-11-08 18:08:04.181942018 
+0100
+++ libgomp/testsuite/libgomp.c++/for-24.C  2018-12-13 13:39:51.436527177 
+0100
@@ -113,11 +113,9 @@ short c[50];
 int d[1024];
 K e[1089];
 L f[1093];
-#pragma omp end declare target
 
 int results[2000];
 
-#pragma omp declare target
 template 
 void
 baz (I )
@@ -370,58 +368,58 @@ main ()
   f[i].c = 5 * i;
 }
   #pragma omp target update to (a, b, c, d, e, f)
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f1 ();
   check (1);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f2 ();
   check (1);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f3 ();
   check (1);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f4 (J ([14], [1803]));
   check (i >= 14 && i < 1803);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f5 ();
   check (i >= 0 && i < 1024);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f6 (J ([19], [1029]));
   check (i >= 19 && i < 1029);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f7 (J ([15], [1091]));
   check (i >= 15 && i < 1091);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f8 (J ([27], [1037]));
   check (i >= 27 && i < 1037);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f9 (J ([1], [1012]));
   check (i >= 1 && i < 1012);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f10 <0> ();
   check (1);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f11 <1> ();
   check (1);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f12 <2> ();
   check (1);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f13 (J ([24], [1703]));
   check (i >= 24 && i < 1703);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f14 <1024> ();
   check (i >= 0 && i < 1024);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f15 (J ([39], [929]));
   check (i >= 39 && i < 929);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f16 (J ([17], [1071]));
   check (i >= 17 && i < 1071);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f17 <3> (J ([7], [1017]));
   check (i >= 7 && i < 1017);
-  #pragma omp target teams map (tofrom: results)
+  #pragma omp target teams map (always, tofrom: results)
   f18 <5> (J ([121], [1010]));
   check (i >= 121 && i < 1010);
 }

Jakub

Re: [PATCH][AArch64][doc] Clarify -msve-vector-bits=128 behaviour

2018-12-13 Thread Richard Sandiford

Ramana Radhakrishnan  writes:
> On Thu, Dec 13, 2018 at 10:15 AM Richard Sandiford
>  wrote:
>>
>> Thanks for doing this.
>>
>> "Kyrill Tkachov"  writes:
>> > @@ -15716,16 +15716,19 @@ an effect when SVE is enabled.
>> >
>> >  GCC supports two forms of SVE code generation: ``vector-length
>> >  agnostic'' output that works with any size of vector register and
>> > -``vector-length specific'' output that only works when the vector
>> > -registers are a particular size.  Replacing @var{bits} with
>> > -@samp{scalable} selects vector-length agnostic output while
>> > -replacing it with a number selects vector-length specific output.
>> > -The possible lengths in the latter case are: 128, 256, 512, 1024
>> > -and 2048.  @samp{scalable} is the default.
>> > -
>> > -At present, @samp{-msve-vector-bits=128} produces the same output
>> > -as @samp{-msve-vector-bits=scalable}.
>> > -
>> > +``vector-length specific'' output that allows GCC to make assumptions
>> > +about the vector length when it is useful for optimization reasons.
>> > +The possible values of @samp{bits} are: @samp{scalable}, @samp{128},
>> > +@samp{256}, @samp{512}, @samp{1024} and @samp{2048}.
>> > +Specifying @samp{scalable} selects vector-length agnostic
>> > +output.  At present @samp{-msve-vector-bits=128} also generates 
>> > vector-length
>> > +agnostic output.  All other values generate vector-length specific code.
>> > +The behavior of these values may change in future releases and no value 
>> > except
>> > +@samp{scalable} should be relied on for producing code that is portable 
>> > across
>> > +different hardware SVE vector lengths.
>> > +
>> > +The default is @samp{-msve-vector-bits=scalable} which produces
>> > +vector-length agnostic code.
>>
>> Think we should have a comma before "which" in the last sentence.
>> OK with that change.
>
> And backport to GCC-8 ?

Yeah, OK for the backport too.

Re: [PATCH] Fix split-path-5.c testcase (PR testsuite/88454)

2018-12-13 Thread Segher Boessenkool

On Thu, Dec 13, 2018 at 08:49:53AM +0100, Jakub Jelinek wrote:
> On Mon, Dec 10, 2018 at 09:56:46PM -0700, Jeff Law wrote:
> > Note that split-path-5 has the same basic structure.  A half-diamond
> > with a single statement in the middle block that should be trivially
> > if-convertable if profitable.  So I adjusted that testcase.
> 
> The split-path-5.c testcase now fails on powerpc64*, arm*, aarch64* etc.
> targets.
> 
> When looking for the difference, I found out it is a -fsigned-char vs.
> -funsigned-char issue, on -funsigned-char targets we are simply compiling
> something quite different.

Ha :-)

> The following patch fixes it, regtested on x86_64-linux (-m64/-m32) and
> tested with cross to aarch64-linux and powerpc64-linux.  Ok for trunk?

This is an obvious patch, isn't it :-)


Segher


> 2018-12-13  Jakub Jelinek  
> 
>   PR testsuite/88454
>   * gcc.dg/tree-ssa/split-path-5.c (__ctype_ptr__): Change type from
>   const char * to const signed char *.
>   (bmhi_init): Change pattern parameter's type the same.  Use
>   __builtin_strlen instead of undeclared strlen.
> 
> --- gcc/testsuite/gcc.dg/tree-ssa/split-path-5.c.jj   2018-12-11 
> 11:02:09.003065907 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/split-path-5.c  2018-12-13 
> 08:36:26.457533278 +0100
> @@ -1,16 +1,16 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fsplit-paths -fdump-tree-split-paths-details -w" } */
>  
> -const extern char *__ctype_ptr__;
> +const extern signed char *__ctype_ptr__;
>  typedef unsigned char uchar;
>  static int patlen;
>  static int skip[(0x7f * 2 + 1) + 1];
>  static uchar *pat = ((void *) 0);
>  void
> -bmhi_init (const char *pattern)
> +bmhi_init (const signed char *pattern)
>  {
>int i, lastpatchar;
> -  patlen = strlen (pattern);
> +  patlen = __builtin_strlen (pattern);
>for (i = 0; i < patlen; i++)
>  pat[i] = (
>  {

Re: [PATCH] PR libstdc++/80762 avoid ambiguous __constructible_from

2018-12-13 Thread Jonathan Wakely

On 13/12/18 13:10 +0100, Christophe Lyon wrote:

On Thu, 13 Dec 2018 at 12:00, Jonathan Wakely  wrote:

On 13/12/18 08:56 +0100, Christophe Lyon wrote:
>On Wed, 12 Dec 2018 at 17:13, Jonathan Wakely  wrote:
>>
>> Ensure we don't try to instantiate __is_constructible_from,
>> because there are two partial specializations that are equally good
>> matches.
>>
>> PR libstdc++/80762
>> * include/bits/fs_path.h (path::_Path): Use remove_cv_t and is_void.
>> * include/experimental/bits/fs_path.h (path::_Path): Likewise.
>> * testsuite/27_io/filesystem/path/construct/80762.cc: New test.
>> * testsuite/experimental/filesystem/path/construct/80762.cc: New 
test.
>>
>
>Hi Jonathan,
>
>One of the new tests fails on bare-metal/newlib targets (aarch64-elf/arm-eabi):
>FAIL: experimental/filesystem/path/construct/80762.cc (test for excess errors)
>/libstdc++-v3/testsuite/experimental/filesystem/path/construct/80762.cc:20:
>fatal error: experimental/filesystem: No such file or directory
>
>I think there was a similar issue recently, but I don't remember how
>you fixed it?

Like this. Sorry for forgetting it. Committed to trunk.

OK thanks.
Note that only the one in "experimental" failed, the 27_io one did pass.

Because of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86756 :-)

The  header is installed unconditionally, and that test
only needs the header not the library, so it happens to work.

The  header is only installed when
libstdc++fs.a is available.

Aside:
There's actually no reason the filesystem::path type can't be enabled
unconditionally for all targets, as it only relies on portable
features like std::string, but currently the definitions of its member
functions are in the libstdc++fs.a library which is only enabled
conditionally. Later today I plan to move the std::filesystem::path
type into the main libstdc++.so library, so only the tests for
std::experimental::filesystem::path will depend on libstdc++fs.a being
available. Until then, this patch is needed.

Great!

Re: [PATCH] PR libstdc++/80762 avoid ambiguous __constructible_from

2018-12-13 Thread Christophe Lyon

On Thu, 13 Dec 2018 at 12:00, Jonathan Wakely  wrote:
>
> On 13/12/18 08:56 +0100, Christophe Lyon wrote:
> >On Wed, 12 Dec 2018 at 17:13, Jonathan Wakely  wrote:
> >>
> >> Ensure we don't try to instantiate __is_constructible_from,
> >> because there are two partial specializations that are equally good
> >> matches.
> >>
> >> PR libstdc++/80762
> >> * include/bits/fs_path.h (path::_Path): Use remove_cv_t and 
> >> is_void.
> >> * include/experimental/bits/fs_path.h (path::_Path): Likewise.
> >> * testsuite/27_io/filesystem/path/construct/80762.cc: New test.
> >> * testsuite/experimental/filesystem/path/construct/80762.cc: New 
> >> test.
> >>
> >
> >Hi Jonathan,
> >
> >One of the new tests fails on bare-metal/newlib targets 
> >(aarch64-elf/arm-eabi):
> >FAIL: experimental/filesystem/path/construct/80762.cc (test for excess 
> >errors)
> >/libstdc++-v3/testsuite/experimental/filesystem/path/construct/80762.cc:20:
> >fatal error: experimental/filesystem: No such file or directory
> >
> >I think there was a similar issue recently, but I don't remember how
> >you fixed it?
>
> Like this. Sorry for forgetting it. Committed to trunk.
OK thanks.
Note that only the one in "experimental" failed, the 27_io one did pass.

>
> Aside:
> There's actually no reason the filesystem::path type can't be enabled
> unconditionally for all targets, as it only relies on portable
> features like std::string, but currently the definitions of its member
> functions are in the libstdc++fs.a library which is only enabled
> conditionally. Later today I plan to move the std::filesystem::path
> type into the main libstdc++.so library, so only the tests for
> std::experimental::filesystem::path will depend on libstdc++fs.a being
> available. Until then, this patch is needed.
>
Great!

Re: [PATCH] Overload std::distance and std::advance for path::iterator

2018-12-13 Thread Jonathan Wakely


On 12/12/18 16:14 +, Jonathan Wakely wrote:

+void
+test04()
+{
+  std::filesystem::path p = "/a/b/c/d/e/f/g";
+  VERIFY( std::distance(p.begin(), p.end()) == 8);
+  auto it = p.begin();
+  std::advance(it, 1);
+  VERIFY( std::distance(p.begin(), it) == 1 );
+  VERIFY( it->native() == "a" );
+  std::advance(it, 3);
+  VERIFY( std::distance(p.begin(), it) == 4 );
+  VERIFY( it->native() == "d" );
+  std::advance(it, -2);
+  VERIFY( std::distance(p.begin(), it) == 2 );
+  VERIFY( it->native() == "b" );


This fails on Windows because native() returns a wide string.

Fixed like so, committed to trunk.


commit 499376d36d7f792174f9080cbd59e7482f7e18b0
Author: Jonathan Wakely 
Date:   Thu Dec 13 12:07:43 2018 +

Fix test to work when path::native() returns wstring

* testsuite/27_io/filesystem/path/itr/traversal.cc: Fix test for
mingw.

diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/itr/traversal.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/itr/traversal.cc
index 55760e82a9a..d77e613ee53 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/itr/traversal.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/itr/traversal.cc
@@ -144,13 +144,13 @@ test04()
   auto it = p.begin();
   std::advance(it, 1);
   VERIFY( std::distance(p.begin(), it) == 1 );
-  VERIFY( it->native() == "a" );
+  VERIFY( it->string() == "a" );
   std::advance(it, 3);
   VERIFY( std::distance(p.begin(), it) == 4 );
-  VERIFY( it->native() == "d" );
+  VERIFY( it->string() == "d" );
   std::advance(it, -2);
   VERIFY( std::distance(p.begin(), it) == 2 );
-  VERIFY( it->native() == "b" );
+  VERIFY( it->string() == "b" );
 }
 
 int

[Committed 1/2] S/390: Use VEC_INEXACT/VEC_NOINEXACT instead of magic numbers.

2018-12-13 Thread Andreas Krebbel

2018-12-13  Andreas Krebbel  

* config/s390/vx-builtins.md ("vec_ctd_s64", "vec_ctd_u64")
("vec_ctsl", "vec_ctul"): Replace 0 with VEC_NOINEXACT.
("vec_double_s64", "vec_double_u64"): Replace 4 with VEC_INEXACT.
---
 gcc/config/s390/vx-builtins.md | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index 1fa5a39..37a64ab 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -1606,7 +1606,7 @@
 (define_expand "vec_ctd_s64"
   [(set (match_operand:V2DF   0 "register_operand" "")
(unspec:V2DF [(match_operand:V2DI 1 "register_operand" "")
- (const_int 4) ; inexact suppressed
+ (const_int VEC_NOINEXACT)
  (const_int VEC_RND_CURRENT)]
 UNSPEC_VEC_VCDGB))
(use (match_operand:QI 2 "const_int_operand" ""))
@@ -1637,7 +1637,7 @@
 (define_expand "vec_ctd_u64"
   [(set (match_operand:V2DF   0 "register_operand" "")
(unspec:V2DF [(match_operand:V2DI 1 "register_operand" "")
- (const_int 4) ; inexact suppressed
+ (const_int VEC_NOINEXACT)
  (const_int VEC_RND_CURRENT)]
 UNSPEC_VEC_VCDLGB))
(use (match_operand:QI 2 "const_int_operand" ""))
@@ -1671,7 +1671,7 @@
 (match_dup 3)))
(set (match_operand:V2DI 0 "register_operand" "")
(unspec:V2DI [(match_dup 4)
- (const_int 4) ; inexact suppressed
+ (const_int VEC_NOINEXACT)
  (const_int VEC_RND_CURRENT)]
 UNSPEC_VEC_VCGDB))]
   "TARGET_VX"
@@ -1704,7 +1704,7 @@
 (match_dup 3)))
(set (match_operand:V2DI 0 "register_operand" "")
(unspec:V2DI [(match_dup 4)
- (const_int 4) ; inexact suppressed
+ (const_int VEC_NOINEXACT)
  (const_int VEC_RND_CURRENT)]
 UNSPEC_VEC_VCLGDB))]
   "TARGET_VX"
@@ -2025,7 +2025,7 @@
 (define_expand "vec_double_s64"
   [(set (match_operand:V2DF   0 "register_operand")
(unspec:V2DF [(match_operand:V2DI 1 "register_operand")
- (const_int 0)  ; inexact suppression disabled
+ (const_int VEC_INEXACT)
  (const_int VEC_RND_CURRENT)]
 UNSPEC_VEC_VCDGB))]
   "TARGET_VX")
@@ -2033,7 +2033,7 @@
 (define_expand "vec_double_u64"
   [(set (match_operand:V2DF   0 "register_operand")
(unspec:V2DF [(match_operand:V2DI 1 "register_operand")
- (const_int 0)  ; inexact suppression disabled
+ (const_int VEC_INEXACT)
  (const_int VEC_RND_CURRENT)]
 UNSPEC_VEC_VCDLGB))]
   "TARGET_VX")
-- 
2.7.4

1 2 >

1 - 100 of 117 matches

Mail list logo