[PATCH] fix and improve strlen conditional handling of merged stores (PR 91183, 91294, 91315)

2019-07-31 Thread Martin Sebor

More extensive testing of the last week's strlen patch for
PR91183 on various non-mainstream targets and with better tests
has exposed a few gaps and a couple of bugs.  The attached patch
addresses all in one change.  I considered splitting it up but
in the end decided the changes were small and sufficiently
isolated that it wasn't necessary.  (If someone feels strongly
otherwise it can be easily split t up.)

The wrong-code bug (PR 91294) is due to handle_store neglecting
to fully consider the case when a single multi-byte store
involving a PHI of two "strings" the same size (so they are merged
into a single int store) but of unequal length.  The function
simply takes the length of the shorter string as the resulting
length when it needs to only set the lower bound of the length
(it does that treating the result as potentially not nul-
terminated).

The gaps are in not handling some MEM_REF forms that come up
in multi-byte assignments (this is the rest of PR 91183 and was
exposed on strictly aligned targets), and in handle_builtin_strlen
discarding the lower bound on a string length instead of exposing
it to downstream passes.  This is PR 91315 that was exposed by
a few cases in the existing tests for PR 91294 failing after
the fix for PR 91294.

Tested on x86_64-linux and spot-checked with a sparc-solaris2.11
cross-compiler.

Martin
PR tree-optimization/91315 - missing strlen lower bound of a string known to be at least N characters
PR tree-optimization/91294 - wrong strlen result of a conditional with an offset
PR tree-optimization/91183 - strlen of a strcpy result with a conditional source not folded

gcc/testsuite/ChangeLog:

	PR tree-optimization/91315
	PR tree-optimization/91294
	PR tree-optimization/91183
	* gcc.dg/strlenopt-44.c: Avoid using length of 1.
	* gcc.dg/strlenopt-70.c: Disable overly optimistic tests.
	* gcc.dg/strlenopt-73.c: New test.
	* gcc.dg/strlenopt-74.c: New test.
	* gcc.dg/strlenopt-75.c: New test.
	* gcc.dg/strlenopt-76.c: New test.
	* gcc.dg/strlenopt-77.c: New test.

gcc/ChangeLog:

	PR tree-optimization/91315
	PR tree-optimization/91294
	PR tree-optimization/91183
	* gimple-fold.c (gimple_fold_builtin_strlen): Add expected argument.
	* tree-ssa-strlen.c (set_strlen_range): Add function argument.
	(maybe_set_strlen_range): Add expected argument.
	(handle_builtin_strlen): Call set_strlen_range.
	(count_nonzero_bytes): Add function arguments.  Handle strinfo
	first.  Handle "single" assignment.
	(handle_store): Set the lower bound of length for multibyte stores
	of unequal lengths.
	* tree-ssa-strlen.h (set_strlen_range): Add function argument.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c	(revision 273914)
+++ gcc/gimple-fold.c	(working copy)
@@ -3751,7 +3751,7 @@ gimple_fold_builtin_strlen (gimple_stmt_iterator *
 
   /* Set the strlen() range to [0, MAXLEN].  */
   if (tree lhs = gimple_call_lhs (stmt))
-set_strlen_range (lhs, maxlen);
+set_strlen_range (lhs, minlen, maxlen);
 
   return false;
 }
Index: gcc/testsuite/gcc.dg/strlenopt-44.c
===
--- gcc/testsuite/gcc.dg/strlenopt-44.c	(revision 273914)
+++ gcc/testsuite/gcc.dg/strlenopt-44.c	(working copy)
@@ -83,7 +83,7 @@ void test_keep (void)
   size_t uchar_max = (unsigned char)-1;
 
   KEEP ("1", 0, UR (1, uchar_max + 1), 1);
-  KEEP ("1\0\3", 1, UR (1, 2), 1);
+  KEEP ("1\0\3", 1, UR (1, 2), 2);
 }
 
 /* { dg-final { scan-tree-dump-times "call_in_true_branch_not_eliminated_" 0 "optimized" } }
Index: gcc/testsuite/gcc.dg/strlenopt-70.c
===
--- gcc/testsuite/gcc.dg/strlenopt-70.c	(revision 273914)
+++ gcc/testsuite/gcc.dg/strlenopt-70.c	(working copy)
@@ -201,14 +201,17 @@ void store_32bit (volatile int i)
   T ("xxx",  uint32_t, 0, I32 ("\1\2\3\0"), == 3);
   T ("xxx",  uint32_t, 0, I32 ("\0\1\2\3"), == 0);
 
-  uint32_t x00332211 = I32 ("123\0");
-  uint32_t x2211 = I32 ("12\0\0");
-  uint32_t x0011 = I32 ("1\0\0\0");
+  uint32_t x123_ = I32 ("123\0");
+  uint32_t x12__ = I32 ("12\0\0");
+  uint32_t x1___ = I32 ("1\0\0\0");
 
-  T ("", uint32_t, 0, i ? x00332211 : x2211, <= 3);
-  T ("", uint32_t, 0, i ? x00332211 : x2211, >= 2);
-  T ("", uint32_t, 0, i ? x00332211 : x0011, <= 3);
-  T ("", uint32_t, 0, i ? x00332211 : x0011, >= 1);
+  // FIXME: Upper bound not implemented yet.
+  /* T ("", uint32_t, 0, i ? x123_ : x12__, <= 3); */
+  T ("", uint32_t, 0, i ? x123_ : x12__, >= 2);
+  T ("", uint32_t, 0, i ? x12__ : x123_, >= 2);
+  /* T ("", uint32_t, 0, i ? x123_ : x1___, <= 3); */
+  T ("", uint32_t, 0, i ? x123_ : x1___, >= 1);
+  T ("", uint32_t, 0, i ? x1___ : x123_, >= 1);
 
   TX ("abcde",  uint32_t, 0, i ? I32 ("1234") : I32 ("1235"), == 5);
   TX ("abcde",  uint32_t, 1, i ? I32 ("1234") : I32 ("1235"), == 5);
@@ -220,7 +223,8 @@ void 

Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-31 Thread Andrew MacLeod

On 7/31/19 4:36 AM, Richard Biener wrote:

On Tue, Jul 30, 2019 at 4:54 PM Andrew MacLeod  wrote:

Everything in this compiler is historical. We did everything we could to
save memory back in the old days, I know, I was there when we did this.
Combining the lattice and the range made sense in that context.  In
fact, you could get rid of the lattice structure entirely with the way
we represent irange... we have varying and undefined values query-able
from the range without anything special.  That seems even more efficient
to me, and that's the way I'd implement it today... no artificial
lattice overlay needed, just the range.

Lattices will eventually go away. I personally don't see any point in us
spinning our wheels re-implementing them when they are slated to be
removed. Until then, there is absolutely nothing wrong with the way it
works right now.  Adding the min and max to the varying node has no
impact on how lattices work, it just allows value_range to interact with
new range code better..

This change is trivial. it actually fixes a few warts. It lets us move
on with other more significant things.

I've been trying to play nice and get agreement, but after a week and a
half it seems like that isn't going to happen. I welcome any further
comments on the patch, but otherwise I will approve the patch.

I will note this is the first time in a very long time that I have had
to exercise my maintainer authority as one of the original architects of
tree-ssa. I have been content to let others review and arrive at a
consensus, but the system is not working this time. Its unclear to me
why you are being so insistent about this trivial matter.  I will
continue to make approvals as necessary going forward, but I still
welcome your input and would prefer to settle on agreement to future
patches.

Fair enough - have fun.

Richard.



In summary, We don't want to do it your way so you wash your hands of it 
and assign all VRP bugs to me stating I am now VRP maintainer.


Aldy started this relatively straightforward submission a month ago (!! 
July 1st!), and we've made virtually no progress since then. You weren't 
holding us up on technical merits, it appeared to be a personal 
preference to not do things our way, despite the explanations and 
cleanups it provided.   I felt I had no choice but to move forward or we 
will not get any code in before stage 1 ends. We're already a month 
behind now, and with vacations looming I fear our initial goals are 
already in serious danger.  Our code certainly won't be in for as long 
as I would have liked.   I still don't know why you were so insistent it 
had to be all your way...    We made a lot of effort to accommodate you 
out of respect for what you do for GCC, it would have been nice to see a 
little in return.


Regardless, you have knowledge in this space which is useful, and we 
welcome your input along the way should you decide to provide it.


Now, on to trying to improve ranges and VRP.

Andrew





[doc PATCH] document variable attribute alias

2019-07-31 Thread Martin Sebor

It was pointed out recently in another forum that GCC doesn't
document attribute alias for variables.  It was also noted in
the same discussion that the semantics of accessing aliases
and their targets can have "surprising" effects because GCC
(as do other compilers) assumes that distinct declarations
with external linkage denote distinct entities.

The attached patch adds text to the manual describing attribute
alias for variables and pointing out the caveat of accessing
the same object using both the alias and the target.

Martin

PS Unlike for function aliases where type mismatches have been
diagnosed by -Wattribute-alias since 8.1, GCC doesn't complain
when the the type of an alias target variable doesn't match that
of its aliases.  Except for top-level qualifiers where it might
make sense to allow the target to be less qualified than
the alias (similarly as with const pointers to non-const objects
or with C++ const references), I assume it's just an oversight.
I will try to look into diagnosing these mismatches as well.
gcc/ChangeLog:

	* doc/extend.texi (Common Variable Attributes): Document alias
	attribute.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 273914)
+++ gcc/doc/extend.texi	(working copy)
@@ -6719,6 +6719,32 @@ attributes.
 The following attributes are supported on most targets.
 
 @table @code
+
+@item alias ("@var{target}")
+@cindex @code{alias} variable attribute
+The @code{alias} variable attribute causes the declaration to be emitted
+as an alias for another symbol known as an @dfn{alias target}.  Except
+for top-level qualifiers the alias target must have the same type as
+the alias.  For instance, the following
+
+@smallexample
+int var_target;
+extern int __attribute__ ((alias ("var_target"))) var_alias;
+@end smallexample
+
+defines @code{var_alias} to be an alias for the @code{var_target} variable.
+
+It is an error if the alias target is not defined in the same translation
+unit as the alias.
+
+Note that in the absence of the attribute GCC assumes that distinct
+declarations with external linkage denote distinct objects.  Using both
+the alias and the alias target to access the same object is undefined
+in a translation unit without a declaration of the alias with the attribute.
+
+This attribute requires assembler and object file support, and may not be
+available on all targets.
+
 @cindex @code{aligned} variable attribute
 @item aligned
 @itemx aligned (@var{alignment})


Re: [PATCH] Implement "P0631R4 Math Constants" for C++20

2019-07-31 Thread Segher Boessenkool
On Wed, Jul 31, 2019 at 08:43:50PM +0200, Marc Glisse wrote:
> On Wed, 31 Jul 2019, Jonathan Wakely wrote:
> 
> >So something like the attached patch.  The glibc  says the
> >values I used have enough digits for IEEE quad-precision:
> >
> >/* The above constants are not adequate for computation using `long 
> >double's.
> > Therefore we provide as an extension constants with similar names as a
> > GNU extension.  Provide enough digits for the 128-bit IEEE quad.  */
> >
> >I don't know if we need more accuracy for IBM double double.
> 
> double double has less precision than quad so it should be fine.

The *precision* (as defined in IEEE 754) of double double is much higher
than that of IEEE quad precision float: "the maximum number of
significant digits that can be represented in a format" -- it's just
that most of those bits have to be zero!

Neither double-double is a subset of QP, nor the other way around.  This
is *the* problem we have in rs6000 with these two float formats: GCC
cannot handle if two FP formats are like that.

(The math constants here are fine for double double, of course).


Segher


Re: [PATCH] Fix typo LIBGCCJIT_SYMLINK -> LIBGCCJIT_SONAME_SYMLINK

2019-07-31 Thread Arvind Sankar
Hi any feedback on this?

Thanks.

On Tue, Jul 23, 2019 at 02:49:13PM -0400, Arvind Sankar wrote:
> There seems to be a typo in gcc/jit/Make-lang.in where it references an
> undefined LIBGCCJIT_SYMLINK instead of LIBGCCJIT_SONAME_SYMLINK.
> ---
>  gcc/jit/Make-lang.in | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in
> index 660f54d78bd..5cfd6035d8c 100644
> --- a/gcc/jit/Make-lang.in
> +++ b/gcc/jit/Make-lang.in
> @@ -65,7 +65,7 @@ LIBGCCJIT_SONAME_OPTION = \
>-Wl$(COMMA)$(LD_SONAME_OPTION)$(COMMA)$(LIBGCCJIT_SONAME))
>  
>  jit: $(LIBGCCJIT_FILENAME) \
> - $(LIBGCCJIT_SYMLINK) \
> + $(LIBGCCJIT_SONAME_SYMLINK) \
>   $(LIBGCCJIT_LINKER_NAME_SYMLINK) \
>   $(FULL_DRIVER_NAME)
>  
> @@ -301,9 +301,8 @@ jit.uninstall:
>  # We just have to delete files specific to us.
>  
>  jit.mostlyclean:
> - -rm -f $(LIBGCCJIT_FILENAME) $(LIBGCCJIT_SYMLINK)
> + -rm -f $(LIBGCCJIT_FILENAME) $(LIBGCCJIT_SONAME_SYMLINK)
>   -rm -f $(LIBGCCJIT_LINKER_NAME_SYMLINK) $(FULL_DRIVER_NAME)
> - -rm -f $(LIBGCCJIT_SONAME)
>   -rm -f $(jit_OBJS)
>  
>  jit.clean:
> -- 
> 2.21.0


Re: [PATCH v2] RISC-V: Raise error on unexpected ISA string at end.

2019-07-31 Thread Jim Wilson
On Wed, Jul 31, 2019 at 9:41 AM Maxim Blinov  wrote:
> gcc/ChangeLog:
> 2019-07-31  Maxim Blinov  
>
> * common/config/riscv/riscv-common.c: Check -march string ends
> with null.
>
> gcc/testsuite/ChangeLog:
> 2019-07-31  Maxim Blinov  
>
> * gcc.target/riscv/attribute-10.c: New test.

Looks good.  I committed the revised patch.

Jim


Re: [PATCH] Implement "P0631R4 Math Constants" for C++20

2019-07-31 Thread Jonathan Wakely

On 31/07/19 21:16 +0200, Jakub Jelinek wrote:

On Wed, Jul 31, 2019 at 07:58:55PM +0100, Jonathan Wakely wrote:

> Perhaps add __extension__ before the literals too?

Does that do anything for Q literals? I thought I'd tried it
previously and decided it didn't. I'm happy to add it though.


You're right, it works well that way only in C, not in C++, on say:
long double x = 1.234Q;
long double y = __extension__ 1.234Q;
This is pedwarned not when actually parsing the number, but
when tokenizing it, which for C++ we do before parsing for all tokens.

We could add a hack for it, say don't emit the pedwarn if the previous token
is RID_EXTENSION, but it wouldn't be anything close to how we handle
__extension__ during parsing (where pedantic is disabled while parsing the
whole cast expression after it).


Ah yes, I definitely looked into this before.

If I use the Q suffix like so ...

#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128)
 template<>
   inline constexpr __float128 e_v<__float128>
 = 2.718281828459045235360287471352662498Q;

...
#endif

Then it produces 13 pedwarns with -pedantic -Wsystem-headers:

In file included from num.cc:1:
/home/jwakely/gcc/10/include/c++/10.0.0/numbers:139:7: warning: non-standard 
suffix on floating constant [-Wpedantic]
 139 |   = 2.718281828459045235360287471352662498Q;
 |   ^

Users would get those warnings even if not using the __float128
constants. Admittedly, if they use -pedantic combined with
-std=gnu++2a, and combine that with -Wsystem-headers, they deserve
what they get.

I think it's better to have accurate __float128 constants even if they
warn with certain combinations of warning flags.




Re: [C++ Patch] Improve delete_sanity locations

2019-07-31 Thread David Malcolm
On Wed, 2019-07-31 at 15:14 -0400, Jason Merrill wrote:
> On 7/24/19 5:24 PM, Paolo Carlini wrote:
> > this takes care of the four locations, two warnings and two errors.
> > I'm 
> > also adding the missing complain & tf_warning or tf_error guards,
> > I 
> > don't have a SFINAE testcase failing but since the function takes
> > a 
> > tsubst_flags_t argument I think it's the right thing to do. Tested 
> > x86_64-linux.
> 
> OK.
> 
> > By the way, shall we add to cp-tree.h, at least temporarily, a:
> > 
> > inline location_t
> > cp_expr_loc_or_loc (const_tree t)
> > {
> >return cp_expr_loc_or_loc (t, input_location);
> > }
> > 
> > overload? We could use it in a ton of places.
> 
> I'd call it "cp_expr_loc_or_here".  But David removed
> EXPR_LOC_OR_HERE a 
> few years back, so I'd like him to weigh in.

I don't care for "cp_expr_loc_or_loc".

By "_or_here" do you mean "or input_location"?

Calling it "cp_expr_loc_or_input_location" would spell out what it
does, but would be rather long.

Dave


C++ PATCH for c++/91264 - detect modifying const objects in constexpr

2019-07-31 Thread Marek Polacek
One of the features of constexpr is that it doesn't allow UB; and such UB must
be detected at compile-time.  So running your code in a context that requires
a constant expression should ensure that the code in question is free of UB.
In effect, constexpr can serve as a sanitizer.  E.g. this article describes in
in more detail:


[dcl.type.cv]p4 says "Any attempt to modify a const object during its lifetime
results in undefined behavior." However, as the article above points out, we
aren't detecting that case in constexpr evaluation.

This patch fixes that.  It's not that easy, though, because we have to keep in
mind [class.ctor]p5:
"A constructor can be invoked for a const, volatile or const volatile object.
const and volatile semantics are not applied on an object under construction.
They come into effect when the constructor for the most derived object ends."

I handled this by keeping a hash set which tracks objects under construction.
I considered other options, such as going up call_stack, but that wouldn't
work with trivial constructor/op=.  It was also interesting to find out that
the definition of TREE_HAS_CONSTRUCTOR says "When appearing in a FIELD_DECL,
it means that this field has been duly initialized in its constructor" though
nowhere in the codebase do we set TREE_HAS_CONSTRUCTOR on a FIELD_DECL as far
as I can see.  Unfortunately, using this bit proved useless for my needs here.

Also, be mindful of mutable subobjects.

Does this approach look like an appropriate strategy for tracking objects'
construction?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-07-31  Marek Polacek  

PR c++/91264 - detect modifying const objects in constexpr.
* constexpr.c (objects_under_construction): New hash set.
(new_object_under_construction): New.
(remove_object_under_construction): New.
(object_under_construction_p): New.
(modifying_const_object_error): New.
(cxx_eval_call_expression): Save the objects being constructed
into the hash set.
(modifying_const_object_p): New.
(cxx_eval_store_expression): Detect modifying a const object
during constant expression evaluation.
(cxx_eval_increment_expression): Use a better location when building
up the store.

* g++.dg/cpp1y/constexpr-tracking-const1.C: New test.
* g++.dg/cpp1y/constexpr-tracking-const2.C: New test.
* g++.dg/cpp1y/constexpr-tracking-const3.C: New test.
* g++.dg/cpp1y/constexpr-tracking-const4.C: New test.
* g++.dg/cpp1y/constexpr-tracking-const5.C: New test.
* g++.dg/cpp1y/constexpr-tracking-const6.C: New test.
* g++.dg/cpp1y/constexpr-tracking-const7.C: New test.
* g++.dg/cpp1y/constexpr-tracking-const8.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index c1b8b9b8a5d..e1ee72f0343 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -1575,6 +1575,59 @@ clear_no_implicit_zero (tree ctor)
 }
 }
 
+/* A hash set used for tracking objects under construction.  This is used
+   for detecting modifying constant objects during constant expression
+   evaluation.  */
+static GTY(()) hash_set *objects_under_construction;
+
+/* Add OBJ to the hash set above.  */
+
+static void
+new_object_under_construction (const_tree obj)
+{
+  if (!CLASS_TYPE_P (TREE_TYPE (obj))
+  /* Modifying non-const objects is allowed any time.  */
+  || !CP_TYPE_CONST_P (TREE_TYPE (obj)))
+return;
+
+  if (!objects_under_construction)
+objects_under_construction = hash_set::create_ggc (3);
+  objects_under_construction->add (CONST_CAST_TREE (obj));
+}
+
+/* Remove OBJ from the hash set.  */
+
+static void
+remove_object_under_construction (const_tree obj)
+{
+  if (objects_under_construction)
+objects_under_construction->remove (CONST_CAST_TREE (obj));
+}
+
+/* Return true if OBJ is an object currently under construction.  */
+
+static bool
+object_under_construction_p (const_tree obj)
+{
+  if (!CLASS_TYPE_P (TREE_TYPE (obj)))
+return false;
+
+  return (objects_under_construction
+ && objects_under_construction->contains (CONST_CAST_TREE (obj)));
+}
+
+/* Complain about a const object OBJ being modified in a constant expression.
+   EXPR is the MODIFY_EXPR expression performing the modification.  */
+
+static void
+modifying_const_object_error (tree expr, tree obj)
+{
+  location_t loc = cp_expr_loc_or_loc (expr, input_location);
+  error_at (loc, "modifying a const object %qE is not allowed in "
+   "a constant expression", TREE_OPERAND (expr, 0));
+  inform (location_of (obj), "originally declared % here");
+}
+
 /* Subroutine of cxx_eval_constant_expression.
Evaluate the call expression tree T in the context of OLD_CALL expression
evaluation.  */
@@ -1694,8 +1747,12 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 

Re: [PATCH] Implement "P0631R4 Math Constants" for C++20

2019-07-31 Thread Jakub Jelinek
On Wed, Jul 31, 2019 at 07:58:55PM +0100, Jonathan Wakely wrote:
> > Perhaps add __extension__ before the literals too?
> 
> Does that do anything for Q literals? I thought I'd tried it
> previously and decided it didn't. I'm happy to add it though.

You're right, it works well that way only in C, not in C++, on say:
long double x = 1.234Q;
long double y = __extension__ 1.234Q;
This is pedwarned not when actually parsing the number, but
when tokenizing it, which for C++ we do before parsing for all tokens.

We could add a hack for it, say don't emit the pedwarn if the previous token
is RID_EXTENSION, but it wouldn't be anything close to how we handle
__extension__ during parsing (where pedantic is disabled while parsing the
whole cast expression after it).

Jakub


Re: [PATCH] PR libstdc++/51333 Define recursive_init_error constructor non-inline

2019-07-31 Thread Jonathan Wakely

On 29/07/19 15:27 +0100, Jonathan Wakely wrote:

The recursive_init_error class is defined in a header, with an inline
constructor, but the definition of the vtable and destructor are not
exported from the shared library. With -fkeep-inline-functions the
constructor gets emitted in user code, and requires the (non-exported)
vtable. This fails to link.

As far as I can tell, the recursive_init_error class definition was
moved into  so it could be documented with Doxygen, not for
any technical reason. But now it's there (and documented), somebody
could be relying on it, by catching that type and possibly performing
derived-to-base conversions to the std::exception base class. So the
conservative fix is to leave the class definition in the header but make
the constructor non-inline. This still allows the type to be caught and
still defines its base class. User code can no longer construct objects
of that type, but that's not something we need to support.


Actually they could never construct them, because the vtable wasn't
exported. So making the constructor non-inline has no semantic effect.
The only difference is that throwing a recursive_init_error when
static init fails will not inline the constructor, which is fine.

So I see no reason to delay backporting this, and will do so for the
upcoming 9.2 release.




Re: [C++ Patch] Improve delete_sanity locations

2019-07-31 Thread Jason Merrill

On 7/24/19 5:24 PM, Paolo Carlini wrote:
this takes care of the four locations, two warnings and two errors. I'm 
also adding the missing complain & tf_warning or tf_error guards, I 
don't have a SFINAE testcase failing but since the function takes a 
tsubst_flags_t argument I think it's the right thing to do. Tested 
x86_64-linux.


OK.


By the way, shall we add to cp-tree.h, at least temporarily, a:

inline location_t
cp_expr_loc_or_loc (const_tree t)
{
   return cp_expr_loc_or_loc (t, input_location);
}

overload? We could use it in a ton of places.


I'd call it "cp_expr_loc_or_here".  But David removed EXPR_LOC_OR_HERE a 
few years back, so I'd like him to weigh in.


Jason


[PATCH] Qualify call to prevent ADL

2019-07-31 Thread Jonathan Wakely

* include/std/memory (make_obj_using_allocator): Qualify call to
uses_allocator_construction_args.

Tested x86_64-linux, committed to trunk

Backport to gcc-9-branch to follow.


commit 0c3543005cee13158806fe2909de9920522d2f9e
Author: redi 
Date:   Wed Jul 31 19:08:56 2019 +

Qualify call to prevent ADL

* include/std/memory (make_obj_using_allocator): Qualify call to
uses_allocator_construction_args.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@273945 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/std/memory b/libstdc++-v3/include/std/memory
index 3036802f8c3..0a483d2d8d1 100644
--- a/libstdc++-v3/include/std/memory
+++ b/libstdc++-v3/include/std/memory
@@ -375,8 +375,9 @@ get_pointer_safety() noexcept { return 
pointer_safety::relaxed; }
 inline _Tp
 make_obj_using_allocator(const _Alloc& __a, _Args&&... __args)
 {
-  return std::make_from_tuple<_Tp>(uses_allocator_construction_args<_Tp>(
-   __a, std::forward<_Args>(__args)...));
+  return std::make_from_tuple<_Tp>(
+ std::uses_allocator_construction_args<_Tp>(__a,
+   std::forward<_Args>(__args)...));
 }
 
   template


Re: [PATCH] Implement "P0631R4 Math Constants" for C++20

2019-07-31 Thread Jonathan Wakely

On 31/07/19 20:50 +0200, Jakub Jelinek wrote:

On Wed, Jul 31, 2019 at 07:37:03PM +0100, Jonathan Wakely wrote:

On 31/07/19 19:07 +0200, Marc Glisse wrote:
> On Wed, 31 Jul 2019, Jonathan Wakely wrote:
>
> > The values of the constants are taken from Glibc where the equivalent
> > constant exists, or by rounding the actual constant to the same number
> > of digits as the Glibc constants have.
>
> How does it behave with __float128? I think that with -std=gnu++2a, it
> counts as a floating point type, but the constant first becomes a long
> double and thus loses precision.

Indeed it does:

#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128)
 template<>
   struct __is_floating_point_helper<__float128>
   : public true_type { };
#endif

We could add explicit specializations for __float128 and use the Q
suffix. That's not valid when __STRICT_ANSI__ is defined (unless
-fext-numeric-literals is also used) but as the snippet above shows,
__float128 isn't a floating-point type when __STRICT_ANSI__ is
defined.


Perhaps add __extension__ before the literals too?


Does that do anything for Q literals? I thought I'd tried it
previously and decided it didn't. I'm happy to add it though.


I don't know if we need more accuracy for IBM double double.


I'd think the *L constants should be good enough.


OK, great.




Re: [C++ PATCH] PR c++/90590 Suppress warning for enumeration value not handled in switch warning

2019-07-31 Thread Jason Merrill

On 7/15/19 9:47 AM, Matthew Beliveau wrote:

Okay I kept the TYPE_MAIN_VARIANT and dropped the accidental new line!
Hopefully this should be fine!


OK, thanks.

Jason


Re: C++ PATCH for c++/89906 (GCC 8 backport)

2019-07-31 Thread Jason Merrill

On 7/12/19 4:16 PM, Marek Polacek wrote:

In order to fix 89906 in GCC 8, we need to backport 86098.
I think the patch is safe to be backported.

Tested x86_64-linux, ok for 8?


OK.

Jason


Re: C++ PATCH to add test for c++/91104

2019-07-31 Thread Jason Merrill

On 7/17/19 1:51 PM, Marek Polacek wrote:

This was a wrong code issue where we printed
2 3 1
1 2 3
instead of
1 2 3
1 2 3
but it was fixed by r271705.  I don't know of a good way to check
the auto... expansion here so I used dg-output.


You could call another function with the expansion and check the 
arguments there?


Jason


Re: [PATCH] Implement "P0631R4 Math Constants" for C++20

2019-07-31 Thread Jakub Jelinek
On Wed, Jul 31, 2019 at 07:37:03PM +0100, Jonathan Wakely wrote:
> On 31/07/19 19:07 +0200, Marc Glisse wrote:
> > On Wed, 31 Jul 2019, Jonathan Wakely wrote:
> > 
> > > The values of the constants are taken from Glibc where the equivalent
> > > constant exists, or by rounding the actual constant to the same number
> > > of digits as the Glibc constants have.
> > 
> > How does it behave with __float128? I think that with -std=gnu++2a, it
> > counts as a floating point type, but the constant first becomes a long
> > double and thus loses precision.
> 
> Indeed it does:
> 
> #if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128)
>  template<>
>struct __is_floating_point_helper<__float128>
>: public true_type { };
> #endif
> 
> We could add explicit specializations for __float128 and use the Q
> suffix. That's not valid when __STRICT_ANSI__ is defined (unless
> -fext-numeric-literals is also used) but as the snippet above shows,
> __float128 isn't a floating-point type when __STRICT_ANSI__ is
> defined.

Perhaps add __extension__ before the literals too?

> I don't know if we need more accuracy for IBM double double.

I'd think the *L constants should be good enough.

Jakub


[C++ PATCH] PR c++/90538 - multiple expansions of capture packs

2019-07-31 Thread Jason Merrill
Previously, with init-capture the type of the closure field was a
DECLTYPE_TYPE of the initializer.  But since each time we tsubst a lambda we
get a different lambda, that meant that if the initializer is a lambda, we'd
end up with different closure types in the field and initializer after
substitution (PR 87322).  We dealt with this by remembering the lambda
instantiation within each pack expansion element, using
local_specialization_stack to separate the elements.  But that broke this
testcase, because it lost lambda capture proxies that also use
local_specializations.

So, this patch removes the local_specializations changes from that patch and
fixes 87322 differently, by giving init-capture fields 'auto' type and doing
deduction later.  There's a bit of a kludge to get the right number of
fields by pretending that 'auto...' uses the parameter packs from the
initializer, but it does the trick.

Tested x86_64-pc-linux-gnu, applying to trunk.

* cp-tree.h (DECLTYPE_FOR_INIT_CAPTURE): Remove.
* lambda.c (add_capture): Copy parameter packs from init.
(lambda_capture_field_type): Always use auto for init-capture.
* pt.c (uses_parameter_packs): Return tree.
(tsubst) [DECLTYPE_TYPE]: Remove init-capture handling.
(gen_elem_of_pack_expansion_instantiation): Don't push
local_specialization_stack.
(prepend_one_capture): New.
(tsubst_lambda_expr): Use it.  Don't touch local_specializations.
(do_auto_deduction): Avoid redundant error.
---
 gcc/cp/cp-tree.h  |  9 +--
 gcc/cp/lambda.c   | 30 +---
 gcc/cp/pt.c   | 73 ++-
 .../g++.dg/cpp0x/lambda/lambda-variadic9.C| 16 
 gcc/testsuite/g++.dg/cpp0x/range-for19.C  |  2 +-
 gcc/testsuite/g++.dg/cpp1y/lambda-init16.C|  2 +-
 gcc/cp/ChangeLog  | 12 +++
 7 files changed, 91 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-variadic9.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c8fa29938e4..c6a1eff9ce1 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -423,7 +423,6 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   LAMBDA_EXPR_MUTABLE_P (in LAMBDA_EXPR)
   DECL_FINAL_P (in FUNCTION_DECL)
   QUALIFIED_NAME_IS_TEMPLATE (in SCOPE_REF)
-  DECLTYPE_FOR_INIT_CAPTURE (in DECLTYPE_TYPE)
   CONSTRUCTOR_IS_DEPENDENT (in CONSTRUCTOR)
   TINFO_USED_TEMPLATE_ID (in TEMPLATE_INFO)
   PACK_EXPANSION_SIZEOF_P (in *_PACK_EXPANSION)
@@ -4484,12 +4483,10 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
   (DECLTYPE_TYPE_CHECK (NODE))->type_common.string_flag
 
 /* These flags indicate that we want different semantics from normal
-   decltype: lambda capture just drops references, init capture
-   uses auto semantics, lambda proxies look through implicit dereference.  */
+   decltype: lambda capture just drops references,
+   lambda proxies look through implicit dereference.  */
 #define DECLTYPE_FOR_LAMBDA_CAPTURE(NODE) \
   TREE_LANG_FLAG_0 (DECLTYPE_TYPE_CHECK (NODE))
-#define DECLTYPE_FOR_INIT_CAPTURE(NODE) \
-  TREE_LANG_FLAG_1 (DECLTYPE_TYPE_CHECK (NODE))
 #define DECLTYPE_FOR_LAMBDA_PROXY(NODE) \
   TREE_LANG_FLAG_2 (DECLTYPE_TYPE_CHECK (NODE))
 #define DECLTYPE_FOR_REF_CAPTURE(NODE) \
@@ -6794,7 +6791,7 @@ extern bool maybe_instantiate_noexcept(tree, 
tsubst_flags_t = tf_warning_or_er
 extern tree instantiate_decl   (tree, bool, bool);
 extern int comp_template_parms (const_tree, const_tree);
 extern bool builtin_pack_fn_p  (tree);
-extern bool uses_parameter_packs(tree);
+extern tree uses_parameter_packs(tree);
 extern bool template_parameter_pack_p   (const_tree);
 extern bool function_parameter_pack_p  (const_tree);
 extern bool function_parameter_expanded_from_pack_p (tree, tree);
diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 758773b8a85..c4fed168269 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -213,16 +213,7 @@ lambda_capture_field_type (tree expr, bool explicit_init_p,
   tree type;
   bool is_this = is_this_parameter (tree_strip_nop_conversions (expr));
 
-  if (!is_this && type_dependent_expression_p (expr))
-{
-  type = cxx_make_type (DECLTYPE_TYPE);
-  DECLTYPE_TYPE_EXPR (type) = expr;
-  DECLTYPE_FOR_LAMBDA_CAPTURE (type) = true;
-  DECLTYPE_FOR_INIT_CAPTURE (type) = explicit_init_p;
-  DECLTYPE_FOR_REF_CAPTURE (type) = by_reference_p;
-  SET_TYPE_STRUCTURAL_EQUALITY (type);
-}
-  else if (!is_this && explicit_init_p)
+  if (!is_this && explicit_init_p)
 {
   tree auto_node = make_auto ();
   
@@ -233,6 +224,14 @@ lambda_capture_field_type (tree expr, bool explicit_init_p,
type = build_reference_type (type);
   type = do_auto_deduction (type, expr, 

[C++ PATCH] Fix copy_node of TEMPLATE_INFO.

2019-07-31 Thread Jason Merrill
build_clone uses copy_node to duplicate the TEMPLATE_INFO for a clone, but
this clears TREE_CHAIN, which was TI_ARGS in a TEMPLATE_INFO.

Tested x86_64-pc-linux-gnu, applying to trunk.

* cp-tree.h (struct tree_template_info): Use tree_base instead of
tree_common.  Add tmpl and args fields.
(TI_TEMPLATE, TI_ARGS): Adjust.
---
 gcc/cp/cp-tree.h | 10 +++---
 gcc/cp/cp-objcp-common.c |  1 -
 gcc/cp/ChangeLog |  7 +++
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 688924cdd12..c8fa29938e4 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1437,7 +1437,9 @@ typedef struct qualified_typedef_usage_s 
qualified_typedef_usage_t;
   (TREE_LANG_FLAG_1 (TEMPLATE_INFO_CHECK (NODE)))
 
 struct GTY(()) tree_template_info {
-  struct tree_common common;
+  struct tree_base base;
+  tree tmpl;
+  tree args;
   vec *typedefs_needing_access_checking;
 };
 
@@ -3420,8 +3422,10 @@ struct GTY(()) lang_decl {
? (TYPE_LANG_SLOT_1 (NODE) = (VAL)) \
: (DECL_TEMPLATE_INFO (TYPE_NAME (NODE)) = (VAL)))
 
-#define TI_TEMPLATE(NODE) TREE_TYPE (TEMPLATE_INFO_CHECK (NODE))
-#define TI_ARGS(NODE) TREE_CHAIN (TEMPLATE_INFO_CHECK (NODE))
+#define TI_TEMPLATE(NODE) \
+  ((struct tree_template_info*)TEMPLATE_INFO_CHECK (NODE))->tmpl
+#define TI_ARGS(NODE) \
+  ((struct tree_template_info*)TEMPLATE_INFO_CHECK (NODE))->args
 #define TI_PENDING_TEMPLATE_FLAG(NODE) \
   TREE_LANG_FLAG_1 (TEMPLATE_INFO_CHECK (NODE))
 /* For a given TREE_VEC containing a template argument list,
diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index 21d162e5d0c..4c95180bd4b 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -373,7 +373,6 @@ cp_common_init_ts (void)
   MARK_TS_COMMON (TEMPLATE_TYPE_PARM);
   MARK_TS_COMMON (TEMPLATE_PARM_INDEX);
   MARK_TS_COMMON (OVERLOAD);
-  MARK_TS_COMMON (TEMPLATE_INFO);
   MARK_TS_COMMON (TYPENAME_TYPE);
   MARK_TS_COMMON (TYPEOF_TYPE);
   MARK_TS_COMMON (UNDERLYING_TYPE);
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 425eea11a7c..06024eb506d 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,10 @@
+2019-07-31  Jason Merrill  
+
+   Fix copy_node of TEMPLATE_INFO.
+   * cp-tree.h (struct tree_template_info): Use tree_base instead of
+   tree_common.  Add tmpl and args fields.
+   (TI_TEMPLATE, TI_ARGS): Adjust.
+
 2019-07-30  Martin Liska  
 
PR tree-optimization/91270

base-commit: c9b21954f008e76123a4b79d416ce273cb749cc5
-- 
2.21.0



Re: [PATCH] Implement "P0631R4 Math Constants" for C++20

2019-07-31 Thread Marc Glisse

On Wed, 31 Jul 2019, Jonathan Wakely wrote:


So something like the attached patch.  The glibc  says the
values I used have enough digits for IEEE quad-precision:

/* The above constants are not adequate for computation using `long double's.
 Therefore we provide as an extension constants with similar names as a
 GNU extension.  Provide enough digits for the 128-bit IEEE quad.  */

I don't know if we need more accuracy for IBM double double.


double double has less precision than quad so it should be fine.

--
Marc Glisse


Re: [PATCH] Implement "P0631R4 Math Constants" for C++20

2019-07-31 Thread Jonathan Wakely

On 31/07/19 19:07 +0200, Marc Glisse wrote:

On Wed, 31 Jul 2019, Jonathan Wakely wrote:


The values of the constants are taken from Glibc where the equivalent
constant exists, or by rounding the actual constant to the same number
of digits as the Glibc constants have.


How does it behave with __float128? I think that with -std=gnu++2a, it 
counts as a floating point type, but the constant first becomes a long 
double and thus loses precision.


Indeed it does:

#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128)
 template<>
   struct __is_floating_point_helper<__float128>
   : public true_type { };
#endif

We could add explicit specializations for __float128 and use the Q
suffix. That's not valid when __STRICT_ANSI__ is defined (unless 
-fext-numeric-literals is also used) but as the snippet above shows,

__float128 isn't a floating-point type when __STRICT_ANSI__ is
defined.

So something like the attached patch.  The glibc  says the
values I used have enough digits for IEEE quad-precision:

/* The above constants are not adequate for computation using `long double's.
  Therefore we provide as an extension constants with similar names as a
  GNU extension.  Provide enough digits for the 128-bit IEEE quad.  */

I don't know if we need more accuracy for IBM double double.


diff --git a/libstdc++-v3/include/std/numbers b/libstdc++-v3/include/std/numbers
index b8e38dd8080..314b130fbd5 100644
--- a/libstdc++-v3/include/std/numbers
+++ b/libstdc++-v3/include/std/numbers
@@ -133,6 +133,72 @@ namespace numbers
   inline constexpr double egamma = egamma_v;
   inline constexpr double phi = phi_v;
 
+#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128)
+  template<>
+inline constexpr __float128 e_v<__float128>
+  = 2.718281828459045235360287471352662498Q;
+
+  /// log_2 e
+  template<>
+inline constexpr __float128 log2e_v<__float128>
+  = 1.442695040888963407359924681001892137Q;
+
+  /// log_10 e
+  template<>
+inline constexpr __float128 log10e_v<__float128>
+  = 0.434294481903251827651128918916605082Q;
+
+  /// pi
+  template<>
+inline constexpr __float128 pi_v<__float128>
+  = 3.141592653589793238462643383279502884Q;
+
+  /// 1/pi
+  template<>
+inline constexpr __float128 inv_pi_v<__float128>
+  = 0.318309886183790671537767526745028724Q;
+
+  /// 1/sqrt(pi)
+  template<>
+inline constexpr __float128 inv_sqrtpi_v<__float128>
+  = 0.564189583547756286948079451560772586Q;
+
+  /// log_e 2
+  template<>
+inline constexpr __float128 ln2_v<__float128>
+  = 0.693147180559945309417232121458176568Q;
+
+  /// log_e 10
+  template<>
+inline constexpr __float128 ln10_v<__float128>
+  = 2.302585092994045684017991454684364208Q;
+
+  /// sqrt(2)
+  template<>
+inline constexpr __float128 sqrt2_v<__float128>
+  = 1.414213562373095048801688724209698079Q;
+
+  /// sqrt(3)
+  template<>
+inline constexpr __float128 sqrt3_v<__float128>
+  = 1.732050807568877293527446341505872367Q;
+
+  /// 1/sqrt(3)
+  template<>
+inline constexpr __float128 inv_sqrt3_v<__float128>
+  = 0.577350269189625764509148780501957456Q;
+
+  /// The Euler-Mascheroni constant
+  template<>
+inline constexpr __float128 egamma_v<__float128>
+  = 0.577215664901532860606512090082402431Q;
+
+  /// The golden ratio, (1+sqrt(5))/2
+  template<>
+inline constexpr __float128 phi_v<__float128>
+  = 1.618033988749894848204586834365638118Q;
+#endif // USE_FLOAT128
+
 } // namespace numbers
 /// @}
 _GLIBCXX_END_NAMESPACE_VERSION


[committed] ipa-devirt: make qsort helpers static

2019-07-31 Thread Alexander Monakov
Hi,

noticed this when checking a qsort patch, applying as obvious.

In the C days, -Wmissing-prototypes would have caught this, but C++ needs a
different warning (-Wmissing-declarations), so we get inevitable bitrot :/

Alexander

* ipa-devirt.c (type_warning_cmp): Make static.
(decl_warning_cmp): Ditto.

--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -3526,7 +3526,7 @@ likely_target_p (struct cgraph_node *n)
 /* Compare type warning records P1 and P2 and choose one with larger count;
helper for qsort.  */

-int
+static int
 type_warning_cmp (const void *p1, const void *p2)
 {
   const odr_type_warn_count *t1 = (const odr_type_warn_count *)p1;
@@ -3542,7 +3542,7 @@ type_warning_cmp (const void *p1, const void *p2)
 /* Compare decl warning records P1 and P2 and choose one with larger count;
helper for qsort.  */

-int
+static int
 decl_warning_cmp (const void *p1, const void *p2)
 {
   const decl_warn_count *t1 = *(const decl_warn_count * const *)p1;



Re: [PATCH, rs6000] Fix PR91050 by adding a DRIVER_SELF_SPECS spec

2019-07-31 Thread Peter Bergner
On 7/26/19 9:28 AM, Iain Sandoe wrote:
> The patch needs one small amendment  to succeed in bootstrap.
> I applied the amended to i686, x86_64 and powerpc Darwin with no apparent
> new problems.  From the Darwin perspective, the patch is OK with the 
> amendment.

Ok, thanks.  For completeness, I have included the final patch that I
committed below, incorporating your and Uros' suggested changes.
Thank you both!



>>> +#define DRIVER_SELF_SPECS \
>>> +  "%{mdejagnu-cpu=*: %>> +   %{mdejagnu-tune=*: %>> +   %{mdejagnu-*: %>> +   SUBTARGET_DRIVER_SELF_SPECS
> 
> NOTE (only a note): most of the driver self-specs that are more than a single 
> line
> seem to be written as comma-separated strings each containing a single spec.
> I am not sure what material difference that makes, and it doesn’t seem to 
> affect
> the functionality of your patch.  I also  asked Joseph on IRC yesterday and 
> he had
> no specific advice.

I see them used both ways, with and without the ','.  Playing around a little,
I see if you use ',' between clauses, then you also need to have each clause 
within
its own set of '"'s.  So "foo", "bar", "baz" etc. is fine, as is "foo bar baz", 
but
"foo, bar, baz" is not ok.  I guess it must have something to do with how we
scan the spec string.  Anyway, since the darwin SUBTARGET_DRIVER_SELF_SPECS
code uses ','s, I rewrote the rs6000.h DRIVER_SELF_SPECS that way too, so that
they're similar.

Peter

PR target/91050
* config/rs6000/rs6000.opt (mdejagnu-cpu=): Delete option.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
use of deleted rs6000_dejagnu_cpu_index variable.
* config/rs6000/rs6000.h (DRIVER_SELF_SPECS): Define.
(SUBTARGET_DRIVER_SELF_SPECS): Likewise.
* config/darwin.h (DRIVER_SELF_SPECS): Rename from this ...
(SUBTARGET_DRIVER_SELF_SPECS): ...to this.
* config/i386/i386.h (DRIVER_SELF_SPECS): Define.
(SUBTARGET_DRIVER_SELF_SPECS): Likewise.

Index: gcc/config/rs6000/rs6000.opt
===
--- gcc/config/rs6000/rs6000.opt(revision 273940)
+++ gcc/config/rs6000/rs6000.opt(revision 273941)
@@ -388,13 +388,6 @@ mtune=
 Target RejectNegative Joined Var(rs6000_tune_index) Init(-1) 
Enum(rs6000_cpu_opt_value) Save
 -mtune=Schedule code for given CPU.
 
-; Only for use in the testsuite.  This simply overrides -mcpu=.  With older
-; versions of Dejagnu the command line arguments you set in RUNTESTFLAGS
-; override those set in the testcases; with this option, the testcase will
-; always win.
-mdejagnu-cpu=
-Target Undocumented RejectNegative Joined Var(rs6000_dejagnu_cpu_index) 
Init(-1) Enum(rs6000_cpu_opt_value) Save
-
 mtraceback=
 Target RejectNegative Joined Enum(rs6000_traceback_type) Var(rs6000_traceback)
 -mtraceback=[full,part,no] Select type of traceback table.
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 273940)
+++ gcc/config/rs6000/rs6000.c  (revision 273941)
@@ -3489,9 +3489,6 @@ rs6000_option_override_internal (bool gl
   /* Don't override by the processor default if given explicitly.  */
   set_masks &= ~rs6000_isa_flags_explicit;
 
-  if (global_init_p && rs6000_dejagnu_cpu_index >= 0)
-rs6000_cpu_index = rs6000_dejagnu_cpu_index;
-
   /* Process the -mcpu= and -mtune= argument.  If the user changed
  the cpu in a target attribute or pragma, but did not specify a tuning
  option, use the cpu for the tuning option rather than the option specified
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 273940)
+++ gcc/config/rs6000/rs6000.h  (revision 273941)
@@ -77,6 +77,20 @@
 #define PPC405_ERRATUM77 0
 #endif
 
+#ifndef SUBTARGET_DRIVER_SELF_SPECS
+# define SUBTARGET_DRIVER_SELF_SPECS ""
+#endif
+
+/* Only for use in the testsuite: -mdejagnu-cpu= simply overrides -mcpu=.
+   With older versions of Dejagnu the command line arguments you set in
+   RUNTESTFLAGS override those set in the testcases; with this option,
+   the testcase will always win.  Ditto for -mdejagnu-tune=.  */
+#define DRIVER_SELF_SPECS \
+  "%{mdejagnu-cpu=*: %

Re: [PATCH] Implement "P0631R4 Math Constants" for C++20

2019-07-31 Thread Marc Glisse

On Wed, 31 Jul 2019, Jonathan Wakely wrote:


The values of the constants are taken from Glibc where the equivalent
constant exists, or by rounding the actual constant to the same number
of digits as the Glibc constants have.


How does it behave with __float128? I think that with -std=gnu++2a, it 
counts as a floating point type, but the constant first becomes a long 
double and thus loses precision.


--
Marc Glisse


Re: [PATCH][AArch64] Fix PR81800

2019-07-31 Thread Wilco Dijkstra
ping

 
PR81800 is about the lrint inline giving spurious FE_INEXACT exceptions.
 The previous change for PR81800 didn't fix this: when lrint is disabled
 in the backend, the midend will simply use llrint.  This actually makes
 things worse since llrint now also ignores FE_INVALID exceptions!
 The fix is to disable lrint/llrint on double if the size of a long is
 smaller (ie. ilp32).
 
 Passes regress and bootstrap on AArch64. OK for commit?
 
 ChangeLog
 2018-11-13  Wilco Dijkstra    
 
     gcc/
     PR target/81800
     * gcc/config/aarch64/aarch64.md (lrint): Disable lrint pattern if GPF
     operand is larger than a long int.
 
     testsuite/
     PR target/81800
     * gcc.target/aarch64/no-inline-lrint_3.c: New test.
 
 --
 
 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index 
5a1894063a1ed2db1cc947c9c449d48808ed96ae..f08cd0930b3fc6527fbca218ad3c464f1ead0103
 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -6304,7 +6304,7 @@ (define_expand "lrint2"
    [(match_operand:GPI 0 "register_operand")
     (match_operand:GPF 1 "register_operand")]
    "TARGET_FLOAT
 -   && ((GET_MODE_SIZE (mode) <= GET_MODE_SIZE (mode))
 +   && ((GET_MODE_BITSIZE (mode) <= LONG_TYPE_SIZE)
     || !flag_trapping_math || flag_fp_int_builtin_inexact)"
  {
    rtx cvt = gen_reg_rtx (mode);
 diff --git a/gcc/testsuite/gcc.target/aarch64/no-inline-lrint_3.c 
b/gcc/testsuite/gcc.target/aarch64/no-inline-lrint_3.c
 new file mode 100644
 index 
..ca772cb999e7b6cfbd3f080111d3eb479d43f47b
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/aarch64/no-inline-lrint_3.c
 @@ -0,0 +1,17 @@
 +/* { dg-do compile } */
 +/* { dg-require-effective-target ilp32 } */
 +/* { dg-options "-O3 -fno-math-errno -fno-fp-int-builtin-inexact" } */
 +
 +#define TEST(name, float_type, int_type, fn) void f_##name (float_type x) \
 +{    \
 +  volatile int_type   b = __builtin_##fn (x);    \
 +}
 +
 +TEST (dld, double, long, lrint)
 +TEST (flf, float , long, lrintf)
 +
 +TEST (did, double, int, lrint)
 +TEST (fif, float , int, lrintf)
 +
 +/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\]+, \[d,s\]\[0-9\]+" 2 } 
} */
 +/* { dg-final { scan-assembler-times "bl\tlrint" 2 } } */
 
 

Re: [PATCH][AArch64] Fix symbol offset limit

2019-07-31 Thread Wilco Dijkstra
ping
   
 
In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations. 
 This means the offset can use all of the +/-4GB offset, leaving no offset 
available
 for the symbol itself.  This results in relocation overflow and link-time 
errors
 for simple expressions like _char + 0xff00.
 
 To avoid this, limit the offset to +/-1MB so that the symbol needs to be 
within a
 3.9GB offset from its references.  For the tiny code model use a 64KB offset, 
allowing
 most of the 1MB range for code/data between the symbol and its references.
 
 Bootstrapped on AArch64, passes regress, OK for commit?
 
 ChangeLog:
 2018-11-09  Wilco Dijkstra  
 
     gcc/
     * config/aarch64/aarch64.c (aarch64_classify_symbol):
     Apply reasonable limit to symbol offsets.
 
     testsuite/
     * gcc.target/aarch64/symbol-range.c (foo): Set new limit.
     * gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.
 
 --
 
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 
83453d03095018eddd1801e71ef3836849267444..0023cb37bbae5afe9387840c1bb6b43586d4fac2
 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -13047,26 +13047,26 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
   the offset does not cause overflow of the final address.  But
   we have no way of knowing the address of symbol at compile time
   so we can't accurately say if the distance between the PC and
 -    symbol + offset is outside the addressible range of +/-1M in the
 -    TINY code model.  So we rely on images not being greater than
 -    1M and cap the offset at 1M and anything beyond 1M will have to
 -    be loaded using an alternative mechanism.  Furthermore if the
 -    symbol is a weak reference to something that isn't known to
 -    resolve to a symbol in this module, then force to memory.  */
 +    symbol + offset is outside the addressible range of +/-1MB in the
 +    TINY code model.  So we limit the maximum offset to +/-64KB and
 +    assume the offset to the symbol is not larger than +/-(1MB - 
64KB).
 +    Furthermore force to memory if the symbol is a weak reference to
 +    something that doesn't resolve to a symbol in this module.  */
    if ((SYMBOL_REF_WEAK (x)
     && !aarch64_symbol_binds_local_p (x))
 - || !IN_RANGE (offset, -1048575, 1048575))
 + || !IN_RANGE (offset, -0x1, 0x1))
  return SYMBOL_FORCE_TO_MEM;
 +
    return SYMBOL_TINY_ABSOLUTE;
  
  case AARCH64_CMODEL_SMALL:
    /* Same reasoning as the tiny code model, but the offset cap here is
 -    4G.  */
 +    1MB, allowing +/-3.9GB for the offset to the symbol.  */
    if ((SYMBOL_REF_WEAK (x)
     && !aarch64_symbol_binds_local_p (x))
 - || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
 -   HOST_WIDE_INT_C (4294967264)))
 + || !IN_RANGE (offset, -0x10, 0x10))
  return SYMBOL_FORCE_TO_MEM;
 +
    return SYMBOL_SMALL_ABSOLUTE;
  
  case AARCH64_CMODEL_TINY_PIC:
 diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
 index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5
 100644
 --- a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
 +++ b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
 @@ -1,12 +1,12 @@
 -/* { dg-do compile } */
 +/* { dg-do link } */
  /* { dg-options "-O3 -save-temps -mcmodel=tiny" } */
  
 -int fixed_regs[0x0020];
 +char fixed_regs[0x0020];
  
  int
 -foo()
 +main ()
  {
 -  return fixed_regs[0x0008];
 +  return fixed_regs[0x000ff000];
  }
  
  /* { dg-final { scan-assembler-not "adr\tx\[0-9\]+, fixed_regs\\\+" } } */
 diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
 index 
6574cf4310430b847e77ea56bf8f20ef312d53e4..75c87c12f08004c153efc5192e5cfab566c089db
 100644
 --- a/gcc/testsuite/gcc.target/aarch64/symbol-range.c
 +++ b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
 @@ -1,12 +1,12 @@
 -/* { dg-do compile } */
 +/* { dg-do link } */
  /* { dg-options "-O3 -save-temps -mcmodel=small" } */
  
 -int fixed_regs[0x2ULL];
 +char fixed_regs[0x2ULL];
  
  int
 -foo()
 +main ()
  {
 -  return fixed_regs[0x1ULL];
 +  return fixed_regs[0xf000ULL];
  }
  
  /* { dg-final { scan-assembler-not "adrp\tx\[0-9\]+, fixed_regs\\\+" } } */
 
 

Re: [PATCH][AArch64] Increase default function alignment

2019-07-31 Thread Wilco Dijkstra
ping
   
 
With -mcpu=generic the function alignment is currently 8, however almost all
 supported cores prefer 16 or higher, so increase the default to 16:12.
 This gives ~0.2% performance increase on SPECINT2017, while codesize is 0.12%
 larger.
 
 ChangeLog:
 2019-05-31  Wilco Dijkstra  
 
     * config/aarch64/aarch64.c (generic_tunings): Set function alignment 
to 16.
 
 --
 
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 
0023cb37bbae5afe9387840c1bb6b43586d4fac2..ed1422af6aab5e3c6eeea37ec57e69b64092a0ab
 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -693,7 +693,7 @@ static const struct tune_params generic_tunings =
    4, /* memmov_cost  */
    2, /* issue_rate  */
    (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
 -  "8", /* function_align.  */
 +  "16:12", /* function_align.  */
    "4", /* jump_align.  */
    "8", /* loop_align.  */
    2,   /* int_reassoc_width.  */
 
 

Re: [PATCH][ARM] Remove remaining Neon DImode support

2019-07-31 Thread Wilco Dijkstra
ping
   
 
Remove the remaining Neon adddi3, subdi3 and negdi2 patterns.  As a result
 adddi3, subdi3 and negdi2 can now always be expanded early irrespectively of
 whether Neon is available.  Also expand the extenddi patterns at the same
 time.  Several Neon arch attributes are no longer used and removed.
 
 Code generation is improved in all cases, saving another 400-500 instructions
 from the PR77308 testcase (total improvement is over 1700 instructions with
 -mcpu=cortex-a57 -O2).
 
 Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
 
 ChangeLog:
 2019-07-19  Wilco Dijkstra  
 
     * config/arm/arm.md (neon_for_64bits): Remove.
     (avoid_neon_for_64bits): Remove.
     (arm_adddi3): Always split early.
     (arm_subdi3): Always split early.
     (negdi2): Remove Neon expansion.
     (split zero_extend): Split before reload.
     (split sign_extend): Split before reload.
 ---
 
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
10ed70dac4384354c0a2453c5e51a29108c6c062..6d8a5a54997caee0e6956f01018cb5300a9a07e1
 100644
 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -125,7 +125,7 @@ (define_attr "length" ""
  ; arm_arch6.  "v6t2" for Thumb-2 with arm_arch6 and "v8mb" for ARMv8-M
  ; Baseline.  This attribute is used to compute attribute "enabled",
  ; use type "any" to enable an alternative in all cases.
 -(define_attr "arch" 
"any,a,t,32,t1,t2,v6,nov6,v6t2,v8mb,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2,armv6_or_vfpv3,neon"
 +(define_attr "arch" 
"any,a,t,32,t1,t2,v6,nov6,v6t2,v8mb,iwmmxt,iwmmxt2,armv6_or_vfpv3,neon"
    (const_string "any"))
  
  (define_attr "arch_enabled" "no,yes"
 @@ -168,16 +168,6 @@ (define_attr "arch_enabled" "no,yes"
    (match_test "TARGET_THUMB1 && arm_arch8"))
   (const_string "yes")
  
 -    (and (eq_attr "arch" "avoid_neon_for_64bits")
 - (match_test "TARGET_NEON")
 - (not (match_test "TARGET_PREFER_NEON_64BITS")))
 -    (const_string "yes")
 -
 -    (and (eq_attr "arch" "neon_for_64bits")
 - (match_test "TARGET_NEON")
 - (match_test "TARGET_PREFER_NEON_64BITS"))
 -    (const_string "yes")
 -
   (and (eq_attr "arch" "iwmmxt2")
    (match_test "TARGET_REALLY_IWMMXT2"))
   (const_string "yes")
 @@ -450,13 +440,8 @@ (define_expand "adddi3"
  (clobber (reg:CC CC_REGNUM))])]
    "TARGET_EITHER"
    "
 -  if (TARGET_THUMB1)
 -    {
 -  if (!REG_P (operands[1]))
 -    operands[1] = force_reg (DImode, operands[1]);
 -  if (!REG_P (operands[2]))
 -    operands[2] = force_reg (DImode, operands[2]);
 - }
 +  if (TARGET_THUMB1 && !REG_P (operands[2]))
 +    operands[2] = force_reg (DImode, operands[2]);
    "
  )
  
 @@ -465,9 +450,9 @@ (define_insn_and_split "*arm_adddi3"
  (plus:DI (match_operand:DI 1 "arm_general_register_operand" "%0, 0, 
r, 0, r")
   (match_operand:DI 2 "arm_general_adddi_operand"    "r,  0, 
r, Dd, Dd")))
     (clobber (reg:CC CC_REGNUM))]
 -  "TARGET_32BIT && !TARGET_NEON"
 +  "TARGET_32BIT"
    "#"
 -  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)"
 +  "TARGET_32BIT"
    [(parallel [(set (reg:CC_C CC_REGNUM)
     (compare:CC_C (plus:SI (match_dup 1) (match_dup 2))
   (match_dup 1)))
 @@ -1290,24 +1275,16 @@ (define_expand "subdi3"
  (clobber (reg:CC CC_REGNUM))])]
    "TARGET_EITHER"
    "
 -  if (TARGET_THUMB1)
 -    {
 -  if (!REG_P (operands[1]))
 -    operands[1] = force_reg (DImode, operands[1]);
 -  if (!REG_P (operands[2]))
 -    operands[2] = force_reg (DImode, operands[2]);
 - } 
 -  "
 -)
 +")
  
  (define_insn_and_split "*arm_subdi3"
    [(set (match_operand:DI   0 "arm_general_register_operand" 
"=,,")
  (minus:DI (match_operand:DI 1 "arm_general_register_operand" "0,r,0")
    (match_operand:DI 2 "arm_general_register_operand" 
"r,0,0")))
     (clobber (reg:CC CC_REGNUM))]
 -  "TARGET_32BIT && !TARGET_NEON"
 +  "TARGET_32BIT"
    "#"  ; "subs\\t%Q0, %Q1, %Q2\;sbc\\t%R0, %R1, %R2"
 -  "&& (!TARGET_IWMMXT || reload_completed)"
 +  "TARGET_32BIT"
    [(parallel [(set (reg:CC CC_REGNUM)
     (compare:CC (match_dup 1) (match_dup 2)))
    (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
 @@ -4164,13 +4141,6 @@ (define_expand "negdi2"
   (neg:DI (match_operand:DI 1 "s_register_operand")))
  (clobber (reg:CC CC_REGNUM))])]
    "TARGET_EITHER"
 -  {
 -    if (TARGET_NEON)
 -  {
 -    emit_insn (gen_negdi2_neon (operands[0], operands[1]));
 -   DONE;
 -  }
 -  }
  )
  
  ;; The constraints here are to prevent a *partial* overlap (where %Q0 == %R1).
 @@ -4182,7 +4152,7 @@ (define_insn_and_split "*negdi2_insn"
    "TARGET_32BIT"
    "#"  ; rsbs %Q0, %Q1, #0; rsc %R0, %R1, #0  (ARM)
  ; negs %Q0, %Q1    

[PATCH v2] RISC-V: Raise error on unexpected ISA string at end.

2019-07-31 Thread Maxim Blinov
Hi Martin, thanks for reviewing.

> I would expect the missing quotes around the option to trigger
> a -Wformat-diag warning.  The %<%s%s> should also be flagged by
> the same warning.  Changing the format string as follows should
> avoid the warnings:
> 
>   error_at (loc, "%<-march=%s%>: unexpected ISA string at end: %qs"

I've made the corresponding changes.
tested with RUNTESTFLAGS="riscv.exp"

Thanks,
Maxim

gcc/ChangeLog:
2019-07-31  Maxim Blinov  

* common/config/riscv/riscv-common.c: Check -march string ends
with null.

gcc/testsuite/ChangeLog:
2019-07-31  Maxim Blinov  

* gcc.target/riscv/attribute-10.c: New test.

---
 gcc/common/config/riscv/riscv-common.c| 7 +++
 gcc/testsuite/gcc.target/riscv/attribute-10.c | 6 ++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-10.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index eeb75717db0..a16d6c5b448 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -513,6 +513,13 @@ riscv_subset_list::parse (const char *arch, location_t loc)
   if (p == NULL)
 goto fail;
 
+  if (*p != '\0')
+{
+  error_at (loc, "%<-march=%s%>: unexpected ISA string at end: %qs",
+   arch, p);
+  goto fail;
+}
+
   return subset_list;
 
 fail:
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-10.c 
b/gcc/testsuite/gcc.target/riscv/attribute-10.c
new file mode 100644
index 000..dd817879a67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-10.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32im_s_sx_unexpectedstring -mabi=ilp32" } */
+int foo()
+{
+}
+/* { dg-error "unexpected ISA string at end:" "" { target { "riscv*-*-*" } } 0 
} */
-- 
2.20.1



[PATCH] Implement "P0631R4 Math Constants" for C++20

2019-07-31 Thread Jonathan Wakely

The values of the constants are taken from Glibc where the equivalent
constant exists, or by rounding the actual constant to the same number
of digits as the Glibc constants have.

P0631R4 Math Constants
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Include new header.
* include/std/numbers: New header.
* include/std/version (__cpp_lib_math_constants): Define.
* testsuite/26_numerics/numbers/1.cc: New test.
* testsuite/26_numerics/numbers/2.cc: New test.
* testsuite/26_numerics/numbers/3.cc: New test.
* testsuite/26_numerics/numbers/nonfloat_neg.cc: New test.

Tested x86_64-linux, committed to trunk.

The C++20 status table in the docs is getting out of date, we need to
add the new proposals (like this one) to it.


commit 2612c312d60f09ab1e4124c30856e2bb2b5bd76a
Author: Jonathan Wakely 
Date:   Wed Jul 31 16:43:45 2019 +0100

Implement "P0631R4 Math Constants" for C++20

The values of the constants are taken from Glibc where the equivalent
constant exists, or by rounding the actual constant to the same number
of digits as the Glibc constants have.

P0631R4 Math Constants
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Include new header.
* include/std/numbers: New header.
* include/std/version (__cpp_lib_math_constants): Define.
* testsuite/26_numerics/numbers/1.cc: New test.
* testsuite/26_numerics/numbers/2.cc: New test.
* testsuite/26_numerics/numbers/3.cc: New test.
* testsuite/26_numerics/numbers/nonfloat_neg.cc: New test.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 742f2c38ad5..3fe80f32cc4 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -57,6 +57,7 @@ std_headers = \
${std_srcdir}/memory \
${std_srcdir}/memory_resource \
${std_srcdir}/mutex \
+   ${std_srcdir}/numbers \
${std_srcdir}/numeric \
${std_srcdir}/optional \
${std_srcdir}/ostream \
diff --git a/libstdc++-v3/include/precompiled/stdc++.h 
b/libstdc++-v3/include/precompiled/stdc++.h
index 477c5c88b36..d62f64b9f6e 100644
--- a/libstdc++-v3/include/precompiled/stdc++.h
+++ b/libstdc++-v3/include/precompiled/stdc++.h
@@ -136,6 +136,9 @@
 #if __cplusplus > 201703L
 #include 
 // #include 
+// #include 
+#include 
+// #include 
 // #include 
 // #include 
 #include 
diff --git a/libstdc++-v3/include/std/numbers b/libstdc++-v3/include/std/numbers
new file mode 100644
index 000..b8e38dd8080
--- /dev/null
+++ b/libstdc++-v3/include/std/numbers
@@ -0,0 +1,142 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file include/numbers
+ *  This is a Standard C++ Library header.
+ */
+
+#ifndef _GLIBCXX_NUMBERS
+#define _GLIBCXX_NUMBERS 1
+
+#pragma GCC system_header
+
+#if __cplusplus > 201703L
+
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+/** @defgroup math_constants Mathematical constants
+ *  @ingroup numerics
+ *  @{
+ */
+
+/// Namespace for mathematical constants
+namespace numbers
+{
+#define __cpp_lib_math_constants 201907L
+
+  /// @cond undoc
+  template
+using _Enable_if_floating = enable_if_t, _Tp>;
+  /// @endcond
+
+  /// e
+  template
+inline constexpr _Tp e_v
+  = _Enable_if_floating<_Tp>(2.718281828459045235360287471352662498L);
+
+  /// log_2 e
+  template
+inline constexpr _Tp log2e_v
+  = _Enable_if_floating<_Tp>(1.442695040888963407359924681001892137L);
+
+  /// log_10 e
+  template
+inline constexpr _Tp log10e_v
+  = _Enable_if_floating<_Tp>(0.434294481903251827651128918916605082L);
+
+  /// pi
+  template
+inline constexpr _Tp pi_v
+  = 

Re: [PATCH][ARM] Cleanup DImode shifts

2019-07-31 Thread Wilco Dijkstra
ping
   
 
Like the logical operations, expand all shifts early rather than only
 sometimes.  The Neon shift expansions are never emitted (not even with
 -fneon-for-64bits), so they are not useful.  So all the late expansions
 and Neon shift patterns can be removed, and shifts are more optimized
 as a result.  Since some extend patterns use Neon DImode shifts, remove
 the Neon extend variants and related splits.
 
 A simple example (relying on [1]) generates the same efficient code after
 this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of
 having Neon enabled resulted inefficient code for no reason).
 
 unsigned long long f(unsigned long long x, unsigned long long y)
 { return x & (y >> 33); }
 
 Before:
     strd    r4, r5, [sp, #-8]!
     lsr r4, r3, #1
     mov r5, #0
     and r1, r1, r5
     and r0, r0, r4
     ldrd    r4, r5, [sp]
     add sp, sp, #8
     bx  lr
 
 After:
     and r0, r0, r3, lsr #1
     mov r1, #0
     bx  lr
 
 Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
 
 [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html
 
 ChangeLog:
 2019-07-19  Wilco Dijkstra  
 
     * config/arm/iterators.md (qhs_extenddi_cstr): Update.
     (qhs_extenddi_cstr): Likewise.
     * config/arm/arm.md (ashldi3): Always expand early.
     (ashlsi3): Likewise.
     (ashrsi3): Likewise.
     (zero_extenddi2): Remove Neon variants.
     (extenddi2): Likewise.
     * config/arm/neon.md (ashldi3_neon_noclobber): Remove.
     (signed_shift_di3_neon): Likewise.
     (unsigned_shift_di3_neon): Likewise.
     (ashrdi3_neon_imm_noclobber): Likewise.
     (lshrdi3_neon_imm_noclobber): Likewise.
     (di3_neon): Likewise.
     (split extend): Remove DI extend split patterns.
 
     testsuite/
     * gcc.target/arm/neon-extend-1.c: Remove test.
     * gcc.target/arm/neon-extend-2.c: Remove test.
 ---
 
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062
 100644
 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift"
  (define_expand "ashldi3"
    [(set (match_operand:DI    0 "s_register_operand")
  (ashift:DI (match_operand:DI 1 "s_register_operand")
 -   (match_operand:SI 2 "general_operand")))]
 +   (match_operand:SI 2 "reg_or_int_operand")))]
    "TARGET_32BIT"
    "
 -  if (TARGET_NEON)
 -    {
 -  /* Delay the decision whether to use NEON or core-regs until
 -    register allocation.  */
 -  emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2]));
 -  DONE;
 -    }
 -  else
 -    {
 -  /* Only the NEON case can handle in-memory shift counts.  */
 -  if (!reg_or_int_operand (operands[2], SImode))
 -    operands[2] = force_reg (SImode, operands[2]);
 -    }
 -
 -  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
 -    ; /* No special preparation statements; expand pattern as above.  */
 -  else
 -    {
 -  rtx scratch1, scratch2;
 -
 -  /* Ideally we should use iwmmxt here if we could know that operands[1]
 - ends up already living in an iwmmxt register. Otherwise it's
 - cheaper to have the alternate code being generated than moving
 - values to iwmmxt regs and back.  */
 -
 -  /* Expand operation using core-registers.
 -    'FAIL' would achieve the same thing, but this is a bit smarter.  */
 -  scratch1 = gen_reg_rtx (SImode);
 -  scratch2 = gen_reg_rtx (SImode);
 -  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
 -    operands[2], scratch1, scratch2);
 -  DONE;
 -    }
 -  "
 -)
 +  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
 +    operands[2], gen_reg_rtx (SImode),
 +    gen_reg_rtx (SImode));
 +  DONE;
 +")
  
  (define_expand "ashlsi3"
    [(set (match_operand:SI    0 "s_register_operand")
 @@ -3661,35 +3631,11 @@ (define_expand "ashrdi3"
   (match_operand:SI 2 "reg_or_int_operand")))]
    "TARGET_32BIT"
    "
 -  if (TARGET_NEON)
 -    {
 -  /* Delay the decision whether to use NEON or core-regs until
 -    register allocation.  */
 -  emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2]));
 -  DONE;
 -    }
 -
 -  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
 -    ; /* No special preparation statements; expand pattern as above.  */
 -  else
 -    {
 -  rtx scratch1, scratch2;
 -
 -  /* Ideally we should use iwmmxt here if we could know that operands[1]
 - ends up already living in an iwmmxt register. Otherwise it's
 - cheaper to have the alternate code being generated than moving
 - values 

Re: [PATCH][ARM] Cleanup logical DImode operations

2019-07-31 Thread Wilco Dijkstra
ping
   
 

 Cleanup the logical DImode operations since the current implementation is way
 too complicated.  Thumb-1, Thumb-2, VFP/Neon and iwMMXt all work differently,
 resulting in a bewildering number of expansions, patterns and splits across
 several md files.  All this complexity is counterproductive and results in
 inefficient code.
 
 A much simpler approach is to split these operations early in the expander
 so that optimizations and register allocation are applied on the 32-bit halves.
 Code generation is unchanged on Thumb-1 and Arm/Thumb-2 without Neon or iwMMXt
 (which already expand these instructions early).  With Neon these changes save
 ~1000 instructions from the PR77308 testcase, mostly by significantly reducing
 register pressure and spilling.
 
 Bootstrap & regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
 
 OK for commit?
 
 ChangeLog:
 2019-07-18  Wilco Dijkstra  
 
     * config/arm/arm.md (split and/eor/ior): Remove Neon check.
     (split not): Add DImode not splitter.
     (anddi3): Remove pattern.
     (anddi3_insn): Likewise.
     (anddi_zesidi_di): Likewise.
     (anddi_sesdi_di): Likewise.
     (anddi_notdi_di): Likewise.
     (anddi_notzesidi_di): Likewise.
     (anddi_notsesidi_di): Likewise.
     (iordi3): Likewise.
     (iordi3_insn): Likewise.
     (iordi_zesidi_di): Likewise.
     (iordi_sesidi_di): Likewise.
     (xordi3): Likewise.
     (xordi3_insn): Likewise.
     (xordi_sesidi_di): Likewise.
     (xordi_zesidi_di): Likewise.
     (one_cmpldi2): Likewise.
     (one_cmpldi2_insn): Likewise.
     * config/arm/constraints.md: Remove De, Df, Dg constraints.
     * config/arm/iwmmxt.md (iwmmxt_iordi3): Remove general register
     alternative.
     (iwmmxt_xordi3): Likewise.
     (iwmmxt_anddi3): Likewise.
     * config/arm/neon.md (orndi3_neon): Remove pattern.
     (anddi_notdi_di): Likewise.
     * config/arm/predicates.md (arm_anddi_operand_neon): Remove.
     (arm_iordi_operand_neon): Likewise.
     (arm_xordi_operand_neon): Likewise.
     * config/arm/thumb2.md(iordi_notdi_di): Remove pattern.
     (iordi_notzesidi_di): Likewise.
     (iordi_notdi_zesidi): Likewise.
     (iordi_notsesidi_di): Likewise.
 
 
 ---
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
8f4a4c26ea849a023f2e63d2efbf327423512dfc..cab59c403b777c37c1e412ab9a69db2c2ec533a2
 100644
 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -2183,19 +2183,16 @@ (define_expand "divdf3"
    "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE"
    "")
   
 -;; Boolean and,ior,xor insns
  
 -;; Split up double word logical operations
 -
 -;; Split up simple DImode logical operations.  Simply perform the logical
 +;; Split DImode and, ior, xor operations.  Simply perform the logical
  ;; operation on the upper and lower halves of the registers.
 +;; This is needed for atomic operations in arm_split_atomic_op.
  (define_split
    [(set (match_operand:DI 0 "s_register_operand" "")
  (match_operator:DI 6 "logical_binary_operator"
    [(match_operand:DI 1 "s_register_operand" "")
     (match_operand:DI 2 "s_register_operand" "")]))]
    "TARGET_32BIT && reload_completed
 -   && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))
     && ! IS_IWMMXT_REGNUM (REGNO (operands[0]))"
    [(set (match_dup 0) (match_op_dup:SI 6 [(match_dup 1) (match_dup 2)]))
     (set (match_dup 3) (match_op_dup:SI 6 [(match_dup 4) (match_dup 5)]))]
 @@ -2210,167 +2207,20 @@ (define_split
    }"
  )
  
 +;; Split DImode not (needed for atomic operations in arm_split_atomic_op).
  (define_split
 -  [(set (match_operand:DI 0 "s_register_operand" "")
 -   (match_operator:DI 6 "logical_binary_operator"
 - [(sign_extend:DI (match_operand:SI 2 "s_register_operand" ""))
 -  (match_operand:DI 1 "s_register_operand" "")]))]
 -  "TARGET_32BIT && reload_completed"
 -  [(set (match_dup 0) (match_op_dup:SI 6 [(match_dup 1) (match_dup 2)]))
 -   (set (match_dup 3) (match_op_dup:SI 6
 -   [(ashiftrt:SI (match_dup 2) (const_int 31))
 -    (match_dup 4)]))]
 -  "
 -  {
 -    operands[3] = gen_highpart (SImode, operands[0]);
 -    operands[0] = gen_lowpart (SImode, operands[0]);
 -    operands[4] = gen_highpart (SImode, operands[1]);
 -    operands[1] = gen_lowpart (SImode, operands[1]);
 -    operands[5] = gen_highpart (SImode, operands[2]);
 -    operands[2] = gen_lowpart (SImode, operands[2]);
 -  }"
 -)
 -
 -;; The zero extend of operand 2 means we can just copy the high part of
 -;; operand1 into operand0.
 -(define_split
 -  [(set (match_operand:DI 0 "s_register_operand" "")
 -   (ior:DI
 - (zero_extend:DI (match_operand:SI 2 "s_register_operand" ""))
 - (match_operand:DI 1 "s_register_operand" "")))]
 -  "TARGET_32BIT && operands[0] != operands[1] && reload_completed"
 

Re: [PATCH] RISC-V: Raise error on unexpected ISA string at end.

2019-07-31 Thread Martin Sebor

On 7/31/19 5:19 AM, Maxim Blinov wrote:

This patch adds the same check that is present in binutils/bfd/elfxx-riscv.c.

Checks that we have reached the end of the string after all the
parsing routines have been run. Without this check, GCC silently
succeeds on erroneous input, and produces an assembly with a truncated
arch attribute.

tested with RUNTESTFLAGS="riscv.exp"

Thanks,
Maxim

gcc/ChangeLog:
2019-07-31  Maxim Blinov  

* common/config/riscv/riscv-common.c: Check -march string ends
with null.

gcc/testsuite/ChangeLog:
2019-07-31  Maxim Blinov  

* gcc.target/riscv/attribute-10.c: New test.

---
  gcc/common/config/riscv/riscv-common.c| 7 +++
  gcc/testsuite/gcc.target/riscv/attribute-10.c | 6 ++
  2 files changed, 13 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-10.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index eeb75717db0..64a309241da 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -513,6 +513,13 @@ riscv_subset_list::parse (const char *arch, location_t loc)
if (p == NULL)
  goto fail;
  
+  if (*p != '\0')

+{
+  error_at (loc, "-march=%s: unexpected ISA string at end: %<%s%>",


I would expect the missing quotes around the option to trigger
a -Wformat-diag warning.  The %<%s%s> should also be flagged by
the same warning.  Changing the format string as follows should
avoid the warnings:

  error_at (loc, "%<-march=%s%>: unexpected ISA string at end: %qs"

Thanks
Martin


+   arch, p);
+  goto fail;
+}
+
return subset_list;
  
  fail:

diff --git a/gcc/testsuite/gcc.target/riscv/attribute-10.c 
b/gcc/testsuite/gcc.target/riscv/attribute-10.c
new file mode 100644
index 000..dd817879a67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-10.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32im_s_sx_unexpectedstring -mabi=ilp32" } */
+int foo()
+{
+}
+/* { dg-error "unexpected ISA string at end:" "" { target { "riscv*-*-*" } } 0 
} */





Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Martin Liška
On 7/31/19 12:01 PM, Martin Liška wrote:
> Anyway, the auto-detection of jobserver seems very ugly to me :/
> I tend to remove the support and start working on a proper implementation
> that can be used not only by make build system.

Can you Jakub test the attached patch please?
Do it reach a ulimit limit on your testing environment?

Thanks,
Martin
>From 2ccefa19d40d136ea04909a2c2b9215ab5362897 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 31 Jul 2019 16:57:45 +0200
Subject: [PATCH] -flto: do not use jobserver automatically

gcc/ChangeLog:

2019-07-31  Martin Liska  

	* lto-wrapper.c (jobserver_active_p): Remove.
	(run_gcc): Only set auto_parallel and do not
	detect jobserverver.
---
 gcc/lto-wrapper.c | 29 +
 1 file changed, 1 insertion(+), 28 deletions(-)

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 353187c6043..9a40a366f2b 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1217,30 +1217,6 @@ init_num_threads (void)
 
 /* FIXME: once using -std=c11, we can use std::thread::hardware_concurrency.  */
 
-/* Return true when a jobserver is running and can accept a job.  */
-
-static bool
-jobserver_active_p (void)
-{
-  const char *makeflags = getenv ("MAKEFLAGS");
-  if (makeflags == NULL)
-return false;
-
-  const char *needle = "--jobserver-auth=";
-  const char *n = strstr (makeflags, needle);
-  if (n == NULL)
-return false;
-
-  int rfd = -1;
-  int wfd = -1;
-
-  return ((sscanf(n, "--jobserver-auth=%d,%d", , ) == 2)
-	  && rfd > 0
-	  && wfd > 0
-	  && fcntl (rfd, F_GETFD) >= 0
-	  && fcntl (wfd, F_GETFD) >= 0);
-}
-
 /* Execute gcc. ARGC is the number of arguments. ARGV contains the arguments. */
 
 static void
@@ -1425,11 +1401,8 @@ run_gcc (unsigned argc, char *argv[])
 }
   else if (!jobserver && parallel)
 {
-  /* If there's no explicit usage of jobserver and
-	 parallel is enabled, then automatically detect
-	 jobserver or number of cores.  */
+  /* Detect number of jobs automatically.  */
   auto_parallel = 1;
-  jobserver = jobserver_active_p ();
 }
 
   if (linker_output)
-- 
2.22.0



Re: [PATCH 1/3] C++20 constexpr lib part 1/3

2019-07-31 Thread Ed Smith-Rowland via gcc-patches

Here is the patch for

* Implement C++20 p0202 - Add constexpr Modifiers to Functions in 
 and  Headers.


* Implement C++20 p1023 - constexpr comparison operators for std::array.

Relative to the last effort it is rebased on more recent trunk and I 
added to .


There's some chance that I'll have to tweak the macros after the draft 
comes in but I'd like to get this moving.?? I've got other chunks of 
constexpr lib coming.?? This passes C++20 testing onx86_64-linux.


Ok?


2019-07-31  Edward Smith-Rowland  <3dw...@verizon.net>

Implement C++20 p0202 - Add Constexpr Modifiers to Functions
in  and  Headers.
Implement C++20 p1023 - constexpr comparison operators for std::array.
* include/bits/algorithmfwd.h (all_of, any_of, binary_search, copy,
copy_backward, copy_if, copy_n, equal_range, fill, find_end,
find_if_not, includes, is_heap, is_heap_until, is_partitioned,
is_permutation, is_sorted, is_sorted_until, iter_swap, lower_bound,
none_of, partition_copy, partition_point, remove, remove_if,
remove_copy, remove_copy_if, replace_copy, replace_copy_if,
reverse_copy, rotate_copy, uunique, upper_bound, adjacent_find, count,
count_if, equal, find, find_first_of, find_if, for_each, generate,
generate_n, lexicographical_compare, merge, mismatch, replace,
replace_if, search, search_n, set_difference, set_intersection,
set_symmetric_difference, set_union, transform, unique_copy):
Mark constexpr.
* include/bits/cpp_type_traits.h (__miter_base): Mark constexpr.
* include/bits/predefined_ops.h (_Iter_less_val::operator(),
_Val_less_iter::operator(), _Iter_equal_to_iter::operator(),
_Iter_equal_to_val::operator(), _Iter_equals_val::operator()):
 Use const ref instead of ref arg;
(_Iter_less_val, __iter_less_val, _Val_less_iter, __val_less_iter,
__iter_equal_to_iter, __iter_equal_to_val, __iter_comp_val,
_Iter_comp_val, _Val_comp_iter, __val_comp_iter, __iter_equals_val,
_Iter_equals_iter, __iter_comp_iter, _Iter_pred, __pred_iter,
_Iter_comp_to_val, __iter_comp_val, _Iter_comp_to_iter,
__iter_comp_iter): Mark constexpr.
* include/bits/stl_algo.h (__find_if, __find_if_not, __find_if_not_n,
__search, __search_n_aux, __search_n, __find_end, find_end, all_of,
none_of, any_of, find_if_not, is_partitioned, partition_point,
__remove_copy_if, remove_copy, remove_copy_if, copy_if, __copy_n,
copy_n, partition_copy, __remove_if, remove, remove_if, __adjacent_find,
__unique, unique, __unique_copy, reverse_copy, rotate_copy,
__unguarded_linear_insert, __insertion_sort, __unguarded_insertion_sort,
__final_insertion_sort, lower_bound, __upper_bound, upper_bound,
__equal_range, equal_range, binary_search, __includes, includes,
__next_permutation, __prev_permutation, __replace_copy_if, replace_copy,
replace_copy_if, __count_if, is_sorted, __is_sorted_until,
is_sorted_until, __is_permutation, is_permutation, for_each, find,
find_if, find_first_of, adjacent_find, count, count_if, search,
search_n, transform, replace, replace_if, generate, generate_n,
unique_copy, __merge, merge, __set_union, set_union, __set_intersection,
set_intersection, __set_difference, set_difference,
__set_symmetric_difference, set_symmetric_difference):  Mark constexpr.
* include/bits/stl_algobase.h (__memmove, __memcmp): New maybe constexpr
wrappers around __builtin_memmove and __builtin_memcmp
respectively;
(__niter_base, __niter_wrap, __copy_m, __copy_move_a, __copy_move_a2,
copy, move, __copy_move_b, __copy_move_backward_a,
__copy_move_backward_a2, copy_backward, move_backward, __fill_a, fill,
__fill_n_a, fill_n, equal, __lc_rai::__newlast1, __lc_rai::__cnd2,
__lexicographical_compare_impl, __lexicographical_compare,
__lexicographical_compare::__lc, __lexicographical_compare_aux,
__lower_bound, lower_bound, equal, __equal4, lexicographical_compare,
__mismatch, mismatch, __is_heap_until, __is_heap, is_heap_until,
is_heap): Mark constexpr.
* include/bits/stl_heap.h (__is_heap_until, __is_heap, is_heap_until,
is_heap): Mark constexpr.
* include/bits/stl_iterator.h (__niter_base, __miter_base): Mark 
constexpr.
* include/std/array: Make comparison ops constexpr.
* include/std/utility: Make exchange constexpr.
* include/std/version (__cpp_lib_constexpr): New macro.
* testsuite/23_containers/array/tuple_interface/get_neg.cc: Adjust.
* testsuite/23_containers/array/tuple_interface/
tuple_element_neg.cc: Adjust.
* testsuite/20_util/exchange/constexpr.cc: New.
* testsuite/23_containers/array/comparison_operators/constexpr.cc: New.
* 

Re: [PATCH] Make BITMAP_WORD more easily configurable

2019-07-31 Thread Martin Sebor

On 7/31/19 4:00 AM, Richard Biener wrote:

On Tue, 30 Jul 2019, Martin Sebor wrote:


On 7/30/19 9:19 AM, Jakub Jelinek wrote:

On Tue, Jul 30, 2019 at 09:16:45AM -0600, Martin Sebor wrote:

Say something like this POC:


We have those already, see ctz_hwi, clz_hwi etc.
The thing is that BITMAP_WORD isn't always the same type as unsigned
HOST_WIDE_INT, and so we need ones with the BITMAP_WORD type.


That sounds like a problem overloading would solve to the benefit
of all clients.  I.e., overload count_trailing_zeros (or whatever
other name) to take integer arguments of all widths (signed and
unsigned) and call the appropriate built-in from each.  That way
we just have to remember one function name and know that it will
be the right choice regardless of the type of the argument.


The problem with clz at least is that you have to make sure
to only allow overloads to exactly match the type of the
operand since semantics change once you do any promotion.
For popcount and ctz that's only an issue for signed arguments,
but still.

And I don't see any optimization of (uint){clzl,popcountl}((ulong)uintvar)
-> {clz,popcount} (uintvar)

unsigned int foo (unsigned int x)
{
   unsigned long y = x;
   unsigned long z = __builtin_popcountl (y);
   return z;
}

calls __popcountdi2 for me.

So if you dislike the macros then we could add
{popcount,clz,ctz}_bitmap_word () functions.

Still having the plumbing in once place looks like an improvement
to me - not sure if your comments were an objection to the original
patch?


Not an objection, just as an encouragement to consider going
the wrapper route you mentioned.  I think that would make
for a more broadly applicable improvement with a more user
friendly interface than what's available today.

Martin


Re: [ARM/FDPIC v5 09/21] [ARM] FDPIC: Add support for taking address of nested function

2019-07-31 Thread Christophe Lyon
On Tue, 16 Jul 2019 at 14:42, Kyrill Tkachov
 wrote:
>
>
> On 7/16/19 12:18 PM, Kyrill Tkachov wrote:
> > Hi Christophe
> >
> > On 5/15/19 1:39 PM, Christophe Lyon wrote:
> > > In FDPIC mode, the trampoline generated to support pointers to nested
> > > functions looks like:
> > >
> > >.word trampoline address
> > >.word trampoline GOT address
> > >ldrr12, [pc, #8]
> > >ldrr9, [pc, #8]
> > >ldr   pc, [pc, #8]
> > >.word static chain value
> > >.word GOT address
> > >.word function's address
> > >
> > > because in FDPIC function pointers are actually pointers to function
> > > descriptors, we have to actually generate a function descriptor for
> > > the trampoline.
> > >
> > > 2019-XX-XX  Christophe Lyon 
> > > Mickaël Guêné 
> > >
> > > gcc/
> > > * config/arm/arm.c (arm_asm_trampoline_template): Add FDPIC
> > > support.
> > > (arm_trampoline_init): Likewise.
> > > (arm_trampoline_init): Likewise.
> > > * config/arm/arm.h (TRAMPOLINE_SIZE): Likewise.
> > >
> > > Change-Id: Idc4d5f629ae4f8d79bdf9623517481d524a0c144
> > >
> > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > > index 40e3f3b..99d13bf 100644
> > > --- a/gcc/config/arm/arm.c
> > > +++ b/gcc/config/arm/arm.c
> > > @@ -3976,13 +3976,50 @@ arm_warn_func_return (tree decl)
> > > .word static chain value
> > > .word function's address
> > > XXX FIXME: When the trampoline returns, r8 will be clobbered.  */
> > > +/* In FDPIC mode, the trampoline looks like:
> > > +  .word trampoline address
> > > +  .word trampoline GOT address
> > > +  ldrr12, [pc, #8] ; #4 for Thumb2
> > > +  ldrr9,  [pc, #8] ; #4 for Thumb2
> > > +  ldr   pc,  [pc, #8] ; #4 for Thumb2
> > > +  .word static chain value
> > > +  .word GOT address
> > > +  .word function's address
> > > +*/
> >
> >
> > I think this comment is not right for Thumb2.
> >
> > These load instructionshave 32-bit encodings, even in Thumb2 (they use
> > high registers).
>
> Andre and Wilco pointed out to me offline that the offset should be #4
> for Arm mode.
>
> The Arm ARM at E1.2.3 says:
>
> PC, the program counter
>
> * When executing an A32 instruction, PC reads as the address of the
> current instruction plus 8.
>
> * When executing a T32 instruction, PC reads as the address of the
> current instruction plus 4.
>

Yes, it looks like the code is right, and the comment is wrong:
- offset 8 for thumb2 mode
- offset 4 for arm mode

Thanks,

Christophe

> Thanks,
>
> Kyrill
>
>
> >
> > Also, please merge this comment with the one above (no separate /**/)
> >
> > >
> > >  static void
> > >  arm_asm_trampoline_template (FILE *f)
> > >  {
> > >fprintf (f, "\t.syntax unified\n");
> > >
> > > -  if (TARGET_ARM)
> > > +  if (TARGET_FDPIC)
> > > +{
> > > +  /* The first two words are a function descriptor pointing to the
> > > +trampoline code just below.  */
> > > +  if (TARGET_ARM)
> > > +   fprintf (f, "\t.arm\n");
> > > +  else if (TARGET_THUMB2)
> > > +   fprintf (f, "\t.thumb\n");
> > > +  else
> > > +   /* Only ARM and Thumb-2 are supported.  */
> > > +   gcc_unreachable ();
> > > +
> > > +  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
> > > +  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
> > > +  /* Trampoline code which sets the static chain register but also
> > > +PIC register before jumping into real code. */
> > > +  asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n",
> > > +  STATIC_CHAIN_REGNUM, PC_REGNUM,
> > > +  TARGET_THUMB2 ? 8 : 4);
> > > +  asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n",
> > > +  PIC_OFFSET_TABLE_REGNUM, PC_REGNUM,
> > > +  TARGET_THUMB2 ? 8 : 4);
> > > +  asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n",
> > > +  PC_REGNUM, PC_REGNUM,
> > > +  TARGET_THUMB2 ? 8 : 4);
> >
> >
> > As above, I think the offset should be 8 for both Arm and Thumb2.
> >
> > Thanks,
> >
> > Kyrill
> >
> >
> > > +  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
> > > +}
> > > +  else if (TARGET_ARM)
> > >  {
> > >fprintf (f, "\t.arm\n");
> > >asm_fprintf (f, "\tldr\t%r, [%r, #0]\n", STATIC_CHAIN_REGNUM,
> > > PC_REGNUM);
> > > @@ -4023,12 +4060,40 @@ arm_trampoline_init (rtx m_tramp, tree fndecl,
> > > rtx chain_value)
> > >emit_block_move (m_tramp, assemble_trampoline_template (),
> > > GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);
> > >
> > > -  mem = adjust_address (m_tramp, SImode, TARGET_32BIT ? 8 : 12);
> > > -  emit_move_insn (mem, chain_value);
> > > +  if (TARGET_FDPIC)
> > > +{
> > > +  rtx funcdesc = XEXP (DECL_RTL (fndecl), 0);
> > > +

[PATCH] Add Doxygen comments to header

2019-07-31 Thread Jonathan Wakely

* include/std/bit: Add Doxygen comments.

Committed to trunk.

At some point I need to resume working on Doxygen improvements and
switch the Doxygen generation to C++17 mode, so we document all the
new C++17 stuff.


commit c76147753665e66ac8258f9b471c5cb65e28067e
Author: redi 
Date:   Wed Jul 31 14:38:50 2019 +

Add Doxygen comments to  header

* include/std/bit: Add Doxygen comments.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@273938 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
index f01fcd621e5..914cdfe629a 100644
--- a/libstdc++-v3/include/std/bit
+++ b/libstdc++-v3/include/std/bit
@@ -40,6 +40,17 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+  /**
+   * @defgroup bit_manip Bit manipulation
+   * @ingroup numerics
+   *
+   * Utilities for examining and manipulating individual bits.
+   *
+   * @{
+   */
+
+  /// @cond undoc
+
   template
 constexpr _Tp
 __rotl(_Tp __x, int __s) noexcept
@@ -248,19 +259,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return _Nd - std::__countl_zero(__x);
 }
 
+  /// @endcond
+
 #if __cplusplus > 201703L
 
+  /// @cond undoc
   template
 using _If_is_unsigned_integer
   = enable_if_t<__is_unsigned_integer<_Tp>::value, _Up>;
+  /// @endcond
 
   // [bit.rot], rotating
 
+  /// Rotate `x` to the left by `s` bits.
   template
 [[nodiscard]] constexpr _If_is_unsigned_integer<_Tp>
 rotl(_Tp __x, int __s) noexcept
 { return std::__rotl(__x, __s); }
 
+  /// Rotate `x` to the right by `s` bits.
   template
 [[nodiscard]] constexpr _If_is_unsigned_integer<_Tp>
 rotr(_Tp __x, int __s) noexcept
@@ -268,26 +285,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // [bit.count], counting
 
+  /// The number of contiguous zero bits, starting from the highest bit.
   template
 constexpr _If_is_unsigned_integer<_Tp, int>
 countl_zero(_Tp __x) noexcept
 { return std::__countl_zero(__x); }
 
+  /// The number of contiguous one bits, starting from the highest bit.
   template
 constexpr _If_is_unsigned_integer<_Tp, int>
 countl_one(_Tp __x) noexcept
 { return std::__countl_one(__x); }
 
+  /// The number of contiguous zero bits, starting from the lowest bit.
   template
 constexpr _If_is_unsigned_integer<_Tp, int>
 countr_zero(_Tp __x) noexcept
 { return std::__countr_zero(__x); }
 
+  /// The number of contiguous one bits, starting from the lowest bit.
   template
 constexpr _If_is_unsigned_integer<_Tp, int>
 countr_one(_Tp __x) noexcept
 { return std::__countr_one(__x); }
 
+  /// The number of bits set in `x`.
   template
 constexpr _If_is_unsigned_integer<_Tp, int>
 popcount(_Tp __x) noexcept
@@ -295,21 +317,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // [bit.pow.two], integral powers of 2
 
+  /// True if `x` is a power of two, false otherwise.
   template
 constexpr _If_is_unsigned_integer<_Tp, bool>
 ispow2(_Tp __x) noexcept
 { return std::__ispow2(__x); }
 
+  /// The smallest power-of-two not less than `x`.
   template
 constexpr _If_is_unsigned_integer<_Tp>
 ceil2(_Tp __x) noexcept
 { return std::__ceil2(__x); }
 
+  /// The largest power-of-two not greater than `x`.
   template
 constexpr _If_is_unsigned_integer<_Tp>
 floor2(_Tp __x) noexcept
 { return std::__floor2(__x); }
 
+  /// The smallest integer greater than the base-2 logarithm of `x`.
   template
 constexpr _If_is_unsigned_integer<_Tp>
 log2p1(_Tp __x) noexcept
@@ -326,6 +352,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   };
 #endif // C++2a
 
+  /// @}
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 


Re: [C PATCH] Fix C error-recovery (PR c/91192)

2019-07-31 Thread Marek Polacek
On Tue, Jul 30, 2019 at 09:22:00AM +0200, Jakub Jelinek wrote:
> Hi!
> 
> Neither c_expr_sizeof_expr nor c_expr_sizeof_type bother with filling up
> the locations in c_expr struct they return.  Normally, this isn't a problem,
> as the sole caller of those calls set_c_expr_source_range.  It doesn't call
> it though if we reach CPP_EOF while parsing the sizeof expression.
> Later on when the callers access the location info, it can randomly segfault
> during error-recovery.  The testcase is too obscure with too many errors to
> include IMHO though, and as it only ICEs randomly, I'm not including it.

Makes sense.

> The fix is simple, just initialize the locations to something, doesn't
> matter much exactly to what, this patch uses a range from start to start.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok, thanks.

> 2019-07-30  Jakub Jelinek  
> 
>   PR c/91192
>   * c-parser.c (c_parser_sizeof_expression): Call set_c_expr_source_range
>   even if finish is UNKNOWN_LOCATION, just use start as finish in that
>   case.
> 
> --- gcc/c/c-parser.c.jj   2019-07-19 20:53:42.121228422 +0200
> +++ gcc/c/c-parser.c  2019-07-29 16:54:43.046562282 +0200
> @@ -7477,8 +7477,9 @@ c_parser_sizeof_expression (c_parser *pa
>   error_at (expr_loc, "% applied to a bit-field");
>result = c_expr_sizeof_expr (expr_loc, expr);
>  }
> -  if (finish != UNKNOWN_LOCATION)
> -set_c_expr_source_range (, start, finish);
> +  if (finish == UNKNOWN_LOCATION)
> +finish = start;
> +  set_c_expr_source_range (, start, finish);
>return result;
>  }

Marek


Re: [patch] Add NetBSD/hppa target

2019-07-31 Thread John David Anglin
Committed as revision 273933.

Dave

On 2019-06-14 11:44 a.m., co...@sdf.org wrote:
> This adds netbsd/hppa support. I tested it on the shiny new QEMU-git
> which can now boot NetBSD too :-)
>
> Files are very similar to the linux code.
>
> Please let me know if any changes need to be made.
>
> Matt Thomas 
> Nick Hudson 
> Matthew Green 
> Maya Rashish 
>
> gcc/ChangeLog:
>   config.gcc (hppa*-*-netbsd*): New target.
>   config/pa/pa-netbsd.h: New file.
>   config/pa/pa32-netbsd.h: New file.
>
> libgcc/ChangeLog:
>   config.host (hppa*-*-netbsd*): New case.
>   config/pa/t-netbsd: New file.
>
> ---
>  gcc/config.gcc  |   8 +++
>  gcc/config/pa/pa-netbsd.h   | 137 
>  gcc/config/pa/pa32-netbsd.h |  37 ++
>  libgcc/config.host  |   3 +
>  libgcc/config/pa/t-netbsd   |   9 +++
>  5 files changed, 194 insertions(+)
>  create mode 100644 gcc/config/pa/pa-netbsd.h
>  create mode 100644 gcc/config/pa/pa32-netbsd.h
>  create mode 100644 libgcc/config/pa/t-netbsd
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 76bb316942d..ba93bb41ec8 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1481,6 +1481,14 @@ hppa*-*-openbsd*)
>   gas=yes
>   gnu_ld=yes
>   ;;
> +hppa*-*-netbsd*)
> + target_cpu_default="MASK_PA_11|MASK_NO_SPACE_REGS"
> + tm_file="${tm_file} dbxelf.h elfos.h ${nbsd_tm_file} \
> +  pa/pa-netbsd.h pa/pa32-regs.h pa/pa32-netbsd.h"
> + tmake_file="${tmake_file}"
> + tm_defines="${tm_defines} CHAR_FAST8=1 SHORT_FAST16=1"
> + extra_options="${extra_options} netbsd.opt netbsd-elf.opt"
> + ;;
>  hppa[12]*-*-hpux10*)
>   case ${target} in
>   hppa1.1-*-* | hppa2*-*-*)
> diff --git a/gcc/config/pa/pa-netbsd.h b/gcc/config/pa/pa-netbsd.h
> new file mode 100644
> index 000..88790987561
> --- /dev/null
> +++ b/gcc/config/pa/pa-netbsd.h
> @@ -0,0 +1,137 @@
> +/* Definitions for PA_RISC with ELF format
> +   Copyright (C) 1999-2019 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +
> +#undef TARGET_OS_CPP_BUILTINS
> +#define TARGET_OS_CPP_BUILTINS() \
> +  do \
> +{\
> + NETBSD_OS_CPP_BUILTINS_ELF();   \
> + builtin_assert ("machine=bigendian");   \
> +}\
> +  while (0)
> +
> +#undef CPP_SPEC
> +#define CPP_SPEC NETBSD_CPP_SPEC
> +
> +#undef ASM_SPEC
> +#define ASM_SPEC \
> +  "%{v:-V} %{n} %{T} %{Ym,*} %{Yd,*} %{Wa,*:%*}"
> +
> +#undef EXTRA_SPECS
> +#define EXTRA_SPECS NETBSD_SUBTARGET_EXTRA_SPECS
> +#undef SUBTARGET_EXTRA_SPECS
> +
> +#define NETBSD_ENTRY_POINT "__start"
> +
> +#undef LINK_SPEC
> +#define LINK_SPEC NETBSD_LINK_SPEC_ELF
> +
> +/* NetBSD profiling functions don't need gcc to allocate counters.  */
> +#define NO_DEFERRED_PROFILE_COUNTERS 1
> +
> +/* Define the strings used for the special svr4 .type and .size directives.
> +   These strings generally do not vary from one system running svr4 to
> +   another, but if a given system (e.g. m88k running svr) needs to use
> +   different pseudo-op names for these, they may be overridden in the
> +   file which includes this one.  */
> +
> +#undef STRING_ASM_OP
> +#define STRING_ASM_OP   "\t.stringz\t"
> +
> +#define TEXT_SECTION_ASM_OP "\t.text"
> +#define DATA_SECTION_ASM_OP "\t.data"
> +#define BSS_SECTION_ASM_OP "\t.section\t.bss"
> +
> +#define TARGET_ASM_FILE_START pa_linux_file_start
> +
> +/* We want local labels to start with period if made with asm_fprintf.  */
> +#undef LOCAL_LABEL_PREFIX
> +#define LOCAL_LABEL_PREFIX "."
> +
> +/* Define these to generate the Linux/ELF/SysV style of internal
> +   labels all the time - i.e. to be compatible with
> +   ASM_GENERATE_INTERNAL_LABEL in .  Compare these with the
> +   ones in pa.h and note the lack of dollar signs in these.  FIXME:
> +   shouldn't we fix pa.h to use ASM_GENERATE_INTERNAL_LABEL instead? */
> +
> +#undef ASM_OUTPUT_ADDR_VEC_ELT
> +#define ASM_OUTPUT_ADDR_VEC_ELT(FILE, VALUE) \
> +  fprintf (FILE, "\t.word .L%d\n", VALUE)
> +
> +#undef ASM_OUTPUT_ADDR_DIFF_ELT
> +#define ASM_OUTPUT_ADDR_DIFF_ELT(FILE, BODY, VALUE, REL) \
> +  fprintf (FILE, "\t.word .L%d-.L%d\n", VALUE, REL)
> +
> +/* Use the default.  */
> +#undef 

Re: [PATCH] Do not emit __gnu_lto_v1 symbol.

2019-07-31 Thread Georg-Johann Lay

Martin Liška schrieb:

On 7/29/19 3:46 PM, Georg-Johann Lay wrote:

Hi Martin.

In SVN r273662 you changed a comment in avr.c from __gnu_lto_v1 to 
__gnu_lto_slim without changing the relevant code:


Yes.


  /* __gnu_lto_slim is just a marker for the linker injected by toplev.c.
 There is no need to trigger __do_clear_bss code for them.  */

  if (!STR_PREFIX_P (name, "__gnu_lto"))
avr_need_clear_bss_p = true;


Because now, the only symbol that starts with __gnu_lto is __gnu_lto_slim. 
That's why I adjusted
the comment only to reflect reality.


Oops, sorry for the noise.  Must have had a glitch in my eyes...

Johann



Re: [PATCH] Implement x86 reduc_plus_scal_v8qi (PR tree-optimization/91201)

2019-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2019 at 11:30 AM Jakub Jelinek  wrote:
>
> Hi!
>
> On Wed, Jul 31, 2019 at 10:51:22AM +0200, Uros Bizjak wrote:
> > OK.
>
> Thanks.  This follow-up implements the same for mmx with sse for V8QImode,
> the testcase shows that it is useful too.  The difference is quite large:
>
> -   movq$0, -72(%rsp)
> -   movl$bytes, %eax
> movqbytes(%rip), %xmm0
> +   movl$bytes, %eax
> +   pxor%xmm2, %xmm2
> .p2align 4,,10
> .p2align 3
>  .L2:
> movdqa  %xmm0, %xmm1
> movq8(%rax), %xmm0
> -   movq-72(%rsp), %xmm2
> addq$8, %rax
> paddb   %xmm0, %xmm1
> paddb   %xmm0, %xmm2
> movq%xmm1, -8(%rax)
> -   movq%xmm2, -72(%rsp)
> cmpq$bytes+1016, %rax
> jne .L2
> -   movq-72(%rsp), %rcx
> -   movzbl  -72(%rsp), %eax
> -   movzbl  %ch, %edx
> -   addl%edx, %eax
> -   movq%rcx, %rdx
> -   shrq$16, %rdx
> -   addl%edx, %eax
> -   movq%rcx, %rdx
> -   shrq$24, %rdx
> -   addl%edx, %eax
> -   movq%rcx, %rdx
> -   shrq$32, %rdx
> -   addl%edx, %eax
> -   movq%rcx, %rdx
> -   shrq$40, %rdx
> -   addl%edx, %eax
> -   movq%rcx, %rdx
> -   shrq$48, %rdx
> -   addl%eax, %edx
> -   movq%rcx, %rax
> -   shrq$56, %rax
> -   addl%edx, %eax
> +   pxor%xmm0, %xmm0
> +   movdqa  %xmm2, %xmm3
> +   psadbw  %xmm0, %xmm3
> +   movq%xmm3, %rax

Excellent!

IIRC, there are quite some (integer) named patterns that can be
implemented using TARGET_MMX_WITH_SSE. I'm not at my keyboard right
now, but it looks that horizontal adds can be implemented using the
same approach. I'm glad that TARGET_MMX_WITH_SSE opens such noticeable
optimization opportunities.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-07-31  Jakub Jelinek  
>
> PR tree-optimization/91201
> * config/i386/mmx.md (reduc_plus_scal_v8qi): New expander.
>
> * gcc.target/i386/sse2-pr91201-2.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/mmx.md.jj   2019-07-20 08:35:05.720255567 +0200
> +++ gcc/config/i386/mmx.md  2019-07-31 08:43:23.054776025 +0200
> @@ -1897,6 +1897,21 @@ (define_insn "mmx_psadbw"
> (set_attr "type" "mmxshft,sseiadd,sseiadd")
> (set_attr "mode" "DI,TI,TI")])
>
> +(define_expand "reduc_plus_scal_v8qi"
> + [(plus:V8QI
> +(match_operand:QI 0 "register_operand")
> +(match_operand:V8QI 1 "register_operand"))]
> + "TARGET_MMX_WITH_SSE"
> +{
> +  rtx tmp = gen_reg_rtx (V8QImode);
> +  emit_move_insn (tmp, CONST0_RTX (V8QImode));
> +  rtx tmp2 = gen_reg_rtx (V1DImode);
> +  emit_insn (gen_mmx_psadbw (tmp2, operands[1], tmp));
> +  tmp2 = gen_lowpart (V8QImode, tmp2);
> +  emit_insn (gen_vec_extractv8qiqi (operands[0], tmp2, const0_rtx));
> +  DONE;
> +})
> +
>  (define_insn_and_split "mmx_pmovmskb"
>[(set (match_operand:SI 0 "register_operand" "=r,r")
> (unspec:SI [(match_operand:V8QI 1 "register_operand" "y,x")]
> --- gcc/testsuite/gcc.target/i386/sse2-pr91201-2.c.jj   2019-07-31 
> 08:45:19.553086849 +0200
> +++ gcc/testsuite/gcc.target/i386/sse2-pr91201-2.c  2019-07-31 
> 08:46:52.556738334 +0200
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/91201 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O3 -msse2 -mno-sse3" } */
> +/* { dg-final { scan-assembler "\tpsadbw\t" } } */
> +
> +unsigned char bytes[1024];
> +
> +unsigned char
> +sum (void)
> +{
> +  unsigned char r = 0;
> +  unsigned char *p = (unsigned char *) bytes;
> +  int n;
> +
> +  for (n = 8; n < sizeof (bytes); ++n)
> +{
> +  p[n - 8] += p[n];
> +  r += p[n];
> +}
> +  return r;
> +}
>
>
> Jakub


[PATCH] Fix PR91280

2019-07-31 Thread Richard Biener


There's some longer latent issue with PTAs handling of
offsetted MEM_REFs in get_constraint_for_component_ref when there
are subfields involved.  The following solves this by handling
offsetted MEM_REFs manually.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-07-31  Richard Biener  

PR tree-optimization/91280
* tree-ssa-structalias.c (get_constraint_for_component_ref):
Decompose MEM_REF manually for offset handling.

* g++.dg/torture/pr91280.C: New testcase.

Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c  (revision 273930)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -3289,9 +3289,29 @@ get_constraint_for_component_ref (tree t
   return;
 }
 
-  /* Pretend to take the address of the base, we'll take care of
- adding the required subset of sub-fields below.  */
-  get_constraint_for_1 (t, results, true, lhs_p);
+  /* Avoid creating pointer-offset constraints, so handle MEM_REF
+ offsets directly.  Pretend to take the address of the base,
+ we'll take care of adding the required subset of sub-fields below.  */
+  if (TREE_CODE (t) == MEM_REF
+  && !integer_zerop (TREE_OPERAND (t, 0)))
+{
+  poly_offset_int off = mem_ref_offset (t);
+  off <<= LOG2_BITS_PER_UNIT;
+  off += bitpos;
+  poly_int64 off_hwi;
+  if (off.to_shwi (_hwi))
+   bitpos = off_hwi;
+  else
+   {
+ bitpos = 0;
+ bitmaxsize = -1;
+   }
+  get_constraint_for_1 (TREE_OPERAND (t, 0), results, false, lhs_p);
+  do_deref (results);
+}
+  else
+get_constraint_for_1 (t, results, true, lhs_p);
+
   /* Strip off nothing_id.  */
   if (results->length () == 2)
 {
Index: gcc/testsuite/g++.dg/torture/pr91280.C
===
--- gcc/testsuite/g++.dg/torture/pr91280.C  (nonexistent)
+++ gcc/testsuite/g++.dg/torture/pr91280.C  (working copy)
@@ -0,0 +1,223 @@
+// { dg-do compile }
+
+enum { Aligned, RowMajor };
+enum { ReadOnlyAccessors };
+template  struct K {
+  enum { value };
+};
+template  struct traits;
+template  struct traits : traits {};
+struct A {
+  enum { has_write_access, value };
+};
+template  class array {
+public:
+  int operator[](unsigned long p1) { return values[p1]; }
+  int values[n];
+};
+template  struct I;
+template  class = I> class M;
+template  class J;
+template  class N;
+template  class D;
+template  class TensorContractionOp;
+template  class TensorChippingOp;
+class C;
+template 
+struct K> {
+  static const long value = NumDims;
+};
+template 
+struct traits> {
+  typedef IndexType_ Index;
+};
+template  class MakePointer_>
+struct traits>
+: traits {};
+template  struct B { typedef T type; };
+template  class N {
+public:
+  typedef typename traits::Index Index;
+  D m_fn1();
+  template 
+  TensorContractionOp
+  m_fn2(OtherDerived, Dimensions);
+  template  TensorChippingOp<1, Derived> m_fn3(Index);
+};
+template 
+class N : public N {
+public:
+  template  C m_fn4(DeviceType);
+};
+template  struct TensorEvaluator;
+template 
+struct TensorEvaluator, Device> {
+  TensorEvaluator(D, Device);
+};
+template  class D {
+public:
+  typedef typename B::type Nested;
+};
+template 
+struct traits<
+TensorEvaluator,
+Device_>> {
+  typedef Indices_ Indices;
+  typedef LeftArgType_ LeftArgType;
+  typedef RightArgType_ RightArgType;
+  typedef OutputKernelType_ OutputKernelType;
+  typedef Device_ Device;
+};
+template 
+class TensorContractionOp {
+public:
+  typedef typename B::type Nested;
+  typename LhsXprType::Nested m_fn5();
+  typename RhsXprType::Nested m_fn6();
+};
+template  struct TensorContractionEvaluatorBase {
+  typedef typename traits::LeftArgType LeftArgType;
+  typedef typename traits::RightArgType RightArgType;
+  typedef typename traits::Device Device;
+  TensorContractionEvaluatorBase(
+  TensorContractionOp::Indices, LeftArgType,
+  RightArgType,
+  typename traits::OutputKernelType>
+  p1,
+  Device p2)
+  : m_leftImpl(p1.m_fn6(), p2), m_rightImpl(p1.m_fn5(), p2) {
+long nocontract_idx;
+for (int i;; i++) {
+  bool contracting;
+  if (contracting) {
+if (nocontract_idx < K::value)
+  m_j_size = m_j_strides[nocontract_idx];
+nocontract_idx++;
+  }
+}
+  }
+  array m_j_strides;
+  long m_j_size;
+  TensorEvaluator m_leftImpl;
+  TensorEvaluator m_rightImpl;
+};
+template 
+struct TensorEvaluator<
+const TensorContractionOp,
+Device>
+: TensorContractionEvaluatorBase,
+  Device>> {
+  typedef TensorEvaluator Self;
+  typedef TensorContractionEvaluatorBase Base;
+  TensorEvaluator(
+  TensorContractionOp
+  p1,
+  Device p2)
+  : Base(p1, p2) {}
+};
+template 
+struct traits> : traits {};
+template 
+class 

Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

2019-07-31 Thread Richard Earnshaw (lists)




On 30/07/2019 21:51, Bernd Edlinger wrote:

+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft 
-mabi=aapcs -O3" } */


This isn't going to work as-is, we test many combinations of the 
compiler, either with explicit dejagnu settings or with the compiler 
defaults and the dejagnu settings can't generally be overridden this way.


For -marm you require an effective-target of arm_arm_ok.  For ldrd, it 
should be enough to just require an effective-target of 
arm_ldrd_strd_ok, then you can .


I don't think we really care about any ABIs other than aapcs, so I'd 
just leave that off.  And as for setting the float-abi, I don't see 
anything in the tests that would require that, so that can probably be 
omitted as well.


I think with all this, you can then write something like

/* { dg-require-effective-target arm_arm_ok && arm_ldrd_strd_ok } */
/* { dg-options "-marm -mno-unaligned-access -O3 } */

But I haven't tested that, so you might need to fiddle with it a bit, 
especially the effective-target rule.


R.


[PATCH] Fix PR91293

2019-07-31 Thread Richard Biener


The following fixes wrong-code when SLP ends up swapping operands
of a reduction stmt, confusing code-generation later.  Fixed by
not doing such swapping.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-07-31  Richard Biener  

PR tree-optimization/91293
* tree-vect-slp.c (vect_build_slp_tree_2): Do not swap operands
of reduction stmts.

* gcc.dg/vect/pr91293-1.c: New testcase.
* gcc.dg/vect/pr91293-2.c: Likewise.
* gcc.dg/vect/pr91293-3.c: Likewise.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 273930)
+++ gcc/tree-vect-slp.c (working copy)
@@ -1296,6 +1296,9 @@ vect_build_slp_tree_2 (vec_info *vinfo,
  && nops == 2
  && oprnds_info[1]->first_dt == vect_internal_def
  && is_gimple_assign (stmt_info->stmt)
+ /* Swapping operands for reductions breaks assumptions later on.  */
+ && STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def
+ && STMT_VINFO_DEF_TYPE (stmt_info) != vect_double_reduction_def
  /* Do so only if the number of not successful permutes was nor more
 than a cut-ff as re-trying the recursive match on
 possibly each level of the tree would expose exponential
Index: gcc/testsuite/gcc.dg/vect/pr91293-1.c
===
--- gcc/testsuite/gcc.dg/vect/pr91293-1.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/pr91293-1.c   (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-additional-options "-msse4.1" { target { sse4_runtime } } } */
+
+long long a;
+unsigned b, c;
+int d = 62;
+void e(long long *f, int p2) { *f = p2; }
+int main()
+{
+  for (int g = 2; g <= d; g++)
+{
+  b += g + 4;
+  c += 5 - g;
+}
+  e(, b);
+  if (a != 2196)
+__builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vect/pr91293-2.c
===
--- gcc/testsuite/gcc.dg/vect/pr91293-2.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/pr91293-2.c   (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-additional-options "-msse4.1" { target { sse4_runtime } } } */
+
+long long a;
+unsigned b, c;
+int d = 62;
+void e(long long *f, int p2) { *f = p2; }
+int main()
+{
+  for (int g = 2; g <= d; g++)
+{
+  c += 5 - g;
+  b += g + 4;
+}
+  e(, b);
+  if (a != 2196)
+__builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/vect/pr91293-3.c
===
--- gcc/testsuite/gcc.dg/vect/pr91293-3.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/pr91293-3.c   (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-additional-options "-msse4.1" { target { sse4_runtime } } } */
+
+long long a;
+unsigned b, c;
+int d = 62;
+void e(long long *f, int p2) { *f = p2; }
+int xx = 5, yy = 4;
+int main()
+{
+  for (int g = 2; g <= d; g++)
+{
+  c += xx - g;
+  b += yy + g;
+}
+  e(, b);
+  if (a != 2196)
+__builtin_abort ();
+  return 0;
+}


Re: [PATCH] Come up with json::integer_number and use it in GCOV.

2019-07-31 Thread David Malcolm
On Wed, 2019-07-31 at 10:42 +0200, Martin Liška wrote:
> Hi.
> 
> As seen here:
> https://github.com/RPGillespie6/fastcov/pull/18/files/f184dd8b6fc14e0
> 75dfc0ea93de7be5c96298ddb#r308735088
> 
> GCOV uses json::number for branch count, line count and similar.
> However, the json library
> uses a double as an internal representation for numbers. That's no
> desirable in case
> of GCOV and so that I would like to come up with integer_number type.
> David would you be fine with that?

Thanks for the patch; overall I'm broadly in favor of the idea, but I
think the patch needs a little work.

The JSON standard has a definition for numbers that allows for both
integers and floats, and says this about interoperability:
 
https://tools.ietf.org/html/rfc7159#section-6
https://tools.ietf.org/html/rfc8259#section-6

"This specification allows implementations to set limits on the range
   and precision of numbers accepted.  Since software that implements
   IEEE 754 binary64 (double precision) numbers [IEEE754] is generally
   available and widely used, good interoperability can be achieved by
   implementations that expect no more precision or range than these
   provide, in the sense that implementations will approximate JSON
   numbers within the expected precision."

Hence I chose "double" as the representation.  But, yeah, it's a pain
when dealing with large integers.

[see also "Parsing JSON is a Minefield"
  http://seriot.ch/parsing_json.php#22 ]

Looking at your patch, you convert some places to integer_number, and
some to float_number.

It looks to me like you converted the gcov places you were concerned
about to integer, and kept the "status quo" as floats for the other
ones.  But in all of the places I can see, I think integers make more
sense than floats.

I think we want to preserve the capability to emit floating point json
values, but I suspect we want to use integer values everywhere we're
currently emitting json; it's probably worth going through them line by
line...

> diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc
> index 53c3b630b1c..2802da8a0a6 100644
> --- a/gcc/diagnostic-format-json.cc
> +++ b/gcc/diagnostic-format-json.cc
> @@ -48,8 +48,8 @@ json_from_expanded_location (location_t loc)
>json::object *result = new json::object ();
>if (exploc.file)
>  result->set ("file", new json::string (exploc.file));
> -  result->set ("line", new json::number (exploc.line));
> -  result->set ("column", new json::number (exploc.column));
> +  result->set ("line", new json::float_number (exploc.line));
> +  result->set ("column", new json::float_number (exploc.column));

These should be integers.


[...snip gcov hunks...]

> diff --git a/gcc/json.cc b/gcc/json.cc
> index 512e2e513b9..bec6fc53cc8 100644
> --- a/gcc/json.cc
> +++ b/gcc/json.cc
> @@ -154,18 +154,31 @@ array::append (value *v)
>m_elements.safe_push (v);
>  }
>  
> -/* class json::number, a subclass of json::value, wrapping a double.  */
> +/* class json::float_number, a subclass of json::value, wrapping a double.  
> */

Would it make more sense to call this "double_number"?  (am not sure)


> -/* Implementation of json::value::print for json::number.  */
> +/* Implementation of json::value::print for json::float_number.  */
>  
>  void
> -number::print (pretty_printer *pp) const
> +float_number::print (pretty_printer *pp) const
>  {
>char tmp[1024];
>snprintf (tmp, sizeof (tmp), "%g", m_value);
>pp_string (pp, tmp);
>  }
>  
> +/* class json::integer_number, a subclass of json::value, wrapping a long.  
> */

Likewise, would it make more sense to call this "long"?

An idea I had reading your patch: would a template be appropriate here,
something like:

template 
class number : public value
{
  enum kind get_kind () const FINAL OVERRIDE;
  T get () const { return m_value; }
  /* etc */
  
  T m_value;
};

with specializations for "double" and "long"?
(hence json::number json::number, and enum values in the
json_kind discriminator>).

I think that a template might be overthinking things, though;
the approach in your patch has the virtue of simplicity.

Is "long" always wide enough for all the integer values we might want
to express on the host?

[...snip...]
 
> -/* Subclass of value for numbers.  */
> +/* Subclass of value for floats.  */

I'd write "floating-point numbers" here (given that JSON standard
talks about "numbers".

[...]

> +/* Subclass of value for integers.  */

Likewise "integer-valued numbers" here, or somesuch.

[...]

> diff --git a/gcc/optinfo-emit-json.cc b/gcc/optinfo-emit-json.cc
> index 1cfcdfe8948..87cc940133a 100644
> --- a/gcc/optinfo-emit-json.cc
> +++ b/gcc/optinfo-emit-json.cc
> @@ -181,7 +181,7 @@ optrecord_json_writer::impl_location_to_json 
> (dump_impl_location_t loc)
>  {
>json::object *obj = new json::object ();
>obj->set ("file", new json::string (loc.m_file));
> -  obj->set ("line", new json::number (loc.m_line));
> +  obj->set ("line", new 

Re: [PATCH 4/5, OpenACC] Allow optional arguments to be used in the use_device OpenACC clause

2019-07-31 Thread Jakub Jelinek
On Mon, Jul 29, 2019 at 10:00:53PM +0100, Kwok Cheung Yeung wrote:
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -11636,9 +11636,40 @@ lower_omp_target (gimple_stmt_iterator *gsi_p,
> omp_context *ctx)
> tkind = GOMP_MAP_FIRSTPRIVATE_INT;
>   type = TREE_TYPE (ovar);
>   if (TREE_CODE (type) == ARRAY_TYPE)
> -   var = build_fold_addr_expr (var);
> +   {
> + var = build_fold_addr_expr (var);
> + gimplify_assign (x, var, );
> +   }
>   else
> {
> + tree opt_arg_label;
> + bool optional_arg_p
> +   = TREE_CODE (TREE_TYPE (ovar)) == POINTER_TYPE
> + && omp_is_optional_argument (ovar);

This should be also wrapped in ()s for emacs, like:

bool optional_arg_p
  = (TREE_CODE (TREE_TYPE (ovar)) == POINTER_TYPE
 && omp_is_optional_argument (ovar));

> +
> + if (optional_arg_p)
> +   {
> + tree null_label
> +   = create_artificial_label (UNKNOWN_LOCATION);
> + tree notnull_label
> +   = create_artificial_label (UNKNOWN_LOCATION);
> + opt_arg_label
> +   = create_artificial_label (UNKNOWN_LOCATION);
> + tree new_x = copy_node (x);

Please use unshare_expr instead of copy_node here.

Otherwise LGTM.

On the OpenMP side this isn't sufficient, because it only
handles mapping clauses, not the data sharing, so I guess I'll need to go
through all data sharing clauses on all constructs, write testcases and see
if what OpenMP spec says (just a general rule):
"If a list item that appears in a directive or clause is an optional dummy 
argument that is not present,
the directive or clause for that list item is ignored.

If the variable referenced inside a construct is an optional dummy argument 
that is not present, any
explicitly determined, implicitly determined, or predetermined data-sharing and 
data-mapping
attribute rules for that variable are ignored. Otherwise, if the variable is an 
optional dummy
argument that is present, it is present inside the construct."
is handled right.

Jakub


Re: [PATCH 2/5, OpenACC] Support Fortran optional arguments in the firstprivate clause

2019-07-31 Thread Jakub Jelinek
On Mon, Jul 29, 2019 at 09:55:52PM +0100, Kwok Cheung Yeung wrote:
> +/* True if OpenMP should treat this DECL as an optional argument.  */
> +
> +bool
> +gfc_omp_is_optional_argument (const_tree decl)
> +{
> +  return DECL_LANG_SPECIFIC (decl)
> +  && GFC_DECL_OPTIONAL_ARGUMENT (decl);

I think this ought to be written as:
  return (DECL_LANG_SPECIFIC (decl)
  && GFC_DECL_OPTIONAL_ARGUMENT (decl));
because otherwise for emacs users (not me) emacs mishandles it.
Also, I wonder if it shouldn't start with TREE_CODE (decl) == PARM_DECL
check, anything else isn't a dummy argument and so can't be optional.

> +}
> +
>  /* True if OpenMP should privatize what this DECL points to rather
> than the DECL itself.  */
> 

Otherwise LGTM.

Jakub


[committed, amdgcn] Remove expcnt waits.

2019-07-31 Thread Andrew Stubbs

I've committed this patch to correct a misunderstanding of the ISA.

The ISA documentation implies that "s_waitcnt expcnt(0)" should be used 
after memory writes, but apparently it really only means this for GDS 
writes (and pixel exports, I think).


The patch simply removes most of these instruction uses. They were 
basically only ever no-ops.


However, in a couple of cases there is an exposed-pipeline issue that 
needs to be resolved with an actual "nop", which we no longer have. The 
patch also takes care of adding these, where appropriate. (As it 
happens, the cmpswap instruction will now get both s_waitcnt and nop, 
which is unnecessary, but that's because I plan to add proper scheduling 
for all the s_waitcnt instructions in the near future, and I don't want 
this detail to get forgotten.)


Andrew Stubbs
Mentor Graphics / CodeSourcery
Remove amdgcn expcnt waits.

2019-07-31  Andrew Stubbs  

	gcc/
	* config/gcn/gcn-valu.md
	(scatter_insn_1offset): Remove s_waitcnt.
	(scatter_insn_1offset_ds): Likewise.
	(scatter_insn_2offsets): Likewise.
	* config/gcn/gcn.c (gcn_md_reorg): Add delayeduse and reads to
	struct ilist. Add nops for delayeduse insns.
	* config/gcn/gcn.md (delayeduse): New attribute.
	(*movbi): Remove s_waitcnt from stores.
	(*mov_insn): Likewise.
	(*movti_insn): Likewise. Add delayeduse attribute.
	(sync_compare_and_swap_insn): Add delayeduse attribute.
	(atomic_store): Remove or adjust s_waitcnt.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 3b41bb37071..4d25eedfc74 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -863,15 +863,12 @@
 if (AS_FLAT_P (as))
   {
 	if (TARGET_GCN5_PLUS)
-	  sprintf (buf, "flat_store%%s2\t%%0, %%2 offset:%%1%s\;"
-		   "s_waitcnt\texpcnt(0)", glc);
+	  sprintf (buf, "flat_store%%s2\t%%0, %%2 offset:%%1%s", glc);
 	else
-	  sprintf (buf, "flat_store%%s2\t%%0, %%2%s\;s_waitcnt\texpcnt(0)",
-		   glc);
+	  sprintf (buf, "flat_store%%s2\t%%0, %%2%s", glc);
   }
 else if (AS_GLOBAL_P (as))
-  sprintf (buf, "global_store%%s2\t%%0, %%2, off offset:%%1%s\;"
-	   "s_waitcnt\texpcnt(0)", glc);
+  sprintf (buf, "global_store%%s2\t%%0, %%2, off offset:%%1%s", glc);
 else
   gcc_unreachable ();
 
@@ -895,7 +892,7 @@
   {
 addr_space_t as = INTVAL (operands[3]);
 static char buf[200];
-sprintf (buf, "ds_write%%b2\t%%0, %%2 offset:%%1%s\;s_waitcnt\texpcnt(0)",
+sprintf (buf, "ds_write%%b2\t%%0, %%2 offset:%%1%s",
 	 (AS_GDS_P (as) ? " gds" : ""));
 return buf;
   }
@@ -929,8 +926,8 @@
 	/* Work around assembler bug in which a 64-bit register is expected,
 	but a 32-bit value would be correct.  */
 	int reg = REGNO (operands[1]) - FIRST_VGPR_REG;
-	sprintf (buf, "global_store%%s3\tv[%d:%d], %%3, %%0 offset:%%2%s\;"
-		  "s_waitcnt\texpcnt(0)", reg, reg + 1, glc);
+	sprintf (buf, "global_store%%s3\tv[%d:%d], %%3, %%0 offset:%%2%s",
+		 reg, reg + 1, glc);
   }
 else
   gcc_unreachable ();
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index c5e069fd266..3ddcc6a2eb3 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -4506,7 +4506,9 @@ gcn_md_reorg (void)
   {
 rtx_insn *insn;
 attr_unit unit;
+attr_delayeduse delayeduse;
 HARD_REG_SET writes;
+HARD_REG_SET reads;
 int age;
   } back[max_waits];
   int oldest = 0;
@@ -4525,6 +4527,7 @@ gcn_md_reorg (void)
 
   attr_type itype = get_attr_type (insn);
   attr_unit iunit = get_attr_unit (insn);
+  attr_delayeduse idelayeduse = get_attr_delayeduse (insn);
   HARD_REG_SET ireads, iwrites;
   CLEAR_HARD_REG_SET (ireads);
   CLEAR_HARD_REG_SET (iwrites);
@@ -4600,6 +4603,14 @@ gcn_md_reorg (void)
 		  (regs, reg_class_contents[(int) VGPR_REGS]))
 		nops_rqd = 2 - prev_insn->age;
 	}
+
+	  /* Store that requires input registers are not overwritten by
+	 following instruction.  */
+	  if ((prev_insn->age + nops_rqd) < 1
+	  && prev_insn->delayeduse == DELAYEDUSE_YES
+	  && ((hard_reg_set_intersect_p
+		   (prev_insn->reads, iwrites
+	nops_rqd = 1 - prev_insn->age;
 	}
 
   /* Insert the required number of NOPs.  */
@@ -4627,7 +4638,9 @@ gcn_md_reorg (void)
   /* Track the current instruction as a previous instruction.  */
   back[oldest].insn = insn;
   back[oldest].unit = iunit;
+  back[oldest].delayeduse = idelayeduse;
   COPY_HARD_REG_SET (back[oldest].writes, iwrites);
+  COPY_HARD_REG_SET (back[oldest].reads, ireads);
   back[oldest].age = 0;
   oldest = (oldest + 1) % max_waits;
 
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 45ffc3e8553..926d0120930 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -285,6 +285,11 @@
 
 (define_attr "laneselect" "yes,no" (const_string "no"))
 
+; Identify instructions that require a "Manually Inserted Wait State" if
+; their inputs are overwritten by subsequent instructions.
+
+(define_attr 

Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Jan Hubicka
> diff --git a/gcc/gcc.c b/gcc/gcc.c
> index a4323eb146e..e43f46246ad 100644
> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -8268,6 +8268,44 @@ driver::maybe_run_linker (const char *argv0) const
>  {
>int tmp = execution_count;
>  
> +  /* Detect jobserver and drop it if it's not working.  */
> +  const char *makeflags = env.get ("MAKEFLAGS");
> +  if (makeflags != NULL)
> + {
> +   const char *needle = "--jobserver-auth=";
> +   const char *n = strstr (makeflags, needle);
> +   if (n != NULL)
> + {
> +   int rfd = -1;
> +   int wfd = -1;
> +
> +   bool jobserver
> + = ((sscanf(n, "--jobserver-auth=%d,%d", , ) == 2)
> +&& rfd > 0
> +&& wfd > 0
> +&& fcntl (rfd, F_GETFD) >= 0
> +&& fcntl (wfd, F_GETFD) >= 0);

Yes, I suppose this is the test whether file descriptors are open that
should work in our context.

Honza
> +
> +   /* Drop the jobserver if it's not working now.  */
> +   if (!jobserver)
> + {
> +   unsigned offset = n - makeflags;
> +   char *dup = xstrdup (makeflags);
> +   dup[offset] = '\0';
> +
> +   const char *space = strchr (makeflags + offset, ' ');
> +   if (space != NULL)
> + strcpy (dup + offset, space);
> +   xputenv (concat ("MAKEFLAGS=", dup, NULL));
> +
> +   // TODO: remove
> +   FILE *f = fopen ("/tmp/111x", "a"); 
> +   fprintf (f, "dropping MAKEFLAGS\n"); 
> +   fclose (f); 
> + }
> + }
> + }
> +
>if (! have_c)
>   {
>  #if HAVE_LTO_PLUGIN > 0
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index 353187c6043..10b74d84af7 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -1430,6 +1430,13 @@ run_gcc (unsigned argc, char *argv[])
>jobserver or number of cores.  */
>auto_parallel = 1;
>jobserver = jobserver_active_p ();
> +
> +FILE *f = fopen ("/tmp/111x", "a"); 
> +fprintf (f, "%d %s\n", jobserver, getenv ("MAKEFLAGS")); 
> +fclose (f); 
> +char buf[1024]; 
> +sprintf (buf, "ls -l /proc/%d/fd >> /tmp/111x", getpid ()); 
> +system (buf); 
>  }
>  
>if (linker_output)



Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-07-31 Thread Richard Biener
On Sat, 27 Jul 2019, Uros Bizjak wrote:

> On Sat, Jul 27, 2019 at 12:07 PM Uros Bizjak  wrote:
> 
> > > How would one write smaxsi3 as a splitter to be split after
> > > reload in the case LRA assigned the GPR alternative?  Is it
> > > even worth doing?  Even the SSE reg alternative can be split
> > > to remove the not needed CC clobber.
> > >
> > > Finally I'm unsure about the add where I needed to place
> > > the SSE alternative before the 2nd op memory one since it
> > > otherwise gets the same cost and wins.
> > >
> > > So - how to go forward with this?
> >
> > Sorry to come a bit late to the discussion.
> >
> > We are aware of CMOV issue for quite some time, but the issue is not
> > understood yet in detail (I was hoping for Intel people to look at
> > this). However, you demonstrated that using PMAX and PMIN  instead of
> > scalar CMOV can bring us big gains, and this thread now deals on how
> > to best implement PMAX/PMIN for scalar code.
> >
> > I think that the way to go forward is with STV infrastructure.
> > Currently, the implementation only deals with DImode on SSE2 32bit
> > targets, but I see no issues on using STV pass also for SImode (on
> > 32bit and 64bit targets). There are actually two STV passes, the first
> > one (currently run on 64bit targets) is run before cse2, and the
> > second (which currently runs on 32bit SSE2 only) is run after combine
> > and before split1 pass. The second pass is interesting to us.
> >
> > The base idea of the second STV pass (for 32bit targets!) is that we
> > introduce a DImode _doubleword instructons that otherwise do not exist
> > with integer registers. Now, the passes up to and including combine
> > pass can use these instructions to simplify and optimize the insn
> > flow. Later, based on cost analysis, STV pass either converts the
> > _doubleword instructions to a real vector ones (e.g. V2DImode
> > patterns) or leaves them intact, and a follow-up split pass splits
> > them into scalar SImode instruction pairs. STV pass also takes care to
> > move and preload values from their scalar form to a vector
> > representation (using SUBREGs). Please note that all this happens on
> > pseudos, and register allocator will later simply use scalar (integer)
> > registers in scalar patterns and vector registers with vector insn
> > patterns.
> >
> > Your approach to amend existing scalar SImode patterns with vector
> > registers will introduce no end of problems. Register allocator will
> > do funny things during register pressure, where values will take a
> > trip to a vector register before being stored to memory (and vice
> > versa, you already found some of them). Current RA simply can't
> > distinguish clearly between two register sets.
> >
> > So, my advice would be to use STV pass also for SImode values, on
> > 64bit and 32bit targets. On both targets, we will be able to use
> > instructions that operate on vector register set, and for 32bit
> > targets (and to some extent on 64bit targets), we would perhaps be
> > able to relax register pressure in a kind of controlled way.
> >
> > So, to demonstrate the benefits of existing STV pass, it should be
> > relatively easy to introduce 64bit max/min pattern on 32bit target to
> > handle 64bit values. For 32bit values, the pass should be re-run to
> > convert SImode scalar operations to vector operations in a controlled
> > way, based on various cost functions.

I've looked at STV before trying to use RA to solve the issue but
quickly stepped away because of its structure which seems to be
tied to particular modes, duplicating things for TImode and DImode
so it looked like I have to write up everything again for SImode...

It really should be possible to run the pass once, handling a set
of modes rather than re-running it for the SImode case I am after.
See also a recent PR about STV slowness and tendency to hog memory
because it seems to enable every DF problem that is around...

> Please find attached patch to see STV in action. The compilation will
> crash due to non-existing V2DImode SMAX insn, but in the _.268r.stv2
> dump, you will be able to see chain building, cost calculation and
> conversion insertion.

So you unconditionally add a smaxdi3 pattern - indeed this looks
necessary even when going the STV route.  The actual regression
for the testcase could also be solved by turing the smaxsi3
back into a compare and jump rather than a conditional move sequence.
So I wonder how you'd do that given that there's pass_if_after_reload
after pass_split_after_reload and I'm not sure we can split
as late as pass_split_before_sched2 (there's also a split _after_
sched2 on x86 it seems).

So how would you go implement {s,u}{min,max}{si,di}3 for the
case STV doesn't end up doing any transform?

You could save me some guesswork here if you can come up with
a reasonably complete final set of patterns (ok, I only care
about smaxsi3) so I can have a look at the STV approach again
(you may remember I simply "split" 

[PATCH] RISC-V: Raise error on unexpected ISA string at end.

2019-07-31 Thread Maxim Blinov
This patch adds the same check that is present in binutils/bfd/elfxx-riscv.c.

Checks that we have reached the end of the string after all the
parsing routines have been run. Without this check, GCC silently
succeeds on erroneous input, and produces an assembly with a truncated
arch attribute.

tested with RUNTESTFLAGS="riscv.exp"

Thanks,
Maxim

gcc/ChangeLog:
2019-07-31  Maxim Blinov  

* common/config/riscv/riscv-common.c: Check -march string ends
with null.

gcc/testsuite/ChangeLog:
2019-07-31  Maxim Blinov  

* gcc.target/riscv/attribute-10.c: New test.

---
 gcc/common/config/riscv/riscv-common.c| 7 +++
 gcc/testsuite/gcc.target/riscv/attribute-10.c | 6 ++
 2 files changed, 13 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-10.c

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index eeb75717db0..64a309241da 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -513,6 +513,13 @@ riscv_subset_list::parse (const char *arch, location_t loc)
   if (p == NULL)
 goto fail;
 
+  if (*p != '\0')
+{
+  error_at (loc, "-march=%s: unexpected ISA string at end: %<%s%>",
+   arch, p);
+  goto fail;
+}
+
   return subset_list;
 
 fail:
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-10.c 
b/gcc/testsuite/gcc.target/riscv/attribute-10.c
new file mode 100644
index 000..dd817879a67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/attribute-10.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32im_s_sx_unexpectedstring -mabi=ilp32" } */
+int foo()
+{
+}
+/* { dg-error "unexpected ISA string at end:" "" { target { "riscv*-*-*" } } 0 
} */
-- 
2.20.1



[PATCH] Make EVRP release unused value-ranges, cleanup lattice swapping

2019-07-31 Thread Richard Biener


This is the last bit I had in my tree - as seen elsewhere you
eventually run into testcases triggering issues, so this fixes
EVRP not releasing any of its temporary ranges it pushes to
the lattice.  It also cleans up the lattice entry swapping
"const correctness" issue by adding a new swap_vr_value
method to the lattice.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-07-31  Richard Biener  

* vr-values.h (vr_values::swap_vr_value): New.
(vr_values::free_value_range): likewise.
* vr-values.c (vr_values::swap_vr_value): Implement.
* gimple-ssa-evrp-analyze.h (evrp_range_analyzer::pop_value_range):
Do not return a range or take a var.
(evrp_range_analyzer::stack): Change back to recording a non-const
value_range *.
* gimple-ssa-evrp-analyze.c
(evrp_range_analyzer::record_ranges_from_stmt): Free unused
value-range.
(evrp_range_analyzer::pop_to_marker): Adjust.
(evrp_range_analyzer::push_value_range): Use new swap_vr_value.
(evrp_range_analyzer::pop_value_range): Likewise.  Free the
no longer needed value-range.

Index: gcc/vr-values.c
===
--- gcc/vr-values.c (revision 273923)
+++ gcc/vr-values.c (working copy)
@@ -4315,6 +4315,8 @@ vr_values::simplify_stmt_using_ranges (g
   return false;
 }
 
+/* Set the lattice entry for VAR to VR.  */
+
 void
 vr_values::set_vr_value (tree var, value_range *vr)
 {
@@ -4323,3 +4325,13 @@ vr_values::set_vr_value (tree var, value
   vr_value[SSA_NAME_VERSION (var)] = vr;
 }
 
+/* Swap the lattice entry for VAR with VR and return the old entry.  */
+
+value_range *
+vr_values::swap_vr_value (tree var, value_range *vr)
+{
+  if (SSA_NAME_VERSION (var) >= num_vr_values)
+return NULL;
+  std::swap (vr_value[SSA_NAME_VERSION (var)], vr);
+  return vr;
+}
Index: gcc/vr-values.h
===
--- gcc/vr-values.h (revision 273923)
+++ gcc/vr-values.h (working copy)
@@ -41,8 +41,9 @@ class vr_values
   ~vr_values (void);
 
   const value_range *get_value_range (const_tree);
-
   void set_vr_value (tree, value_range *);
+  value_range *swap_vr_value (tree, value_range *);
+
   void set_def_to_varying (const_tree);
   void set_defs_to_varying (gimple *);
   bool update_value_range (const_tree, value_range *);
@@ -68,6 +69,8 @@ class vr_values
   /* Allocate a new value_range object.  */
   value_range *allocate_value_range (void)
 { return vrp_value_range_pool.allocate (); }
+  void free_value_range (value_range *vr)
+{ vrp_value_range_pool.remove (vr); }
 
   /* */
   void cleanup_edges_and_switches (void);
Index: gcc/gimple-ssa-evrp-analyze.c
===
--- gcc/gimple-ssa-evrp-analyze.c   (revision 273923)
+++ gcc/gimple-ssa-evrp-analyze.c   (working copy)
@@ -214,7 +214,10 @@ evrp_range_analyzer::record_ranges_from_
old_vr->max ());
  tem.intersect (vrs[i].second);
  if (tem.equal_p (*old_vr))
-   continue;
+   {
+ vr_values->free_value_range (vrs[i].second);
+ continue;
+   }
  push_value_range (vrs[i].first, vrs[i].second);
  if (is_fallthru
  && m_update_global_ranges
@@ -393,7 +396,7 @@ evrp_range_analyzer::pop_to_marker (void
 {
   gcc_checking_assert (!stack.is_empty ());
   while (stack.last ().first != NULL_TREE)
-pop_value_range (stack.last ().first);
+pop_value_range ();
   stack.pop ();
 }
 
@@ -421,17 +424,18 @@ evrp_range_analyzer::push_value_range (t
   dump_value_range (dump_file, vr);
   fprintf (dump_file, "\n");
 }
-  stack.safe_push (std::make_pair (var, get_value_range (var)));
-  vr_values->set_vr_value (var, vr);
+  value_range *old_vr = vr_values->swap_vr_value (var, vr);
+  stack.safe_push (std::make_pair (var, old_vr));
 }
 
-/* Pop the Value Range from the vrp_stack and update VAR with it.  */
+/* Pop a Value Range from the vrp_stack.  */
 
-const value_range *
-evrp_range_analyzer::pop_value_range (tree var)
+void
+evrp_range_analyzer::pop_value_range ()
 {
-  const value_range *vr = stack.last ().second;
-  gcc_checking_assert (var == stack.last ().first);
+  std::pair e = stack.pop ();
+  tree var = e.first;
+  value_range *vr = e.second;
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "popping range for ");
@@ -440,9 +444,9 @@ evrp_range_analyzer::pop_value_range (tr
   dump_value_range (dump_file, vr);
   fprintf (dump_file, "\n");
 }
-  /* We saved off a lattice entry, now give it back - it can now
- be modified again, thus the const casting.  */
-  vr_values->set_vr_value (var, const_cast  (vr));
-  stack.pop ();
-  return vr;
+  /* We saved off a lattice entry, now 

Re: [patch] Extend GIMPLE store merging to throwing stores

2019-07-31 Thread Richard Biener
On Fri, Jul 26, 2019 at 12:56 PM Eric Botcazou  wrote:
>
> Hi,
>
> one of the effects of -fnon-call-exceptions is that the memory accesses are
> considered trapping by default, i.e. unless you can prove otherwise.  If, in
> addition to this, the code is covered by an exception handler, such memory
> accesses are the sources of an EH edge, which means that they end their basic
> block.  This thwarts a fair number of optimizations, especially the local ones
> among which a very badly affected example is GIMPLE store merging.
>
> So the attached patch adds (minimal) support for merging throwing stores when
> -fnon-call-exceptions is enabled and there are active exception handlers in
> the function.  The idea is straightforward (to record the index number of the
> landing pad and merge consecutive stores with the same number) so the meat of
> the patch is technicalities required to first streamline the CFG, then detect
> the candidates throwing stores and finally properly update the CFG at the end.
>
> We have real-world examples for which this helps a lot in Ada, typically when
> you're inlining an elaboration routine (constructor in Ada parlance) into a
> region covered by an exception handler.
>
> Tested on x86_64-suse-linux, OK for the mainline?

+/* Return the index number of the landing pad for STMT, if any.  */
+
+static int
+lp_nr_for_store (gimple *stmt)
+{
+  if (!cfun->can_throw_non_call_exceptions || !cfun->eh)
+return 0;
+
+  if (!stmt_could_throw_p (cfun, stmt))
+return 0;
+
+  return lookup_stmt_eh_lp (stmt);
+}

Did you add the wrapper as compile-time optimization?  That is,
I don't see why simply calling lookup_stmt_eh_lp wouldn't work?

+  /* If the function can throw and catch non-call exceptions, we'll be trying
+ to merge stores across different basic blocks so we need to first unsplit
+ the EH edges in order to streamline the CFG of the function.  */
+  if (cfun->can_throw_non_call_exceptions && cfun->eh)
+{
+  free_dominance_info (CDI_DOMINATORS);
+  maybe_remove_unreachable_handlers ();
+  changed = unsplit_all_eh ();
+  if (changed)
+   delete_unreachable_blocks ();
+}

uh, can unsplitting really result in unreachable blocks or does it
merely forget to delete forwarders it made unreachable?  Removing
unreachable handlers is also to make things match better?  Just
wondering how much of this work we could delay to the first
store-merging opportunity with EH we find (but I don't care too much
about -fnon-call-exceptions).

To isolate the details above maybe move this piece into a helper
in tree-eh.c so you also can avoid exporting unsplit_all_eh?

Otherwise looks OK to me.

Thanks,
Richard.

>
> 2019-07-26  Eric Botcazou  
>
> * tree-eh.h (unsplit_all_eh): Declare.
> * tree-eh.c (maybe_remove_unreachable_handlers): Detect more cases.
> (unsplit_all_eh): Make public.
> * gimple-ssa-store-merging.c: Include cfganal.h cfgcleanup.h except.h.
> (struct store_immediate_info): Add lp_nr field.
> (store_immediate_info::store_immediate_info): Add NR2 parameter and
> initialize lp_nr with it.
> (struct merged_store_group): Add lp_nr and only_constants fields.
> (merged_store_group::merged_store_group): Initialize them.
> (merged_store_group::can_be_merged_into): Deal with them.
> (pass_store_merging): Rename terminate_and_release_chain into
> terminate_and_process_chain.
> (pass_store_merging::terminate_and_process_all_chains): Adjust to 
> above
> renaming and remove useless assertions.
> (pass_store_merging::terminate_all_aliasing_chains): Small tweak.
> (stmts_may_clobber_ref_p): Be prepared for different basic blocks.
> (imm_store_chain_info::coalesce_immediate_stores): Use only_constants
> instead of always recomputing it and compare lp_nr.
> (imm_store_chain_info::output_merged_store): If the group is in an
> active EH region, register new stores if they can throw.  Moreover,
> if the insertion has created new basic blocks, adjust the PHI nodes
> of the post landing pad.
> (imm_store_chain_info::output_merged_stores): If the original stores
> are in an active EH region, deregister them.
> (lhs_valid_for_store_merging_p): Prettify.
> (adjust_bit_pos): New function extracted from...
> (mem_valid_for_store_merging): ...here.  Use it for the base address
> and also for the offset if it is the addition of a constant.
> (lp_nr_for_store): New function.
> (pass_store_merging::process_store): Change return type to bool.
> Call lp_nr_for_store to initialize the store info.  Propagate the
> return status of various called functions to the return value.
> (store_valid_for_store_merging_p): New predicate.
> (enum basic_block_status): New enumeration.
> (get_status_for_store_merging): 

Re: [PATCH] Mark necessary 2nd and later args for delete op.

2019-07-31 Thread Richard Biener
On Wed, Jul 31, 2019 at 10:37 AM Martin Liška  wrote:
>
> On 7/30/19 4:00 PM, Marc Glisse wrote:
> > On Tue, 30 Jul 2019, Martin Liška wrote:
> >
> >> Ah, that's bad, both of them need a care:
> >
> > Yes, that makes more sense to me, thanks.
> >
> >>> I tried to experiment to understand, but it is complicated because 
> >>> including  disables the optimization:
> >>>
> >>> #include 
> >>> void fn1() {
> >>> char*p=new char;
> >>> delete p;
> >>> }
> >>>
> >>> This ICEs with -O -std=c++17:
> >>>
> >>> int a = 64;
> >>> std::align_val_t b{64};
> >>> void fn1() {
> >>>   void *s = operator new(a,b);
> >>>   operator delete(s,8+*(unsigned long*)s,b);
> >>> }
> >>>
> >>>
> >>
> >> I can't see it on current master. Can you?
> >
> > Yes. I just did a clean build to make sure.
> >
>
> That's addressed in a patch I'm attaching.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

> Thanks,
> Martin


Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Martin Liška
On 7/31/19 11:12 AM, Jan Hubicka wrote:
>> and see that in some cases jobserver is 0 and in other cases jobserver is 1.
>> Examples of both:
>> 0 kw -j2 --jobserver-auth=3,6 -- 
>> total 0
>> lrwx--. 1 jakub jakub 64 Jul 31 10:41 0 -> /dev/pts/4
>> l-wx--. 1 jakub jakub 64 Jul 31 10:41 1 -> pipe:[75026106]
>> lr-x--. 1 jakub jakub 64 Jul 31 10:41 10 -> 
>> /usr/src/gcc/obj/gcc/libgcc_s.so
>> lr-x--. 1 jakub jakub 64 Jul 31 10:41 13 -> /usr/lib64/libc.so
>> lr-x--. 1 jakub jakub 64 Jul 31 10:41 18 -> 
>> /usr/src/gcc/obj/gcc/libgcc_s.so
>> l-wx--. 1 jakub jakub 64 Jul 31 10:41 2 -> /tmp/ccX4y4r3.le
>> l-wx--. 1 jakub jakub 64 Jul 31 10:41 9 -> pipe:[46167136]
>> 1 kw -j2 --jobserver-auth=3,6 -- 
>> total 0
>> lrwx--. 1 jakub jakub 64 Jul 31 10:41 0 -> /dev/pts/4
>> l-wx--. 1 jakub jakub 64 Jul 31 10:41 1 -> /tmp/cchSvmBt
>> lr-x--. 1 jakub jakub 64 Jul 31 10:41 10 -> /usr/lib64/crtn.o
>> lrwx--. 1 jakub jakub 64 Jul 31 10:41 2 -> /dev/pts/4
>> lr-x--. 1 jakub jakub 64 Jul 31 10:41 3 -> /usr/lib64/crt1.o
>> lr-x--. 1 jakub jakub 64 Jul 31 10:41 4 -> /usr/lib64/crti.o
>> lr-x--. 1 jakub jakub 64 Jul 31 10:41 5 -> 
>> /usr/src/gcc/obj/gcc/crtbegin.o
>> lr-x--. 1 jakub jakub 64 Jul 31 10:41 6 -> 
>> /usr/src/gcc/obj/gcc/testsuite/gcc/c_lto_20081125_0.o
>> lr-x--. 1 jakub jakub 64 Jul 31 10:41 7 -> 
>> /usr/src/gcc/obj/gcc/testsuite/gcc/c_lto_20081125_1.o
>> lr-x--. 1 jakub jakub 64 Jul 31 10:41 8 -> /usr/src/gcc/obj/gcc/crtend.o
>> l-wx--. 1 jakub jakub 64 Jul 31 10:41 9 -> pipe:[46167136]
>>
>> so, if one is lucky enough and at least one of the two file descriptors in
>> MAKEFLAGS --jobserver-auth= is closed, it will work fine, but if they are
>> open, even when they have nothing to do with make, it will fail miserably.
> 
> I see, so the problem is that lto-wrapper is executed via plugin from
> gold which already used the file descriptors for something else?
> 
> This seems a bit of problem since gold opens it before we get to
> lto-plugin (most probably) so i do not see how to reliably detect
> presence of make server.

Thank you Jakub for debugging!

> 
> I think the gcc binary can look for jobserv environment, detect jobserv
> and if it is not available remove jobserver option from it prior going
> to ld.  I think this works since I think link-time gcc can only be
> executed via gcc/g++/gfortran... wrappers.

Do you mean something like what I'm attaching?

Anyway, the auto-detection of jobserver seems very ugly to me :/
I tend to remove the support and start working on a proper implementation
that can be used not only by make build system.

Martin

> 
> (unlike lto plugin invoked implicitly for ar/nm etc).
> 
> Honza
>>
>> As a partial workaround, I wonder if jobserver_active_p couldn't check (with
>> fstat/fstat64) if the file descriptors are pipes.  It will still not be 100%
>> reliable, as when you are unlucky the file descriptor could be some
>> completely unrelated pipe, but at least better than the current state.
>>
>>  Jakub

diff --git a/gcc/gcc.c b/gcc/gcc.c
index a4323eb146e..e43f46246ad 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -8268,6 +8268,44 @@ driver::maybe_run_linker (const char *argv0) const
 {
   int tmp = execution_count;
 
+  /* Detect jobserver and drop it if it's not working.  */
+  const char *makeflags = env.get ("MAKEFLAGS");
+  if (makeflags != NULL)
+	{
+	  const char *needle = "--jobserver-auth=";
+	  const char *n = strstr (makeflags, needle);
+	  if (n != NULL)
+	{
+	  int rfd = -1;
+	  int wfd = -1;
+
+	  bool jobserver
+		= ((sscanf(n, "--jobserver-auth=%d,%d", , ) == 2)
+		   && rfd > 0
+		   && wfd > 0
+		   && fcntl (rfd, F_GETFD) >= 0
+		   && fcntl (wfd, F_GETFD) >= 0);
+
+	  /* Drop the jobserver if it's not working now.  */
+	  if (!jobserver)
+		{
+		  unsigned offset = n - makeflags;
+		  char *dup = xstrdup (makeflags);
+		  dup[offset] = '\0';
+
+		  const char *space = strchr (makeflags + offset, ' ');
+		  if (space != NULL)
+		strcpy (dup + offset, space);
+		  xputenv (concat ("MAKEFLAGS=", dup, NULL));
+
+		  // TODO: remove
+		  FILE *f = fopen ("/tmp/111x", "a"); 
+		  fprintf (f, "dropping MAKEFLAGS\n"); 
+		  fclose (f); 
+		}
+	}
+	}
+
   if (! have_c)
 	{
 #if HAVE_LTO_PLUGIN > 0
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 353187c6043..10b74d84af7 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1430,6 +1430,13 @@ run_gcc (unsigned argc, char *argv[])
 	 jobserver or number of cores.  */
   auto_parallel = 1;
   jobserver = jobserver_active_p ();
+
+FILE *f = fopen ("/tmp/111x", "a"); 
+fprintf (f, "%d %s\n", jobserver, getenv ("MAKEFLAGS")); 
+fclose (f); 
+char buf[1024]; 
+sprintf (buf, "ls -l /proc/%d/fd >> /tmp/111x", getpid ()); 
+system (buf); 
 }
 
   if (linker_output)


Re: [PATCH] Make BITMAP_WORD more easily configurable

2019-07-31 Thread Richard Biener
On Tue, 30 Jul 2019, Martin Sebor wrote:

> On 7/30/19 9:19 AM, Jakub Jelinek wrote:
> > On Tue, Jul 30, 2019 at 09:16:45AM -0600, Martin Sebor wrote:
> > > Say something like this POC:
> > 
> > We have those already, see ctz_hwi, clz_hwi etc.
> > The thing is that BITMAP_WORD isn't always the same type as unsigned
> > HOST_WIDE_INT, and so we need ones with the BITMAP_WORD type.
> 
> That sounds like a problem overloading would solve to the benefit
> of all clients.  I.e., overload count_trailing_zeros (or whatever
> other name) to take integer arguments of all widths (signed and
> unsigned) and call the appropriate built-in from each.  That way
> we just have to remember one function name and know that it will
> be the right choice regardless of the type of the argument.

The problem with clz at least is that you have to make sure
to only allow overloads to exactly match the type of the
operand since semantics change once you do any promotion.
For popcount and ctz that's only an issue for signed arguments,
but still.

And I don't see any optimization of (uint){clzl,popcountl}((ulong)uintvar)
-> {clz,popcount} (uintvar)

unsigned int foo (unsigned int x)
{
  unsigned long y = x;
  unsigned long z = __builtin_popcountl (y);
  return z;
}

calls __popcountdi2 for me.

So if you dislike the macros then we could add 
{popcount,clz,ctz}_bitmap_word () functions.

Still having the plumbing in once place looks like an improvement
to me - not sure if your comments were an objection to the original
patch?

Thanks,
Richard.


Re: [PATCH] Remove also 2nd argument for unused delete operator (PR tree-optimization/91270).

2019-07-31 Thread Richard Biener
On Tue, Jul 30, 2019 at 3:39 PM Martin Liška  wrote:
>
> On 7/30/19 3:09 PM, Marc Glisse wrote:
> > On Tue, 30 Jul 2019, Martin Liška wrote:
> >
> >> On 7/30/19 1:35 PM, Marc Glisse wrote:
> >>> +  /* Some delete operators have size as 2nd argument.  */
> >>>
> >>> Some delete operators have 3 arguments. From cp/decl.c:
> >>>
> >>> /* operator delete (void *, size_t, align_val_t); */
> >>>
> >>
> >> Yep, I know. The patch I installed expects at least 2 arguments:
> >>
> >> + /* Some delete operators have size as 2nd argument.  */
> >> + if (is_delete_operator && gimple_call_num_args (stmt) >= 
> >> 2)
> >
> > True, I guess I am a bit confused why the second argument (which could be 
> > either size or alignment) needs special handling (mark_operand_necessary) 
> > while the third one does not (it is usually a constant).
>
> Ah, that's bad, both of them need a care:
>
> diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
> index bec13cd5930..80d5f5c30f7 100644
> --- a/gcc/tree-ssa-dce.c
> +++ b/gcc/tree-ssa-dce.c
> @@ -824,13 +824,16 @@ propagate_necessity (bool aggressive)
>|| DECL_FUNCTION_CODE (def_callee) == 
> BUILT_IN_CALLOC))
>   || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (def_callee)))
> {
> - /* Some delete operators have size as 2nd argument.  */
> + /* Delete operators can have alignment and (or) size as next
> +arguments.  When being a SSA_NAME, they must be marked
> +as necessary.  */
>   if (is_delete_operator && gimple_call_num_args (stmt) >= 2)
> -   {
> - tree size_argument = gimple_call_arg (stmt, 1);
> - if (TREE_CODE (size_argument) == SSA_NAME)
> -   mark_operand_necessary (size_argument);
> -   }
> +   for (unsigned i = 1; i < gimple_call_num_args (stmt); i++)
> + {
> +   tree arg = gimple_call_arg (stmt, i);
> +   if (TREE_CODE (arg) == SSA_NAME)
> + mark_operand_necessary (arg);
> + }
>
>   continue;
> }

Pre-approved.

> >
> > I tried to experiment to understand, but it is complicated because 
> > including  disables the optimization:
> >
> > #include 
> > void fn1() {
> > char*p=new char;
> > delete p;
> > }
> >
> > This ICEs with -O -std=c++17:
> >
> > int a = 64;
> > std::align_val_t b{64};
> > void fn1() {
> >   void *s = operator new(a,b);
> >   operator delete(s,8+*(unsigned long*)s,b);
> > }
> >
> >
>
> I can't see it on current master. Can you?
>
> Martin
>


[PATCH] Implement x86 reduc_plus_scal_v8qi (PR tree-optimization/91201)

2019-07-31 Thread Jakub Jelinek
Hi!

On Wed, Jul 31, 2019 at 10:51:22AM +0200, Uros Bizjak wrote:
> OK.

Thanks.  This follow-up implements the same for mmx with sse for V8QImode,
the testcase shows that it is useful too.  The difference is quite large:

-   movq$0, -72(%rsp)
-   movl$bytes, %eax
movqbytes(%rip), %xmm0
+   movl$bytes, %eax
+   pxor%xmm2, %xmm2
.p2align 4,,10
.p2align 3
 .L2:
movdqa  %xmm0, %xmm1
movq8(%rax), %xmm0
-   movq-72(%rsp), %xmm2
addq$8, %rax
paddb   %xmm0, %xmm1
paddb   %xmm0, %xmm2
movq%xmm1, -8(%rax)
-   movq%xmm2, -72(%rsp)
cmpq$bytes+1016, %rax
jne .L2
-   movq-72(%rsp), %rcx
-   movzbl  -72(%rsp), %eax
-   movzbl  %ch, %edx
-   addl%edx, %eax
-   movq%rcx, %rdx
-   shrq$16, %rdx
-   addl%edx, %eax
-   movq%rcx, %rdx
-   shrq$24, %rdx
-   addl%edx, %eax
-   movq%rcx, %rdx
-   shrq$32, %rdx
-   addl%edx, %eax
-   movq%rcx, %rdx
-   shrq$40, %rdx
-   addl%edx, %eax
-   movq%rcx, %rdx
-   shrq$48, %rdx
-   addl%eax, %edx
-   movq%rcx, %rax
-   shrq$56, %rax
-   addl%edx, %eax
+   pxor%xmm0, %xmm0
+   movdqa  %xmm2, %xmm3
+   psadbw  %xmm0, %xmm3
+   movq%xmm3, %rax

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-07-31  Jakub Jelinek  

PR tree-optimization/91201
* config/i386/mmx.md (reduc_plus_scal_v8qi): New expander.

* gcc.target/i386/sse2-pr91201-2.c: New test.

--- gcc/config/i386/mmx.md.jj   2019-07-20 08:35:05.720255567 +0200
+++ gcc/config/i386/mmx.md  2019-07-31 08:43:23.054776025 +0200
@@ -1897,6 +1897,21 @@ (define_insn "mmx_psadbw"
(set_attr "type" "mmxshft,sseiadd,sseiadd")
(set_attr "mode" "DI,TI,TI")])
 
+(define_expand "reduc_plus_scal_v8qi"
+ [(plus:V8QI
+(match_operand:QI 0 "register_operand")
+(match_operand:V8QI 1 "register_operand"))]
+ "TARGET_MMX_WITH_SSE"
+{
+  rtx tmp = gen_reg_rtx (V8QImode);
+  emit_move_insn (tmp, CONST0_RTX (V8QImode));
+  rtx tmp2 = gen_reg_rtx (V1DImode);
+  emit_insn (gen_mmx_psadbw (tmp2, operands[1], tmp));
+  tmp2 = gen_lowpart (V8QImode, tmp2);
+  emit_insn (gen_vec_extractv8qiqi (operands[0], tmp2, const0_rtx));
+  DONE;
+})
+
 (define_insn_and_split "mmx_pmovmskb"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
(unspec:SI [(match_operand:V8QI 1 "register_operand" "y,x")]
--- gcc/testsuite/gcc.target/i386/sse2-pr91201-2.c.jj   2019-07-31 
08:45:19.553086849 +0200
+++ gcc/testsuite/gcc.target/i386/sse2-pr91201-2.c  2019-07-31 
08:46:52.556738334 +0200
@@ -0,0 +1,21 @@
+/* PR tree-optimization/91201 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O3 -msse2 -mno-sse3" } */
+/* { dg-final { scan-assembler "\tpsadbw\t" } } */
+
+unsigned char bytes[1024];
+
+unsigned char
+sum (void)
+{
+  unsigned char r = 0;
+  unsigned char *p = (unsigned char *) bytes;
+  int n;
+
+  for (n = 8; n < sizeof (bytes); ++n)
+{
+  p[n - 8] += p[n];
+  r += p[n];
+}
+  return r;
+}


Jakub


[PATCH] Check -shared is available for pr87906's tesstcase

2019-07-31 Thread Kito Cheng
gcc/testsuite/ChangeLog:

* g++.dg/lto/pr87906_0.C: Add dg-require-effective-target shared check.
---
 gcc/testsuite/g++.dg/lto/pr87906_0.C | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/lto/pr87906_0.C 
b/gcc/testsuite/g++.dg/lto/pr87906_0.C
index 434d9fbe142..6a04cd5c6f0 100644
--- a/gcc/testsuite/g++.dg/lto/pr87906_0.C
+++ b/gcc/testsuite/g++.dg/lto/pr87906_0.C
@@ -1,5 +1,6 @@
 // { dg-lto-do link }
 // { dg-require-effective-target fpic }
+// { dg-require-effective-target shared }
 // { dg-lto-options { { -O -fPIC -flto } } }
 // { dg-extra-ld-options "-shared -nostdlib" }
 
-- 
2.17.1



Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Jan Hubicka
> On Wed, Jul 31, 2019 at 11:12:15AM +0200, Jan Hubicka wrote:
> > > so, if one is lucky enough and at least one of the two file descriptors in
> > > MAKEFLAGS --jobserver-auth= is closed, it will work fine, but if they are
> > > open, even when they have nothing to do with make, it will fail miserably.
> > 
> > I see, so the problem is that lto-wrapper is executed via plugin from
> > gold which already used the file descriptors for something else?
> 
> I don't think I'm using gold, I have it installed, but /usr/bin/ld points to
> ld.bfd.

Particular linker implementation probably does not make much pracitcal
difference.  It seems natural that linker opens couple files prior
asking plugin doing its job. 

Honza


Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Jakub Jelinek
On Wed, Jul 31, 2019 at 11:12:15AM +0200, Jan Hubicka wrote:
> > so, if one is lucky enough and at least one of the two file descriptors in
> > MAKEFLAGS --jobserver-auth= is closed, it will work fine, but if they are
> > open, even when they have nothing to do with make, it will fail miserably.
> 
> I see, so the problem is that lto-wrapper is executed via plugin from
> gold which already used the file descriptors for something else?

I don't think I'm using gold, I have it installed, but /usr/bin/ld points to
ld.bfd.

Jakub


Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Jan Hubicka
> and see that in some cases jobserver is 0 and in other cases jobserver is 1.
> Examples of both:
> 0 kw -j2 --jobserver-auth=3,6 -- 
> total 0
> lrwx--. 1 jakub jakub 64 Jul 31 10:41 0 -> /dev/pts/4
> l-wx--. 1 jakub jakub 64 Jul 31 10:41 1 -> pipe:[75026106]
> lr-x--. 1 jakub jakub 64 Jul 31 10:41 10 -> 
> /usr/src/gcc/obj/gcc/libgcc_s.so
> lr-x--. 1 jakub jakub 64 Jul 31 10:41 13 -> /usr/lib64/libc.so
> lr-x--. 1 jakub jakub 64 Jul 31 10:41 18 -> 
> /usr/src/gcc/obj/gcc/libgcc_s.so
> l-wx--. 1 jakub jakub 64 Jul 31 10:41 2 -> /tmp/ccX4y4r3.le
> l-wx--. 1 jakub jakub 64 Jul 31 10:41 9 -> pipe:[46167136]
> 1 kw -j2 --jobserver-auth=3,6 -- 
> total 0
> lrwx--. 1 jakub jakub 64 Jul 31 10:41 0 -> /dev/pts/4
> l-wx--. 1 jakub jakub 64 Jul 31 10:41 1 -> /tmp/cchSvmBt
> lr-x--. 1 jakub jakub 64 Jul 31 10:41 10 -> /usr/lib64/crtn.o
> lrwx--. 1 jakub jakub 64 Jul 31 10:41 2 -> /dev/pts/4
> lr-x--. 1 jakub jakub 64 Jul 31 10:41 3 -> /usr/lib64/crt1.o
> lr-x--. 1 jakub jakub 64 Jul 31 10:41 4 -> /usr/lib64/crti.o
> lr-x--. 1 jakub jakub 64 Jul 31 10:41 5 -> /usr/src/gcc/obj/gcc/crtbegin.o
> lr-x--. 1 jakub jakub 64 Jul 31 10:41 6 -> 
> /usr/src/gcc/obj/gcc/testsuite/gcc/c_lto_20081125_0.o
> lr-x--. 1 jakub jakub 64 Jul 31 10:41 7 -> 
> /usr/src/gcc/obj/gcc/testsuite/gcc/c_lto_20081125_1.o
> lr-x--. 1 jakub jakub 64 Jul 31 10:41 8 -> /usr/src/gcc/obj/gcc/crtend.o
> l-wx--. 1 jakub jakub 64 Jul 31 10:41 9 -> pipe:[46167136]
> 
> so, if one is lucky enough and at least one of the two file descriptors in
> MAKEFLAGS --jobserver-auth= is closed, it will work fine, but if they are
> open, even when they have nothing to do with make, it will fail miserably.

I see, so the problem is that lto-wrapper is executed via plugin from
gold which already used the file descriptors for something else?

This seems a bit of problem since gold opens it before we get to
lto-plugin (most probably) so i do not see how to reliably detect
presence of make server.

I think the gcc binary can look for jobserv environment, detect jobserv
and if it is not available remove jobserver option from it prior going
to ld.  I think this works since I think link-time gcc can only be
executed via gcc/g++/gfortran... wrappers.

(unlike lto plugin invoked implicitly for ar/nm etc).

Honza
> 
> As a partial workaround, I wonder if jobserver_active_p couldn't check (with
> fstat/fstat64) if the file descriptors are pipes.  It will still not be 100%
> reliable, as when you are unlucky the file descriptor could be some
> completely unrelated pipe, but at least better than the current state.
> 
>   Jakub


Re: [Arm][CMSE]Add warn_unused_return attribute to cmse functions

2019-07-31 Thread Kyrill Tkachov

Hi Joel,

On 7/17/19 12:19 PM, Joel Hutton wrote:

At present it is possible to call the CMSE functions for checking
addresses (such as cmse_check_address_range) and  forget to check/use
the return value. This patch makes the interfaces more robust against
programmer error by marking these functions with the warn_unused_result
attribute. With this set, any use of these functions that does not use
the result will produce a warning.

This produces a warning on default warn levels when the result of the
cmse functions is not used.

For the following function:
void foo()
{
 int *data;
 cmse_check_address_range((int*)data, 0, 0);
}
The following warning is emitted:
warning: ignoring return value of 'cmse_check_address_range' declared
with attribute 'warn_unused_result' [-Wunused-result]
 6 |  cmse_check_address_range((int*)data, 0, 0);
    |  ^~

gcc/ChangeLog:

2019-07-10  Joel Hutton  

 * config/arm/arm_cmse.h (cmse_nonsecure_caller): Add
warn_unused_result attribute.
 (cmse_check_address_range): Add warn_unused_result attribute.

libgcc/ChangeLog:

2019-07-10  Joel Hutton  

 * config/arm/cmse.c (cmse_check_address_range): Add
warn_unused_result attribute.

2019-07-10  Joel Hutton  

 * gcc.target/arm/cmse/cmse-17.c: New test.



Thanks for the patch. Approved and applied on your behalf as r273924.

For the future, it would help speed up review to CC the relevant 
maintainers on your patch submissions. You can find them in the 
MAINTAINERS file in the source tree.


Thanks,

Kyrill



Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Jakub Jelinek
On Wed, Jul 31, 2019 at 10:21:16AM +0200, Martin Liška wrote:
> On 7/31/19 10:08 AM, Jan Hubicka wrote:
> > Hi,
> > 
> >> We do not detect jobserver because of Dejagnu is not using it.
> >> And yes, we default to -flto= in LTO tests now. That's
> >> what is causing issues right now.
> > 
> > Why the error messages are 
> > make[4]: *** write jobserver: Bad file descriptor.  Stop.
> > make[4]: *** Waiting for unfinished jobs
> > make[4]: *** write jobserver: Bad file descriptor.  Stop.
> > It seems to me that it is internal make from lto-wrapper trying to get
> > jobserver access?
> 
> Hard to guess. Can you Jakub debug that? I don't see the error message.

I can easily reproduce even with
make -j2 -k check-gcc RUNTESTFLAGS=lto.exp=*20081125*

I've added a hack:
--- lto-wrapper.c.jj2019-07-31 09:46:39.361331843 +0200
+++ lto-wrapper.c   2019-07-31 10:40:57.970302110 +0200
@@ -1430,6 +1430,12 @@ run_gcc (unsigned argc, char *argv[])
 jobserver or number of cores.  */
   auto_parallel = 1;
   jobserver = jobserver_active_p ();
+FILE *f = fopen ("/tmp/111x", "a");
+fprintf (f, "%d %s\n", jobserver, getenv ("MAKEFLAGS"));
+fclose (f);
+char buf[1024];
+sprintf (buf, "ls -l /proc/%d/fd >> /tmp/111x", getpid ());
+system (buf);
 }
 
   if (linker_output)

and see that in some cases jobserver is 0 and in other cases jobserver is 1.
Examples of both:
0 kw -j2 --jobserver-auth=3,6 -- 
total 0
lrwx--. 1 jakub jakub 64 Jul 31 10:41 0 -> /dev/pts/4
l-wx--. 1 jakub jakub 64 Jul 31 10:41 1 -> pipe:[75026106]
lr-x--. 1 jakub jakub 64 Jul 31 10:41 10 -> /usr/src/gcc/obj/gcc/libgcc_s.so
lr-x--. 1 jakub jakub 64 Jul 31 10:41 13 -> /usr/lib64/libc.so
lr-x--. 1 jakub jakub 64 Jul 31 10:41 18 -> /usr/src/gcc/obj/gcc/libgcc_s.so
l-wx--. 1 jakub jakub 64 Jul 31 10:41 2 -> /tmp/ccX4y4r3.le
l-wx--. 1 jakub jakub 64 Jul 31 10:41 9 -> pipe:[46167136]
1 kw -j2 --jobserver-auth=3,6 -- 
total 0
lrwx--. 1 jakub jakub 64 Jul 31 10:41 0 -> /dev/pts/4
l-wx--. 1 jakub jakub 64 Jul 31 10:41 1 -> /tmp/cchSvmBt
lr-x--. 1 jakub jakub 64 Jul 31 10:41 10 -> /usr/lib64/crtn.o
lrwx--. 1 jakub jakub 64 Jul 31 10:41 2 -> /dev/pts/4
lr-x--. 1 jakub jakub 64 Jul 31 10:41 3 -> /usr/lib64/crt1.o
lr-x--. 1 jakub jakub 64 Jul 31 10:41 4 -> /usr/lib64/crti.o
lr-x--. 1 jakub jakub 64 Jul 31 10:41 5 -> /usr/src/gcc/obj/gcc/crtbegin.o
lr-x--. 1 jakub jakub 64 Jul 31 10:41 6 -> 
/usr/src/gcc/obj/gcc/testsuite/gcc/c_lto_20081125_0.o
lr-x--. 1 jakub jakub 64 Jul 31 10:41 7 -> 
/usr/src/gcc/obj/gcc/testsuite/gcc/c_lto_20081125_1.o
lr-x--. 1 jakub jakub 64 Jul 31 10:41 8 -> /usr/src/gcc/obj/gcc/crtend.o
l-wx--. 1 jakub jakub 64 Jul 31 10:41 9 -> pipe:[46167136]

so, if one is lucky enough and at least one of the two file descriptors in
MAKEFLAGS --jobserver-auth= is closed, it will work fine, but if they are
open, even when they have nothing to do with make, it will fail miserably.

As a partial workaround, I wonder if jobserver_active_p couldn't check (with
fstat/fstat64) if the file descriptors are pipes.  It will still not be 100%
reliable, as when you are unlucky the file descriptor could be some
completely unrelated pipe, but at least better than the current state.

Jakub


Re: [PATCH] Implement x86 reduc_plus_scal_v{16,32,64}qi (PR tree-optimization/91201)

2019-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2019 at 9:10 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As mentioned in the PR, we can use psadbw to shorten the final reductions to
> scalar for 8-bit elements.  E.g. for -mavx2 the difference is:
> -   vmovdqa %xmm1, %xmm0
> -   vextracti128$0x1, %ymm1, %xmm1
> -   vpaddb  %xmm1, %xmm0, %xmm0
> -   vpsrldq $8, %xmm0, %xmm1
> -   vpaddb  %xmm1, %xmm0, %xmm0
> -   vpsrldq $4, %xmm0, %xmm1
> -   vpaddb  %xmm1, %xmm0, %xmm0
> -   vpsrldq $2, %xmm0, %xmm1
> -   vpaddb  %xmm1, %xmm0, %xmm0
> -   vpsrldq $1, %xmm0, %xmm1
> -   vpaddb  %xmm1, %xmm0, %xmm0
> +   vextracti128$0x1, %ymm1, %xmm0
> +   vpaddb  %xmm1, %xmm0, %xmm1
> +   vpsrldq $8, %xmm1, %xmm0
> +   vpaddb  %xmm0, %xmm1, %xmm1
> +   vpxor   %xmm0, %xmm0, %xmm0
> +   vpsadbw %xmm0, %xmm1, %xmm0
> vpextrb $0, %xmm0, %eax
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-07-31  Jakub Jelinek  
>
> PR tree-optimization/91201
> * config/i386/sse.md (reduc_plus_scal_v16qi): New expander.
> (REDUC_PLUS_MODE): Add V32QImode for TARGET_AVX and V64QImode for
> TARGET_AVX512F.
> (reduc_plus_scal_): Improve formatting by introducing
> a temporary.
>
> * gcc.target/i386/sse2-pr91201.c: New test.
> * gcc.target/i386/avx2-pr91201.c: New test.
> * gcc.target/i386/avx512bw-pr91201.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2019-07-30 12:19:45.999490854 +0200
> +++ gcc/config/i386/sse.md  2019-07-30 12:19:55.379352735 +0200
> @@ -2728,9 +2728,30 @@ (define_expand "reduc_plus_scal_"
>DONE;
>  })
>
> +(define_expand "reduc_plus_scal_v16qi"
> + [(plus:V16QI
> +(match_operand:QI 0 "register_operand")
> +(match_operand:V16QI 1 "register_operand"))]
> + "TARGET_SSE2"
> +{
> +  rtx tmp = gen_reg_rtx (V1TImode);
> +  emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, operands[1]),
> +GEN_INT (64)));
> +  rtx tmp2 = gen_reg_rtx (V16QImode);
> +  emit_insn (gen_addv16qi3 (tmp2, operands[1], gen_lowpart (V16QImode, 
> tmp)));
> +  rtx tmp3 = gen_reg_rtx (V16QImode);
> +  emit_move_insn (tmp3, CONST0_RTX (V16QImode));
> +  rtx tmp4 = gen_reg_rtx (V2DImode);
> +  emit_insn (gen_sse2_psadbw (tmp4, tmp2, tmp3));
> +  tmp4 = gen_lowpart (V16QImode, tmp4);
> +  emit_insn (gen_vec_extractv16qiqi (operands[0], tmp4, const0_rtx));
> +  DONE;
> +})
> +
>  (define_mode_iterator REDUC_PLUS_MODE
>   [(V4DF "TARGET_AVX") (V8SF "TARGET_AVX")
> -  (V8DF "TARGET_AVX512F") (V16SF "TARGET_AVX512F")])
> +  (V8DF "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
> +  (V32QI "TARGET_AVX") (V64QI "TARGET_AVX512F")])
>
>  (define_expand "reduc_plus_scal_"
>   [(plus:REDUC_PLUS_MODE
> @@ -2741,8 +2762,8 @@ (define_expand "reduc_plus_scal_"
>rtx tmp = gen_reg_rtx (mode);
>emit_insn (gen_vec_extract_hi_ (tmp, operands[1]));
>rtx tmp2 = gen_reg_rtx (mode);
> -  emit_insn (gen_add3
> -(tmp2, tmp, gen_lowpart (mode, operands[1])));
> +  rtx tmp3 = gen_lowpart (mode, operands[1]);
> +  emit_insn (gen_add3 (tmp2, tmp, tmp3));
>emit_insn (gen_reduc_plus_scal_ (operands[0], tmp2));
>DONE;
>  })
> --- gcc/testsuite/gcc.target/i386/sse2-pr91201.c.jj 2019-07-30 
> 12:23:48.930913778 +0200
> +++ gcc/testsuite/gcc.target/i386/sse2-pr91201.c2019-07-30 
> 12:23:45.518964018 +0200
> @@ -0,0 +1,18 @@
> +/* PR tree-optimization/91201 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -msse2 -mno-sse3" } */
> +/* { dg-final { scan-assembler "\tpsadbw\t" } } */
> +
> +unsigned char bytes[1024];
> +
> +unsigned char
> +sum (void)
> +{
> +  unsigned char r = 0;
> +  unsigned char *p = (unsigned char *) bytes;
> +  int n;
> +
> +  for (n = 0; n < sizeof (bytes); ++n)
> +r += p[n];
> +  return r;
> +}
> --- gcc/testsuite/gcc.target/i386/avx2-pr91201.c.jj 2019-07-30 
> 12:24:05.199674228 +0200
> +++ gcc/testsuite/gcc.target/i386/avx2-pr91201.c2019-07-30 
> 12:24:34.544242142 +0200
> @@ -0,0 +1,6 @@
> +/* PR tree-optimization/91201 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx2 -mno-avx512f" } */
> +/* { dg-final { scan-assembler "\tvpsadbw\t" } } */
> +
> +#include "sse2-pr91201.c"
> --- gcc/testsuite/gcc.target/i386/avx512bw-pr91201.c.jj 2019-07-30 
> 12:24:50.079013395 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512bw-pr91201.c2019-07-30 
> 12:25:10.685709971 +0200
> @@ -0,0 +1,6 @@
> +/* PR tree-optimization/91201 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mavx512bw -mprefer-vector-width=512" } */
> +/* { dg-final { scan-assembler "\tvpsadbw\t" } } */
> +
> +#include "sse2-pr91201.c"
>
> Jakub


Re: [PATCH] i386: Roundeven expansion for SSE4.1+

2019-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2019 at 7:51 AM Tejas Joshi  wrote:
>
> Hi.
>
> > > * gcc.target/i386/avx-vround-roundeven-1.c: New test.
> > > * gcc.target/i386/avx-vround-roundeven-2.c: New test.
> >
> > roundss and roundsd are sse4_1 instructions, also please change tests
> > to use -O2:
>
> I have made the following changes you suggested and changed the file names to:
>
> * gcc.target/i386/sse4_1-round-roundeven-1.c: New test.
> * gcc.target/i386/sse4_1-round-roundeven-2.c: New test.

+++ b/gcc/testsuite/gcc.target/i386/sse4_1-round-roundeven-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mavx" } */
+/* { dg-options "-msse4.1" } */

-O2 -msse4.1

OK with the above change.

Thanks,
Uros.


[PATCH] Come up with json::integer_number and use it in GCOV.

2019-07-31 Thread Martin Liška
Hi.

As seen here:
https://github.com/RPGillespie6/fastcov/pull/18/files/f184dd8b6fc14e075dfc0ea93de7be5c96298ddb#r308735088

GCOV uses json::number for branch count, line count and similar. However, the 
json library
uses a double as an internal representation for numbers. That's no desirable in 
case
of GCOV and so that I would like to come up with integer_number type.
David would you be fine with that?

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-07-31  Martin Liska  

* diagnostic-format-json.cc (json_from_expanded_location):
Use json::float_number.
* gcov.c (output_intermediate_json_line): Use new
json::integer_number.
(output_json_intermediate_file): Likewise.
* json.cc (number::print): Move to ...
(float_number::print): ... this.
(integer_number::print): New.
(test_writing_numbers): Move to ...
(test_writing_float_numbers): ... this.
(test_writing_integer_numbers): New.
(json_cc_tests): Register test_writing_integer_numbers.
* json.h (class value): Add forward declaration
for float_number and integer_number.
(enum kind): Add JSON_INTEGER and JSON_FLOAT.
(class number): Move to ...
(class float_number): ... this.
(class integer_number): New.
* optinfo-emit-json.cc (optrecord_json_writer::impl_location_to_json):
Use json::float_number.
(optrecord_json_writer::location_to_json): Likewise.
(optrecord_json_writer::profile_count_to_json): Likewise.
(optrecord_json_writer::pass_to_json): Likewise.
---
 gcc/diagnostic-format-json.cc |  4 ++--
 gcc/gcov.c| 23 +++-
 gcc/json.cc   | 41 ---
 gcc/json.h| 35 --
 gcc/optinfo-emit-json.cc  | 10 -
 5 files changed, 81 insertions(+), 32 deletions(-)


diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc
index 53c3b630b1c..2802da8a0a6 100644
--- a/gcc/diagnostic-format-json.cc
+++ b/gcc/diagnostic-format-json.cc
@@ -48,8 +48,8 @@ json_from_expanded_location (location_t loc)
   json::object *result = new json::object ();
   if (exploc.file)
 result->set ("file", new json::string (exploc.file));
-  result->set ("line", new json::number (exploc.line));
-  result->set ("column", new json::number (exploc.column));
+  result->set ("line", new json::float_number (exploc.line));
+  result->set ("column", new json::float_number (exploc.column));
   return result;
 }
 
diff --git a/gcc/gcov.c b/gcc/gcov.c
index c65b7153765..ef006d125a2 100644
--- a/gcc/gcov.c
+++ b/gcc/gcov.c
@@ -1061,10 +1061,10 @@ output_intermediate_json_line (json::array *object,
 return;
 
   json::object *lineo = new json::object ();
-  lineo->set ("line_number", new json::number (line_num));
+  lineo->set ("line_number", new json::integer_number (line_num));
   if (function_name != NULL)
 lineo->set ("function_name", new json::string (function_name));
-  lineo->set ("count", new json::number (line->count));
+  lineo->set ("count", new json::integer_number (line->count));
   lineo->set ("unexecuted_block",
 	  new json::literal (line->has_unexecuted_block));
 
@@ -1079,7 +1079,7 @@ output_intermediate_json_line (json::array *object,
 	if (!(*it)->is_unconditional && !(*it)->is_call_non_return)
 	  {
 	json::object *branch = new json::object ();
-	branch->set ("count", new json::number ((*it)->count));
+	branch->set ("count", new json::integer_number ((*it)->count));
 	branch->set ("throw", new json::literal ((*it)->is_throw));
 	branch->set ("fallthrough",
 			 new json::literal ((*it)->fall_through));
@@ -1138,16 +1138,19 @@ output_json_intermediate_file (json::array *json_files, source_info *src)
   function->set ("name", new json::string ((*it)->m_name));
   function->set ("demangled_name",
 		 new json::string ((*it)->get_demangled_name ()));
-  function->set ("start_line", new json::number ((*it)->start_line));
-  function->set ("start_column", new json::number ((*it)->start_column));
-  function->set ("end_line", new json::number ((*it)->end_line));
-  function->set ("end_column", new json::number ((*it)->end_column));
+  function->set ("start_line",
+		 new json::integer_number ((*it)->start_line));
+  function->set ("start_column",
+		 new json::integer_number ((*it)->start_column));
+  function->set ("end_line", new json::integer_number ((*it)->end_line));
+  function->set ("end_column",
+		 new json::integer_number ((*it)->end_column));
   function->set ("blocks",
-		 new json::number ((*it)->get_block_count ()));
+		 new json::integer_number ((*it)->get_block_count ()));
   function->set ("blocks_executed",
-		 new json::number ((*it)->blocks_executed));

[PATCH] Mark necessary 2nd and later args for delete op.

2019-07-31 Thread Martin Liška
On 7/30/19 4:00 PM, Marc Glisse wrote:
> On Tue, 30 Jul 2019, Martin Liška wrote:
> 
>> Ah, that's bad, both of them need a care:
> 
> Yes, that makes more sense to me, thanks.
> 
>>> I tried to experiment to understand, but it is complicated because 
>>> including  disables the optimization:
>>>
>>> #include 
>>> void fn1() {
>>>     char*p=new char;
>>>     delete p;
>>> }
>>>
>>> This ICEs with -O -std=c++17:
>>>
>>> int a = 64;
>>> std::align_val_t b{64};
>>> void fn1() {
>>>   void *s = operator new(a,b);
>>>   operator delete(s,8+*(unsigned long*)s,b);
>>> }
>>>
>>>
>>
>> I can't see it on current master. Can you?
> 
> Yes. I just did a clean build to make sure.
> 

That's addressed in a patch I'm attaching.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From cd975aaccbbed3c03bbd246e0802c94693a0962d Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 30 Jul 2019 15:40:24 +0200
Subject: [PATCH] Mark necessary 2nd and later args for delete op.

gcc/ChangeLog:

2019-07-30  Martin Liska  

	* tree-ssa-dce.c (propagate_necessity): Delete operator can
	have size and (or) alignment as 2nd and later arguments.
	Mark all of them as necessary.
---
 gcc/tree-ssa-dce.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index bec13cd5930..80d5f5c30f7 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -824,13 +824,16 @@ propagate_necessity (bool aggressive)
 			   || DECL_FUNCTION_CODE (def_callee) == BUILT_IN_CALLOC))
 		  || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (def_callee)))
 		{
-		  /* Some delete operators have size as 2nd argument.  */
+		  /* Delete operators can have alignment and (or) size as next
+		 arguments.  When being a SSA_NAME, they must be marked
+		 as necessary.  */
 		  if (is_delete_operator && gimple_call_num_args (stmt) >= 2)
-		{
-		  tree size_argument = gimple_call_arg (stmt, 1);
-		  if (TREE_CODE (size_argument) == SSA_NAME)
-			mark_operand_necessary (size_argument);
-		}
+		for (unsigned i = 1; i < gimple_call_num_args (stmt); i++)
+		  {
+			tree arg = gimple_call_arg (stmt, i);
+			if (TREE_CODE (arg) == SSA_NAME)
+			  mark_operand_necessary (arg);
+		  }
 
 		  continue;
 		}
-- 
2.22.0



Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-31 Thread Richard Biener
On Tue, Jul 30, 2019 at 4:54 PM Andrew MacLeod  wrote:
>
> On 7/30/19 4:55 AM, Richard Biener wrote:
> > On Fri, Jul 26, 2019 at 4:32 PM Andrew MacLeod  wrote:
> >> On 7/25/19 11:37 PM, Jeff Law wrote:
> >>> On 7/24/19 12:33 PM, Richard Biener wrote:
>  On July 24, 2019 8:18:57 PM GMT+02:00, Jeff Law 
>  wrote:
> > On 7/24/19 11:00 AM, Richard Biener wrote: [ Big snip, ignore
> > missing reply attributions... ]
> >
> >>> it. But I'd claim that if callers are required not to change
> >>> these ranges, then the callers are fundamentally broken.  I'm
> >>> not sure what the "sanitization" is really buying you here.
> >>> Can you point to something specific?
> >>>
>  But you lose the sanitizing that nobody can change it and the
> changed info leaks to other SSA vars.
> 
>  As said, fix all callers to deal with NULL.
> 
>  But I argue the current code is exactly optimal and safe.
> >>> ANd I'd argue that it's just plain broken and that the
> >>> sanitization you're referring to points to something broken
> >>> elsewhere,  higher up in the callers.
> >> Another option is to make get_value_range return by value and
> >> the only way to change the lattice to call an appropriate set
> >> function. I think we already do the latter in all cases (but we
> >> use get_value_range in the setter) and returning by reference is
> >> just eliding the copy.
> > OK, so what I think you're getting at (and please correct me if
> > I'm wrong) is that once the lattice values are set, you don't want
> > something changing the recorded ranges underneath?
> >
> > ISTM the way to enforce that is to embed the concept in the class
> > and enforce it by not allowing direct manipulation of range by the
> > clients. So a client that wants this behavior somehow tells the
> > class that ranges are "set in stone" and from that point the
> > setters don't allow changing the underlying ranges.
>  Yes. You'll see that nearly all callers do
> 
>  Value_range vr = *get_value_range (name);
> 
>  Modify
> 
>  Update_value_range (name, ) ;
> 
>  And returning by reference was mostly an optimization. We _did_ have
>  callers Changing the range in place and the const varying catched
>  those.
> 
>  When returning by value we can return individual VARYINGs not in the
>  lattice if we decide that's what we want.
> 
> > I just want to make sure we're on the same page WRT why you think
> > the constant varying range object is useful.
>  As said it's an optimization. We do not want to reallocate the
>  lattice. And we want lattice updating to happen in a controlled
>  manner, so returning a pointer into the lattice is bad design at this
>  point.
> >>> But I would claim that the current state is dreadful.  Consider that
> >>> when gimple-fold asks for a new SSA_NAME, it could get a recycled one,
> >>> in which case we get a real range.  Or if it doens't get a recycled
> >>> name, then we get the const varying node.  The inconsistently is silly
> >>> when we can just reallocate the underlying object.
> >>>
> >>> Between recycling of SSA_NAMEs and allocating a bit of additional space
> >>> (say rounding up to some reasonable boundary) I'd bet you'd never be
> >>> able to measure the reallocation in practice.
> >>>
> >> I annotated the patch yesterday to actually log reallocations and ran a
> >> couple of bootstraps...
> >>
> >> If we don't add any extra space in the vector initially (as it is
> >> allocated today) , it is re-sized a total of 201 times.  Of those, 93
> >> get back the same pointer so no resize actually happens.
> >>
> >> IF we use the patch as I initially propose, where we add 10% to the
> >> vector to start, we re-size 10 times, all from either 18 to 25 , or 8 to
> >> 14,so very small numbers of ssaname functions, where the 10% doesnt
> >> really do much.  Those were all re-allocations but one.   The initial
> >> resize does seem to help prevent any larger reallocations,
> >>
> >> THat doesn't seem like that bad of a thing over all, and we could tweak
> >> the initial size a bit more if it would help? to deal with the small
> >> cases, we could make it num_names+10%+10 or something even.
> >>
> >> I feel it would be safer.. It seems to me the CONST solution cannot be
> >> disabled.. ie, even a non-checking production compiler would go boom if
> >> it triggered.
> >>
> >> As for addressing the issue that the CONST approach is trying to
> >> resolve, Could we change all the set/update routines to do something like
> >>
> >> gcc_checking_assert (new_range->varying_p ()  || !values_propagated);
> >>
> >> that way we'll trigger an assert if we try to change any value to
> >> something other than varying when values_propagated is set?
> > I think the constness is appropriately addressed by my recent API 

Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Martin Liška
On 7/31/19 10:08 AM, Jan Hubicka wrote:
> Hi,
> 
>> We do not detect jobserver because of Dejagnu is not using it.
>> And yes, we default to -flto= in LTO tests now. That's
>> what is causing issues right now.
> 
> Why the error messages are 
> make[4]: *** write jobserver: Bad file descriptor.  Stop.
> make[4]: *** Waiting for unfinished jobs
> make[4]: *** write jobserver: Bad file descriptor.  Stop.
> It seems to me that it is internal make from lto-wrapper trying to get
> jobserver access?

Hard to guess. Can you Jakub debug that? I don't see the error message.

> 
>> Works for me.
> 
> We probably also can give it a meaning controlling the default parameter
> of -flto.
> I.e. setenv GCC_LTO_PARALLELISM 
> which we will default to in case only -flto is passed to command line.

Interesting idea, I like it, but:
- if we detect that jobserver will not work, GCC_LTO_PARALLELISM=jobserver is 
useless
- auto equals now to 'unset GCC_LTO_PARALLELISM'

So the only interesting value is numthreads. Then I would recommend to use
GCC_LTO_MAX_AUTO_THREADS?

> 
> Make's jobserver is quite nice and easy to use. If it supported named
> pipe and there was a small library which allowed to initialize jobserver
> and connect to it, perhaps other tools needing parallelism during build
> (GCC, ninja, llvm, ...) would be able to use common protocol.

Can be a nice GSoC project for next year?

Martin

> 
> Honza
> 



Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Jan Hubicka
Hi,

> We do not detect jobserver because of Dejagnu is not using it.
> And yes, we default to -flto= in LTO tests now. That's
> what is causing issues right now.

Why the error messages are 
make[4]: *** write jobserver: Bad file descriptor.  Stop.
make[4]: *** Waiting for unfinished jobs
make[4]: *** write jobserver: Bad file descriptor.  Stop.
It seems to me that it is internal make from lto-wrapper trying to get
jobserver access?

> Works for me.

We probably also can give it a meaning controlling the default parameter
of -flto.
I.e. setenv GCC_LTO_PARALLELISM 
which we will default to in case only -flto is passed to command line.

Make's jobserver is quite nice and easy to use. If it supported named
pipe and there was a small library which allowed to initialize jobserver
and connect to it, perhaps other tools needing parallelism during build
(GCC, ninja, llvm, ...) would be able to use common protocol.

Honza


Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Jakub Jelinek
On Wed, Jul 31, 2019 at 09:50:47AM +0200, Martin Liška wrote:
> > Why doesn't the jobserver work in the tests?  Is that because of missing +
> > somewhere in the Makefiles or is something unsetting MFLAGS or MAKEFLAGS
> > env vars?
> 
> Is Dejagnu using make internally to invoke individual tests?

No.  But as I said, the same check-gcc causes many make goals running
simultaneously, as you can see in the testsuite/gcc{,1,2,3,...} directories,
where each runtest invoked by those goals uses
gcc_parallel_test_run_p and O_EXCL files to decide which of the runtests
will perform each test.

Jakub


[PATCH] PR91178, use tail-recursion for vn_reference_maybe_forwprop_address

2019-07-31 Thread Richard Biener


Otherwise a sequence of _2 = _1 + 4; _3 = _2 + 4; ... can blow the
stack.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-07-31  Richard Biener  

PR tree-optimization/91178
* tree-ssa-sccvn.c (vn_reference_maybe_forwprop_address):
Use tail-recursion.

* gcc.dg/torture/pr91178-2.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 273920)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -1249,113 +1250,123 @@ static bool
 vn_reference_maybe_forwprop_address (vec *ops,
 unsigned int *i_p)
 {
-  unsigned int i = *i_p;
-  vn_reference_op_t op = &(*ops)[i];
-  vn_reference_op_t mem_op = &(*ops)[i - 1];
-  gimple *def_stmt;
-  enum tree_code code;
-  poly_offset_int off;
-
-  def_stmt = SSA_NAME_DEF_STMT (op->op0);
-  if (!is_gimple_assign (def_stmt))
-return false;
-
-  code = gimple_assign_rhs_code (def_stmt);
-  if (code != ADDR_EXPR
-  && code != POINTER_PLUS_EXPR)
-return false;
-
-  off = poly_offset_int::from (wi::to_poly_wide (mem_op->op0), SIGNED);
+  bool changed = false;
+  vn_reference_op_t op;
 
-  /* The only thing we have to do is from  add the offset
- from .foo.bar to the preceding MEM_REF offset and replace the
- address with   */
-  if (code == ADDR_EXPR)
+  do
 {
-  tree addr, addr_base;
-  poly_int64 addr_offset;
+  unsigned int i = *i_p;
+  op = &(*ops)[i];
+  vn_reference_op_t mem_op = &(*ops)[i - 1];
+  gimple *def_stmt;
+  enum tree_code code;
+  poly_offset_int off;
+
+  def_stmt = SSA_NAME_DEF_STMT (op->op0);
+  if (!is_gimple_assign (def_stmt))
+   return changed;
+
+  code = gimple_assign_rhs_code (def_stmt);
+  if (code != ADDR_EXPR
+ && code != POINTER_PLUS_EXPR)
+   return changed;
+
+  off = poly_offset_int::from (wi::to_poly_wide (mem_op->op0), SIGNED);
+
+  /* The only thing we have to do is from  add the offset
+from .foo.bar to the preceding MEM_REF offset and replace the
+address with   */
+  if (code == ADDR_EXPR)
+   {
+ tree addr, addr_base;
+ poly_int64 addr_offset;
+
+ addr = gimple_assign_rhs1 (def_stmt);
+ addr_base = get_addr_base_and_unit_offset (TREE_OPERAND (addr, 0),
+_offset);
+ /* If that didn't work because the address isn't invariant propagate
+the reference tree from the address operation in case the current
+dereference isn't offsetted.  */
+ if (!addr_base
+ && *i_p == ops->length () - 1
+ && known_eq (off, 0)
+ /* This makes us disable this transform for PRE where the
+reference ops might be also used for code insertion which
+is invalid.  */
+ && default_vn_walk_kind == VN_WALKREWRITE)
+   {
+ auto_vec tem;
+ copy_reference_ops_from_ref (TREE_OPERAND (addr, 0), );
+ /* Make sure to preserve TBAA info.  The only objects not
+wrapped in MEM_REFs that can have their address taken are
+STRING_CSTs.  */
+ if (tem.length () >= 2
+ && tem[tem.length () - 2].opcode == MEM_REF)
+   {
+ vn_reference_op_t new_mem_op = [tem.length () - 2];
+ new_mem_op->op0
+ = wide_int_to_tree (TREE_TYPE (mem_op->op0),
+ wi::to_poly_wide (new_mem_op->op0));
+   }
+ else
+   gcc_assert (tem.last ().opcode == STRING_CST);
+ ops->pop ();
+ ops->pop ();
+ ops->safe_splice (tem);
+ --*i_p;
+ return true;
+   }
+ if (!addr_base
+ || TREE_CODE (addr_base) != MEM_REF
+ || (TREE_CODE (TREE_OPERAND (addr_base, 0)) == SSA_NAME
+ && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (TREE_OPERAND (addr_base,
+   0
+   return changed;
+
+ off += addr_offset;
+ off += mem_ref_offset (addr_base);
+ op->op0 = TREE_OPERAND (addr_base, 0);
+   }
+  else
+   {
+ tree ptr, ptroff;
+ ptr = gimple_assign_rhs1 (def_stmt);
+ ptroff = gimple_assign_rhs2 (def_stmt);
+ if (TREE_CODE (ptr) != SSA_NAME
+ || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (ptr)
+ /* Make sure to not endlessly recurse.
+See gcc.dg/tree-ssa/20040408-1.c for an example.  Can easily
+happen when we value-number a PHI to its backedge value.  */
+ || SSA_VAL (ptr) == op->op0
+ || !poly_int_tree_p (ptroff))
+   return changed;
 
-  addr = gimple_assign_rhs1 

[committed] Fix distribute parallel for with private class iterator (PR middle-end/91301)

2019-07-31 Thread Jakub Jelinek
Hi!

This undoes the splitting of private clauses to parallel instead of
innermost.  I should one day change the clause splitting to match exactly
what was agreed on in OpenMP 5.0, without the exceptions because we've done
something differently in the past, but I'm afraid it will be a lot of work.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
queued for backporting.

2019-07-31  Jakub Jelinek  

PR middle-end/91301
* gimplify.c (gimplify_omp_for): If for class iterator on
distribute parallel for there is no data sharing clause
on inner_for_stmt, look for private clause on combined
parallel too and if found, move it to inner_for_stmt.

* testsuite/libgomp.c++/for-27.C: New test.

--- gcc/gimplify.c.jj   2019-07-30 17:21:17.189694558 +0200
+++ gcc/gimplify.c  2019-07-30 18:56:54.342267818 +0200
@@ -10668,6 +10668,22 @@ gimplify_omp_for (tree *expr_p, gimple_s
  && OMP_CLAUSE_DECL (*pc) == orig_decl)
break;
if (*pc == NULL_TREE)
+ {
+   tree *spc;
+   for (spc = _PARALLEL_CLAUSES (*data[1]);
+*spc; spc = _CLAUSE_CHAIN (*spc))
+ if (OMP_CLAUSE_CODE (*spc) == OMP_CLAUSE_PRIVATE
+ && OMP_CLAUSE_DECL (*spc) == orig_decl)
+   break;
+   if (*spc)
+ {
+   tree c = *spc;
+   *spc = OMP_CLAUSE_CHAIN (c);
+   OMP_CLAUSE_CHAIN (c) = NULL_TREE;
+   *pc = c;
+ }
+ }
+   if (*pc == NULL_TREE)
  ;
else if (OMP_CLAUSE_CODE (*pc) == OMP_CLAUSE_PRIVATE)
  {
--- libgomp/testsuite/libgomp.c++/for-27.C.jj   2019-07-30 19:26:45.321254401 
+0200
+++ libgomp/testsuite/libgomp.c++/for-27.C  2019-07-30 19:25:27.455384720 
+0200
@@ -0,0 +1,169 @@
+// { dg-do run }
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+extern "C" void abort ();
+
+int a[2000];
+
+template 
+class I
+{
+public:
+  typedef ptrdiff_t difference_type;
+  I ();
+  ~I ();
+  I (T *);
+  I (const I &);
+  T  * ();
+  T *operator -> ();
+  T  [] (const difference_type &) const;
+  I  = (const I &);
+  I  ++ ();
+  I operator ++ (int);
+  I  -- ();
+  I operator -- (int);
+  I  += (const difference_type &);
+  I  -= (const difference_type &);
+  I operator + (const difference_type &) const;
+  I operator - (const difference_type &) const;
+  template  friend bool operator == (I &, I &);
+  template  friend bool operator == (const I &, const I &);
+  template  friend bool operator < (I &, I &);
+  template  friend bool operator < (const I &, const I &);
+  template  friend bool operator <= (I &, I &);
+  template  friend bool operator <= (const I &, const I &);
+  template  friend bool operator > (I &, I &);
+  template  friend bool operator > (const I &, const I &);
+  template  friend bool operator >= (I &, I &);
+  template  friend bool operator >= (const I &, const I &);
+  template  friend typename I::difference_type operator - (I 
&, I &);
+  template  friend typename I::difference_type operator - 
(const I &, const I &);
+  template  friend I operator + (typename I::difference_type 
, const I &);
+private:
+  T *p;
+};
+template  I::I () : p (0) {}
+template  I::~I () {}
+template  I::I (T *x) : p (x) {}
+template  I::I (const I ) : p (x.p) {}
+template  T ::operator * () { return *p; }
+template  T *I::operator -> () { return p; }
+template  T ::operator [] (const difference_type ) const { 
return p[x]; }
+template  I ::operator = (const I ) { p = x.p; return 
*this; }
+template  I ::operator ++ () { ++p; return *this; }
+template  I I::operator ++ (int) { return I (p++); }
+template  I ::operator -- () { --p; return *this; }
+template  I I::operator -- (int) { return I (p--); }
+template  I ::operator += (const difference_type ) { p 
+= x; return *this; }
+template  I ::operator -= (const difference_type ) { p 
-= x; return *this; }
+template  I I::operator + (const difference_type ) const { 
return I (p + x); }
+template  I I::operator - (const difference_type ) const { 
return I (p - x); }
+template  bool operator == (I , I ) { return x.p == y.p; 
}
+template  bool operator == (const I , const I ) { return 
x.p == y.p; }
+template  bool operator != (I , I ) { return !(x == y); }
+template  bool operator != (const I , const I ) { return 
!(x == y); }
+template  bool operator < (I , I ) { return x.p < y.p; }
+template  bool operator < (const I , const I ) { return 
x.p < y.p; }
+template  bool operator <= (I , I ) { return x.p <= y.p; 
}
+template  bool operator <= (const I , const I ) { return 
x.p <= y.p; }
+template  bool operator > (I , I ) { return x.p > y.p; }
+template  bool operator > (const I , const I ) { return 
x.p > y.p; }
+template  bool operator >= (I , I ) { return x.p >= y.p; 
}
+template  bool operator >= (const I , const I ) { return 
x.p >= y.p; }

Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Martin Liška
On 7/31/19 9:34 AM, Jakub Jelinek wrote:
> On Wed, Jul 31, 2019 at 09:20:40AM +0200, Martin Liška wrote:
>> One possible solution will be to adjust lto.exp:
>>
>>set LTO_OPTIONS [list \
>>{-O0 -flto -flto-partition=none -fuse-linker-plugin} \
>>{-O2 -flto -flto-partition=none -fuse-linker-plugin 
>> -fno-fat-lto-objects } \
>>{-O0 -flto -flto-partition=1to1 -fno-use-linker-plugin } \
>>{-O2 -flto -flto-partition=1to1 -fno-use-linker-plugin } \
>>
>> and replace all -flto with -flto=1. But still, many individual tests set 
>> -flto by themselves.
>> Another solution would be to disable the auto-detection with an environment 
>> variable:
>>
>> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
>> index 353187c6043..bb6fb2b53ff 100644
>> --- a/gcc/lto-wrapper.c
>> +++ b/gcc/lto-wrapper.c
>> @@ -1423,7 +1423,7 @@ run_gcc (unsigned argc, char *argv[])
>>auto_parallel = 0;
>>parallel = 0;
>>  }
>> -  else if (!jobserver && parallel)
>> +  else if (!jobserver && parallel && !getenv ("LTO_NO_AUTO_PARALLEL"))
>>  {
>>/* If there's no explicit usage of jobserver and
>>   parallel is enabled, then automatically detect
>> diff --git a/gcc/testsuite/lib/lto.exp b/gcc/testsuite/lib/lto.exp
>> index 25c934731df..e303894e0b0 100644
>> --- a/gcc/testsuite/lib/lto.exp
>> +++ b/gcc/testsuite/lib/lto.exp
>> @@ -209,6 +209,8 @@ proc lto_init { args } {
>>]
>>  }
>>  }
>> +
>> +setenv LTO_NO_AUTO_PARALLEL 1
>>  }
>>  
>>  #
>>
>> Can you Jakub test the patch or the s/-flto/-flto=1 solutions please?
> 
> Neither will work very well, we have thousands of -flto tests outside of
> lto.exp, e.g. dg-torture.exp (or libgomp and others) cycle through various
> options including -flto etc.
> 
> Some env var would be useful I guess, though shouldn't it have GCC in the
> name?

Works for me.

>  I mean, if we run into these fork-bomb problems in gcc, won't other
> projects run into those as well?

Right now, we build whole openSUSE distribution with -flto=N (N based on # of 
threads)
and we haven't seen issues related to fork-bomb.

> 
> Why doesn't the jobserver work in the tests?  Is that because of missing +
> somewhere in the Makefiles or is something unsetting MFLAGS or MAKEFLAGS
> env vars?

Is Dejagnu using make internally to invoke individual tests?

> 
> Though, as I said on IRC, I think we might run out of filedescriptors when
> using jobserver too, say if on 64 thread machine one does make -j64 -k check
> and each test simultaneously tries to create 64 partitions, that would be
> 4096 connections to the jobserver, right?
> 
>   Jakub
> 



Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Martin Liška
On 7/31/19 9:40 AM, Jan Hubicka wrote:
>> Neither will work very well, we have thousands of -flto tests outside of
>> lto.exp, e.g. dg-torture.exp (or libgomp and others) cycle through various
>> options including -flto etc.
>>
>> Some env var would be useful I guess, though shouldn't it have GCC in the
>> name?  I mean, if we run into these fork-bomb problems in gcc, won't other
>> projects run into those as well?
>>
>> Why doesn't the jobserver work in the tests?  Is that because of missing +
>> somewhere in the Makefiles or is something unsetting MFLAGS or MAKEFLAGS
>> env vars?
> 
> Main trouble with make's jobserver is that it works by 
> 1) defining environment variable saying which file descriptior to
> connect to
> 2) keeping the file descriptor open upon invoking "+" prefixed lines
> 
> Adding "+" to GCC invocation is wrong since it breaks dry run (we do not
> want to link at that time) but it is only way to access the jobserver.
> If "+" is not present, make will keep the environment vairable but will
> close file descriptors prior exec.
> 
> Make developers said that this is because some old prorams misbehave
> when you exec them with more than 3 file descriptors open. I tried to
> negotiate for named pipe which would solve this and it would make it
> easy to connect to outermost jobserver from anything invoked form
> toplevel make, but they was worried about systems w/o named pipes.

Yes a more generic approach would be welcome as other build systems
would also be able to utilize it.

> 
> I wonder why we do not detect jobserv as unavailable in this case and do
> not default to -flto=?

We do not detect jobserver because of Dejagnu is not using it.
And yes, we default to -flto= in LTO tests now. That's
what is causing issues right now.

> Is it because dejagnu machinery actually opens some other file
> descriptor that gets same ID and executes us with it?
> 
> Or does LTO wrapper open something prior accessing jobserver?
> 
>>
>> Though, as I said on IRC, I think we might run out of filedescriptors when
>> using jobserver too, say if on 64 thread machine one does make -j64 -k check
>> and each test simultaneously tries to create 64 partitions, that would be
>> 4096 connections to the jobserver, right?
> 
> Only WPA process connects to jobserver (which is 1 per linker
> invokation), so I think this should be safe.

Yes and it's only about a quick fcntl. But as mentioned, Dejagnu is not passing
us jobserver, so we don't do it right now.

Martin

> 
> Honza
>>
>>  Jakub



Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Jan Hubicka
> Neither will work very well, we have thousands of -flto tests outside of
> lto.exp, e.g. dg-torture.exp (or libgomp and others) cycle through various
> options including -flto etc.
> 
> Some env var would be useful I guess, though shouldn't it have GCC in the
> name?  I mean, if we run into these fork-bomb problems in gcc, won't other
> projects run into those as well?
> 
> Why doesn't the jobserver work in the tests?  Is that because of missing +
> somewhere in the Makefiles or is something unsetting MFLAGS or MAKEFLAGS
> env vars?

Main trouble with make's jobserver is that it works by 
1) defining environment variable saying which file descriptior to
connect to
2) keeping the file descriptor open upon invoking "+" prefixed lines

Adding "+" to GCC invocation is wrong since it breaks dry run (we do not
want to link at that time) but it is only way to access the jobserver.
If "+" is not present, make will keep the environment vairable but will
close file descriptors prior exec.

Make developers said that this is because some old prorams misbehave
when you exec them with more than 3 file descriptors open. I tried to
negotiate for named pipe which would solve this and it would make it
easy to connect to outermost jobserver from anything invoked form
toplevel make, but they was worried about systems w/o named pipes.

I wonder why we do not detect jobserv as unavailable in this case and do
not default to -flto=?
Is it because dejagnu machinery actually opens some other file
descriptor that gets same ID and executes us with it?

Or does LTO wrapper open something prior accessing jobserver?

> 
> Though, as I said on IRC, I think we might run out of filedescriptors when
> using jobserver too, say if on 64 thread machine one does make -j64 -k check
> and each test simultaneously tries to create 64 partitions, that would be
> 4096 connections to the jobserver, right?

Only WPA process connects to jobserver (which is 1 per linker
invokation), so I think this should be safe.

Honza
> 
>   Jakub


Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Jakub Jelinek
On Wed, Jul 31, 2019 at 09:20:40AM +0200, Martin Liška wrote:
> One possible solution will be to adjust lto.exp:
> 
> set LTO_OPTIONS [list \
> {-O0 -flto -flto-partition=none -fuse-linker-plugin} \
> {-O2 -flto -flto-partition=none -fuse-linker-plugin 
> -fno-fat-lto-objects } \
> {-O0 -flto -flto-partition=1to1 -fno-use-linker-plugin } \
> {-O2 -flto -flto-partition=1to1 -fno-use-linker-plugin } \
> 
> and replace all -flto with -flto=1. But still, many individual tests set 
> -flto by themselves.
> Another solution would be to disable the auto-detection with an environment 
> variable:
> 
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index 353187c6043..bb6fb2b53ff 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -1423,7 +1423,7 @@ run_gcc (unsigned argc, char *argv[])
>auto_parallel = 0;
>parallel = 0;
>  }
> -  else if (!jobserver && parallel)
> +  else if (!jobserver && parallel && !getenv ("LTO_NO_AUTO_PARALLEL"))
>  {
>/* If there's no explicit usage of jobserver and
>parallel is enabled, then automatically detect
> diff --git a/gcc/testsuite/lib/lto.exp b/gcc/testsuite/lib/lto.exp
> index 25c934731df..e303894e0b0 100644
> --- a/gcc/testsuite/lib/lto.exp
> +++ b/gcc/testsuite/lib/lto.exp
> @@ -209,6 +209,8 @@ proc lto_init { args } {
> ]
>   }
>  }
> +
> +setenv LTO_NO_AUTO_PARALLEL 1
>  }
>  
>  #
> 
> Can you Jakub test the patch or the s/-flto/-flto=1 solutions please?

Neither will work very well, we have thousands of -flto tests outside of
lto.exp, e.g. dg-torture.exp (or libgomp and others) cycle through various
options including -flto etc.

Some env var would be useful I guess, though shouldn't it have GCC in the
name?  I mean, if we run into these fork-bomb problems in gcc, won't other
projects run into those as well?

Why doesn't the jobserver work in the tests?  Is that because of missing +
somewhere in the Makefiles or is something unsetting MFLAGS or MAKEFLAGS
env vars?

Though, as I said on IRC, I think we might run out of filedescriptors when
using jobserver too, say if on 64 thread machine one does make -j64 -k check
and each test simultaneously tries to create 64 partitions, that would be
4096 connections to the jobserver, right?

Jakub


Re: [PATCH] Fix simd attribute handling on aarch64 (version 2)

2019-07-31 Thread Richard Sandiford
Steve Ellcey  writes:
> On Tue, 2019-07-30 at 14:31 +0100, Richard Sandiford wrote:
>> 
>> > -
>> > -  tree new_type = build_distinct_type_copy (TREE_TYPE (node-
>> > >decl));
>> > -  TYPE_ARG_TYPES (new_type) = new_reversed;
>> 
>> I think you still need this line, just applied to the existing type
>> rather than a new one.
>> 
>> > -  TREE_TYPE (node->decl) = new_type;
>> > -
>> >adjustments.release ();
>> >  }
>
> OK, I fixed that and retested with no regressions.
>
> Steve Ellcey
> sell...@marvell.com
>
>
> 2019-07-30  Steve Ellcey  
>
>   * omp-simd-clone.c (simd_clone_adjust_return_type): Remove call to
>   build_distinct_type_copy.
>   (simd_clone_adjust_argument_types): Ditto.
>   (simd_clone_adjust): Call build_distinct_type_copy here.
>   (expand_simd_clones): Ditto.
>
> 2019-07-30  Steve Ellcey  
>
>   * gcc.target/aarch64/simd_pcs_attribute.c: New test.
>   * gcc.target/aarch64/simd_pcs_attribute-2.c: Ditto.
>   * gcc.target/aarch64/simd_pcs_attribute-3.c: Ditto.

OK if there are no objections in 48 hours.

Thanks,
Richard


Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-07-31 Thread Feng Xue OS
Thanks for these comments.

Feng


From: Michael Matz 
Sent: Tuesday, July 30, 2019 1:59:04 AM
To: Feng Xue OS
Cc: Richard Biener; gcc-patches@gcc.gnu.org
Subject: Re: Ping agian: [PATCH V2] Loop split upon semi-invariant condition 
(PR tree-optimization/89134)

Hello Feng,

first, sorry for the terrible delay in reviewing, but here is one now :)

Generally I do like the idea of the transformation, and the basic building
blocks seem to be sound.  But I dislike it being a separate pass, so
please integrate the code you have written into the existing loop split
pass.  Most building blocks can be used as is, except the main driver.

The existing loop-split code uses loop->aux as binary marker and analyses
and transforms loops in one go, you're doing it separately.  That
separation makes sense for you, so the existing code should be changed to
also do that separately.  Some info for the existing loop-split analysis
needs to be stored in the info struct then as well, which can be done if
you add some fields in yours.  Some splitting-out of code from the
existing main driver is probably needed (basically the parts that
determine eligibility and which cond statement to split).

The two routines that actually split the loops (those that call
loop_version) also have an overlap, so maybe something more can be
commonized between the two (ultimately the way of splitting in both
variants is different, so somewhere they'll do something else, but still
some parts are common).

So, with these general remarks, some more concrete ones about the patch:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index eaef4cd63d2..0427fede3d6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11352,6 +11352,14 @@ The maximum number of branches unswitched in a 
> single loop.
>  @item lim-expensive
>  The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +The maximum number of insns to be increased due to loop split on
> +semi-invariant condition statement.

"to be increased" --> "to be generated" (or "added")

> +@item min-cond-loop-split-prob
> +The minimum threshold for probability of semi-invaraint condition
> +statement to trigger loop split.

typo, semi-invariant
I think somewhere in the docs your definition of semi-invariant needs
to be stated in some form (can be short, doesn't need to reproduce the
diagram or such), so don't just replicate the short info from the
params.def file.

> diff --git a/gcc/params.def b/gcc/params.def
> index 0db60951413..5384f7d1c4d 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>  "The maximum number of unswitchings in a single loop.",
>  3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> +   "max-cond-loop-split-insns",
> +   "The maximum number of insns to be increased due to loop split on "
> +   "semi-invariant condition statement.",
> +   100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> +   "min-cond-loop-split-prob",
> +   "The minimum threshold for probability of semi-invaraint condition "
> +   "statement to trigger loop split.",

Same typo: "semi-invariant".

> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 999c9a30366..7239d0cfb00 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kind of loop splitting.

kind_s_, plural

> +   One transformation of loops like:
>
> for (i = 0; i < 100; i++)
>   {
> @@ -670,6 +674,782 @@ tree_ssa_split_loops (void)
>return 0;
>  }
>
> +
> +/* Another transformation of loops like:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> + {
> +   if (expr (a_1, a_2, ..., a_n))
> + a_j = ...;  // change at least one a_j
> +   else
> + S;  // not change any a_j
> + }

You should mention that 'expr' needs to be pure, i.e. once it
becomes false and the inputs don't change, that it remains false.

> +
> +   into:
> +
> +   for (i = INIT (); CHECK (i); i = NEXT ())
> + {
> +   if (expr (a_1, a_2, ..., a_n))
> + a_j = ...;
> +   else
> + {
> +   S;
> +   i = NEXT ();
> +   break;
> + }
> + }
> +
> +   for (; CHECK (i); i = NEXT ())
> + {
> +   S;
> + }
> +
> +   */
> +
...
> +/* Determine when conditional statement never transfers execution to one of 
> its
> +   branch, whether we can remove the branch's leading basic block (BRANCH_BB)
> +   and those basic blocks dominated by BRANCH_BB.  */
> +
> +static bool
> +branch_removable_p (basic_block branch_bb)
> +{
> +  if (single_pred_p (branch_bb))
> +return true;
> +
> +  

Re: [PATCH] Deduce automatically number of cores for -flto option.

2019-07-31 Thread Martin Liška
On 7/31/19 1:32 AM, Jakub Jelinek wrote:
> This broke a lot of tests.

Whoops.

> The logs show
> spawn -ignore SIGHUP /home/jakub/src/gcc/obj31/gcc/xgcc 
> -B/home/jakub/src/gcc/obj31/gcc/ c_lto_pr83954_0.o c_lto_pr83954_1.o 
> -fno-diagnostics-show-
> caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -O2 -flto 
> -flto-partition=1to1 -o gcc-dg-lto-pr83954-31.exe
> make[4]: *** write jobserver: Bad file descriptor.  Stop.
> make[4]: *** Waiting for unfinished jobs
> make[4]: *** write jobserver: Bad file descriptor.  Stop.
> lto-wrapper: fatal error: make returned 2 exit status
> compilation terminated.
> collect2: fatal error: lto-wrapper returned 1 exit status
> compilation terminated.
> compiler exited with status 1
> FAIL: gcc.dg/lto/pr83954 c_lto_pr83954_0.o-c_lto_pr83954_1.o link, -O2 -flto 
> -flto-partition=1to1 
> and similar, e.g. for x86_64-linux it was following regressions, on
> i686-linux also some tests in libgomp etc.
> Is -flto now really using all available CPUs for each compilation?

Yes, I can confirm that linking happens in N processed for each LTO test now.
It's caused by fact that current Dejagnu machinery does not pass a make 
jobserver
to gcc command invocations. On the other hand, our LTO tests are so tiny that we
should have always very low number of partitions.

>  Without
> jobserver that would like a fork-bomb, say if I have a CPU with 32 threads
> and do make check -j32, does that mean there are 1024 lto1s?
> Judging from http://gcc.gnu.org/ml/gcc-testresults/2019-07/msg03610.html
> I'm not alone.

One possible solution will be to adjust lto.exp:

  set LTO_OPTIONS [list \
  {-O0 -flto -flto-partition=none -fuse-linker-plugin} \
  {-O2 -flto -flto-partition=none -fuse-linker-plugin 
-fno-fat-lto-objects } \
  {-O0 -flto -flto-partition=1to1 -fno-use-linker-plugin } \
  {-O2 -flto -flto-partition=1to1 -fno-use-linker-plugin } \

and replace all -flto with -flto=1. But still, many individual tests set -flto 
by themselves.
Another solution would be to disable the auto-detection with an environment 
variable:

diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 353187c6043..bb6fb2b53ff 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1423,7 +1423,7 @@ run_gcc (unsigned argc, char *argv[])
   auto_parallel = 0;
   parallel = 0;
 }
-  else if (!jobserver && parallel)
+  else if (!jobserver && parallel && !getenv ("LTO_NO_AUTO_PARALLEL"))
 {
   /* If there's no explicit usage of jobserver and
 parallel is enabled, then automatically detect
diff --git a/gcc/testsuite/lib/lto.exp b/gcc/testsuite/lib/lto.exp
index 25c934731df..e303894e0b0 100644
--- a/gcc/testsuite/lib/lto.exp
+++ b/gcc/testsuite/lib/lto.exp
@@ -209,6 +209,8 @@ proc lto_init { args } {
  ]
}
 }
+
+setenv LTO_NO_AUTO_PARALLEL 1
 }
 
 #

Can you Jakub test the patch or the s/-flto/-flto=1 solutions please?
Thanks,
Martin


[PATCH] Implement x86 reduc_plus_scal_v{16,32,64}qi (PR tree-optimization/91201)

2019-07-31 Thread Jakub Jelinek
Hi!

As mentioned in the PR, we can use psadbw to shorten the final reductions to
scalar for 8-bit elements.  E.g. for -mavx2 the difference is:
-   vmovdqa %xmm1, %xmm0
-   vextracti128$0x1, %ymm1, %xmm1
-   vpaddb  %xmm1, %xmm0, %xmm0
-   vpsrldq $8, %xmm0, %xmm1
-   vpaddb  %xmm1, %xmm0, %xmm0
-   vpsrldq $4, %xmm0, %xmm1
-   vpaddb  %xmm1, %xmm0, %xmm0
-   vpsrldq $2, %xmm0, %xmm1
-   vpaddb  %xmm1, %xmm0, %xmm0
-   vpsrldq $1, %xmm0, %xmm1
-   vpaddb  %xmm1, %xmm0, %xmm0
+   vextracti128$0x1, %ymm1, %xmm0
+   vpaddb  %xmm1, %xmm0, %xmm1
+   vpsrldq $8, %xmm1, %xmm0
+   vpaddb  %xmm0, %xmm1, %xmm1
+   vpxor   %xmm0, %xmm0, %xmm0
+   vpsadbw %xmm0, %xmm1, %xmm0
vpextrb $0, %xmm0, %eax
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-07-31  Jakub Jelinek  

PR tree-optimization/91201
* config/i386/sse.md (reduc_plus_scal_v16qi): New expander.
(REDUC_PLUS_MODE): Add V32QImode for TARGET_AVX and V64QImode for
TARGET_AVX512F.
(reduc_plus_scal_): Improve formatting by introducing
a temporary.

* gcc.target/i386/sse2-pr91201.c: New test.
* gcc.target/i386/avx2-pr91201.c: New test.
* gcc.target/i386/avx512bw-pr91201.c: New test.

--- gcc/config/i386/sse.md.jj   2019-07-30 12:19:45.999490854 +0200
+++ gcc/config/i386/sse.md  2019-07-30 12:19:55.379352735 +0200
@@ -2728,9 +2728,30 @@ (define_expand "reduc_plus_scal_"
   DONE;
 })
 
+(define_expand "reduc_plus_scal_v16qi"
+ [(plus:V16QI
+(match_operand:QI 0 "register_operand")
+(match_operand:V16QI 1 "register_operand"))]
+ "TARGET_SSE2"
+{
+  rtx tmp = gen_reg_rtx (V1TImode);
+  emit_insn (gen_sse2_lshrv1ti3 (tmp, gen_lowpart (V1TImode, operands[1]),
+GEN_INT (64)));
+  rtx tmp2 = gen_reg_rtx (V16QImode);
+  emit_insn (gen_addv16qi3 (tmp2, operands[1], gen_lowpart (V16QImode, tmp)));
+  rtx tmp3 = gen_reg_rtx (V16QImode);
+  emit_move_insn (tmp3, CONST0_RTX (V16QImode));
+  rtx tmp4 = gen_reg_rtx (V2DImode);
+  emit_insn (gen_sse2_psadbw (tmp4, tmp2, tmp3));
+  tmp4 = gen_lowpart (V16QImode, tmp4);
+  emit_insn (gen_vec_extractv16qiqi (operands[0], tmp4, const0_rtx));
+  DONE;
+})
+
 (define_mode_iterator REDUC_PLUS_MODE
  [(V4DF "TARGET_AVX") (V8SF "TARGET_AVX")
-  (V8DF "TARGET_AVX512F") (V16SF "TARGET_AVX512F")])
+  (V8DF "TARGET_AVX512F") (V16SF "TARGET_AVX512F")
+  (V32QI "TARGET_AVX") (V64QI "TARGET_AVX512F")])
 
 (define_expand "reduc_plus_scal_"
  [(plus:REDUC_PLUS_MODE
@@ -2741,8 +2762,8 @@ (define_expand "reduc_plus_scal_"
   rtx tmp = gen_reg_rtx (mode);
   emit_insn (gen_vec_extract_hi_ (tmp, operands[1]));
   rtx tmp2 = gen_reg_rtx (mode);
-  emit_insn (gen_add3
-(tmp2, tmp, gen_lowpart (mode, operands[1])));
+  rtx tmp3 = gen_lowpart (mode, operands[1]);
+  emit_insn (gen_add3 (tmp2, tmp, tmp3));
   emit_insn (gen_reduc_plus_scal_ (operands[0], tmp2));
   DONE;
 })
--- gcc/testsuite/gcc.target/i386/sse2-pr91201.c.jj 2019-07-30 
12:23:48.930913778 +0200
+++ gcc/testsuite/gcc.target/i386/sse2-pr91201.c2019-07-30 
12:23:45.518964018 +0200
@@ -0,0 +1,18 @@
+/* PR tree-optimization/91201 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -msse2 -mno-sse3" } */
+/* { dg-final { scan-assembler "\tpsadbw\t" } } */
+
+unsigned char bytes[1024];
+
+unsigned char
+sum (void)
+{
+  unsigned char r = 0;
+  unsigned char *p = (unsigned char *) bytes;
+  int n;
+
+  for (n = 0; n < sizeof (bytes); ++n)
+r += p[n];
+  return r;
+}
--- gcc/testsuite/gcc.target/i386/avx2-pr91201.c.jj 2019-07-30 
12:24:05.199674228 +0200
+++ gcc/testsuite/gcc.target/i386/avx2-pr91201.c2019-07-30 
12:24:34.544242142 +0200
@@ -0,0 +1,6 @@
+/* PR tree-optimization/91201 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -mno-avx512f" } */
+/* { dg-final { scan-assembler "\tvpsadbw\t" } } */
+
+#include "sse2-pr91201.c"
--- gcc/testsuite/gcc.target/i386/avx512bw-pr91201.c.jj 2019-07-30 
12:24:50.079013395 +0200
+++ gcc/testsuite/gcc.target/i386/avx512bw-pr91201.c2019-07-30 
12:25:10.685709971 +0200
@@ -0,0 +1,6 @@
+/* PR tree-optimization/91201 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512bw -mprefer-vector-width=512" } */
+/* { dg-final { scan-assembler "\tvpsadbw\t" } } */
+
+#include "sse2-pr91201.c"

Jakub