date:20170522

[Bug libgcc/80037] Bad .eh_frame data in crtend.o

2017-05-22 Thread stilor at att dot net

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80037

Alexey Neyman  changed:

   What|Removed |Added

 CC||stilor at att dot net

--- Comment #2 from Alexey Neyman  ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80848 is a duplicate of this bug.
I have confirmed that this patch solves that issue.

[Bug target/80725] [7/8 Regression] s390x ICE on alsa-lib

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80725

--- Comment #4 from Jakub Jelinek  ---
1), if it works, looks easiest to me.  3) if Vlad would prefer that way.

[Bug middle-end/80809] Multi-free error for variable size array used within OpenMP task

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80809

--- Comment #6 from Jakub Jelinek  ---
Fixed on the trunk so far.

[Bug target/80848] /crtend.o(.eh_frame); no .eh_frame_hdr table will be created

2017-05-22 Thread stilor at att dot net

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80848

--- Comment #2 from Alexey Neyman  ---
It seems to be a duplicate of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80037 that has a proposed patch.

[Bug target/80848] /crtend.o(.eh_frame); no .eh_frame_hdr table will be created

2017-05-22 Thread stilor at att dot net

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80848

Alexey Neyman  changed:

   What|Removed |Added

 CC||stilor at att dot net

--- Comment #1 from Alexey Neyman  ---
I have looked at this issue when the problem was reported in crosstool-NG
tracker [1]. The issue started with the commit 66035fd [2]; the link at [3]
explains in more detail the problem that commit tried to solve.

After that change, BFD trips over this check [4]:

 724  if (hdr_length == 0)
 725 {
 726   /* A zero-length CIE should only be found at the end of
 727  the section.  */
 728   REQUIRE ((bfd_size_type) (buf - ehbuf) == sec->size);
 729   ENSURE_NO_RELOCS (buf);
 730   sec_info->count++;
 731   break;
 732 }

The crtend.o generated by the build has the following in .eh_frame:


alphaev4-unknown-linux-gnu-readelf -wf
.build/alphaev4-unknown-linux-gnu/buildtools/lib/gcc/alphaev4-unknown-linux-gnu/4.8.4/crtend.o
Contents of the .eh_frame section:

 ZERO terminator


0004 0010  CIE
  Version:   1
  Augmentation:  "zR"
  Code alignment factor: 4
  Data alignment factor: -8
  Return address column: 26
  Augmentation data: 1b

  DW_CFA_def_cfa_register: r30
  DW_CFA_nop

0018 0024 0018 FDE cie=0004
pc=..0070
  DW_CFA_advance_loc: 20 to 0014
  DW_CFA_def_cfa_offset: 32
  DW_CFA_advance_loc: 12 to 0020
  DW_CFA_offset: r9 at cfa-24
  DW_CFA_advance_loc: 8 to 0028
  DW_CFA_offset: r10 at cfa-16
  DW_CFA_advance_loc: 8 to 0030
  DW_CFA_offset: r26 at cfa-32
  DW_CFA_advance_loc: 60 to 006c
  DW_CFA_restore: r10
  DW_CFA_restore: r9
  DW_CFA_restore: r26
  DW_CFA_def_cfa_offset: 0
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop


I.e., the zero terminator is indeed not the last record. GCC produces the
following assembly when compiling crtstuff.c into crtend.o (skipping irrelevant
parts):


.section.eh_frame,"a",@progbits
.align 2
.type   __FRAME_END__, @object
__FRAME_END__:
.zero   4
...
__do_global_ctors_aux:
$LFB9:
.cfi_startproc
ldah $29,0($27) !gpdisp!1
lda $29,0($29)  !gpdisp!1


That is, GCC generates both the explicit zero terminator as well as the CFI
instructions that make the assembler generate additional CIE/FDE records.
Adding -fno-asynchronous-unwind-tables and/or -fno-exceptions has no effect.
Adding -fno-dwarf2-cfi-insns makes GCC emit a second .eh_section fragment with
explicitly generated DWARF2 bytecode. In both cases, the result is an invalid
.eh_frame section with zero terminator record not being the last. In other
words, there is no way to prevent GCC from emitting any additional content into
the .eh_frame section, aside from the terminator.

- So, the question is, how should this be fixed?
- Was [2] a correct change to begin with?
- Should alpha (in addition to [2], or instead of [2]) implement a custom
"alpha_except_unwind_info" that will return UI_NONE if exceptions are disabled
(and then use that while compiling crtend.o)?
- Or if CFI is desired for __do_global_ctors_aux, perhaps compile this routine
separately into, say, crtend1.o; then compile the data-only section terminators
into crtend2.o; and then do a `ld -r` of both crtend[12].o into crtend.o?

[1] https://github.com/crosstool-ng/crosstool-ng/issues/719
[2]
https://github.com/gcc-mirror/gcc/commit/66035fd81f6fb8dff84e0c64d52ed041450fdebc
[3] https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01680.html
[4]
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf-eh-frame.c;h=52ba9c62138bb7d2c8901d961ba322dbfe23e220;hb=HEAD#l724

[Bug rtl-optimization/80474] ipa-cp wrongly adding LO(symbol) twice

2017-05-22 Thread jan.smets at nokia dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80474

--- Comment #7 from Jan Smets  ---
My apologies, somehow I dropped the -mno-abicalls along the road.

$ git status
HEAD detached at gcc-6_3_0-release

configure --target=mips64-linux-gnuabi64 --enable-languages=c (with
binutils/gmp/etc all alread present)

make all-gcc -jN

 ./gcc/xgcc --version
xgcc (GCC) 6.3.0

./gcc/xgcc -B ./gcc  -O2 -fno-reorder-blocks -march=mips2 
-fno-inline-small-functions -mabi=32 -c /tmp/test.c -S -dA -dP  -o -
-mno-abicalls

The output still matches the above output.

Thanks

[Bug c++/80858] When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong

2017-05-22 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858

--- Comment #4 from sgunderson at bigfoot dot com ---
I think this should work as reduction:

struct Empty {
};

template
struct A
{
A =(const A&) 
{
T t(3);
return *this;
}   
};

class B
{   
A a;
};

int main(void)
{
B b1, b2;
b1 = b2;
}


The error is attributed to the line with “class B”, without ever mentioning the
“b1 = b2;” line.

[Bug middle-end/66313] Unsafe factorization of ab+ac

2017-05-22 Thread babokin at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66313

--- Comment #15 from Dmitry Babokin  ---
The bug is almost 2 years old. I consider it's quite important, as false
positives make UBSAN not usable on any large codebases.

[Bug other/80803] libgo appears to be miscompiled on powerpc64le since r247923

2017-05-22 Thread ian at airs dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80803

Ian Lance Taylor  changed:

   What|Removed |Added

 CC||ian at airs dot com

--- Comment #5 from Ian Lance Taylor  ---
To reproduce:
make GOTESTFLAGS=--keep net/check

My apologies if I omitted the "/check" before.

Yes, you have identified the point in the libgo Makefile that produces this,
but the point is that the test program built by `make net/check` (and preserved
if you use `GOTESTFLAGS=--keep`) is somehow miscompiled.

[Bug target/80718] GCC generates slow code for offsettable vec_duplicate

2017-05-22 Thread meissner at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80718

--- Comment #3 from Michael Meissner  ---
Author: meissner
Date: Mon May 22 22:44:45 2017
New Revision: 248352

URL: https://gcc.gnu.org/viewcvs?rev=248352=gcc=rev
Log:
[gcc]
2017-05-22  Michael Meissner  

PR target/80718
* config/rs6000/vsx.md (vsx_splat_, VSX_D iterator): Split
V2DF/V2DI splat into two separate patterns, one that handles
registers, and the other that only handles memory.  Drop support
for splatting from a GPR on ISA 2.07 and then splitting the
splat into direct move and splat.
(vsx_splat__reg): Likewise.
(vsx_splat__mem): Likewise.

[gcc/testsuite]
2017-05-22  Michael Meissner  

PR target/80718
* gcc.target/powerpc/pr80718.c: New test.


Added:
trunk/gcc/testsuite/gcc.target/powerpc/pr80718.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/rs6000/vsx.md
trunk/gcc/testsuite/ChangeLog

[Bug target/80834] PowerPC gcc -mcpu=power9 seems to turn off vectorization that -mcpu=power8 enables

2017-05-22 Thread meissner at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80834

--- Comment #3 from Michael Meissner  ---
It looks like -fvect-cost-model=unlimited seems to re-enable vectorization with
-mcpu=power9.  It also enables some vectorizaton with -mcpu=power7.

[Bug target/80861] New: ARM (VFPv3): Inefficient float-to-char conversion goes through memory

2017-05-22 Thread gergo.barany at inria dot fr

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80861

Bug ID: 80861
   Summary: ARM (VFPv3): Inefficient float-to-char conversion goes
through memory
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gergo.barany at inria dot fr
  Target Milestone: ---

Created attachment 41407
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41407=edit
Input C file for triggering the bug

Consider the attached code:

$ cat tst.c
char fn1(float p1) {
  return (char) p1;
}

GCC from trunk from two weeks ago generates this code on ARM:

$ gcc tst.c -O3 -S -o -
.arch armv7-a
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file   "tst.c"
.text
.align  2
.global fn1
.syntax unified
.arm
.fpu vfpv3-d16
.type   fn1, %function
fn1:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvt.u32.f32s15, s0
sub sp, sp, #8
vstr.32 s15, [sp, #4]   @ int
ldrbr0, [sp, #4]@ zero_extendqisi2
add sp, sp, #8
@ sp needed
bx  lr
.size   fn1, .-fn1
.ident  "GCC: (GNU) 8.0.0 20170510 (experimental)"


Going through memory for the int-to-char truncation after the float-to-int
conversion (vcvt) is excessive. For comparison, this is the entire code
generated by Clang:

@ BB#0:
vcvt.u32.f32s0, s0
vmovr0, s0
bx  lr

And this is what CompCert produces for the core of the function (stack
manipulation code omitted):

vcvt.u32.f32 s12, s0
vmovr0, s12
and r0, r0, #255


My GCC version:

Target: armv7a-eabihf
Configured with: --target=armv7a-eabihf --with-arch=armv7-a
--with-fpu=vfpv3-d16 --with-float-abi=hard --with-float=hard
Thread model: single
gcc version 8.0.0 20170510 (experimental) (GCC)

[Bug bootstrap/80860] New: AIX Bootstrap failure

2017-05-22 Thread dje at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80860

Bug ID: 80860
   Summary: AIX Bootstrap failure
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dje at gcc dot gnu.org
  Target Milestone: ---
Target: powerpc-ibm-aix

/nasfarm/edelsohn/src/src/gcc/c-family/c-cppbuiltin.c: In function 'void
builtin_define_float_constants(const char*, const char*, const char*, const
char*, tree)':
/nasfarm/edelsohn/src/src/gcc/c-family/c-cppbuiltin.c:310:1: internal compiler
error: in maybe_record_trace_start, at dwarf2cfi.c:2330

Started Monday 22 May.

[Bug c++/80859] New: Performance Problems with OpenMP 4.5 support

2017-05-22 Thread thorstenkurth at me dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859

Bug ID: 80859
   Summary: Performance Problems with OpenMP 4.5 support
   Product: gcc
   Version: 6.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: thorstenkurth at me dot com
  Target Milestone: ---

Dear Sir/Madam,

I am working on the Cori HPC system, a Cray XC-40 with intel Xeon Phi 7250. I
probably found a performance "bug" when using the OpenMP 4.5 target directives.
It seems to me that the GNU compiler generates unnecessary move and push
functions when a 

#pragma omp target region is present but no offloading is used.

I have attached a test case to illustrate that problem. Please compile the
nested_test_omp_4dot5.x in the directory (don't be confused by the name, I am
not using nested OpenMP here). Then go into the corresponding .cpp file and
comment out the target-related directives (target teams and distribute),
compile again and then compare the assembly code. The code with the target
directives has more pushes and moves than the one without. I think I also place
the output of that process in the directory already, the files ending in .as.

The performance overhead is marginal here but I am currently working on a
Department of Energy performance portability project and I am exploring the
usefulness of OpenMP 4.5. The code we retargeting is a Geometric Multigrid in
the BoxLiv/AMReX framework and there the overhead is significant. I could
observe as much as 10x slowdown accumulated throughout the app. This code is
bigger and thus I do not want to demonstrate that here but I could send you an
invitation to the github repo if requested. In my opinion, if no offloading is
used, the compiler should just ignore the target region statements and just
default to plain OpenMP. 

Please let me know what you think.

Best Regards
Thorsten Kurth
National Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory

[Bug c++/80835] Reading a member of an atomic can load just that member, not the whole struct

2017-05-22 Thread peter at cordes dot ca

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80835

--- Comment #5 from Peter Cordes  ---
Previous godbolt link was supposed to be: https://godbolt.org/g/78kIAl
which includes the CAS functions.

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-22 Thread peter at cordes dot ca

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833

--- Comment #6 from Peter Cordes  ---
(In reply to Richard Biener from comment #5)
> There's some related bugs.  I think there is no part of the compiler that
> specifically tries to avoid store forwarding issues.

Ideally the compiler would keep track of how stores were done (or likely done
for code that's not visible), but without that:

For data coming from integer registers, a pure ALU strategy (with movd/q and
punpck or pinsrd/q) should be a win on all CPUs over narrow-store -> wide-load. 
 Except maybe in setup for long loops where small code / fewer uops is a win,
or if there's other work that hides the latency of a store-forwarding stall.

The other exception is maybe -mtune=atom without SSE4.  (But ALU isn't bad
there, so adding a special-case just for old in-order Atom might not make
sense.)

---

For data starting in memory, a simple heuristic might be: vector loads wider
than a single integer reg are good for arrays, or for anything other than
scalar locals / args on the stack that happen to be contiguous.

We need to make sure such a heuristic never results in auto-vectorizing with
movd/pinsrd loads from arrays, instead of movdqu.  However, it might be
appropriate to use a movd/pinsrd strategy for _mm_set_epi32, even if the data
happens to be contiguous in memory.  In that case, the programmer can use a
_mm_loadu_si128 (and a struct or array to ensure adjacency).

It's less clear what to do about int64_t in 32-bit mode, though, without a good
mechanism to track how it was recently written.  Always using movd/pinsrd for
locals / args is not horrible, but would suck for structs in memory if the
programmer is assuming that they'll get an efficient MOVQ/MOVHPS.


A function that takes a read-write int64_t *arg might often get called right
after the pointed-to data is written.  In 32-bit code, we need it in integer
registers to do anything but copy it.  If we're just copying it somewhere else,
hopefully a store-forwarding stall isn't part of the critical path.  I'm not
sure how long it takes for a store to complete, and no longer need to be
forwarded.  The store buffer can't commit stores to L1 until they retire (and
then it has to go in-order to preserve x86 memory ordering), so even passing a
pointer on the stack (store/reload with successful forwarding) probably isn't
nearly enough latency for a pair of stores in the caller to be actually
committed to L1.

A store-forwarding "stall" doesn't actually stall the whole pipeline, or even
unrelated memory ops, AFAIK.  My understanding is that it just adds latency to
the affected load while out-of-order execution continues as normal.  There may
be some throughput limitations on how many failed-store-forwarding loads can be
in flight at once: I think it works by scanning the store buffer for all
overlapping stores, if the last store that wrote any of the bytes isn't able to
use the forwarding fast-case (either because of sub-alignment restrictions or
only partial overlap).  It doesn't have to drain the store buffer, though.

Obviously every uarch can have its own quirks, but this seems the most likely
explanation for a latency penalty that's a constant number of cycles.

AFAIK, the store-forwarding stall penalty can't start until the load-address is
ready, since AFAIK no major x86 CPUs do address-prediction for loads.  So the 6
+ 10c latency for an SSE load on SnB with failed store-forwarding would be from
when the address becomes ready to when the value becomes ready.  I might be
mistaken, though.  Maybe it helps if the store executed several cycles before
the load-address was ready, so 32-bit code using a MOVQ xmm load on an int64_t*
won't suffer as badly if it got the address from a stack arg, and did some
other work before the load.

[Bug rtl-optimization/80474] ipa-cp wrongly adding LO(symbol) twice

2017-05-22 Thread ebotcazou at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80474

Eric Botcazou  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed|2017-04-21 00:00:00 |2017-05-22
 Ever confirmed|0   |1

--- Comment #6 from Eric Botcazou  ---
I cannot reproduce, please post the configure line of the 6.3.1 compiler and
make sure to use pristine FSF sources.

[Bug c++/80835] Reading a member of an atomic can load just that member, not the whole struct

2017-05-22 Thread peter at cordes dot ca

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80835

--- Comment #4 from Peter Cordes  ---
Thanks for correcting my mistake in tagging this bug, but this got me thinking
it's not just a C++ issue.

This also applies to GNU C __atomic_load_n(), and ISO C11 stdatomic code like

#include 
#include 
uint32_t load(atomic_uint_fast64_t *p) {  // https://godbolt.org/g/CXuiPO
  return *p;
}

With -m32, it's definitely useful to only do a 32-bit load.  With -m64, that
might let us fold the load as a memory operand, like `add (%rdi), %eax` or
something.

In 64-bit code, it just compiles to movq (%edi),%rax; ret.  But if we'd wanted
the high half, it would have been a load and shift instead of just a 32-bit
load.  Or shift+zero-extend if we'd wanted bytes 1 through 4 with (*p) >> 8. 
We know that a 64-bit load of *p doesn't cross a cache-line boundary (otherwise
it wouldn't be atomic), so neither will any load that's fully contained by it.  

This optimization should be disabled for `volatile`, if that's supposed to make
stdatomic usable for MMIO registers.

--

It's common to use a load as setup for a CAS.  For cmpxchg8/16b, it's most
efficient to use two separate loads as setup for cmpxchg8b.  gcc even does this
for us with code like *p |= 3, but we can't get that behaviour if we write a
CAS loop ourselves.  This makes the un-contended case slower, even if it's just
an xmm->int delay for an 8 byte load, not a function-call + CMPXCHG16B.


void cas_compiler(atomic_uint_fast64_t *p) {
  *p |= 3;  // separate 32-bit loads before loop
}
#gcc8 snapshot 20170522 -m32, and clang does the same
... intro, including including pushing 4 regs, two of which aren't used
:(
movl(%esi), %eax#* p, tmp90
movl4(%esi), %edx   #,
... (loop with MOV, OR, and CMPXCHG8B)


void cas_explicit(atomic_uint_fast64_t *p) {
  // AFAIK, it would be data-race UB to cast to regular uint64_t* for a
non-atomic load
  uint_fast64_t expect = atomic_load_explicit(p, memory_order_relaxed);
  _Bool done = 0;
  do {
uint_fast64_t desired = expect | 3;
done = atomic_compare_exchange_weak(p, , desired);
  } while (!done);
}
... similar setup, but also reserve some stack space
movq(%esi), %xmm0   #* p, tmp102
movq%xmm0, (%esp)   # tmp102,
movl(%esp), %eax#, expect
movl4(%esp), %edx   #, expect
...  (then the same loop)

I think it's legal to split an atomic load into two halves if the value can't
escape and is only feeding the old/new args of a CAS:

If the cmpxchg8/16b succeeds, that means the "expected" value was there in
memory.  Seeing it earlier than it was actually there because of tearing
between two previous values is indistinguishable from a possible memory
ordering for a full-width atomic load.  We could have got the same result if
everything in this thread had happened after the store that made the value seen
by cmpxchg8/16b globally visible.  So this also requires that there are no
other synchronization points between the load and the CAS.


For 16-byte objects in 64-bit code, this saves a CMPXCHG16B-load, so it's about
half the cost in the no-contention case.

For 8-byte objects with -m32, it's smaller, but probably not totally irrelevant
in the un-contended case.  An SSE2 load and xmm->int (via ALU or store/reload)
might be the same order of magnitude in cost as lock cmpxchg8b on AMD
Bulldozer.  As far the latency chain for operations on a single atomic
variable, lock cmpxchg8b has ~42 cycle latency on Piledriver (according to
Agner Fog), while an extra xmm->int has about 8c latency beyond directly
loading into integer regs.  (The throughput costs of lock cmpxchg8b are vastly
higher, though: 18 m-ops instead of 3 for movd/pextrd.)

[Bug middle-end/80809] Multi-free error for variable size array used within OpenMP task

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80809

--- Comment #5 from Jakub Jelinek  ---
Author: jakub
Date: Mon May 22 18:54:54 2017
New Revision: 248346

URL: https://gcc.gnu.org/viewcvs?rev=248346=gcc=rev
Log:
PR middle-end/80809
* omp-low.c (finish_taskreg_remap): New function.
(finish_taskreg_scan): If unit size of ctx->record_type
is non-constant, unshare the size expression and replace
decls in it with possible outer var refs.

* testsuite/libgomp.c/pr80809-2.c: New test.
* testsuite/libgomp.c/pr80809-3.c: New test.

Added:
trunk/libgomp/testsuite/libgomp.c/pr80809-2.c
trunk/libgomp/testsuite/libgomp.c/pr80809-3.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/omp-low.c
trunk/libgomp/ChangeLog

[Bug middle-end/80809] Multi-free error for variable size array used within OpenMP task

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80809

--- Comment #4 from Jakub Jelinek  ---
Author: jakub
Date: Mon May 22 18:54:05 2017
New Revision: 248345

URL: https://gcc.gnu.org/viewcvs?rev=248345=gcc=rev
Log:
PR middle-end/80809
* gimplify.c (omp_add_variable): For GOVD_DEBUG_PRIVATE use
GOVD_SHARED rather than GOVD_PRIVATE with it.
(gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses): Expect
GOVD_SHARED rather than GOVD_PRIVATE with GOVD_DEBUG_PRIVATE.

* testsuite/libgomp.c/pr80809-1.c: New test.

Added:
trunk/libgomp/testsuite/libgomp.c/pr80809-1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimplify.c
trunk/libgomp/ChangeLog

[Bug middle-end/80853] [6/7/8 Regression] OpenMP ICE in build_outer_var_ref with array reduction

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80853

--- Comment #2 from Jakub Jelinek  ---
Author: jakub
Date: Mon May 22 18:51:54 2017
New Revision: 248344

URL: https://gcc.gnu.org/viewcvs?rev=248344=gcc=rev
Log:
PR middle-end/80853
* omp-low.c (lower_reduction_clauses): Pass OMP_CLAUSE_PRIVATE
as last argument to build_outer_var_ref for pointer bases of array
section reductions.

* testsuite/libgomp.c/pr80853.c: New test.

Added:
trunk/libgomp/testsuite/libgomp.c/pr80853.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/omp-low.c
trunk/libgomp/ChangeLog

[Bug other/80803] libgo appears to be miscompiled on powerpc64le since r247923

2017-05-22 Thread wschmidt at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80803

--- Comment #4 from Bill Schmidt  ---
And gotest is just a bash script, so "something that it invokes" is the
problem...

[Bug other/80803] libgo appears to be miscompiled on powerpc64le since r247923

2017-05-22 Thread wschmidt at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80803

--- Comment #3 from Bill Schmidt  ---
Looks like this arises from this code in libgo/Makefile.in:

  if $(SHELL) $(srcdir)/testsuite/gotest --goarch=$(GOARCH)
--goos=$(GO\
OS) --basedir=$(srcdir) --srcdir=$(srcdir)/go/$(@D) --pkgpath="$(@D)"
--pkgfile\
s="$$files" $(GOTESTFLAGS) >>$@-testlog 2>&1; then \
echo "PASS: $(@D)" >> $@-testlog; \
echo "PASS: $(@D)"; \
echo "PASS: $(@D)" > $@-testsum; \
  else \
echo "FAIL: $(@D)" >> $@-testlog; \
cat $@-testlog; \
echo "FAIL: $(@D)" > $@-testsum; \
exit 1; \
  fi; \

gotest must be spewing garbage into net/check-testlog when $@=net/check, so
presumably gotest (or something it invokes) is miscompiled.  It must return
failure to invoke "cat $@-testlog;".

[Bug c++/80858] When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong

2017-05-22 Thread redi at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858

Jonathan Wakely  changed:

   What|Removed |Added

   Keywords||diagnostic
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-05-22
 Ever confirmed|0   |1

--- Comment #3 from Jonathan Wakely  ---
Confirmed. I'll try to reduce the example.

[Bug c++/80858] When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong

2017-05-22 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858

--- Comment #2 from sgunderson at bigfoot dot com ---
Yes, I mean that the error message isn't clear (and it's basically the same
error message in 4.8, so no regression).

I don't think I understand the difficulties involved. Doesn't the error come as
a direct result of my copying? If I do this with e.g. std::vector, I get a much
clearer error message, which directly points to the line in question:

[…]
/usr/include/c++/6/bits/vector.tcc:195:19:   required from ‘std::vector<_Tp,
_Alloc>& std::vector<_Tp, 
_Alloc>::operator=(const std::vector<_Tp, _Alloc>&) [with _Tp =
std::unique_ptr; _Alloc = std::all
ocator]’
test.cc:7:7:   required from here

My main gripe is that with unordered_map, the error traceback stops in the
internal details of _Hashtable:

/usr/include/c++/7/bits/unordered_map.h:101:11:   required from here

In the real-world case in question, I eventually had to go into unordered_map.h
 (yes, in /usr/include) and replace “= default;” with “= delete;” to figure out
who was calling the copy constructor.

[Bug other/70268] add option -ffile-prefix-map to map one directory name (old) to another (new) in FILE, __BASE_FILEand builtin_FILE()

2017-05-22 Thread tom.rini at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70268

Tom Rini  changed:

   What|Removed |Added

 CC||tom.rini at gmail dot com

--- Comment #8 from Tom Rini  ---
This would also be useful in cases where we care about the size of our outputs
but may have cases where file paths are included (critical debug type messages
for example).

[Bug fortran/80766] [7/8 Regression] [OOP] ICE with type-bound procedure returning an array

2017-05-22 Thread janus at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80766

--- Comment #11 from janus at gcc dot gnu.org ---
Author: janus
Date: Mon May 22 17:08:24 2017
New Revision: 248341

URL: https://gcc.gnu.org/viewcvs?rev=248341=gcc=rev
Log:
2017-05-22  Janus Weil  

PR fortran/80766
* resolve.c (resolve_fl_derived): Make sure that vtype symbols are
properly resolved.

2017-05-22  Janus Weil  

PR fortran/80766
* gfortran.dg/typebound_call_28.f90: New test.

Added:
trunk/gcc/testsuite/gfortran.dg/typebound_call_28.f90
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/resolve.c
trunk/gcc/testsuite/ChangeLog

[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-22 Thread tkoenig at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379

--- Comment #32 from Thomas Koenig  ---
Created attachment 41406
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41406=edit
Additional files for the previous patch

Here are the new files for the patch.

[Bug libfortran/78379] Processor-specific versions for matmul

2017-05-22 Thread tkoenig at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379

Thomas Koenig  changed:

   What|Removed |Added

  Attachment #40120|0   |1
is obsolete||
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |tkoenig at gcc dot 
gnu.org

--- Comment #31 from Thomas Koenig  ---
Created attachment 41405
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41405=edit
Patch for AMD

Here's a proposed patch for AMDs. This does AVX128 and FMA
when both are available, or AVX128 and FMA4, or nothing.

Rationale is that AVX128 alone does not do a lot for
AMD processors.

The new files will come as a separate attachment.

[Bug other/80803] libgo appears to be miscompiled on powerpc64le since r247923

2017-05-22 Thread wschmidt at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80803

--- Comment #2 from Bill Schmidt  ---
I've verified that this only happens with a bootstrapped compiler.  A one-pass
build does not produce the problem.  The output from "cat net/check-testlog"
for such a build  is:

PASS
PASS: net

Ian, I am not having luck using `make GOTESTFLAGS=--keep net` to reproduce.  I
guess that I need some flavor of `make check`, but so far my attempts haven't
managed to save the a.out file.  Any recommendations?  I am not sure what
command is supposed to build the a.out in question, so I'm at a bit of a
standstill.

[Bug c++/80858] When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong

2017-05-22 Thread redi at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858

--- Comment #1 from Jonathan Wakely  ---
(In reply to sgunderson from comment #0)
> Using gcc version 7.1.0 (Debian 7.1.0-5) (but the error goes back to at
> least 4.8, and amazingly, also in Clang), on this piece of code, simplified
> from a much bigger test case:

What error goes back to 4.8?

As you say, it's correct that the code doesn't compile. Do you mean the fact
that the error message isn't very clear, or something else?

In practice it's quite difficult to check the copyability (which would be
needed to produce a more helpful diagnostic) because it depends on the
allocator as well as the value_type.

[Bug c++/80835] Reading a member of an atomic can load just that member, not the whole struct

2017-05-22 Thread redi at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80835

Jonathan Wakely  changed:

   What|Removed |Added

  Component|libstdc++   |c++
   Severity|normal  |enhancement

[Bug c/80116] Warn about macros expanding to multiple statements

2017-05-22 Thread mpolacek at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80116

--- Comment #3 from Marek Polacek  ---
A testcase:

#define SWAP(x, y) \
  tmp = x; \
  x = y; \
  y = tmp

int x, y, tmp;

void
fn1 (void)
{
  if (x)
SWAP(x, y); // warn
}

void
fn2 (void)
{
  SWAP(x, y);
}

void
fn3 (void)
{
  if (x)
{
  SWAP(x, y);
}
}

void
fn4 (void)
{
  if (x)
x = 10;
  else
SWAP(x, y); // warn
}

[Bug rtl-optimization/79801] Disable ira.c:add_store_equivs for some targets?

2017-05-22 Thread pthaugen at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79801

Pat Haugen  changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #1 from Pat Haugen  ---
I ran a comparison on CPU2006. The only benchmark possibly outside the noise
range was 470.lbm with a 1.9% degradation.

[Bug target/80817] [missed optimization][x86] relaxed atomics

2017-05-22 Thread amonakov at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817

--- Comment #4 from Alexander Monakov  ---
On 32-bit x86 manipulating 64-bit integers, let alone atomically, is going to
be inconvenient. The emitted code could have been shorter, instead of


movl(%esp), %eax
movl4(%esp), %edx
addl$1, %eax
adcl$0, %edx
movl%eax, (%esp)
movl%edx, 4(%esp)

it would be sufficient to emit

addl$1, (%esp)
adcl$0, 4(%esp)

(it seems stack slots holding the loaded value have been made volatile,
wrongly?), and with -msse2 it could have used SSE load/add/store, but that
needs enhancements in the STV pass I guess.

[Bug libstdc++/80835] Reading a member of an atomic can load just that member, not the whole struct

2017-05-22 Thread peter at cordes dot ca

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80835

--- Comment #3 from Peter Cordes  ---
(In reply to Jonathan Wakely from comment #2)
> You've reported this against libstdc++

I had to take a guess at the right component, based on a couple other
std::atomic bugs I looked at.  Apparently I picked wrong, if libstdc++ really
is just the headers and not compiler internals at all.  Can someone please mark
the appropriate component?  I mostly just look at compiler asm output, not the
real internals.

[Bug testsuite/80759] gcc.target/x86_64/abi/ms-sysv FAILs

2017-05-22 Thread ro at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80759

--- Comment #15 from Rainer Orth  ---
Created attachment 41404
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41404=edit
Switch ms-sysv to more regular dg functions

[Bug testsuite/80759] gcc.target/x86_64/abi/ms-sysv FAILs

2017-05-22 Thread ro at CeBiTec dot Uni-Bielefeld.DE

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80759

--- Comment #14 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #9 from Daniel Santos  ---
[...]
>> sure, though there's no need at all (except for the .struct part) to do
>> the testing on Solaris.  I believe there are ready-made Solaris/x86
>> VirtualBox images, though.
>
> I've found a few, so going to try them out when I get some time.  Oracle even
> has something on their downloads.  I haven't used Solaris since the early
> aughts.

As I said, you can try that if you're really motivated, but the problems
at hand are mostly not Solaris-related at all.

>> For the multilib problem, you can easily
>> configure gcc for i686-pc-linux-gnu with --enable-targets=all on a
>> Linux/x86_64 box (with a few necessary 32-bit development packages
>> added), so the default multilib is non-x86_64, while the x86_64 multilib
>> is only used with -m64.
>
> Hmm, I seem to be having problems getting this to work.  Would I configure 
> with
> --target=i686-pc-linux-gnu --enable-targets=all --enable-multilib?

You need to make certain to have the necessary 32-bit libraries and
headers.  Apart from that, configure --target=i686-pc-linux-gnu
--enable-targets=all should be enough, together with CC='gcc -m32'
CXX='g++ -m32'.  I don't pass --enable-multilib, this happens by default.

>> However, I still don't understand why you are jumping through all these
>> hoops in ms-sysv.exp doing the compilations etc. manually rather than
>> just relying on dg-runtest or similar.  This would avoid all this
>> multilib trouble nicely, and massivly reduce ms-sysv.exp.
>
> Well quite frankly because dg-runtest, et. al. don't offer support for tests
> that use code generators.  The generated headers using the default options are

Why would they need to?  You just generate the headers in advance and
than invoke dg-runtest to compile and run the ms-sysv.c test proper.

> between 4.4 and 6 MiB in size and there are more things that need to be tested
> (-fsplit-stack to name one) that isn't tested now.  I would also like to add a
> feature where defining an environment variable generates more comprehensive
> tests that I wouldn't want to run for every test (as it could take hours with
> --enable-checking=all,rtl).

I'd strongly suggest only invoking the basic tests during a regular
testsuite run and control additional test modes with an environment
variable as you suggest.

> The most behaviorally similar test currently in the tree is
> gcc/testsuite/gcc.dg/compat/struct-layout-1.exp, which builds a generator
> (using remote_exec), runs the generator (remote_exec again) to generate 
> sources
> for all tests and then builds and runs each test using (using 
> compat-execute). 

Right, but the test executions proper are done with
${tool}_target_compile, which also underly dg-test/dg-runtest.

> Calls to remote_exec are not automatically parallelized.  I don't fully
> understand how the gcc/testsuite/lib/compat.exp library works, but I'm 
> guessing
> that calls to compat-execute are parallelized by dejagnu.
>
> The scheme that struct-layout-1 uses builds the generator and creates sources
> for all of the tests in job directory (i.e.,
> gcc/testsuite/gcc{,1,2,3,4,5,6,etc.}/gcc.dg-struct-layout-1).  They take up
> 1.21 MiB per job, so -j48 results in 58 MiB of space usage.  My generator and
> generated sources are larger, and currently take about 11.65 MiB per job, so
> -j48 would eat 559 MiB of disk space, even though there are only 6 tests at 
> the
> moment.  This could be mitigated if there was a way to build and run the
> generator only once and have the output go to a directory shared across jobs,
> but I'm not yet aware of any such existing mechanism.

I can't help but get the the feeling that your doing very much premature
optimization here.  If I run the test sequentially on an Opteron 8435,
it takes less than 3 hours.  Investing very much work and complexity to
parallelize this (starting with invoking the generator only once instead
of once per parallel execution, which saves ca. 9 seconds each) doesn't
seem a good tradeoff to me.

> So if you have some better ideas on how to accomplish this then please do
> present them.  Or maybe I'm misunderstanding something about the way
> dg-runtest, gcc_target_compile, etc. work in relation to parallelism?  My
> understanding is that if I use them in succession for a single test run (i.e.,
> build the generator, run the generator, build & run the test) that they could
> end up being run on different jobs and then fail.

I'd forget about the parallelism for the moment (unless I'm missing
something) and get the basics working sequentially.  The attached patch
(on top of your last one) switches the whole thing to dg-runtest.  This
works sequentially, apart from the complilation failure on Solaris I
reported last. I haven't tried what happens if you try it in parallel,
but if it causes problems, parallelism can easily be disabled with

[Bug c++/80858] New: When trying to copy std::unordered_map illegally, error message doesn't tell what's wrong

2017-05-22 Thread sgunderson at bigfoot dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80858

Bug ID: 80858
   Summary: When trying to copy std::unordered_map illegally,
error message doesn't tell what's wrong
   Product: gcc
   Version: 7.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sgunderson at bigfoot dot com
  Target Milestone: ---

Using gcc version 7.1.0 (Debian 7.1.0-5) (but the error goes back to at least
4.8, and amazingly, also in Clang), on this piece of code, simplified from a
much bigger test case:

#include 
#include 

int main(void)
{
  std::unordered_map a, b;
  a = b;
}

The code is wrong, and GCC correctly rejects it, but the error message is less
than helpful, since it doesn't mention the line with the assignment on, or
really anything hinting at who asked the copy constructor to be invoked:

$ g++-7 -c test.cc
In file included from
/usr/include/x86_64-linux-gnu/c++/7/bits/c++allocator.h:33:0,
 from /usr/include/c++/7/bits/allocator.h:46,
 from /usr/include/c++/7/memory:63,
 from test.cc:1:
/usr/include/c++/7/ext/new_allocator.h: In instantiation of ‘void
__gnu_cxx::new_allocator<_Tp>::construct(_Up*, _Args&& ...) [with _Up =
std::pair >; _Args = {const std::pair > >&}; _Tp = std::pair >]’:
/usr/include/c++/7/bits/alloc_traits.h:475:4:   required from ‘static void
std::allocator_traits
>::construct(std::allocator_traits >::allocator_type&,
_Up*, _Args&& ...) [with _Up = std::pair >;
_Args = {const std::pair > >&}; _Tp = std::pair
>; std::allocator_traits >::allocator_type =
std::allocator >]’
/usr/include/c++/7/bits/hashtable_policy.h:2066:37:   required from
‘std::__detail::_Hashtable_alloc<_NodeAlloc>::__node_type*
std::__detail::_Hashtable_alloc<_NodeAlloc>::_M_allocate_node(_Args&& ...)
[with _Args = {const std::pair > >&}; _NodeAlloc =
std::allocator, false> >;
std::__detail::_Hashtable_alloc<_NodeAlloc>::__node_type =
std::__detail::_Hash_node, false>]’
/usr/include/c++/7/bits/hashtable.h:1023:54:   required from
‘std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash,
_RehashPolicy, _Traits>::operator=(const std::_Hashtable<_Key, _Value, _Alloc,
_ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>&):: [with _Key = int; _Value = std::pair >; _Alloc = std::allocator >; _ExtractKey = std::__detail::_Select1st; _Equal =
std::equal_to; _H1 = std::hash; _H2 =
std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash;
_RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits =
std::__detail::_Hashtable_traits; std::_Hashtable<_Key,
_Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy,
_Traits>::__node_type = std::__detail::_Hash_node, false>; typename _Traits::__hash_cached =
std::integral_constant]’
/usr/include/c++/7/bits/hashtable.h:1022:9:   required from ‘struct
std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash,
_RehashPolicy, _Traits>::operator=(const std::_Hashtable<_Key, _Value, _Alloc,
_ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>&) [with _Key =
int; _Value = std::pair >; _Alloc =
std::allocator >; _ExtractKey =
std::__detail::_Select1st; _Equal = std::equal_to; _H1 = std::hash;
_H2 = std::__detail::_Mod_range_hashing; _Hash =
std::__detail::_Default_ranged_hash; _RehashPolicy =
std::__detail::_Prime_rehash_policy; _Traits =
std::__detail::_Hashtable_traits]::’
/usr/include/c++/7/bits/hashtable.h:1021:14:   required from
‘std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash,
_RehashPolicy, _Traits>& std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey,
_Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::operator=(const
std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash,
_RehashPolicy, _Traits>&) [with _Key = int; _Value = std::pair >; _Alloc = std::allocator >; _ExtractKey = std::__detail::_Select1st; _Equal =
std::equal_to; _H1 = std::hash; _H2 =
std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash;
_RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits =
std::__detail::_Hashtable_traits]’
/usr/include/c++/7/bits/unordered_map.h:101:11:   required from here
/usr/include/c++/7/ext/new_allocator.h:136:4: error: use of deleted function
‘std::pair<_T1, _T2>::pair(const std::pair<_T1, _T2>&) [with _T1 = const int;
_T2 = std::unique_ptr]’
  { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
^~
In file included from

[Bug c++/80856] [7/8 Regression] ICE from template local overload resolution

2017-05-22 Thread trippels at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80856

Markus Trippelsdorf  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #1 from Markus Trippelsdorf  ---
Started with r236221:

commit eee80116a8d0e277930da56aaf57a3d7e880bc86
Author: jason 
Date:   Fri May 13 19:18:35 2016 +

Fix type-dependence and the current instantiation.

PR c++/10200
PR c++/69753

[Bug c++/80857] New: slow compare_exchange_weak with unintegral type

2017-05-22 Thread sv_91 at inbox dot ru

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80857

Bug ID: 80857
   Summary: slow compare_exchange_weak with unintegral type
   Product: gcc
   Version: 7.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sv_91 at inbox dot ru
  Target Milestone: ---

Created attachment 41403
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41403=edit
example

Compared to gcc 6.2.0, function func2 work slower:

gcc 6.2.0
Result 499500. Time 110
Result 499500. Time 110
gcc 7.1.0
Result 499500. Time 98
Result 499500. Time 154

Build options:
-m64 -Wextra -Wall -Werror -Wpedantic -Wformat-security -fno-builtin-malloc
-fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -pthread -std=c++17
-DNDEBUG -Ofast -funroll-loops -fomit-frame-pointer -Wno-misleading-indentation
-g -mfpmath=sse


Function func2 call function

template
inline void atomic_fetch_add(std::atomic , const T& arg) noexcept {
T current = obj;
while (!obj.compare_exchange_weak(current, current + arg));
}

where T == std::chrono::milliseconds

[Bug target/80725] [7/8 Regression] s390x ICE on alsa-lib

2017-05-22 Thread krebbel at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80725

--- Comment #3 from Andreas Krebbel  ---
process_address_1 does not do any address reloading because it only checks the
first constraint letter whether it is an extra address constraint or not. In
our case ("a,ZR") it is a register constraint. Hence no reloading to make the
address valid.

Before adding ZR the reload was issued as usual for register constraints and
not as part of address reloading. With adding the broken ZR constraint there
was a constraint which magically matched also the FPR hence no reload is being
generated anymore.

LRA needs a way to recognize what is supposed to be an address to trigger the
address reloading. Reload as well as LRA rely on the first constraint letter to
be an address constraint for that.

I see 3 potential fixes:

1. Since ZR is an extra address constraint swapping the two alternatives in the
pattern makes the problem disappear.

2. constraint 'a' needs to become an extra address constraint as well.

3. LRA/reload could check if *any* of the constraints is an extra address
constraint .

If it isn't 3 we probably should document that the address constraints must
come first. Or perhaps that all the constraints for addresses need to be
address constraints.

[Bug c++/80856] [7/8 Regression] ICE from template local overload resolution

2017-05-22 Thread trippels at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80856

Markus Trippelsdorf  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-05-22
 CC||trippels at gcc dot gnu.org
Summary|ICE from template local |[7/8 Regression] ICE from
   |overload resolution |template local overload
   ||resolution
 Ever confirmed|0   |1

[Bug testsuite/80759] gcc.target/x86_64/abi/ms-sysv FAILs

2017-05-22 Thread ro at CeBiTec dot Uni-Bielefeld.DE

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80759

--- Comment #13 from ro at CeBiTec dot Uni-Bielefeld.DE  ---
> --- Comment #10 from Daniel Santos  ---
[...]
> Anyway, if you can test it again for me and let me know what you think I would
> appreciate it.  I've got some other code formatting changes I want to send 
> with
> it, but I separated it out from this patch to simplify reading.  I'll post the
> second patch anyway though.

Just a quick note with first test results, more later:

* With the patch, a sequential test works on i686-pc-linux-gnu (both
  multilibs).

* On i386-pc-solaris2.12 with /bin/as, do-test.S fails to assemble:

spawn /var/gcc/regression/trunk/12-gcc/build/gcc/xgcc
-B/var/gcc/regression/trunk/12-gcc/build/gcc/
-I/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv
-I/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv -m64
-fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -Wall -c -o
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/do-test.o
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S
Assembler: 
"/var/tmp//ccTucmac.s", line 3 : Multiply defined label: "regs_to_mem"

.global regs_to_mem; .type regs_to_mem,@function; regs_to_mem:
regs_to_mem:

"/var/tmp//ccTucmac.s", line 25 : Multiply defined label: "mem_to_regs"

.global mem_to_regs; .type mem_to_regs,@function; mem_to_regs:
mem_to_regs:

"/var/tmp//ccTucmac.s", line 45 : Symbol "regs_to_mem" already has a
size

.size regs_to_mem,.-regs_to_mem

WARNING: Could not assemble
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/do-test.S

  The first two can be avoided by removing the explicit function labels
  which are already covered by ELFFN_BEGIN.  The last error is due to a
  wrong call to FUNC_END: the second call should be for mem_to_regs, not
  regs_to_mem.

* On i386-pc-solaris2.12 with gas 2.28, this error doesn't happen, but I
  get

WARNING: Could not build
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/x8
6_64/abi/ms-sysv/ms-sysv.c.
FAIL: gcc.target/x86_64/abi/ms-sysv CFLAGS+="-O2" generator_args="-p0-5"
PASS: gcc.target/x86_64/abi/ms-sysv CFLAGS+="-O0 -g3" generator_args="-p0-5
--om
it-rbp-clobbers"
WARNING: Could not build
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/x8
6_64/abi/ms-sysv/ms-sysv.c.
FAIL: gcc.target/x86_64/abi/ms-sysv CFLAGS+="-mcall-ms2sysv-xlogues -O2"
generat
or_args="-p0-5"
WARNING: Link failed.
FAIL: gcc.target/x86_64/abi/ms-sysv CFLAGS+="-mcall-ms2sys

  The first two instances of ms-sysv.c fail to compile:

In file included from
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c:149:0:
/var/gcc/regression/trunk/12-gcc-gas/build/gcc/testsuite/gcc/ms-sysv/ms-sysv-generated.h:
In function 'msabi_02_0':
/var/gcc/regression/trunk/12-gcc-gas/build/gcc/testsuite/gcc/ms-sysv/ms-sysv-generated.h:205:1:
error: bp cannot be used in asm here

__attribute__ ((noinline, ms_abi)) long msabi_02_0 ()
{
  __asm__ __volatile__ ("" :::"rbp");
  return sysv_0_noinfo ();
}

WARNING: Could not build
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/x86_64/abi/ms-sysv/ms-sysv.c.

  The last instance of ms-sysv.exe doesn't link:

Undefined   first referenced
 symbol in file
__resms64f_12  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__resms64f_13  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__resms64f_14  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__resms64f_15  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__resms64f_16  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__resms64f_17  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__savms64f_12  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__savms64f_13  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__savms64f_14  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__savms64f_15  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__savms64f_16  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__savms64f_17  
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__resms64fx_12 
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__resms64fx_13 
/var/gcc/regression/trunk/12-gcc/build/gcc/testsuite/gcc/ms-sysv/ms-sysv.o
__resms64fx_14

[Bug c++/80856] New: ICE from template local overload resolution

2017-05-22 Thread joshkel at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80856

Bug ID: 80856
   Summary: ICE from template local overload resolution
   Product: gcc
   Version: 7.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: joshkel at gmail dot com
  Target Milestone: ---

Created attachment 41402
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41402=edit
Test case

Compiling the attached source file with no flags ("g++-7
EngineReportWriter.cpp") results in an internal compiler error:

EngineReportWriter.cpp: In function ‘T WrapToCycle(T) [with T = int]’:
EngineReportWriter.cpp:2:10: internal compiler error: Segmentation fault
 inline T WrapToCycle(T degrees)
  ^~~
0x86f4b4a crash_signal
../../src/gcc/toplev.c:337
0x84d576b useless_type_conversion_p(tree_node*, tree_node*)
../../src/gcc/gimple-expr.c:85
0x83b4b8f types_compatible_p
../../src/gcc/gimple-expr.h:66
0x83b4b8f gimple_check_call_args
../../src/gcc/cgraph.c:3733
0x83b4b8f gimple_check_call_matching_types(gimple*, tree_node*, bool)
../../src/gcc/cgraph.c:3783
0x83b6304 symbol_table::create_edge(cgraph_node*, cgraph_node*, gcall*, long
long, int, bool)
../../src/gcc/cgraph.c:864
0x83b6529 cgraph_node::create_edge(cgraph_node*, gcall*, long long, int)
../../src/gcc/cgraph.c:900
0x83bbb06 execute
../../src/gcc/cgraphbuild.c:337
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


The bug appears to be triggered by the combination of a template plus local
function declarations plus overload resolution.  If I move the local scope
function declarations to global scope, or if I change the function declarations
so that overload resolution isn't necessary, then the bug goes away.

This is on Ubuntu 14.04 x86, running GCC 7.1.0 from the Ubuntu Toolchain Test
PPA (https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test).

Previous versions of GCC (including GCC 6.3.0 from the Ubuntu Toolchain Test
PPA) are unaffected.

GCC 7.1 on godbolt.org is also reporting errors: https://godbolt.org/g/eAYhqd

[Bug target/80817] [missed optimization][x86] relaxed atomics

2017-05-22 Thread Joost.VandeVondele at mat dot ethz.ch

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817

--- Comment #3 from Joost VandeVondele  
---
If I compile with -m32

gcc -std=c++11 -m32 -S -O3  test.cpp

I get 

.cfi_startproc
subl$12, %esp
.cfi_def_cfa_offset 16
movl16(%esp), %ecx
fildq   (%ecx)
fistpq  (%esp)
movl(%esp), %eax
movl4(%esp), %edx
addl$1, %eax
adcl$0, %edx
movl%eax, (%esp)
movl%edx, 4(%esp)
fildq   (%esp)
fistpq  (%ecx)
addl$12, %esp
.cfi_def_cfa_offset 4
ret
.cfi_endproc


Is the above expected ? This causes a measurable slowdown in the piece of code
I'm looking at.

[Bug c++/79759] [concepts] ICE in tsubst, at cp/pt.c:13509

2017-05-22 Thread tom at honermann dot net

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79759

--- Comment #2 from Tom Honermann  ---
This looks to be directly related to the following reports:
- Bug 80746 - [concepts] ICE evaluating constraints for concepts with dependent
template parameters
- Bug 67147 - [concepts] ICE on checking concept with default template
arguments

[Bug target/80855] [nvptx] missing sorry("target cannot support label values"

2017-05-22 Thread vries at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80855

--- Comment #1 from Tom de Vries  ---
Created attachment 41401
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41401=edit
tentative patch

Using this tentative patch, we get:
...
test.c: In function ‘main’:
test.c:9:11: sorry, unimplemented: target cannot support label values
 void *ptr = &
   ^~~
...

[Bug target/80855] New: [nvptx] missing sorry("target cannot support label values"

2017-05-22 Thread vries at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80855

Bug ID: 80855
   Summary: [nvptx] missing sorry("target cannot support label
values"
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

When compilling this example:
...
int
main (void)
{
  goto L2;
 L1:
  return 0;
 L2:
  {
void *ptr = &
goto *ptr;
  }
}
...
we see:
...
test.c: In function ‘main’:
test.c:2:1: sorry, unimplemented: indirect jumps are not available on this
target
 main (void)
 ^~~~
...

But when we comment out the 'goto *ptr', we see:
...
ptxas test.o, line 25; error   : Arguments mismatch for instruction 'mov'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
...

We want a similar 'sorry' message for label values.

[Bug libstdc++/80835] Reading a member of an atomic can load just that member, not the whole struct

2017-05-22 Thread redi at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80835

--- Comment #2 from Jonathan Wakely  ---
You've reported this against libstdc++ but there's no way the library code can
possibly transform p->load(std::memory_order_acquire).ptr into loading a
subobject. std::atomic::load has no idea what you plan to do with the returned
value.

[Bug c++/80830] [8 Regression] ICE in tsubst_copy, at cp/pt.c:14569

2017-05-22 Thread nathan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80830

--- Comment #5 from Nathan Sidwell  ---
Author: nathan
Date: Mon May 22 12:05:41 2017
New Revision: 248329

URL: https://gcc.gnu.org/viewcvs?rev=248329=gcc=rev
Log:
gcc/testsuite/
PR c++/80830
* g++.dg/lookup/friend20.C: New testcase.

Added:
branches/c++-modules/gcc/testsuite/g++.dg/lookup/friend20.C
Modified:
branches/c++-modules/ChangeLog.modules

[Bug middle-end/80809] Multi-free error for variable size array used within OpenMP task

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80809

--- Comment #3 from Jakub Jelinek  ---
Created attachment 41400
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41400=edit
gcc8-pr80809.patch

Fix for the incorrect implicit determination.  The bug was that we turned the
(implicit in this case, but could be explicit too) shared clause on the
parallel
for the VLA into a private clause (to privatize the variable to hold debug info
in it; the actual VLA is then what a firstprivate pointer points to) too early,
we need to consider it shared while determining data sharing in inner
constructs, and only turn it into a private clause when we are done with that.

The other bug is with the incorrect VLA size, I'll work on it next.

[Bug target/74563] [6 regression] Classic MIPS16 (non-MIPS16e) function return broken

2017-05-22 Thread ma...@linux-mips.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74563

Maciej W. Rozycki  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||ma...@linux-mips.org
   Assignee|unassigned at gcc dot gnu.org  |matthew.fortune at 
imgtec dot com

--- Comment #11 from Maciej W. Rozycki  ---
Matthew,

Can you please take care of the backport?

Maciej

[Bug libfortran/80850] Sourced allocate() fails to allocate a pointer

2017-05-22 Thread tkoenig at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80850

Thomas Koenig  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2017-05-22
 CC||tkoenig at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Thomas Koenig  ---
Does the problem still persist with 7.1?

Also, please try reducing the problem to something we can manage,
fater having tried the usual debugging steps.

Running the executable under valgrind might help with pinpointing
a problem either in the source code or in gfortran.

[Bug c++/80830] [8 Regression] ICE in tsubst_copy, at cp/pt.c:14569

2017-05-22 Thread nathan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80830

Nathan Sidwell  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |nathan at gcc dot 
gnu.org

--- Comment #4 from Nathan Sidwell  ---
This compiles fine on my modules branch, so I guess addressed by some of the
symbol table patches I have yet to port.  Will add testcase there and recheck
once porting is complete.

[Bug c++/80841] Fails to match template specialization with polymorphic non-type template argument

2017-05-22 Thread cipherjason at hotmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80841

--- Comment #4 from Jason Bell  ---
(In reply to Daniel Krügler from comment #3)
> (In reply to Jason Bell from comment #2)
> > Thanks that's a good reduced example.  I've changed it slightly so it works
> > with constexpr input.
> 
> But that is just adding additional complexity: The constexpr declaration is
> unnecessary when you provide an argument of reference type to a non-type
> template of reference type.

I agree that should be the case but with my GCC compiler (6.3.1) on C++14 mode
I get an error from that... "error: the value of ‘input’ is not usable in a
constant expression" even though it's used as a reference.  If I static cast it
this problem goes away, but then I get different behaviour on Clang with
C++14...  So the constexpr version seems a bit more portable for testing this
particular issue.

[Bug tree-optimization/80854] New: hot path is slowed down when the cold return path is merged into it

2017-05-22 Thread nsz at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80854

Bug ID: 80854
   Summary: hot path is slowed down when the cold return path is
merged into it
   Product: gcc
   Version: 7.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

i see subomptimal code gen for

float foo (float x)
{
  if (__builtin_expect (x > 0, 0))
if (x>2) return 0;
  return x*x;
}

because the return path merge causes extra register move in the hot path
https://godbolt.org/g/AZxxrR

x86_64:

foo:
pxor%xmm1, %xmm1
ucomiss %xmm1, %xmm0
ja  .L8
.L2:
movaps  %xmm0, %xmm1  // extra reg move
mulss   %xmm0, %xmm1
.L1:
movaps  %xmm1, %xmm0  // extra reg move
ret
.L8:
ucomiss .LC1(%rip), %xmm0
jbe .L2
jmp .L1   // need not jmp back
.LC1:
.long   1073741824


aarch64:

foo:
fcmpe   s0, #0.0
bgt .L8
.L2:
fmuls1, s0, s0
.L1:
fmovs0, s1   // extra reg move
ret
.p2align 3
.L8:
fmovs2, 2.0e+0
moviv1.2s, #0
fcmpe   s0, s2
ble .L2
b   .L1// need not jmp back

i wonder if gcc could do better if there is information about hot/cold paths
(by not merging the hot/cold return paths in some cases).

[Bug middle-end/80853] [6/7/8 Regression] OpenMP ICE in build_outer_var_ref with array reduction

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80853

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-05-22
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jakub Jelinek  ---
Created attachment 41399
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41399=edit
gcc8-pr80853.patch

Untested fix.

[Bug middle-end/80853] New: [6/7/8 Regression] OpenMP ICE in build_outer_var_ref with array reduction

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80853

Bug ID: 80853
   Summary: [6/7/8 Regression] OpenMP ICE in build_outer_var_ref
with array reduction
   Product: gcc
   Version: 7.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

void
foo (int *p)
{
  #pragma omp for reduction(+:p[:4])
  for (int i = 0; i < 64; i++)
{
  p[0] += i;
  p[1] += i / 2;
  p[2] += 2 * i;
  p[3] += 3 * i;
}
}

ICEs with -fopenmp in both C and C++ starting with GCC 6 (when array reduction
support has been introduced).  For array reductions if the array section base
is a pointer, we don't really need the pointer to be shared (and don't actually
check that, unlike e.g. when the array section is array based), it is enough if
what the points to is shared.

[Bug libgomp/80822] libgomp incorrect affinity when OMP_PLACES=threads

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80822

--- Comment #2 from Jakub Jelinek  ---
Note, the OpenMP description of OMP_PLACES=threads is:
"Each place corresponds to a single hardware thread on the target machine."
which is what GCC implements.  Perhaps ICC implements the same, but orders in
the list differently?  For spread and close the algorithm is pretty well
defined on what it should do with a particular OMP_PLACES list.

[Bug libgomp/80822] libgomp incorrect affinity when OMP_PLACES=threads

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80822

--- Comment #1 from Jakub Jelinek  ---
What do you get with OMP_PLACES=threads OMP_DISPLAY_ENV=verbose (with both
libgomp and Intel libomp)?

[Bug c++/80841] Fails to match template specialization with polymorphic non-type template argument

2017-05-22 Thread daniel.kruegler at googlemail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80841

--- Comment #3 from Daniel Krügler  ---
(In reply to Jason Bell from comment #2)
> Thanks that's a good reduced example.  I've changed it slightly so it works
> with constexpr input.

But that is just adding additional complexity: The constexpr declaration is
unnecessary when you provide an argument of reference type to a non-type
template of reference type.

[Bug c++/67054] Constructor inheritance with non-default constructible members

2017-05-22 Thread db0451 at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67054

DB  changed:

   What|Removed |Added

 CC||db0451 at gmail dot com

--- Comment #3 from DB  ---
Still occurs in 4.8 to 7.1, according to this duplicate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80851

Also, could we perhaps get a more useful title for this one? It describes what
is involved in the problem, but not what the problem is.

[Bug target/80808] [7/8 Regression] gnupg miscompilation on arm starting with r241660

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80808

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #4 from Jakub Jelinek  ---
You're right, that fixes it.  So not a GCC bug, but a bug in gnupg that it uses
helplessly obsolete longlong.h.
This has been fixed in https://gcc.gnu.org/ml/gcc-patches/2005-10/msg00546.html
and committed as r106491.  Current gmp longlong.h seems to be fine too.

[Bug c++/80851] All versions that support C++11 are confused by combination of inherited constructors with member initializer that captures this

2017-05-22 Thread db0451 at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80851

DB  changed:

   What|Removed |Added

 CC||db0451 at gmail dot com

--- Comment #1 from DB  ---
This is presumably a duplicate of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67054

unless the OP has evidence that capturing *this is relevant to whether or not
the problem occurs

[Bug c/80852] Optimisation fails to recognise sum computed by loop

2017-05-22 Thread drraph at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80852

--- Comment #2 from Raphael C  ---
You are quite right. This code shows the same issue:

int foo(int num) {
int a = 0;
for (int x = 0; x < num; x+=2) {
a += x;
   }
return a;
  }

[Bug sanitizer/78204] ‘no_sanitize’ attribute directive ignored [-Wattributes]

2017-05-22 Thread marxin at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78204

Martin Liška  changed:

   What|Removed |Added

   Target Milestone|--- |8.0

--- Comment #5 from Martin Liška  ---
> 
> It looks like GCC does not support this feature in any compilers (GCC 4
> through 7). Is that correct?
> 
> (I'm trying to get some macros tuned based on Clang and GCC versions).

No, it doesn't. But I've got a patch that can be merged for GCC 8.

[Bug target/80833] 32-bit x86 causes store-forwarding stalls for int64_t -> xmm

2017-05-22 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-05-22
 Ever confirmed|0   |1

--- Comment #5 from Richard Biener  ---
There's some related bugs.  I think there is no part of the compiler that
specifically tries to avoid store forwarding issues.

[Bug c/80832] GCC_COLORS

2017-05-22 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80832

Richard Biener  changed:

   What|Removed |Added

   Keywords||documentation
   Severity|normal  |enhancement

[Bug c++/80831] [6/7/8 Regression] ICE: Segmentation fault with -fsyntax-only

2017-05-22 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80831

Richard Biener  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
   Priority|P3  |P2
   Target Milestone|--- |6.4

[Bug c++/80830] [8 Regression] ICE in tsubst_copy, at cp/pt.c:14569

2017-05-22 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80830

Richard Biener  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
   Priority|P3  |P1
Version|7.0 |8.0

[Bug target/80808] [7/8 Regression] gnupg miscompilation on arm starting with r241660

2017-05-22 Thread pinskia at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80808

--- Comment #3 from Andrew Pinski  ---
Can you try adding "flags" as a clobber on those inline-asm.  I suspect that
might causing something here.

[Bug target/80808] [7/8 Regression] gnupg miscompilation on arm starting with r241660

2017-05-22 Thread jakub at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80808

--- Comment #2 from Jakub Jelinek  ---
Smaller testcase that still works at -O1 and fails at -O2 (+ -march=armv7-a
-mfpu=vfpv3-d16 -mfloat-abi=hard in both cases):

static __attribute__ ((noinline, noclone)) unsigned
mpihelp_divrem (unsigned *qp, int qextra_limbs,
unsigned *np, int nsize,
unsigned *dp, int dsize)
{
  unsigned most_significant_q_limb = 0;
  switch (dsize)
{
case 0:
  return 1 / dsize;
case 2:
  {
int i;
unsigned n1, n0, n2, d1, d0;
np += nsize - 2;
d1 = dp[1];
d0 = dp[0];
n1 = np[1];
n0 = np[0];
if (n1 >= d1 && (n1 > d1 || n0 >= d0))
  {
__asm__ ("subs %1, %4, %5\n" "sbc  %0, %2, %3"
 : "=r" (n1), "=" (n0)
 : "r" (n1), "rI" (d1), "r" (n0), "rI" (d0));
most_significant_q_limb = 1;
  }
for (i = qextra_limbs + nsize - 2 - 1; i >= 0; i--)
  {
unsigned q;
unsigned r;
if (i >= qextra_limbs)
  np--;
else
  np[0] = 0;
if (n1 == d1)
  {
q = ~(unsigned) 0;
r = n0 + d1;
if (r < d1)
  {
__asm__ ("adds %1, %4, %5\n" "adc  %0, %2, %3"
 : "=r" (n1), "=" (n0)
 : "%r" (r - d0), "rI" (0),
   "%r" (np[0]), "rI" (d0));
qp[i] = q;
continue;
  }
n1 = d0 - (d0 != 0 ? 1 : 0);
n0 = -d0;
  }
else
  {
do
  {
unsigned __d1, __d0, __q1, __q0, __r1, __r0, __m;
__d1 = (d1 >> ((8 * (4)) / 2));
__d0 = (d1 & ((1U << ((8 * (4)) / 2)) - 1));
__r1 = (n1) % __d1;
__q1 = (n1) / __d1;
__m = (unsigned) __q1 *__d0;
__r1 = __r1 * (1U << ((8 * (4)) / 2))
   | ((unsigned) (n0) >> ((8 * (4)) / 2));
if (__r1 < __m)
  {
__q1--, __r1 += (d1);
if (__r1 >= (d1))
  if (__r1 < __m)
__q1--, __r1 += (d1);
  }
__r1 -= __m;
__r0 = __r1 % __d1;
__q0 = __r1 / __d1;
__m = (unsigned) __q0 *__d0;
__r0 = __r0 * (1U << ((8 * (4)) / 2))
   | (n0 & (((unsigned) 1 << ((8 * (4)) / 2)) - 1));
if (__r0 < __m)
  {
__q0--, __r0 += (d1);
if (__r0 >= (d1))
  if (__r0 < __m)
__q0--, __r0 += (d1);
  }
__r0 -= __m;
q = (unsigned) __q1 * (1U << ((8 * (4)) / 2)) | __q0;
r = __r0;
  }
while (0);
__asm__ ("umull %r1, %r0, %r2, %r3"
 : "=" (n1), "=r" (n0)
 : "r" (d0), "r" (q):"r0", "r1");
  }
n2 = np[0];
  q_test:
if (n1 > r || (n1 == r && n0 > n2))
  {
q--;
__asm__ ("subs %1, %4, %5\n" "sbc  %0, %2, %3"
 : "=r" (n1), "=" (n0)
 : "r" (n1), "rI" (0), "r" (n0), "rI" (d0));
r += d1;
if (r >= d1)
  goto q_test;
  }
qp[i] = q;
__asm__ ("subs %1, %4, %5\n" "sbc  %0, %2, %3"
 : "=r" (n1), "=" (n0)
 : "r" (r), "rI" (n1), "r" (n2), "rI" (n0));
  }
np[1] = n1;
np[0] = n0;
  }
  break;
default:
  __builtin_abort ();
}
  return most_significant_q_limb;
}

int
main ()
{
  unsigned qp[1];
  unsigned np[3] = { 0xdaafeaa6, 0x0e77816a, 1 };
  unsigned dp[2] = { 0x6816ec64, 0xb9d5666d };
  volatile int l = 0;
  unsigned ret = mpihelp_divrem (qp + l, 0 + l, np + l, 3 + l, dp + l, 2 + l);
  if (ret != 0 || qp[0] != 1 || np[0] != 0x7298fe42 || np[1] != 0x54a21afd)
__builtin_abort ();
  return 0;
}

[Bug tree-optimization/80842] [7/8 Regression] ICE at -O3 on x86_64-linux-gnu in "set_lattice_value"

2017-05-22 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80842

--- Comment #3 from Richard Biener  ---
Hmm, goes CONSTANT -> UNDEFINED.  We've been here before I think.

Visiting statement:
_7 = (int) f.7_6;
which is likely UNDEFINED
Lattice value changed to UNDEFINED.  Adding SSA edges to worklist.
marking stmt to be not simulated again

Visiting statement:
_5 = (char) _4;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x0 (0x3).  Adding SSA edges to worklist.

Visiting statement:
_8 = (int) _5;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x0 (0x3).  Adding SSA edges to worklist.

Visiting statement:
_14 = _7 * _8;
which is likely CONSTANT
Lattice value changed to VARYING.  Adding SSA edges to worklist.

Visiting statement:
_15 = _14 / _8;
which is likely CONSTANT
Lattice value changed to CONSTANT 0x0 (0xff).  Adding SSA edges to worklist.

Visiting statement:
_5 = (char) _4;
which is likely CONSTANT
Lattice value changed to VARYING.  Adding SSA edges to worklist.
ssa_edge_worklist: adding SSA use in _8 = (int) _5;

Simulating statement: _8 = (int) _5;

Visiting statement:
_8 = (int) _5;
which is likely CONSTANT
Lattice value changed to VARYING.  Adding SSA edges to worklist.
ssa_edge_worklist: adding SSA use in _15 = _14 / _8;
ssa_edge_worklist: adding SSA use in _18 = _8 & _17;

Simulating statement: _15 = _14 / _8;

Visiting statement:
_15 = _14 / _8;
which is likely CONSTANT
Applying pattern match.pd:397, gimple-match.c:9897
Match-and-simplified _14 / _8 to _7

[Bug c++/80841] Fails to match template specialization with polymorphic non-type template argument

2017-05-22 Thread cipherjason at hotmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80841

--- Comment #2 from Jason Bell  ---
Thanks that's a good reduced example.  I've changed it slightly so it works
with constexpr input.

//#
template 
struct A {};

template 
struct B {};

template 
struct B> 
{
  using result = T;
};

static constexpr double input = 1.;

int main() {
  using result1 = typename B>::result; // OK
  using result2 = typename B>::result;
// OK
  using result3 = typename B>::result;
// Error
}
//#

I've noticed that it works in Clang if using --std=c++14 but not with
--std=c++1z (same as my previous example).

[Bug c/80852] Optimisation missed for loop with condition that is always true

2017-05-22 Thread glisse at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80852

--- Comment #1 from Marc Glisse  ---
I don't think the test on x%2 is relevant, gcc eliminates it without problem. I
thought gcc had some code to recognize 1+2+3+...+n and similar patterns, but
apparently not this one.

[Bug tree-optimization/80842] [7/8 Regression] ICE at -O3 on x86_64-linux-gnu in "set_lattice_value"

2017-05-22 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80842

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Target Milestone|--- |7.2

--- Comment #2 from Richard Biener  ---
Mine.

[Bug c/80852] New: Optimisation missed for loop with condition that is always true

2017-05-22 Thread drraph at gmail dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80852

Bug ID: 80852
   Summary: Optimisation missed for loop with condition that is
always true
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: drraph at gmail dot com
  Target Milestone: ---

Consider this (slightly odd) code:

int foo(int num) {
int a = 0;
for (int x = 0; x < num; x+=2) {
  if (!(x % 2)) {
a += x;
  }
   }
return a;
  }

Note that the condition !(x % 2) is always true.

In clang and -O3 -march=core-avx2 you get:

foo(int): # @square(int)
testedi, edi
jle .LBB0_1
add edi, -1
mov eax, edi
shr eax
lea ecx, [rax - 1]
imulecx, eax
and ecx, -2
and edi, -2
add edi, ecx
mov eax, edi
ret
.LBB0_1:
xor edi, edi
mov eax, edi
ret

This is clever as it avoids looping altogether.

gcc however doesn't know this trick and you get:

foo(int):
testedi, edi
jle .L7
lea eax, [rdi-1]
mov ecx, eax
shr ecx
add ecx, 1
cmp eax, 17
jbe .L8
mov edx, ecx
vmovdqa ymm1, YMMWORD PTR .LC0[rip]
xor eax, eax
vpxor   xmm0, xmm0, xmm0
vmovdqa ymm2, YMMWORD PTR .LC1[rip]
shr edx, 3
.L5:
add eax, 1
vpaddd  ymm0, ymm0, ymm1
vpaddd  ymm1, ymm1, ymm2
cmp eax, edx
jb  .L5
vpxor   xmm1, xmm1, xmm1
mov esi, ecx
vperm2i128  ymm2, ymm0, ymm1, 33
and esi, -8
vpaddd  ymm0, ymm0, ymm2
lea edx, [rsi+rsi]
vperm2i128  ymm2, ymm0, ymm1, 33
vpalignrymm2, ymm2, ymm0, 8
vpaddd  ymm0, ymm0, ymm2
vperm2i128  ymm1, ymm0, ymm1, 33
vpalignrymm1, ymm1, ymm0, 4
vpaddd  ymm0, ymm0, ymm1
vmovd   eax, xmm0
cmp ecx, esi
je  .L12
vzeroupper
.L3:
lea ecx, [rdx+2]
add eax, edx
cmp edi, ecx
jle .L10
add eax, ecx
lea ecx, [rdx+4]
cmp ecx, edi
jge .L10
add eax, ecx
lea ecx, [rdx+6]
cmp edi, ecx
jle .L10
add eax, ecx
lea ecx, [rdx+8]
cmp edi, ecx
jle .L10
add eax, ecx
lea ecx, [rdx+10]
cmp edi, ecx
jle .L10
add eax, ecx
lea ecx, [rdx+12]
cmp edi, ecx
jle .L10
add eax, ecx
lea ecx, [rdx+14]
cmp edi, ecx
jle .L10
add eax, ecx
add edx, 16
lea ecx, [rax+rdx]
cmp edi, edx
cmovg   eax, ecx
ret
.L7:
xor eax, eax
.L10:
ret
.L12:
vzeroupper
ret
.L8:
xor edx, edx
xor eax, eax
jmp .L3
.LC0:
.long   0
.long   2
.long   4
.long   6
.long   8
.long   10
.long   12
.long   14
.LC1:
.long   16
.long   16
.long   16
.long   16
.long   16
.long   16
.long   16
.long   16

[Bug c++/80851] New: All versions that support C++11 are confused by combination of inherited constructors with member initializer that captures this

2017-05-22 Thread devgs at ukr dot net

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80851

Bug ID: 80851
   Summary: All versions that support C++11 are confused by
combination of inherited constructors with member
initializer that captures this
   Product: gcc
   Version: 7.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: devgs at ukr dot net
  Target Milestone: ---

Tested with:
4.8.1 - 7.1

Test case:

struct BaseFooWrapper
{
BaseFooWrapper(int qux)
{ }
};

struct Foo
{
Foo(BaseFooWrapper & foo)
: foo(foo)
{ }

BaseFooWrapper & foo;
};

struct SomeFooWrapper : public BaseFooWrapper
{
using BaseFooWrapper::BaseFooWrapper;


Foo foo{*this};
};

int main()
{
SomeFooWrapper wrapped_foo(1);
return 0;
}



Error output:


#1 with x86-64 gcc 7.1
: In function 'int main()':
:29:33: error: use of deleted function
'SomeFooWrapper::SomeFooWrapper(int) [inherited from BaseFooWrapper]'
 SomeFooWrapper wrapped_foo(1);
 ^
:20:27: note: 'SomeFooWrapper::SomeFooWrapper(int) [inherited from
BaseFooWrapper]' is implicitly deleted because the default definition would be
ill-formed:
 using BaseFooWrapper::BaseFooWrapper;
   ^~
:20:27: error: no matching function for call to 'Foo::Foo()'
:11:5: note: candidate: Foo::Foo(BaseFooWrapper&)
 Foo(BaseFooWrapper & foo)
 ^~~
:11:5: note:   candidate expects 1 argument, 0 provided
:9:8: note: candidate: constexpr Foo::Foo(const Foo&)
 struct Foo
^~~
:9:8: note:   candidate expects 1 argument, 0 provided
:9:8: note: candidate: constexpr Foo::Foo(Foo&&)
:9:8: note:   candidate expects 1 argument, 0 provided

79 matches

Mail list logo