: target
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Created attachment 31399
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31399action=edit
Patch
Trying to compile a function with an xop multiversion fails with a No
dispatcher found for xop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033
Allan Jensen linux at carewolf dot com changed:
What|Removed |Added
CC||linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033
--- Comment #32 from Allan Jensen linux at carewolf dot com 2013-02-17
15:23:49 UTC ---
(In reply to comment #31)
(In reply to comment #30)
Another example is binary operators between scalar and vectors. In C the
scalar
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53460
Bug #: 53460
Summary: Internal compiler error: in calc_dfs_tree, at
dominance.c:395
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53460
--- Comment #1 from Allan Jensen linux at carewolf dot com 2012-05-23
15:34:35 UTC ---
Created attachment 27481
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27481
FontFastPath.ii.gz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53460
--- Comment #2 from Allan Jensen linux at carewolf dot com 2012-05-23
15:37:32 UTC ---
It appears I am not allowed to make more than one attachment so you will have
to do with one example. Here is the console output:
Using built-in specs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48026
Allan Jensen linux at carewolf dot com changed:
What|Removed |Added
CC||linux at carewolf
: libgcc
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
After recently trying to build Qt with -O3, I found one of our tests failing.
After investigating I narrowed it down to qregion.cpp and the flag
-finline-functions (using -O2 -finline-functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #1 from Allan Jensen linux at carewolf dot com ---
Created attachment 32268
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32268action=edit
qregion.cpp intermediate compiled with G++ 4.4 (working)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #2 from Allan Jensen linux at carewolf dot com ---
Created attachment 32269
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32269action=edit
qregion.cpp intermediate compiled with gcc 4.8
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #3 from Allan Jensen linux at carewolf dot com ---
Created attachment 32270
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32270action=edit
qregion.cpp assembler compiled with gcc 4.8
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #4 from Allan Jensen linux at carewolf dot com ---
Created attachment 32271
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32271action=edit
qregion.cpp assembler compiled with gcc 4.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #6 from Allan Jensen linux at carewolf dot com ---
(In reply to Richard Biener from comment #5)
Can you identify the inlined call? Is it
if (pSLL y == pSLL-scanline) {
loadAET(AET, pSLL
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #8 from Allan Jensen linux at carewolf dot com ---
Created attachment 32303
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32303action=edit
Reduced test
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #9 from Allan Jensen linux at carewolf dot com ---
Created attachment 32304
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32304action=edit
Reduced test assembler
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #10 from Allan Jensen linux at carewolf dot com ---
I have uploaded a reduced test. Compiled with -O0 or -O1 it outputs 180,
compiled with -O2 or higher it outputs 179.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #11 from Allan Jensen linux at carewolf dot com ---
Note that to run it, it links against Qt5Core.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #13 from Allan Jensen linux at carewolf dot com ---
(In reply to Andrew Pinski from comment #12)
tmpPtBlock-pts = reinterpret_castQPoint
*(tmpPtBlock-data);
Does this not violate C/C++ aliasing rules later on?
I
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #24 from Allan Jensen linux at carewolf dot com ---
I just tested the latest subversion head of gcc 4.9 and can confirm it fixes
the original problem (tst_qregion in Qt 5.2.1 compiled with -O3).
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60429
--- Comment #25 from Allan Jensen linux at carewolf dot com ---
Will it be backported to 4.8?
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Created attachment 32567
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32567action=edit
Test case
If you compile the attached program with -O0 and -mlzcnt on x86, it will
produce wrong results
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60788
--- Comment #1 from Allan Jensen linux at carewolf dot com ---
Sorry. The optimization has nothing to do with it, it just causes the constant
expressions used for testing to be evaluated at compile time.
The real issue is that the lzcnt
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60788
--- Comment #3 from Allan Jensen linux at carewolf dot com ---
Sorry for the confusion. I thought Intel had added it from Ivy Bridge, but it
was Haswell.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64806
--- Comment #3 from Allan Jensen linux at carewolf dot com ---
I refer to this:
/* Handle arch= if specified. For priority, set it to be 1 more than
the best instruction set the processor can handle. For instance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64806
Allan Jensen linux at carewolf dot com changed:
What|Removed |Added
CC||linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492
--- Comment #10 from Allan Jensen linux at carewolf dot com ---
Just make things more complicated, I just tried the test on a Haswell, and
surprisingly disabling if-convert or tree-vectorize on -O3 has no effect on
performance, but activating
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
After investigating a loop using SSE intrinsics that was significantly faster
in clang than in gcc, I discovered gcc had the same performance as clang in
-O2, and only performed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492
--- Comment #1 from Allan Jensen linux at carewolf dot com ---
Created attachment 35070
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35070action=edit
main
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492
--- Comment #2 from Allan Jensen linux at carewolf dot com ---
Created attachment 35071
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=35071action=edit
vector union test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492
--- Comment #3 from Allan Jensen linux at carewolf dot com ---
The -O3 regression seems to go back a long way, but has become lesser over
time.
With gcc 4.6 and older it runs at 3.1s with -O3, and still at 1.8s with -O2.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492
--- Comment #8 from Allan Jensen linux at carewolf dot com ---
You can remove the branches in the inner loop and still reproduce the issue.
There were no branches in the original code, I only added them to the reduced
case because I was using
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492
--- Comment #9 from Allan Jensen linux at carewolf dot com ---
Looking at the assembler, it does indeed appear that the only difference just
loop unrolling and if conversion.
After testing on another machine (and old PhenomII as opposed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492
--- Comment #11 from Allan Jensen linux at carewolf dot com ---
Issues with slow cmov has been seen in several bug reports:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073
https://gcc.gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65492
--- Comment #12 from Allan Jensen linux at carewolf dot com ---
I have a very crude fix for this.
First though, according to comments in tree-if-conv.c and earlier bugs on the
issues. If-conversion is suppposed to be conditional. It performed
: lto
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
When trying to build QtWebkit with LTO I get the internal error:
lto1: internal compiler error: in should_move_die_to_comdat, at
dwarf2out.c:6846
Note. I do not actually expect an LTO debug build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65274
--- Comment #2 from Allan Jensen linux at carewolf dot com ---
Yes, it appears to complete the linktime compilation when using GCC trunk.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65211
--- Comment #2 from Allan Jensen linux at carewolf dot com ---
Created attachment 34873
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34873action=edit
Intermediate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65211
Allan Jensen linux at carewolf dot com changed:
What|Removed |Added
Attachment #34873|0 |1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65211
--- Comment #1 from Allan Jensen linux at carewolf dot com ---
Created attachment 34872
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34872action=edit
Assembler intermediate
It is the movdqa(%rdx), %xmm1 instruction on line 19
++
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Created attachment 34871
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34871action=edit
C++ source
A specific combination of local typedef inside a templated function causes gcc
to lose
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65211
--- Comment #4 from Allan Jensen linux at carewolf dot com ---
Note either removing the template argument or moving the typedef out of the
function both solve the issue, and makes gcc use an unaligned load.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67351
--- Comment #1 from Allan Jensen linux at carewolf dot com ---
Created attachment 36254
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36254action=edit
Compiled test assembler
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Created attachment 36253
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36253action=edit
Test
Gcc will expand and detect field setting on 32-bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67351
Allan Jensen changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68793
--- Comment #3 from Allan Jensen ---
Created attachment 36959
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36959=edit
neon-test-no-split-wide-types.s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68793
--- Comment #6 from Allan Jensen ---
I mean the neon64 case, not 32-bit.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68793
--- Comment #1 from Allan Jensen ---
Created attachment 36957
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36957=edit
neon-test.cpp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68793
--- Comment #2 from Allan Jensen ---
Created attachment 36958
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36958=edit
neon-test-split-wide-types.s
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Enabling the optimization 'split-wide-types' causes worse code for NEON
intrinsics than disabling it, and it is enabled by default by -O1.
It is triggered by multi-register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68793
--- Comment #5 from Allan Jensen ---
The test-case uses C++11 initialization. I haven't tested gcc 6, so if you say
it is solved, I would trust you.
Note the 32-bit case is also suboptimal in both cases (not affected by
split-wide-types). Is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68793
Allan Jensen changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51509
Allan Jensen changed:
What|Removed |Added
CC||linux at carewolf dot com
--- Comment #6
: other
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
The intrinsics _mm_loadl_epi64 and _mm_storel_epi64 triggers UBSan warnings on
unaligned access because the instrinsics definitions in emmintrin.h are using
__m64
: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
We have been running into several issues with the tautological compare warning
in qtdeclarative, first there was https
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Created attachment 39774
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39774=edit
Example that trigger the pointl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65274
--- Comment #4 from Allan Jensen ---
It works now.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902
Allan Jensen changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118
--- Comment #3 from Allan Jensen ---
Or r217608
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118
--- Comment #2 from Allan Jensen ---
I believe this to be fixed by r239889
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118
--- Comment #4 from Allan Jensen ---
Created attachment 40130
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40130=edit
Proposed patch
On closer inspection, we are only almost there, two minor changes are still
needed. (testing patch).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118
Allan Jensen changed:
What|Removed |Added
Attachment #40130|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667
Allan Jensen changed:
What|Removed |Added
CC||linux at carewolf dot com
--- Comment #3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78563
--- Comment #1 from Allan Jensen ---
Created attachment 40177
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40177=edit
Test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667
--- Comment #4 from Allan Jensen ---
(In reply to Allan Jensen from comment #3)
> Gcc 5 and 6 produces code with pmovzx when compiling the example with -O3
> -msse4.1
>
> I assume this can be closed.
Note like comment 1 saids, it will not use
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
An unpack pattern with 0 constant are neither folded nor recognized as a pmovzx
instruction.
SSE2 code:
_mm_unpacklo_epi32(X, _mm_setzero_si128())
GCC code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78394
Allan Jensen changed:
What|Removed |Added
Attachment #40064|0 |1
is obsolete|
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Created attachment 40064
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40064=edit
maybe_uninitialized.cpp
Compiling with -Og produces a number of uni
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63319
Allan Jensen changed:
What|Removed |Added
CC||linux at carewolf dot com
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902
--- Comment #1 from Allan Jensen ---
Further experimentation shows that GCC can sometimes reason about the remaining
range but does so inconsistenly.
For instance this examplse also fails:
int result = 0;
for (; count >= 4; count -= 4)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77902
--- Comment #2 from Allan Jensen ---
While this have been the case in both GCC 5 and GCC 6, it appears to both
failing cases previously meantioned already produced the best case result in
using a half recent GCC 7.
gcc version 7.0.0 20160923
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754
--- Comment #8 from Allan Jensen ---
Note this happens with -mavx2, but not with -march=haswell. It appears the
tuning is a bit too pessimistic when avx2 is enabled on generic x64.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754
Allan Jensen changed:
What|Removed |Added
CC||linux at carewolf dot com
--- Comment #7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754
--- Comment #10 from Allan Jensen ---
No I mean it triggers when you compile with -mavx2, it is solved with
-march=haswell. It appears the issue is the tune flag
X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL is set for all processors that support
avx2,
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Created attachment 40295
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40295=edit
Test
In gcc 7 when not optimizing for sp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762
--- Comment #3 from Allan Jensen ---
Created attachment 40298
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40298=edit
Test compiled with gcc 6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762
--- Comment #1 from Allan Jensen ---
Created attachment 40296
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40296=edit
Test compiled with -mavx2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762
--- Comment #2 from Allan Jensen ---
Created attachment 40297
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40297=edit
Test compiled with -march=haswell
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754
--- Comment #11 from Allan Jensen ---
The think the issue I noted is completely separate from this one, so I opened
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762 to deal with it.
I think this one could probably be closed though.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59874
Allan Jensen changed:
What|Removed |Added
CC||linux at carewolf dot com
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66970
Allan Jensen changed:
What|Removed |Added
CC||linux at carewolf dot com
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59874
--- Comment #15 from Allan Jensen ---
Yes, the patch works and it also evaluates at compile time.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59874
--- Comment #8 from Allan Jensen ---
Thanks that looks good. I will test it when I have a chance. I am changing the
Qt sources to not assume the presence of __builtin_clzs when __BMI__ is
defined. It can use __builtin_clz() and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70118
Allan Jensen changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762
--- Comment #10 from Allan Jensen ---
That would solve the problem, but also leave the behavior as Sandybridge only
(nehalem didn't have AVX).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762
--- Comment #11 from Allan Jensen ---
Btw, did you benchmark store splitting on AMD? It is also enabled for BDVER and
ZNVER1.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78762
--- Comment #13 from Allan Jensen ---
The question is if the unaligned store is still slow on Excavator and Ryzen
which support AVX2. As far as I understand the bulldozer architectures just
prefer split AVX because it was basically emulating
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
The intrinsics for x86 SIMD shuffle instructions could be redeclared using
__builtin_shuffle. This would help folding and better
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Created attachment 41100
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41100=edit
icf.cc
Several functions that produce identical assembler are not merged by ipa-icf. I
h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80040
--- Comment #2 from Allan Jensen ---
Created attachment 40973
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40973=edit
Assembler output from gcc 6
Easier to compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80040
--- Comment #1 from Allan Jensen ---
Created attachment 40972
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40972=edit
Assembler output
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Created attachment 40971
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40971=edit
Example
The intrinsics _mm_testz_si128 and _mm_testc_si128 both map to the exact same
instruct
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Created attachment 41610
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41610=edit
bswap-issue.cc
In writting a big-endian bitfield accessor I noticed that bs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81174
Allan Jensen changed:
What|Removed |Added
Version|6.1.1 |7.1.0
--- Comment #1 from Allan Jensen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82426
--- Comment #3 from Allan Jensen ---
Note it appears the fact it can do it at all in -Os is new in gcc 7
: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Created attachment 42299
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42299=edit
vectslp.cpp
The attached example is a simple matrix multiplicat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82426
--- Comment #1 from Allan Jensen ---
Created attachment 42300
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42300=edit
Assembler output with -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82426
--- Comment #2 from Allan Jensen ---
Created attachment 42301
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42301=edit
Assembler output with -Os -ftree-slp-vectorize
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
If a vector initialization is using elements from only a single vector source,
it will be optimized as a shuffle, but if it is using elements from two
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85692
--- Comment #1 from Allan Jensen ---
Created attachment 44084
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44084=edit
construct.cc
Motivating examples. Compile with -msse4.1 for the second case.
: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linux at carewolf dot com
Target Milestone: ---
Created attachment 44030
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44030=edit
strmod.cpp
Many simple loops using modulo naively can be optimized
1 - 100 of 153 matches
Mail list logo