https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87077
Bug ID: 87077
Summary: missed optimization for horizontal add for x86 SSE
Product: gcc
Version: 7.0.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: trashyankes at wp dot pl
Target Milestone: ---
During some experiments with toy programs I find out that GCC do not do any
horizontal adding for xmm registers.
Some benchmark code:
http://quick-bench.com/HhZPnOtb9SYYK8z4IMKb_XAWYCI
If I'm not mistaken both function do same work and one hand written is faster.
And IIRC `_mm_hadd_ps` is consider a slow way to do this but is still faster
than standard function.
Is my finding correct or I simply miss some important details why GCC do not do
this?