https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122573
Bug ID: 122573
Summary: C++ missed invariant motion vs. vectorization
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
The following C++ testcase has the m_column?[?] loads not hoisted from the loop
causing a high VF and with AVX512 masked epilogues high pressure on the
compare unit of Zen4/5 leading to slowdown when numPixels is low.
I'm not sure invariant motion would be valid here, but the vectorizer does
runtime alias checking against the this->m_column accesses, and the
vectorizer detects the loads as invariant. Possibly SLP discovery could
treat them so, ignoring that they are not "grouped accesses".
struct S {
void apply(const void * inImg, void * outImg, long numPixels) const;
float m_column1[4];
float m_column2[4];
float m_column3[4];
float m_column4[4];
};
void S::apply(const void * inImg, void * outImg, long numPixels) const
{
const float * in = (const float *)inImg;
float * out = (float *)outImg;
for (long idx = 0; idx < numPixels; ++idx)
{
const float r = in[0];
const float g = in[1];
const float b = in[2];
const float a = in[3];
out[0] = r*m_column1[0]
+ g*m_column2[0]
+ b*m_column3[0]
+ a*m_column4[0];
out[1] = r*m_column1[1]
+ g*m_column2[1]
+ b*m_column3[1]
+ a*m_column4[1];
out[2] = r*m_column1[2]
+ g*m_column2[2]
+ b*m_column3[2]
+ a*m_column4[2];
out[3] = r*m_column1[3]
+ g*m_column2[3]
+ b*m_column3[3]
+ a*m_column4[3];
in += 4;
out += 4;
}
}