https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121908
Bug ID: 121908
Summary: Hot loop in xz not vectorized
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rdapp at gcc dot gnu.org
CC: jeffreyalaw at gmail dot com, rguenth at gcc dot gnu.org,
tamar.christina at arm dot com
Target Milestone: ---
The following is a simplified example of bt_skip_func in 557.xz that can be
vectorized. It's a search loop that looks for (dis)similarities in an array
and, depending on the input, we're seeing double-digit improvements when
vectorized.
#define uint8_t unsigned char
#define uint32_t unsigned int
int foo (const uint8_t *const cur, uint32_t n)
{
uint32_t i = 15;
while (i++ != n)
if (cur[i] != cur[i - 15])
break;
return i;
}
We give up analyzing the DRs because n may be < 15:
Creating dr for *_2
analyze_innermost: bla2.c:15:12: missed: failed: evolution of base is not
affine.
base_address:
offset from base address:
constant offset from base address:
step:
base alignment: 0
base misalignment: 0
offset alignment: 0
step alignment: 0
base_object: *_2
Creating dr for *_6
analyze_innermost: bla2.c:15:22: missed: failed: evolution of base is not
affine.
base_address:
offset from base address:
constant offset from base address:
step:
base alignment: 0
base misalignment: 0
offset alignment: 0
step alignment: 0
base_object: *_6
A more complex example, closer to the real loop is:
#define uint32_t unsigned int
#define uint8_t unsigned char
int foo (const uint8_t *const cur, uint32_t len, uint32_t len_limit,
uint32_t pos, uint32_t cur_match)
{
const uint32_t delta = pos - cur_match;
const uint8_t *pb = cur - delta;
while (++len != len_limit)
if (pb[len] != cur[len])
break;
return len;
}
My idea was to "version"/partition the loop or vectorization along len >
len_limit and len <= len_limit.