https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670
Bug ID: 89670
Summary: __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be
<31 ?
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: joern at purestorage dot com
Target Milestone: ---
Created attachment 45945
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45945&action=edit
matchlen testcase extracted from lz compressor
I ran across this while working on a LZ compression library. One way of
calculating the match length is through vector-comparison, movemask and ctz.
It is relatively useful because it covers up to 32 equal bytes without branch.
If 32 bytes match, the true match length might be much longer than 32. So
naturally the code contains a branch
if (ml == 32) {
/* calculate actual match length */
}
That branch was optimized away, which surprised me a bit. I have reduced the
problem to the attached testcase. Testcase seems to work fine with gcc 4.8,
but fails with 4.9, 5, 6, 7 and 8. It also fails with clang 3.5, 3.8, 4.0, 6.0
and 7, fwiw.
System is an old Debian unstable, compilers are from Debian.b