https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
--- Comment #15 from Vineet Gupta <vgupta at synopsys dot com> --- (In reply to Linus Torvalds from comment #14) > (In reply to Vineet Gupta from comment #13) > > Sorry the workaround proposed by Alexander doesn't seem to cure it (patch > > attached), outcome is the same > > Vineet - it's not the ldd/std that is necessarily buggy, it's the earlier > tests of the address that guard that vectorized path. > > So your quoted parts of the code generation aren't necessarily the > problematic ones. /me slaps myself. How can I be so stupid. > Did you actually test the code and check whether it has the same issue? > Maybe it changed the address limit guards before that ldd/std? The problem is is indeed gone. I need to analyze the assembly fully how it prevents the bad case. e.g. I'm still not comfortable seeing the loop entered with following and it doing 8 byte ldd/std when we know it should only do 2 at a time. r21 = 0xbf178036 (pre-increment so 0x3e will be first src) r22 = 0xbf1780b2 LPC = 4 80d9a360: lp 12 ;80d9a36c <inflate_fast+0x2c0> 80d9a364: ldd.a r18r19,[r21,8] 80d9a368: std.ab r18r19,[r22,8] > I also sent you a separate patch to test if just upgrading to a newer > version of the zlib code helps. Although that may be buggy for other > reasons, it's not like I actually tested the end result.. But it would be > interesting to hear if that one works for you (again, ldd/std might be a > valid end result of trying to vectorize that code assuming the aliasing > tests are done correctly in the vectorized loop headers). Thx for that. And this seems to boot as well.