https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

--- Comment #15 from Vineet Gupta <vgupta at synopsys dot com> ---
(In reply to Linus Torvalds from comment #14)
> (In reply to Vineet Gupta from comment #13)
> > Sorry the workaround proposed by Alexander doesn't seem to cure it (patch
> > attached), outcome is the same
> 
> Vineet - it's not the ldd/std that is necessarily buggy, it's the earlier
> tests of the address that guard that vectorized path. 
> 
> So your quoted parts of the code generation aren't necessarily the
> problematic ones.

/me slaps myself. How can I be so stupid.

> Did you actually test the code and check whether it has the same issue?
> Maybe it changed the address limit guards before that ldd/std?

The problem is is indeed gone. I need to analyze the assembly fully how it
prevents the bad case. e.g. I'm still not comfortable seeing the loop entered
with following and it doing 8 byte ldd/std when we know it should only do 2 at
a time.

r21 = 0xbf178036  (pre-increment so 0x3e will be first src)
r22 = 0xbf1780b2
LPC = 4

80d9a360:       lp      12      ;80d9a36c <inflate_fast+0x2c0>
80d9a364:       ldd.a   r18r19,[r21,8]
80d9a368:       std.ab  r18r19,[r22,8]

> I also sent you a separate patch to test if just upgrading to a newer
> version of the zlib code helps. Although that may be buggy for other
> reasons, it's not like I actually tested the end result.. But it would be
> interesting to hear if that one works for you (again, ldd/std might be a
> valid end result of trying to vectorize that code assuming the aliasing
> tests are done correctly in the vectorized loop headers).

Thx for that. And this seems to boot as well.

Reply via email to