https://bugs.kde.org/show_bug.cgi?id=401828

--- Comment #9 from Will Schmidt <will_schm...@vnet.ibm.com> ---
(In reply to Julian Seward from comment #8)
> I'm confused about the top level diagnosis here.  I see two possibilities:
> 
> (1) If the test program, when run directly (meaning, not on V) produces
>     different results depending on compiler version and/or optimisation
>     level, then the test program is buggy.
> 
> (2) If (1) isn't the case, and instead, the test program produces different
>     results when run directly vs when run on V, then V is buggy.

^ this one (option 2).  :-)

The testcase only fails when the testcase is run under valgrind and the
testcase is built with a newer gcc. 

> Note that"run directly" really does mean run directly on the machine,
> and not merely "whatever is in the .stdout.exp file".

Correct.  As part of running this down I iterated on debug while running the
test both standalone and under valgrind.  


> Will: have you definitely excluded (1) as a possibility?  My initial 
> assumption here was that this problem is (1), but from the comments above
> it's hard to be sure either way.

Yup.  Comment 1 should have include that detail, but lots of associated
subsequent noise to help confuse.


I have previously spoken to Michael Meissner to confirm the behavior of the GCC
change, this explanation is based on his comments, and in hindsight fits and
explains what we were seeing.

The xscvdpsp,xscvdpspn,xscvdpuxws instructions each convert double precision
values to single precision values, and write the results into bits 0-32 of the
128 bit target register.  To get the value into the normal position for a
scalar register it needed to be right-shifted 32 bits, so gcc always did that. 
For Power9 we (toolchain) requested hardware put the result into the second
32-bit section so we could avoid the shift.  It was then that we realized
hardware was already putting the result into *both* of those 32-bit sections,
so the compiler removed the (redundant) shift.

Valgrind emulation only wrote the result to the first 32-bit section, so the
issue was exposed when GCC dropped doing that redundant shift.

The proposed patch duplicates the result into the second 32-bit section of the
target register.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to