On 2021/12/21 10:19, Xionghu Luo via Gcc-patches wrote:
> 
> 
> On 2021/12/21 09:32, David Edelsohn wrote:
>> On Mon, Dec 20, 2021 at 6:55 PM Segher Boessenkool
>> <seg...@kernel.crashing.org> wrote:
>>>
>>> On Mon, Dec 20, 2021 at 11:45:45AM -0500, David Edelsohn wrote:
>>>> On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo <luo...@linux.ibm.com> wrote:
>>>>> These four UNSPECS seems could be replaced with native RTL, and why
>>>>> "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))"
>>>>> in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT):
>>>>>
>>>>>   This bit is sticky; that is, once set to 1 it
>>>>>   remains set to 1 until it is set to 0 by an
>>>>>   mtvscr instruction.
>>>>>
>>>>> The RTL pattern set it to 0 but final ASM doesn't present it?  And why
>>>>> not use Clobber VSCR_REGNO instead?
>>>>
>>>> The design came from the early implementation of Altivec:
>>>>
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html
> 
> Thanks, I constructed a testcase for this,
> 
> cat vadds.c
> #include <stdio.h>
> #include <altivec.h>
> vector signed int foo1 (vector signed int a, vector signed int c) {
>   vector signed int ret = vec_vaddsws (a, c);
>   union {vector unsigned short v; unsigned short s[8];} u;
>   u.v = vec_mfvscr();
>   printf("foo1: vscr == { %d,%d,%d,%d,%d,%d,%d,%d }\n", u.s[0], u.s[1],
> u.s[2],
>          u.s[3], u.s[4], u.s[5], u.s[6], u.s[7]);
>   return a;
> }
> 
> vector signed int foo2 (vector signed int a, vector signed int c) {
>   vector signed int ret = vec_vaddsws (a, c);
>   union {vector unsigned short v; unsigned short s[8];} u;
>   u.v = vec_mfvscr();
>   printf("foo2: vscr == { %d,%d,%d,%d,%d,%d,%d,%d }\n", u.s[0], u.s[1],
> u.s[2],
>          u.s[3], u.s[4], u.s[5], u.s[6], u.s[7]);
>   return ret;
> }
> 
> int main ()
> {
>   vector signed int a = {0x12345678, 0x22345678, 0x32345678, 0x42345678};
>   vector signed int c = {0x12345678, 0x22345678, 0x32345678, 0x42345678};
>   vector signed int b1 = foo1 (a, c);
>   vector signed int b2 = foo2 (a, c);
>   return b1[0]+b2[0];
> }
> 
> The output is:
> 
> ./a.out
> foo1: vscr == { 0,0,0,0,0,0,0,0 }
> foo2: vscr == { 1,0,0,0,0,0,0,0 }
> 
> 
> So is this expected behavior?  Seems doesn't meet with Aldy's
> description...  (foo1's temp is optimized away so no vaddsws produced)


Just realized that foo1's "ret" is optimized way in gimple already, so
no such RTL and vaddsws generated in foo1, only foo2's vaddsws will set
VSCR explicitly.

> 
> " foo()
> {
>   vector int result;
> 
>   result = vec_adds(blah, blah);
>   __check_for_saturation__
> }
> 
> gcc, will optimize away vec_adds() because the result is a local
> variable unused later.  then when we check the saturation bit in VSCR,
> we get wrong results.
> 
> this patch explains to gcc all about VSCR, and adds it as a global
> register as well."
> 
> 
>>>>
>>>> If one later checks for saturation (reads VSCR), one needs a
>>>> corresponding SET of the value.  It's set in an architecture-specific
>>>> manner that isn't described to GCC, but it's set, not just clobbered
>>>> and in an undefined state.
>>>
>>> Well.  RTL clobber and set do exactly the same thing, except with
>>> clobber it is not specified *what* value is set.  All bits are set, all
>>> bits are defined.  There is no (direct) way in RTL to say
>>> "undetermined".
>>>
>>> An RTL clobber would work just fine afaics?
>>
>> I don't know about the original intention from Aldy, but if one were
>> looking at an RTL dump and the code used the saturation bit from VSCR,
>> it might be confusing to see a CLOBBER instead of a SET.  The SET
>> documents that VSCR_REGNO is assigned a specific value; GCC doesn't
>> know about the semantics, but it's not some undefined bit pattern.
>> CLOBBER implies a trash value or a value that one will not query
>> later, i.e., one would want to SET the register to a specific value
>> before using it.

Agree that SET is better than CLOBBER.  Thank you!

>>
>>>
>>>> The RTL does not describe that VSCR is set to the value 0.  The
>>>> (const_int 0) is not the value set.  You can think of the (const_int
>>>> 0) as a dummy RTL argument to the VSCR UNSPEC.  UNSPEC requires at
>>>> least one argument and the pattern doesn't try to express the
>>>> argument, so it uses a dummy RTL constant.
>>>
>>> Yup.  Traditionally (pc) was used for this.  Nowadays (const_int 0) is
>>> not really more expensive anymore, and many people find it clearer (but
>>> not in this case it seems :-) ).
>>>
>>>> It's part of a PARALLEL
>>>> and the plus or minus already expresses the data dependency of the
>>>> pattern on the input operands.
>>>
>>> But they do not describe any dependency on vscr, or output to it.  This
>>> is the same problem we have with fpscr (most FP insns use some of its
>>> fields, most set some, but there is no way to cleanly express that).
>>
>> It describes that VSCR_REGNO is set, an output. It doesn't describe
>> how it is set nor inform the compiler that the value depends on the
>> input operands in some complicated way unknown to the compiler, but
>> the compiler cannot do anything useful with the additional
>> information.
>>
>>>
>>> Explicit clobbers like this help one side of the issue.  For vscr, other
>>> than the sat bit there is only the nj bit, and we just ignore that :-)
>>>
>>>> This patch is okay.  Thanks for updating the machine description and
>>>> for cleaning up the formatting.
>>>
>>> x2.  Thanks!
>>>
>>>
>>> Segher
> 

-- 
Thanks,
Xionghu

Reply via email to