Hi Carl,

on 2023/5/25 23:59, Carl Love wrote:
> Peter, Kewen:
> 
> On Thu, 2023-05-25 at 13:28 +0800, Kewen.Lin wrote:
>> on 2023/5/24 23:20, Carl Love wrote:
>>> On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
>>>> on 2023/5/24 06:30, Peter Bergner wrote:
>>>>> On 5/23/23 12:24 AM, Kewen.Lin wrote:
>>>>>> on 2023/5/23 01:31, Carl Love wrote:
>>>>>>> The builtins were requested for use in GLibC.  As of
>>>>>>> version
>>>>>>> 2.31 they
>>>>>>> were added as inline asm.  They requested a builtin so the
>>>>>>> asm
>>>>>>> could be
>>>>>>> removed.
>>>>>>
>>>>>> So IMHO we also want the similar support for mffscrn, that is
>>>>>> to
>>>>>> make
>>>>>> use of mffscrn and mffscrni on Power9 and later, but falls
>>>>>> back
>>>>>> to 
>>>>>> __builtin_set_fpscr_rn + mffs similar on older platforms.
>>>>>
>>>>> So __builtin_set_fpscr_rn everything we want (sets the RN bits)
>>>>> and
>>>>> uses mffscrn/mffscrni on P9 and later and uses older insns on
>>>>> pre-
>>>>> P9.
>>>>> The only problem is we don't return the current FPSCR bits, as
>>>>> the
>>>>> bif
>>>>> is defined to return void.
>>>>
>>>> Yes.
>>>>
>>>>> Crazy idea, but could we extend the built-in
>>>>> with an overload that returns the FPSCR bits?  
>>>>
>>>> So you agree that we should make this proposed new bif handle
>>>> pre-P9
>>>> just
>>>> like some other existing bifs. :)  I think extending it is good
>>>> and
>>>> doable,
>>>> but the only concern here is the bif name
>>>> "__builtin_set_fpscr_rn",
>>>> which
>>>> matches the existing behavior (only set rounding) but doesn't
>>>> match
>>>> the
>>>> proposed extending behavior (set rounding and get some env bits
>>>> back).
>>>> Maybe it's not a big deal if the documentation clarify it well.
>>>
>>> Extending the builtin to pre Power 9 is straight forward and I
>>> agree
>>> would make good sense to do.
>>>
>>> I am a bit concerned on how to extend __builtin_set_fpscr_rn to add
>>> the
>>> new functionality.  Peter suggests overloading the builtin to
>>> either
>>> return void or returns FPSCR bits.  It is my understanding that the
>>> return value for a given builtin had to be the same, i.e. you can't
>>> overload the return value. Maybe you can with Bill's new
>>> infrastructure?  I recall having problems trying to overload the
>>> return
>>> value in the past and Bill said you couldn't do it.  I play with
>>> this
>>> and see if I can overload the return value.
>>
>> Your understanding on that we fail to overload this for just
>> different
>> return types is correct.  But previously I interpreted the extending
>> proposal as to extend
>>
>>   void __builtin_set_fpscr_rn (int);
>>
>> to 
>>
>>   void __builtin_set_fpscr_rn (int, double*);
>>
>> The related address taken and store here can be optimized out
>> normally.
> 
> I don't think that is correct.   The current definition of the builtin

Just to clarify: I didn't meant to suggest it, just explained why I
thought overloading is doable previously as I interpreted it with one
extended argument. :)

> is:
> 
>      void __builtin_set_fpscr_rn (int);
> 
> The proposal by Peter was to change the return type to double, i.e.
> 
>      double __builtin_set_fpscr_rn (int);
> 
> Peter also said the following:
> 
>    The built-in machinery can see that the usage is expecting a return
>    value or not and for the pre-P9 code, can skip generating the ending
>    mffs if we don't want the return value.
> 
> Which I don't think we want.  The mffscrn and mffscrni instructions
> return the contents of the control bits in the FPSCR, that is, bits
> 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), are placed
> into the corresponding bits in register FRT. All other bits in register
> FRT are set to 0.  
> 
> The instructions also updates the current RN field of the FPSCR with
> the new RN supplied the second argument of the instruction.  So, the
> instructions update the RN field just like the __builtin_set_fpscr_rn. 
> So, we can use the existing __builtin_set_fpscr_rn to update the RN for
> all ISAs, we just need to have __builtin_set_fpscr_rn always return a
> double with the desired fields from the FPSCR (the current RN).  This
> will then emulate the behavior of the mffscrn and mffscrni
> instructions.  The current uses of __builtin_set_fpscr_rn will just
> ignore the return value which is not a problem.  The return value can
> be stored in the places were the user is currently using the inline asm
> for the mffscrn and mffscrni instructions.
> 
> The __builtin_set_fpscr_rn builtin is currently using the mffscrn and
> mffscrni on Power 9 and throwing away the result from the instruction. 
> We just need to change __builtin_set_fpscr_rn to return the value
> instead.  For the pre Power 9 code, the builtin will need to read the
> full FPSCR, mask of the desired fields and return the fields.
> 
> So, there is no need for the builtin to have to determine if the user
> is storing the result of the __builtin_set_fpscr_rn.  The RN bits will
> always be updated by the __builtin_set_fpscr_rn builtin and the
> existing fields of the FPSCR will always be returned by the builtin.

Yeah, I agree, even with pre-P9 code when the returned value is unused,
I'd expect DCE can eliminate the part for the FPSCR bits reading and
masking, it's just like before (only setting RN bits).

The only concern I mentioned before is the built-in name doesn't clearly
match what it does (with extending, it returns something instead) since
it's only saying "set" and setting RN bits, the return value is easily
misunderstood as returning old RN bits, the documentation has to explain
and note it well.

Looking forward to Segher's opinion on this.

BR,
Kewen

> 
> Please let me know if you agree.  I think I have this sorted out
> correctly.  Thanks.
> 
>                         Carl 
> 


Reply via email to