>>>>>>> The existing 'sysret_rip' selftest asserts that 'regs->r11 ==
>>>>>>> regs->flags'. This check relies on the behavior of the SYSCALL
>>>>>>> instruction on legacy x86_64, which saves 'RFLAGS' into 'R11'.
>>>>>>> However, on systems with FRED (Flexible Return and Event Delivery)
>>>>>>> enabled, instead of using registers, all state is saved onto the stack.
>>>>>>> Consequently, 'R11' retains its userspace value, causing the assertion
>>>>>>> to fail.
>>>>>>> Fix this by detecting if FRED is enabled and skipping the register
>>>>>>> assertion in that case. The detection is done by checking if the RPL
>>>>>>> bits of the GS selector are preserved after a hardware exception.
>>>>>>> IDT (via IRET) clears the RPL bits of NULL selectors, while FRED (via
>>>>>>> ERETU) preserves them.
>>>>>> I don't really like this. I think we have two credible choices:
>>>>>> 1. Define the Linux ABI to be that, on FRED systems, SYSCALL preserves
>>>>>> R11 and RCX on entry and exit. And update the test to actually test
>>>>>> this.
>>>>>> 2. Define the Linux ABI to be what it has been for quite a few years:
>>>>>> SYSCALL entry copies RFLAGS to R11 and RIP to RCX and SYSCALL exit
>>>>>> preserves all registers.
>>>>>> I'm in favor of #2. People love making new programming languages and
>>>>>> runtimes and inline asm and, these days, vibe coded crap. And it's
>>>>>> *easier* to emit a SYSCALL and forget to tell the compiler / code
>>>>>> generator that RCX and R11 are clobbered than it is to remember that
>>>>>> they're clobbered. And it's easy to test on FRED (well, not really,
>>>>>> but it hopefully will be some day) and it's easy to publish one's
>>>>>> code, and then everyone is a bit screwed when the resulting program
>>>>>> crashes sometimes on non-FRED systems. And it will be miserable to
>>>>>> debug.
>>>>>> (It's *really* *really* easy to screw this up in a way that sort of
>>>>>> works even on non-FRED: RCX and R11 are usually clobbered across
>>>>>> function calls, so one can get into a situation in which one's
>>>>>> generated code usually doesn't require that SYSCALL preserve one of
>>>>>> these registers until an inlining decision changes or some code gets
>>>>>> reordered, and then it will start failing. And making the failure
>>>>>> depend on hardware details is just nasty.
>>>>>> So I think we should add the ~2 lines of code to fix the SYSCALL entry
>>>>>> on FRED to match non-FRED.
>>>>> Yes; I'm afraid I have to concur. Preserving the clobber on entry for
>>>>> FRED systems is by far the safest choice.
>>>>> Aside from this selftest, fancy debuggers and anything that can transfer
>>>>> userspace state between machines might be 'surprised'.
>>>> Thanks Andy and Peter.
>>>> Indeed, making the selftest branch on FRED vs. non-FRED behavior
>>>> is not a good practice. The selftest should validate ABI consistency.
>>>> I agree with Andy's option #2, so this should be fixed in the FRED
>>>> syscall entry implementation.
>>>> Li Xin, does this direction look right to you? I can assit with
>>>> validation and keep the selftest aligned with the agreed ABI.
>>> Yes, consistency should take precedence over hardware-specific variations.
>>> I would like to hear from Andrew Cooper and hpa before we do it.
>> Per Andy’s suggestion, the change would be:
>> diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
>> index 88c757ac8ccd..a19898747a2c 100644
>> --- a/arch/x86/entry/entry_fred.c
>> +++ b/arch/x86/entry/entry_fred.c
>> @@ -79,6 +79,9 @@ static __always_inline void fred_other(struct pt_regs
>> *regs)
>> {
>> /* The compiler can fold these conditions into a single test */
>> if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.l)) {
>> + regs->cx = regs->ip;
>> + regs->r11 = regs->flags;
>> +
>> regs->orig_ax = regs->ax;
>> regs->ax = -ENOSYS;
>> do_syscall_64(regs, regs->orig_ax);
>> It adds 4 extra MOVs on this hot path, but I don’t see it's a problem here.
>
> We discussed this over a year ago, and at that point agreed that reserving
> the register was the desired behavior. Why has this changed now?
Yes, that is technically simpler and cleaner.
The question brought up by Andy is, is the RCX/R11 clobbering behavior an
established architectural contract, or is it an implementation detail that
software ignores?
But both are hard to prove.
I think Andy and PeterZ want to be on the safer side, i.e., this clobbering
behavior is established.