Michael Ellerman <m...@ellerman.id.au> writes:
> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm....@kernel.org> writes:
>> From: Nathan Lynch <nath...@linux.ibm.com>
>>
>> The kernel can handle retrying RTAS function calls in response to
>> -2/990x in the sys_rtas() handler instead of relaying the intermediate
>> status to user space.
>
> This looks good in general.
>
> One query ...
>
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 47a2aa43d7d4..c330a22ccc70 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs,
>>  /* We assume to be passed big endian arguments */
>>  SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>>  {
>> -    struct pin_cookie cookie;
>>      struct rtas_args args;
>>      unsigned long flags;
>>      char *buff_copy, *errbuf = NULL;
>> @@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, 
>> uargs)
>>  
>>      buff_copy = get_errorlog_buffer();
>>  
>> -    raw_spin_lock_irqsave(&rtas_lock, flags);
>> -    cookie = lockdep_pin_lock(&rtas_lock);
>> +    do {
>> +            struct pin_cookie cookie;
>>  
>> -    rtas_args = args;
>> -    do_enter_rtas(&rtas_args);
>> -    args = rtas_args;
>> +            raw_spin_lock_irqsave(&rtas_lock, flags);
>> +            cookie = lockdep_pin_lock(&rtas_lock);
>>  
>> -    /* A -1 return code indicates that the last command couldn't
>> -       be completed due to a hardware error. */
>> -    if (be32_to_cpu(args.rets[0]) == -1)
>> -            errbuf = __fetch_rtas_last_error(buff_copy);
>> +            rtas_args = args;
>> +            do_enter_rtas(&rtas_args);
>> +            args = rtas_args;
>>  
>> -    lockdep_unpin_lock(&rtas_lock, cookie);
>> -    raw_spin_unlock_irqrestore(&rtas_lock, flags);
>> +            /*
>> +             * Handle error record retrieval before releasing the lock.
>> +             */
>> +            if (be32_to_cpu(args.rets[0]) == -1)
>> +                    errbuf = __fetch_rtas_last_error(buff_copy);
>> +
>> +            lockdep_unpin_lock(&rtas_lock, cookie);
>> +            raw_spin_unlock_irqrestore(&rtas_lock, flags);
>> +    } while (rtas_busy_delay(be32_to_cpu(args.rets[0])));
>
> rtas_busy_delay_early() has the successive_ext_delays case that will
> break out eventually. But if we keep getting plain RTAS_BUSY back from
> RTAS I *think* this loop will never terminate?

Yes, but if this happens, then there is a serious bug in Linux or
RTAS. The only time I've seen something like that on PowerVM is when
Linux corrupted internal RTAS state by not serializing calls correctly.

rtas_busy_delay_early() has a bail-out heuristic, not for RTAS_BUSY, but
for extended delay statuses (990x), which I suspect happen rarely (if
ever) that early. That's there in order to allow boot to proceed and
hopefully get useful messages out in a truly unexpected circumstance.

That said...

> To avoid that, and just as good manners, I think we should have a
> fatal_signal_pending() check, and if that returns true we bail out of
> the syscall with -EINTR ?

That probably makes sense. In its current state, I could see
this patch preventing or delaying OS shutdown in situations where it
wouldn't have occurred before.

I think I would want the bailout condition in this case to be
(fatal_signal_pending() && retries > some_threshold), to reduce the
likelihood of non-"stuck" operations from being left unfinished. And it
should dump a stack trace.

Reply via email to