Hi James,

Sorry for reply late.

On 2020/4/14 22:53, James Morse wrote:
> Hi Xie,
> 
> On 14/04/2020 13:39, Xie XiuQi wrote:
>> On 2020/4/14 20:16, James Morse wrote:
>>> On 14/04/2020 11:59, Mark Rutland wrote:
>>>> On Fri, Apr 10, 2020 at 09:52:45AM +0800, Xie XiuQi wrote:
>>>>> We should panic even panic_on_oops is not set, when we can't recover
>>>>> from synchronous external abort in kernel context.
>>>
>>> Hmm, fault-from-kernel-context doesn't mean the fault affects the kernel. 
>>> If the kernel is
>>> reading or writing from user-space memory for a syscall, its the user-space 
>>> memory that is
>>> affected. This thread can't make progress, so we kill it.
>>> If its a kernel thread or we were in irq context, we panic().
>>>
>>> I don't think you really want all faults that happen as a result of a 
>>> kernel access to be
>>> fatal!
> 
>> Yes, you're right. I just want to fix a hung up when ras error inject 
>> testing.
>>
>> panic_on_oops is not set in the kernel image for testing. When receiving a 
>> sea in kernel
>> context, the PE trap in do_exit(), and can't return any more.
> 
> trap? gets trapped, (or gets stuck, to prevent confusion with the 
> architectures use of the
> word 'trap'!)
> 
> 
>> I analyze the source code, the call trace might like this:
>>    do_mem_abort
>>      -> arm64_notify_die
>>         -> die                    # kernel context, call die() directly;
>>            -> do_exit             # kernel process context, call 
>> do_exit(SIGSEGV);
>>               -> do_task_dead()   # call do_task_dead(), and hung up this 
>> core;
> 
> Thanks for the trace. This describes a corrected error in your I-cache, that 
> occurred
> while the kernel was executing a kernel thread. These shouldn't be fatal, 
> because it was
> corrected ... but the kernel doesn't know that because it doesn't know how to 
> parse those
> records.
> 
> There are two things wrong here:
> 1. it locks up while trying to kill the thread.
> 2. it tried to kill the thread in the first place!
> 
> For 1, does your l1l2_inject module take any spinlocks or tinker with the 
> pre-empt counter?
> 
> I suspect this is some rarely-tested path in do_task_dead() that sleeps, but 
> can't from
> your l1l2_inject module because the pre-empt counter has been raised.
> 
> CONFIG_DEBUG_ATOMIC_SLEEP may point at the function to blame.
> 
> It may be accessing some thread data that kthreads don't have, taking a 
> second exception
> and blocking on the die_lock. LOCKDEP should catch this one.
> 
> We should fix this one first.
> 

I analyze the l1l2_inject module, there is a kworker thread used to inject 
error.
do_sea() try to kill the kworker thread, so the work(s) queued on this kworker
is blocked.

panic_on_oops option is usually set on production environment, so if someone
unset this option for testing, he should take his own risk. Is it right?

> 
> 2 is a bit more complicated. Today, this is fatal because the arch code 
> assumes this was
> probably a memory error, and if it returns to user-space it can't avoid 
> getting stuck in a
> loop until the scheduled memory_failure() work runs. Instead it 
> unconditionally signals
> the process.
> 
> [0] fixes this up for memory errors. But in this case it will assume all the 
> work has been
> done by APEI, (or will be before we get back to user-space). APEI ignored the 
> processor
> error you fed it, because it doesn't know what they are, they are just 
> printed out.
> 
> This is fine for corrected errors, but were are reliant on your firmware 
> describing
> uncorrected errors with a 'fatal' severity... which might be too heavy a 
> hammer. (Ideally
> that would mean 'uncontained', and the kernel should handle, or detect it 
> can't, any other
> errror...)
> 
> We can discuss the right thing to do here when support for parsing these 'arm 
> processor
> errors' is posted.
> (If you think I need to do something different in [0] because of this, please 
> shout!)

For some errors which could be recoverable or corrected, do_sea() should not 
kill process
or die() unconditionally. It's better detect this situation.

> 
> 
>> [  387.740609] {1}[Hardware Error]: Hardware error from APEI Generic 
>> Hardware Error Source: 9
>> [  387.748837] {1}[Hardware Error]: event severity: recoverable
>> [  387.754470] {1}[Hardware Error]:  Error 0, type: recoverable
> 
>> [  387.760103] {1}[Hardware Error]:   section_type: ARM processor error
> 
> et voila! Case 2. Linux doesn't handle these 'ARM processor error' things, 
> because it
> doesn't know what they are. It just prints them out.
> 
> 
>> [  387.766425] {1}[Hardware Error]:   MIDR: 0x00000000481fd010
>> [  387.771972] {1}[Hardware Error]:   Multiprocessor Affinity Register 
>> (MPIDR): 0x0000000081080000
>> [  387.780628] {1}[Hardware Error]:   error affinity level: 0
>> [  387.786088] {1}[Hardware Error]:   running state: 0x1
>> [  387.791115] {1}[Hardware Error]:   Power State Coordination Interface 
>> state: 0
>> [  387.798301] {1}[Hardware Error]:   Error info structure 0:
>> [  387.803761] {1}[Hardware Error]:   num errors: 1
>> [  387.808356] {1}[Hardware Error]:    error_type: 0, cache error
>> [  387.814160] {1}[Hardware Error]:    error_info: 0x0000000024400017
>> [  387.820311] {1}[Hardware Error]:     transaction type: Instruction
>> [  387.826461] {1}[Hardware Error]:     operation type: Generic error (type 
>> cannot be determined)
>> [  387.835031] {1}[Hardware Error]:     cache level: 1
> 
>> [  387.839878] {1}[Hardware Error]:     the error has been corrected
> 
> As this is corrected, I think the bug is a deadlock somewhere in 
> do_task_dead().
> 
> 
>> [  387.845942] {1}[Hardware Error]:    physical fault address: 
>> 0x00000027caf50770
> 
> (and your firmware gives you the physical address, excellent, the kernel can 
> do something
> with this!)
> 
> 
>> [  388.021037] Call trace:
>> [  388.023475]  lsu_inj_ue+0x58/0x70 [l1l2_inject]
>> [  388.029019]  error_inject+0x64/0xb0 [l1l2_inject]
>> [  388.033707]  process_one_work+0x158/0x4b8
>> [  388.037699]  worker_thread+0x50/0x498
>> [  388.041348]  kthread+0xfc/0x128
>> [  388.044480]  ret_from_fork+0x10/0x1c
>> [  388.048042] Code: b2790000 d519f780 f9800020 d5033f9f (58001001)
>> [  388.054109] ---[ end trace 39d51c21b0e42ba6 ]---
>>
>> core 0 hung up at here.
> 
> DEBUG_ATOMIC_SLEEP and maybe LOCKDEP should help you pin down where the 
> kernel is getting
> stuck. It looks like a bug in the core code.
> 
> 
> Thanks,
> 
> James
> 
> [0] 
> https://lore.kernel.org/linux-acpi/[email protected]/
> .
> 

Reply via email to