On 06/03/17 12:54, Alexey Kardashevskiy wrote: > On 06/03/17 10:22, Gavin Shan wrote: >> On Fri, Mar 03, 2017 at 04:59:11PM +1100, Alexey Kardashevskiy wrote: >>> On 03/03/17 15:47, Russell Currey wrote: >>>> eeh_handle_special_event() is called when an EEH event is detected but >>>> can't be narrowed down to a specific PE. This function looks through >>>> every PE to find one in an erroneous state, then calls the regular event >>>> handler eeh_handle_normal_event() once it knows which PE has an error. >>>> >>>> However, if eeh_handle_normal_event() found that the PE cannot possibly >>>> be recovered, it will remove the PE and associated devices. This leads >>>> to a use after free in eeh_handle_special_event() as it attempts to clear >>>> the "recovering" state on the PE after eeh_handle_normal_event() returns. >>>> >>>> Thus, make sure the PE is valid when attempting to clear state in >>>> eeh_handle_special_event(). >>>> >>>> Cc: <sta...@vger.kernel.org> #3.10+ >>>> Reported-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>> Signed-off-by: Russell Currey <rus...@russell.cc> >>>> --- >>>> arch/powerpc/kernel/eeh_driver.c | 13 +++++++++++++ >>>> 1 file changed, 13 insertions(+) >>>> >>>> diff --git a/arch/powerpc/kernel/eeh_driver.c >>>> b/arch/powerpc/kernel/eeh_driver.c >>>> index b94887165a10..492397298a2a 100644 >>>> --- a/arch/powerpc/kernel/eeh_driver.c >>>> +++ b/arch/powerpc/kernel/eeh_driver.c >>>> @@ -983,6 +983,19 @@ static void eeh_handle_special_event(void) >>>> if (rc == EEH_NEXT_ERR_FROZEN_PE || >>>> rc == EEH_NEXT_ERR_FENCED_PHB) { >>>> eeh_handle_normal_event(pe); >>>> + >>>> + /* >>>> + * eeh_handle_normal_event() can free the PE if it >>>> + * determines that the PE cannot possibly be recovered. >>>> + * Make sure the PE still exists before changing its >>>> + * state. >>>> + */ >>>> + if (!pe || (pe->type & EEH_PE_INVALID) >>>> + || (pe->state & EEH_PE_REMOVED)) { >>> >>> >>> The bug is that pe becomes stale after eeh_handle_normal_event() returned >>> and dereferencing it afterwards is broken. >>> >> >> Correct, it won't cause a kernel crash as @pe is deferencing linear mapped >> area whose address is always valid. > > Dereferencing pe would not crash but dereferencing any pointer from the > pnv_ioda_pe struct would (as it would random stuff or a poison). > > >> I think the proper fix would be to use >> eeh_handle_normal_event() to indicate the @pe has been released and don't >> access it any more. > > Correct. The problem is that the callstack from my other reply is a bit too > long to make an trivial patch :)
Any update on this? > > > >>> >>> >>>> + pr_warn("EEH: not clearing state on bad PE\n"); >> >> The message like this isn't meaningful, no need to have it. The messages that >> have prefix "EEH:" is informative messages. We definitely needn't this here. >> However, the message might be not needed in next revision. >> >>>> + continue; >>>> + } >>>> + >>>> eeh_pe_state_clear(pe, EEH_PE_RECOVERING); >>>> } else { >>>> pci_lock_rescan_remove(); >>>> >> >> Thanks, >> Gavin >> > > -- Alexey