Hi Boris, Sorry for the delay.
>-----Original Message----- >From: Borislav Petkov [mailto:[email protected]] >Sent: 11 September 2020 17:48 >To: Shiju Jose <[email protected]> >Cc: [email protected]; [email protected]; >[email protected]; [email protected]; [email protected]; >Linuxarm <[email protected]>; Robert Richter <[email protected]> >Subject: Re: [PATCH 1/1] EDAC/ghes: Fix for NULL pointer dereference in >ghes_edac_register() > >On Thu, Aug 27, 2020 at 02:02:27PM +0000, Shiju Jose wrote: >> I tested with your changes and it fixes the issue. I will send v2. > >Btw, I don't know how it managed to work on your machine because even >with this patch, it isn't all fixed because num_dimms needs to be cleared too, >see here: I debug with adding more logs. I found that in our platform hw->num_dimms was 32 when called ghes_edac_register() second time when probe a new ghes instance, the check !(hw->num_dimms % 16) in the enumerate_dimms() passed and it allocated memory for hw->dimms. Thus it did not fail with NULL pointer dereference in ghes_edac_register(). With the your new fix hw->num_dimms reset to 0. > >--- >From: Borislav Petkov <[email protected]> >Date: Fri, 11 Sep 2020 12:55:55 +0200 >Subject: [PATCH] EDAC/ghes: Clear scanned data on unload > >Commit > > b972fdba8665 ("EDAC/ghes: Fix NULL pointer dereference in >ghes_edac_register()") > >didn't clear all the information from the scanned system and, more >specifically, left ghes_hw.num_dimms to its previous value. On a second load >(CONFIG_DEBUG_TEST_DRIVER_REMOVE=y), the driver would use the >leftover num_dimms value which is not 0 and thus the 0 check in >enumerate_dimms() will get bypassed and it would go directly to the pointer >deref: > > d = &hw->dimms[hw->num_dimms]; > >which is, of course, NULL: > > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 0 P4D 0 > Oops: 0002 [#1] PREEMPT SMP > CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #7 > Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 >08/29/2018 > RIP: 0010:enumerate_dimms.cold+0x7b/0x375 > >Reset the whole ghes_hw on driver unregister so that no stale values are >used on a second system scan. > >Fixes: b972fdba8665 ("EDAC/ghes: Fix NULL pointer dereference in >ghes_edac_register()") >Cc: Shiju Jose <[email protected]> >Signed-off-by: Borislav Petkov <[email protected]> >--- > drivers/edac/ghes_edac.c | 1 + > 1 file changed, 1 insertion(+) > >diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index >a6b9c0b2a15c..eb6034a6fbbb 100644 >--- a/drivers/edac/ghes_edac.c >+++ b/drivers/edac/ghes_edac.c >@@ -632,6 +632,7 @@ void ghes_edac_unregister(struct ghes *ghes) > mutex_lock(&ghes_reg_mutex); > > system_scanned = false; >+ memset(&ghes_hw, 0, sizeof(struct ghes_hw_desc)); > > if (!refcount_dec_and_test(&ghes_refcount)) > goto unlock; >-- >2.21.0 > >-- >Regards/Gruss, > Boris. > >https://people.kernel.org/tglx/notes-about-netiquette Thanks, Shiju

