Jane Chu wrote: > On 8/1/2022 9:44 AM, Dan Williams wrote: > > Jane Chu wrote: > >> With Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine > >> poison granularity") that changed nfit_handle_mce() callback to report > >> badrange according to 1ULL << MCI_MISC_ADDR_LSB(mce->misc), it's been > >> discovered that the mce->misc LSB field is 0x1000 bytes, hence injecting > >> 2 back-to-back poisons and the driver ends up logging 8 badblocks, > >> because 0x1000 bytes is 8 512-byte. > >> > >> Dan Williams noticed that apei_mce_report_mem_error() hardcode > >> the LSB field to PAGE_SHIFT instead of consulting the input > >> struct cper_sec_mem_err record. So change to rely on hardware whenever > >> support is available. > >> > >> Link: > >> https://lore.kernel.org/r/7ed50fd8-521e-cade-77b1-738b8bfb8...@oracle.com > >> > >> Reviewed-by: Dan Williams <dan.j.willi...@intel.com> > >> Signed-off-by: Jane Chu <jane....@oracle.com> > >> --- > >> arch/x86/kernel/cpu/mce/apei.c | 14 +++++++++++++- > >> 1 file changed, 13 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/x86/kernel/cpu/mce/apei.c > >> b/arch/x86/kernel/cpu/mce/apei.c > >> index 717192915f28..2c7ea0ba9dd7 100644 > >> --- a/arch/x86/kernel/cpu/mce/apei.c > >> +++ b/arch/x86/kernel/cpu/mce/apei.c > >> @@ -29,15 +29,27 @@ > >> void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err > >> *mem_err) > >> { > >> struct mce m; > >> + int lsb = PAGE_SHIFT; > >> > >> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) > >> return; > >> > >> + /* > >> + * Even if the ->validation_bits are set for address mask, > >> + * to be extra safe, check and reject an error radius '0', > >> + * and fallback to the default page size. > >> + */ > >> + if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK) { > >> + lsb = __ffs64(mem_err->physical_addr_mask); > >> + if (lsb == 1) > > > > This was the reason I recommended hweight64 and min_not_zero() as > > hweight64 does not have the undefined behavior. However, an even better > > option is to just do: > > > > find_first_bit(&mem_err->physical_addr_mask, PAGE_SHIFT) > > > > ...as that trims the result to the PAGE_SHIFT max and handles the zero > > case. > > Thanks Dan! However it looks like find_first_bit() could call into > __ffs(x) which has the same limitation as __ffs64(x), as Tony pointed out.
Not quite, no. __ffs() behavior is *undefined* if the input is zero. find_first_bit() is *defined* and returns @size is the input is zero. Which is the behavior this wants to default to PAGE_SHIFT in the absence of any smaller granularity information.