On 7/14/2022 5:58 PM, Dan Williams wrote: [..] >>> >>>>> However, the ARS engine likely can return the precise error ranges so I >>>>> think the fix is to just use the address range indicated by 1UL << >>>>> MCI_MISC_ADDR_LSB(mce->misc) to filter the results from a short ARS >>>>> scrub request to ask the device for the precise error list. >>>> >>>> You mean for nfit_handle_mce() callback to issue a short ARS per each >>>> poison report over a 4K range >>> >>> Over a L1_CACHE_BYTES range... >>> [..] >>> >>> For the badrange tracking, no. So this would just be a check to say >>> "Yes, CPU I see you think the whole 4K is gone, but lets double check >>> with more precise information for what gets placed in the badrange >>> tracking". >> >> Okay, process-wise, this is what I am seeing - >> >> - for each poison, nfit_handle_mce() issues a short ARS given (addr, >> 64bytes) > > Why would the short-ARS be performed over a 64-byte span when the MCE > reported a 4K aligned event?
Cuz you said so, see above. :) Yes, 4K range as reported by the MCE makes sense. > >> - and short ARS returns to say that's actually (addr, 256bytes), >> - and then nvdimm_bus_add_badrange() logs the poison in (addr, 512bytes) >> anyway. > > Right, I am reacting to the fact that the patch is picking 512 as an > arbtitrary blast radius. It's ok to expand the blast radius from > hardware when, for example, recording a 64-byte MCE in badrange which > only understands 512 byte records, but it's not ok to take a 4K MCE and > trim it to 512 bytes without asking hardware for a more precise report. Agreed. > > Recall that the NFIT driver supports platforms that may not offer ARS. > In that case the 4K MCE from the CPU is all that the driver gets and > there is no data source for a more precise answer. > > So the ask is to avoid trimming the blast radius of MCE reports unless > and until a short-ARS says otherwise. > What happens to short ARS on a platform that doesn't support ARS? -EOPNOTSUPPORTED ? >> The precise badrange from short ARS is lost in the process, given the >> time spent visiting the BIOS, what's the gain? > > Generic support for not under-recording poison on platforms that do not > support ARS. > >> Could we defer the precise badrange until there is consumer of the >> information? > > Ideally the consumer is immediate and this precise information can make > it to the filesystem which might be able to make a better decision about > what data got clobbered. > > See dax_notify_failure() infrastructure currently in linux-next that can > convey poison events to filesystems. That might be a path to start > tracking and reporting precise failure information to address the > constraints of the badrange implementation. Yes, I'm aware of dax_notify_failure(), but would appreciate if you don't mind to elaborate on how the code path could be leveraged for precise badrange implementation. My understanding is that dax_notify_failure() is in the path of synchronous fault accompanied by SIGBUS with BUS_MCEERR_AR. But badrange could be recorded without poison being consumed, even without DAX filesystem in the picture. thanks, -jane