On 2020-04-06 12:17:22 Mon, Nicholas Piggin wrote:
> Ganesh's on April 4, 2020 11:05 pm:
> > On 4/3/20 7:38 AM, Nicholas Piggin wrote:
> > 
> >> Ganesh Goudar's on March 30, 2020 5:12 pm:
> >>> From: Santosh S <sant...@fossix.org>
> >>>
> >>> Introduce notification chain which lets know about uncorrected memory
> >>> errors(UE). This would help prospective users in pmem or nvdimm subsystem
> >>> to track bad blocks for better handling of persistent memory allocations.
> >>>
> >>> Signed-off-by: Santosh S <sant...@fossix.org>
> >>> Signed-off-by: Ganesh Goudar <ganes...@linux.ibm.com>
> >> Do you have any such users yet? It would be good to refer to an example
> >> user and give a brief description of what it does in its notifier.
> > 
> > Santosh has sent a patch which uses this notification.
> > https://patchwork.ozlabs.org/patch/1265062/
> 
> Okay. So these things are asynchronous after the machine check. I guess
> that's the design of it and memory offlining does something similar by
> the looks, but how do you prevent the memory being allocated for 
> something else before the notifiers run?

We can't. This race even exists today when we call memory_failure(). If
the same memory is allocated again then we may hit another mce on same
address when touched until the subsystem that has resistered for
notification has completed handling the notified address.

Thanks,
-Mahesh.

> 
> >>> @@ -263,6 +277,7 @@ static void machine_process_ue_event(struct 
> >>> work_struct *work)
> >>>           while (__this_cpu_read(mce_ue_count) > 0) {
> >>>                   index = __this_cpu_read(mce_ue_count) - 1;
> >>>                   evt = this_cpu_ptr(&mce_ue_event_queue[index]);
> >>> +         blocking_notifier_call_chain(&mce_notifier_list, 0, evt);
> >> Can we really use a blocking notifier here? I'm not sure that we can.
> > 
> > I think we can, do you see any problem?
> 
> No it looks okay after better look, sorry for the noise.
> 
> Thanks,
> Nick

-- 
Mahesh J Salgaonkar

Reply via email to