Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-31 Thread Aili Yao
On Wed, 31 Mar 2021 08:44:53 +0200 David Hildenbrand wrote: > On 31.03.21 06:32, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Wed, Mar 31, 2021 at 10:43:36AM +0800, Aili Yao wrote: > >> On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) > >> wrote: > >>> On Fri, Mar 26, 2021 at 03:22:49PM

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-31 Thread Aili Yao
On Wed, 31 Mar 2021 08:44:53 +0200 David Hildenbrand wrote: > On 31.03.21 06:32, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Wed, Mar 31, 2021 at 10:43:36AM +0800, Aili Yao wrote: > >> On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) > >> wrote: > >>> On Fri, Mar 26, 2021 at 03:22:49PM

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-31 Thread David Hildenbrand
On 31.03.21 08:53, HORIGUCHI NAOYA(堀口 直也) wrote: On Wed, Mar 31, 2021 at 07:07:39AM +0100, Matthew Wilcox wrote: On Wed, Mar 31, 2021 at 01:52:59AM +, HORIGUCHI NAOYA(堀口 直也) wrote: If we successfully unmapped but failed in truncate_error_page() for example, the processes mapping the page

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-31 Thread 堀口 直也
On Wed, Mar 31, 2021 at 07:07:39AM +0100, Matthew Wilcox wrote: > On Wed, Mar 31, 2021 at 01:52:59AM +, HORIGUCHI NAOYA(堀口 直也) wrote: > > If we successfully unmapped but failed in truncate_error_page() for example, > > the processes mapping the page would get -EFAULT as expected. But even in

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-31 Thread David Hildenbrand
On 31.03.21 06:32, HORIGUCHI NAOYA(堀口 直也) wrote: On Wed, Mar 31, 2021 at 10:43:36AM +0800, Aili Yao wrote: On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) wrote: On Fri, Mar 26, 2021 at 03:22:49PM +0100, David Hildenbrand wrote: On 26.03.21 15:09, David Hildenbrand wrote: On

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-31 Thread Matthew Wilcox
On Wed, Mar 31, 2021 at 01:52:59AM +, HORIGUCHI NAOYA(堀口 直也) wrote: > If we successfully unmapped but failed in truncate_error_page() for example, > the processes mapping the page would get -EFAULT as expected. But even in > this case, other processes could reach the error page via page cache

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-30 Thread 堀口 直也
On Wed, Mar 31, 2021 at 10:43:36AM +0800, Aili Yao wrote: > On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) > wrote: > > On Fri, Mar 26, 2021 at 03:22:49PM +0100, David Hildenbrand wrote: > > > On 26.03.21 15:09, David Hildenbrand wrote: > > > > On 22.03.21 12:33, Aili Yao wrote: >

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-30 Thread Aili Yao
On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) wrote: > On Fri, Mar 26, 2021 at 03:22:49PM +0100, David Hildenbrand wrote: > > On 26.03.21 15:09, David Hildenbrand wrote: > > > On 22.03.21 12:33, Aili Yao wrote: > > > > When we do coredump for user process signal, this may be one

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-30 Thread 堀口 直也
On Fri, Mar 26, 2021 at 03:22:49PM +0100, David Hildenbrand wrote: > On 26.03.21 15:09, David Hildenbrand wrote: > > On 22.03.21 12:33, Aili Yao wrote: > > > When we do coredump for user process signal, this may be one SIGBUS signal > > > with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-26 Thread David Hildenbrand
On 26.03.21 15:09, David Hildenbrand wrote: On 22.03.21 12:33, Aili Yao wrote: When we do coredump for user process signal, this may be one SIGBUS signal with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is resulted from ECC memory fail like SRAR or SRAO, we expect the memory

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-26 Thread David Hildenbrand
On 22.03.21 12:33, Aili Yao wrote: When we do coredump for user process signal, this may be one SIGBUS signal with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is resulted from ECC memory fail like SRAR or SRAO, we expect the memory recovery work is finished correctly, then the

[PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-22 Thread Aili Yao
When we do coredump for user process signal, this may be one SIGBUS signal with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is resulted from ECC memory fail like SRAR or SRAO, we expect the memory recovery work is finished correctly, then the get_dump_page() will not return the