Re: [PATCH v2 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address

2021-04-18 Thread Aili Yao
> Here's the v2 of 3/3. > > Aili, could you test with it? > > Thanks, > Naoya Horiguchi > I tested this v2 version, In my test, this patches worked as expected and the previous issues didn't happen again. Test-by: Aili Yao Thanks, Aili Yao > - > From: Naoya Ho

Re: [PATCH v1 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE

2021-04-16 Thread Aili Yao
lease test and > let me have some feedback? > > Thanks, > Naoya Horiguchi > > [1]: > https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/ > --- > Summary: > > Aili Yao (1): > mm,hwpoison: return -EHWPOISON when page already >

Re: [RFC 0/4] Fix machine check recovery for copy_from_user

2021-04-09 Thread Aili Yao
this page has the swap/poison signature, so > the > page is not freed for re-use. > > -Tony Oh, Yes, Sorry for my rudeness and error-understandings, I just happen to can't control my emotions and get confused for some other things. Thanks! Aili Yao

Re: [RFC 0/4] Fix machine check recovery for copy_from_user

2021-04-07 Thread Aili Yao
turned by your patch, the user process may check the return values, for errors, it may exit the process, then the error page will be freed, and then the page maybe alloced to other process or to kernel itself, then code will initialize it and this will trigger one SRAO, if it's used by kernel, we may do noth

Re: [PATCH v7] mm/gup: check page hwpoison status for memory recovery failures.

2021-04-07 Thread Aili Yao
On Wed, 7 Apr 2021 01:54:28 + HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Apr 06, 2021 at 10:41:23AM +0800, Aili Yao wrote: > > When we call get_user_pages() to pin user page in memory, there may be > > hwpoison page, currently, we just handle the normal case that memory &

[PATCH v7] mm/gup: check page hwpoison status for memory recovery failures.

2021-04-05 Thread Aili Yao
. Changes since v6: - Fix wrong page pointer check in follow_trans_huge_pmd(); Signed-off-by: Aili Yao Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Mike Kravetz Cc: Andrew Morton Cc: sta...@vger.kernel.org --- mm/gup.c | 27

[PATCH v6] mm/gup: check page hwpoison status for memory recovery failures.

2021-04-05 Thread Aili Yao
. Signed-off-by: Aili Yao Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Mike Kravetz Cc: Andrew Morton Cc: sta...@vger.kernel.org --- mm/gup.c | 27 +++ mm/huge_memory.c | 9 +++-- mm/hugetlb.c | 8 +++- mm

Re: [PATCH v3] mm,hwpoison: return -EHWPOISON when page already poisoned

2021-04-05 Thread Aili Yao
tual address will be > available in MCE handler. > > Anyway I'll try to write a patch for this. Yeah, previous patch didn't adress the multiple virtual address issue, If there is a way to fix that, That would be great! -- Thanks! Aili Yao

Re: [PATCH v3] mm,hwpoison: return -EHWPOISON when page already poisoned

2021-04-01 Thread Aili Yao
On Thu, 1 Apr 2021 08:33:20 -0700 "Luck, Tony" wrote: > On Wed, Mar 31, 2021 at 07:25:40PM +0800, Aili Yao wrote: > > When the page is already poisoned, another memory_failure() call in the > > same page now return 0, meaning OK. For nested memory mce handling, this >

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-31 Thread Aili Yao
On Wed, 31 Mar 2021 08:44:53 +0200 David Hildenbrand wrote: > On 31.03.21 06:32, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Wed, Mar 31, 2021 at 10:43:36AM +0800, Aili Yao wrote: > >> On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) > >> wrote: > >>

[PATCH v3] mm,hwpoison: return -EHWPOISON when page already poisoned

2021-03-31 Thread Aili Yao
nt to process a memory error which have already been processed. This behavior seems reasonable. Signed-off-by: Aili Yao --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 24210c9bd843..5cd42144b67c 100644

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-31 Thread Aili Yao
on this topic, but i noticed today I made a stupid mistake that EHWPOISON is already been declared, so we should better return EHWPOISON for this case. Really sorry for this! As the patch is still under review, I will post a new version for this, if I change this, may I add your review tag here please? -- Thanks! Aili Yao

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-31 Thread Aili Yao
On Wed, 31 Mar 2021 08:44:53 +0200 David Hildenbrand wrote: > On 31.03.21 06:32, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Wed, Mar 31, 2021 at 10:43:36AM +0800, Aili Yao wrote: > >> On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) > >> wrote: > >>

Re: [PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-30 Thread Aili Yao
On Wed, 31 Mar 2021 01:52:59 + HORIGUCHI NAOYA(堀口 直也) wrote: > On Fri, Mar 26, 2021 at 03:22:49PM +0100, David Hildenbrand wrote: > > On 26.03.21 15:09, David Hildenbrand wrote: > > > On 22.03.21 12:33, Aili Yao wrote: > > > > When we do coredump for user p

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-03-24 Thread Aili Yao
On Wed, 24 Mar 2021 10:59:50 +0800 Aili Yao wrote: > On Wed, 24 Feb 2021 10:39:21 +0800 > Aili Yao wrote: > > > On Tue, 23 Feb 2021 16:12:43 + > > "Luck, Tony" wrote: > > > > > > What I think is qemu has not an easy to get the MCE sig

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-03-23 Thread Aili Yao
On Wed, 24 Feb 2021 10:39:21 +0800 Aili Yao wrote: > On Tue, 23 Feb 2021 16:12:43 + > "Luck, Tony" wrote: > > > > What I think is qemu has not an easy to get the MCE signature from host > > > or currently no methods for this > > > So qemu t

[PATCH v5] mm/gup: check page hwposion status for coredump.

2021-03-22 Thread Aili Yao
-by: Aili Yao Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Oscar Salvador Cc: Mike Kravetz Cc: Aili Yao Cc: sta...@vger.kernel.org Signed-off-by: Andrew Morton --- mm/gup.c | 4 mm/internal.h | 20 2 files changed, 24 insertions(+) diff

Re: [PATCH v3] mm/gup: check page posion status for coredump.

2021-03-21 Thread Aili Yao
On Sat, 20 Mar 2021 00:35:16 + Matthew Wilcox wrote: > On Fri, Mar 19, 2021 at 10:44:37AM +0800, Aili Yao wrote: > > +++ b/mm/gup.c > > @@ -1536,6 +1536,10 @@ struct page *get_dump_page(unsigned long addr) > > FOLL_FORCE |

[PATCH v3] mm/gup: check page posion status for coredump.

2021-03-18 Thread Aili Yao
status in get_dump_page(), and if TRUE, return NULL. There maybe other scenario that is also better to check the posion status and not to panic, so make a wrapper for this check, Thanks to David's suggestion(). Signed-off-by: Aili Yao --- mm/gup.c | 4 mm/internal.h | 21

Re: [PATCH] mm/gup: check page posion status for coredump.

2021-03-17 Thread Aili Yao
take_page_off_buddy(struct page *page); > #else > PAGEFLAG_FALSE(HWPoison) > #define __PG_HWPOISON 0 > #endif > > so there's no need for this > if (IS_ENABLED(CONFIG_MEMORY_FAILURE) > check, as it simply turns into > > if (PageHuge(page) && 0) > else if (0) > > and the compiler can optimise it all away. Yes, You are right, I will modify this later. Thanks for correction -- Thanks! Aili Yao

[PATCH v2] mm/gup: check page posion status for coredump.

2021-03-17 Thread Aili Yao
status in get_dump_page(), and if TRUE, return NULL. There maybe other scenario that is also better to check the posion status and not to panic, so make a wrapper for this check, suggested by David Hildenbrand Signed-off-by: Aili Yao --- mm/gup.c | 4 mm/internal.h | 21

Re: [PATCH] mm/gup: check page posion status for coredump.

2021-03-17 Thread Aili Yao
return (ret == 1) ? page : NULL; > > } > > #endif /* CONFIG_ELF_CORE */ > > > > Yes, May other places meet the requirements as the coredump meets, it's better to make a wrapper for this. But i am not familiar with the specific scenario, so this patch only cover the coredump case. I will post a v2 patch for this. -- Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-17 Thread Aili Yao
read and further operation. For the process, it seems it have a change to proceed. if just error code is returned, the process may care or not, it may not correctly process the error. It seems the worst case here is the process will touch the poison page again, trigger another MCE and

[PATCH] mm/gup: check page posion status for coredump.

2021-03-17 Thread Aili Yao
status in get_dump_page(), and if TRUE, return NULL. Signed-off-by: Aili Yao --- mm/gup.c | 8 1 file changed, 8 insertions(+) diff --git a/mm/gup.c b/mm/gup.c index e4c224c..499a496 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1536,6 +1536,14 @@ struct page *get_dump_page(unsigned long addr

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-17 Thread Aili Yao
nfo and other unclean modifications. Post a clean one. Thanks Aili Yao >From 2289276ba943cdcddbf3b5b2cdbcaff78690e2e8 Mon Sep 17 00:00:00 2001 From: Aili Yao Date: Wed, 17 Mar 2021 16:12:41 +0800 Subject: [PATCH] fix invalid SIGBUS address for recovery fail Walk the current process pages and co

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-17 Thread Aili Yao
ng. > I can't find the way to fix this, maybe the virtual address is contained in related register, but this is really beyong my knowledge. This is a v2 RFC patch, add support for thp and 1G huge page errors. Thanks Aili Yao >From 31b685609610b3b06c8fd98d866913dbfeb7e159 Mon Sep 17 00:00:0

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-16 Thread Aili Yao
al); > + else if (val & PM_HWPOISON) > + pfn = PM_SWAP_OFFSET(val); > else > pfn = 0; > > @@ -742,7 +745,7 @@ static void walk_vma(unsigned long index, unsigned long > count) > pfn = pagemap_pfn(buf[i]); > if (pfn) > walk_pfn(index + i, pfn, 1, buf[i]); > - if (buf[i] & PM_SWAP) > + else if (buf[i] & PM_SWAP) > walk_swap(index + i, buf[i]); > } > -- Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-11 Thread Aili Yao
ut we may lost the correct page shift? And for copyin case, we don't need to call set_mce_nospec()? -- Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-11 Thread Aili Yao
On Thu, 11 Mar 2021 08:55:30 + HORIGUCHI NAOYA(堀口 直也) wrote: > On Wed, Mar 10, 2021 at 02:10:42PM +0800, Aili Yao wrote: > > On Fri, 5 Mar 2021 15:55:25 + > > "Luck, Tony" wrote: > > > > > > From the walk, it seems we have got the virtual

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-10 Thread Aili Yao
On Wed, 10 Mar 2021 17:28:12 -0800 Andy Lutomirski wrote: > On Wed, Mar 10, 2021 at 5:19 PM Aili Yao wrote: > > > > On Mon, 8 Mar 2021 11:00:28 -0800 > > Andy Lutomirski wrote: > > > > > > On Mar 8, 2021, at 10:31 AM, Luck, Tony wrote: > > >

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-10 Thread Aili Yao
ig_mceerr() instead of force_sig_mceerr(), if process want to ignore the SIGBUS, then it will ignore that, or it can also process the SIGBUS? -- Thanks! Aili Yao

Re: [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-10 Thread Aili Yao
On Tue, 9 Mar 2021 08:28:24 + HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Mar 09, 2021 at 02:35:34PM +0800, Aili Yao wrote: > > When the page is already poisoned, another memory_failure() call in the > > same page now return 0, meaning OK. For nested memory mce handling, this &

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-09 Thread Aili Yao
, and if you really think the pfn with SIGBUS is not proper, I think following patch maybe one way. I copy your abandon code, and make a little modification, and just now it pass my simple test. And also this is a RFC version, only valid if you think the pfn with SIGBUS is not right. Thanks! &

Re: [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races

2021-03-08 Thread Aili Yao
rom memory_failure()'s concurrency issue, > so I'm still expecting that your patch is to be merged. Maybe do you want > to update it based on the discussion (if it's concluded)? > > Thanks, > Naoya Horiguchi I have submitted a v2 patch, and please help review. Thanks! -- Thanks! Aili Yao

[PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-08 Thread Aili Yao
nt to process a memory error which have already been processed. This behavior seems reasonable. Signed-off-by: Aili Yao --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 24210c9bd843..b6bc77460ee1 100644 --- a/mm

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-08 Thread Aili Yao
On Tue, 9 Mar 2021 10:14:52 +0800 Aili Yao wrote: > On Mon, 8 Mar 2021 18:31:07 + > "Luck, Tony" wrote: > > > > Can you point me at that SIGBUS code in a current kernel? > > > > It is in kill_me_maybe(). mce_vaddr is setup when we disassemble w

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-08 Thread Aili Yao
at the address in siginfo. > > -Tony Is the kill action for this scenario in memory_failure()? -- Thanks! Aili Yao

Re: [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races

2021-03-08 Thread Aili Yao
_kflags & MCE_IN_KERNEL_COPYIN)) { set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); sync_core(); return; } while we place set_mce_nospec() here is for a reason, please see commit fd0e786d9d09024f67b. 2. When memory_failure return 0 and maybe return to user process, and it may re-execute the instruction triggering previous fault, this behavior assume an implicit dependence that the related pte has been correctly set. or if not correctlily set, it will lead to infinite loop again. -- Thanks! Aili Yao

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-08 Thread Aili Yao
question: > > When programs use read(2), write(2) as ways to check if memory is valid, > > does it really want to check if the user page the program provided is > > valid, not the destination or disk space valid? > > They may well be trying to see if their memory is valid. Thanks for your reply, and I don't know what to do. For current code, if user program write to a block device(maybe a test try) and if its user copy page corrupt when in kernel copy, the process is killed with a SIGBUS. And for the page fault case in this thread, the process is error returned. -- Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-04 Thread Aili Yao
On Fri, 5 Mar 2021 09:30:16 +0800 Aili Yao wrote: > On Thu, 4 Mar 2021 15:57:20 -0800 > "Luck, Tony" wrote: > > > On Thu, Mar 04, 2021 at 02:45:24PM +0800, Aili Yao wrote: > > > > > if your methods works, should it be like this? > > > &

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-04 Thread Aili Yao
On Thu, 4 Mar 2021 15:57:20 -0800 "Luck, Tony" wrote: > On Thu, Mar 04, 2021 at 02:45:24PM +0800, Aili Yao wrote: > > > > if your methods works, should it be like this? > > > > > > > > 1582 pteval = > > > >

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-03 Thread Aili Yao
On Thu, 4 Mar 2021 12:19:41 +0800 Aili Yao wrote: > On Thu, 4 Mar 2021 10:16:53 +0800 > Aili Yao wrote: > > > On Wed, 3 Mar 2021 15:41:35 + > > "Luck, Tony" wrote: > > > > > > For error address with sigbus, i think this is not an issue

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-03 Thread Aili Yao
On Thu, 4 Mar 2021 10:16:53 +0800 Aili Yao wrote: > On Wed, 3 Mar 2021 15:41:35 + > "Luck, Tony" wrote: > > > > For error address with sigbus, i think this is not an issue resulted by > > > the patch i post, before my patch, the issue is already there

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-03 Thread Aili Yao
counter(page)); 1590 set_pte_at(mm, address, pvmw.pte, pteval); 1591 } the page fault check if it's a poison page using is_hwpoison_entry(), -- Thanks! Aili Yao

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-03 Thread Aili Yao
correctly? if this is the proper action, the original posion flow in current code from read and write need to change too. -- Thanks! Aili Yao

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-03 Thread Aili Yao
On Wed, 3 Mar 2021 20:24:02 +0800 Aili Yao wrote: > On Mon, 1 Mar 2021 11:09:36 -0800 > Andy Lutomirski wrote: > > > > On Mar 1, 2021, at 11:02 AM, Luck, Tony wrote: > > > > > >  > > >> > > >> Some programs may use read(

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-03 Thread Aili Yao
Hi tony: > On Tue, 2 Mar 2021 19:39:53 -0800 > "Luck, Tony" wrote: > > > On Fri, Feb 26, 2021 at 10:59:15AM +0800, Aili Yao wrote: > > > Hi naoya, tony: > > > > > > > > > > Idea for what we should do next ... Now that x86 is

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-03 Thread Aili Yao
On Tue, 2 Mar 2021 19:39:53 -0800 "Luck, Tony" wrote: > On Fri, Feb 26, 2021 at 10:59:15AM +0800, Aili Yao wrote: > > Hi naoya, tony: > > > > > > > > Idea for what we should do next ... Now that x86 is calling > > > > memory_failu

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-03-02 Thread Aili Yao
On Fri, 26 Feb 2021 09:58:37 -0800 "Luck, Tony" wrote: > On Fri, Feb 26, 2021 at 10:52:50AM +0800, Aili Yao wrote: > > Hi naoya,Oscar,david: > > > > > > > We could use some negative value (error code) to report the reported > > > > cas

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-03-01 Thread Aili Yao
e the SIGSEGV with SIGBUS for hwposion case? I think SIGBUS is more accurate for the error. Normally for poison access, the process shouldn't be returned and an exit will be good or we need another code stream for this I think. This is the legacy way to process user poison access error like other posion code branch in kernel. Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-02-25 Thread Aili Yao
k for this issue, Does this change the result that the process should be killed? Or is there something other still need to be considered? Thanks! Aili Yao

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-02-25 Thread Aili Yao
HWPOISON, but other options are fine if justified well. > > -EHWPOISON seems like a good fit. > I am OK with the -EHWPOISON error code, But I have one doubt here: When we return this -EHWPOISON error code, Does this means we have to add a new error code to error-base.h or errno.h? Is this easy realized? Thanks! Aili Yao

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-24 Thread Aili Yao
On Tue, 23 Feb 2021 08:42:59 -0800 "Luck, Tony" wrote: > On Tue, Feb 23, 2021 at 07:33:46AM -0800, Andy Lutomirski wrote: > > > > > On Feb 23, 2021, at 4:44 AM, Aili Yao wrote: > > > > > > On Fri, 5 Feb 2021 17:01:35 +0800 > > > Ai

Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-02-24 Thread Aili Yao
t; > > > For other cases which care the return value of memory_failure() should > > check why they want to process a memory error which have already been > > processed. This behavior seems reasonable. > > > > In kill_me_maybe, log the fact about the memory may not r

[PATCH] mm,hwpoison: return -EBUSY when page already poisoned

2021-02-23 Thread Aili Yao
effect. For other cases which care the return value of memory_failure() should check why they want to process a memory error which have already been processed. This behavior seems reasonable. In kill_me_maybe, log the fact about the memory may not recovered, and we will kill the related process. Signed-o

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-23 Thread Aili Yao
y SRAR is triggered, RIPV will always be set, then it's the job of qemu to set the RIPV instead. Or if When SRAR is triggered with RIPV cleared, the same issue will be true for host. And I think it's better for VM to know the real RIPV value, It need more work in qemu and kernel if possible. Thanks Aili Yao

Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-23 Thread Aili Yao
On Fri, 5 Feb 2021 17:01:35 +0800 Aili Yao wrote: > When one page is already hwpoisoned by MCE AO action, processes may not > be killed, processes mapping this page may make a syscall include this > page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel > mode it

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-23 Thread Aili Yao
On Tue, 23 Feb 2021 11:05:38 +0100 Borislav Petkov wrote: > On Tue, Feb 23, 2021 at 05:56:40PM +0800, Aili Yao wrote: > > What i inject is AR error, and I don't see MCG_STATUS_RIPV flag. > > Then keep debugging qemu to figure out why that is. > What I think is qemu has

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-23 Thread Aili Yao
On Tue, 23 Feb 2021 10:43:00 +0100 Borislav Petkov wrote: > On Tue, Feb 23, 2021 at 10:27:55AM +0800, Aili Yao wrote: > > When Guest access one address with UE error, it will exit guest mode, > > the host will do the recovery job, and then one SIGBUS is send to > > the VCP

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 13:45:50 +0100 Borislav Petkov wrote: > On Mon, Feb 22, 2021 at 08:35:49PM +0800, Aili Yao wrote: > > Guest VM, the qemu has no way to know the RIPV value, so always get it > > cleared. > > What does that mean? > > The guest VM will get the

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 13:22:41 +0100 Borislav Petkov wrote: > On Mon, Feb 22, 2021 at 08:17:23PM +0800, Aili Yao wrote: > > AR (Action Required) flag, bit 55 - Indicates (when set) that MCA > > error code specific recovery action must be... > > Give me the *exact* MCE signa

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 19:21:46 +0800 Aili Yao wrote: > On Mon, 22 Feb 2021 11:22:06 +0100 > Borislav Petkov wrote: > > > On Mon, Feb 22, 2021 at 06:08:19PM +0800, Aili Yao wrote: > > > So why would intel provide this MCG_STATUS_RIPV flag, it's better to > > >

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 11:22:06 +0100 Borislav Petkov wrote: > On Mon, Feb 22, 2021 at 06:08:19PM +0800, Aili Yao wrote: > > So why would intel provide this MCG_STATUS_RIPV flag, it's better to > > remove it as it will never be set, and all the related logic for this > >

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 11:03:56 +0100 Borislav Petkov wrote: > On Mon, Feb 22, 2021 at 05:31:09PM +0800, Aili Yao wrote: > > you can inject a memory UE to a VM, it should always be MCG_STATUS_RIPV 0. > > So the signature you injected is not something the hardware would > g

Re: [PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-22 Thread Aili Yao
On Mon, 22 Feb 2021 10:24:03 +0100 Borislav Petkov wrote: > On Mon, Feb 22, 2021 at 11:50:07AM +0800, Aili Yao wrote: > > From commit b2f9d678e28c ("x86/mce: Check for faults tagged in > > EXTABLE_CLASS_FAULT exception table entries"), When there is a > &g

[PATCH v2] x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-21 Thread Aili Yao
tep in kernel will touch the error page again, which result to a fatal error. We need to poison the page and then kill current in memory-failure module. So fix it using the orinigal checking method. Signed-off-by: Aili Yao --- arch/x86/kernel/cpu/mce/core.c | 5 - 1 file changed, 4 insertions(+

x86/mce: fix wrong no-return-ip logic in do_machine_check()

2021-02-21 Thread Aili Yao
el will touch the error page again, whick result to a fatal error. We need to poison the page and then kill current in memory-failure module. So fix it using the orinigal checking method. Signed-off-by: Aili Yao --- arch/x86/kernel/cpu/mce/core.c | 7 --- 1 file changed, 4 insertions(+)

[PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-05 Thread Aili Yao
to user code. This is not sufficient, we should send a SIGBUS to the process and log the info to console, as we can't trust the process will handle the error correctly. Suggested-by: Feng Yang Signed-off-by: Aili Yao --- arch/x86/mm/fault.c | 62 + 1

Re: [PATCH v2] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-04 Thread Aili Yao
On Thu, 4 Feb 2021 07:25:55 + HORIGUCHI NAOYA(堀口 直也) wrote: > Hi Aili, > > On Mon, Feb 01, 2021 at 04:17:49PM +0800, Aili Yao wrote: > > When one page is already hwpoisoned by AO action, process may not be > > killed, the process mapping this page may make a syscall

Re: [PATCH v2] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-01 Thread Aili Yao
On Mon, 1 Feb 2021 08:58:27 -0800 Andy Lutomirski wrote: > On Mon, Feb 1, 2021 at 12:17 AM Aili Yao wrote: > > > > When one page is already hwpoisoned by AO action, process may not be > > killed, the process mapping this page may make a syscall include this > >

[PATCH v2] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

2021-02-01 Thread Aili Yao
to user process. This is not sufficient, we should send a SIGBUS to the process and log the info to console, as we can't trust the process will handle the error correctly. Suggested-by: Feng Yang Signed-off-by: Aili Yao --- arch/x86/mm/fault.c | 34 +++--- 1 file changed

Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

2021-01-31 Thread Aili Yao
Do you mean the force_sig_mceerr and force_sig_fault difference? I see a hwpoison related comment there, but it's better to follow the usual way force_sig_mceerr, I will modify this in a v2 patch. Or something other, you may post a better one. Thanks -- Best Regards! Aili Yao

Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

2021-01-28 Thread Aili Yao
On Thu, 28 Jan 2021 09:43:52 -0800 "Luck, Tony" wrote: > On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote: > > when one page is already hwpoisoned by AO action, process may not be > > killed, the process mapping this page may make a syscall include this >

[PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

2021-01-28 Thread Aili Yao
to user process. This is not suffient, we should send a SIGBUS to the process and log the info to console, as we can't trust the process will handle the error correctly. Suggested-by: Feng Yang Signed-off-by: Aili Yao --- arch/x86/mm/fault.c | 16 1 file changed, 16 insertions