On Thu, 4 Mar 2021 12:19:41 +0800
Aili Yao <yaoa...@kingsoft.com> wrote:

> On Thu, 4 Mar 2021 10:16:53 +0800
> Aili Yao <yaoa...@kingsoft.com> wrote:
> 
> > On Wed, 3 Mar 2021 15:41:35 +0000
> > "Luck, Tony" <tony.l...@intel.com> wrote:
> >   
> > > > For error address with sigbus, i think this is not an issue resulted by 
> > > > the patch i post, before my patch, the issue is already there.
> > > > I don't find a realizable way to get the correct address for same 
> > > > reason --- we don't know whether the page mapping is there or not when
> > > > we got to kill_me_maybe(), in some case, we may get it, but there are a 
> > > > lot of parallel issue need to consider, and if failed we have to 
> > > > fallback
> > > > to the error brach again, remaining current code may be an easy option; 
> > > >      
> > > 
> > > My RFC patch from yesterday removes the uncertainty about whether the 
> > > page is there or not. After it walks the page
> > > tables we know that the poison page isn't mapped (note that patch is RFC 
> > > for a reason ... I'm 90% sure that it should
> > > do a bit more that just clear the PRESENT bit).
> > > 
> > > So perhaps memory_failure() has queued a SIGBUS for this task, if so, we 
> > > take it when we return from kill_me_maybe()  
> 
> And when this happen, the process will receive an SIGBUS with AO level, is it 
> proper as not an AR?
> 
> > > If not, we will return to user mode and re-execute the failing 
> > > instruction ... but because the page is unmapped we will take a #PF    
> > 
> > Got this, I have some error thoughts here.
> > 
> >   
> > > The x86 page fault handler will see that the page for this physical 
> > > address is marked HWPOISON, and it will send the SIGBUS
> > > (just like it does if the page had been removed by an earlier UCNA/SRAO 
> > > error).    
> > 
> > if your methods works, should it be like this?
> > 
> > 1582                         pteval = 
> > swp_entry_to_pte(make_hwpoison_entry(subpage));
> > 1583                         if (PageHuge(page)) {
> > 1584                                 hugetlb_count_sub(compound_nr(page), 
> > mm);
> > 1585                                 set_huge_swap_pte_at(mm, address,
> > 1586                                                      pvmw.pte, pteval,
> > 1587                                                      
> > vma_mmu_pagesize(vma));
> > 1588                         } else {
> > 1589                                 dec_mm_counter(mm, mm_counter(page));
> > 1590                                 set_pte_at(mm, address, pvmw.pte, 
> > pteval);
> > 1591                         }
> > 
> > the page fault check if it's a poison page using is_hwpoison_entry(),
> >   
> 
> And if it works, does we need some locking mechanism before we call 
> walk_page_range();
> if we lock, does we need to process the blocking interrupted error as other 
> places will do?
> 

And another thing:
Do we need a call to flush_tlb_page(vma, address) to make the pte changes into 
effect?

-- 
Thanks!
Aili Yao

Reply via email to