On 03/04/2018 01:57, David Rientjes wrote: > On Tue, 13 Mar 2018, Laurent Dufour wrote: > >> When dealing with speculative page fault handler, we may race with VMA >> being split or merged. In this case the vma->vm_start and vm->vm_end >> fields may not match the address the page fault is occurring. >> >> This can only happens when the VMA is split but in that case, the >> anon_vma pointer of the new VMA will be the same as the original one, >> because in __split_vma the new->anon_vma is set to src->anon_vma when >> *new = *vma. >> >> So even if the VMA boundaries are not correct, the anon_vma pointer is >> still valid. >> >> If the VMA has been merged, then the VMA in which it has been merged >> must have the same anon_vma pointer otherwise the merge can't be done. >> >> So in all the case we know that the anon_vma is valid, since we have >> checked before starting the speculative page fault that the anon_vma >> pointer is valid for this VMA and since there is an anon_vma this >> means that at one time a page has been backed and that before the VMA >> is cleaned, the page table lock would have to be grab to clean the >> PTE, and the anon_vma field is checked once the PTE is locked. >> >> This patch introduce a new __page_add_new_anon_rmap() service which >> doesn't check for the VMA boundaries, and create a new inline one >> which do the check. >> >> When called from a page fault handler, if this is not a speculative one, >> there is a guarantee that vm_start and vm_end match the faulting address, >> so this check is useless. In the context of the speculative page fault >> handler, this check may be wrong but anon_vma is still valid as explained >> above. >> >> Signed-off-by: Laurent Dufour <lduf...@linux.vnet.ibm.com> > > I'm indifferent on this: it could be argued both sides that the new > function and its variant for a simple VM_BUG_ON() isn't worth it and it > would should rather be done in the callers of page_add_new_anon_rmap(). > It feels like it would be better left to the caller and add a comment to > page_add_anon_rmap() itself in mm/rmap.c.
Well there are 11 calls to page_add_new_anon_rmap() which will need to be impacted and future ones too. By introducing __page_add_new_anon_rmap() my goal was to make clear that this call is *special* and that calling it is not the usual way. This also implies that most of the time the check is done (when build with the right config) and that we will not miss some.