On Mon, May 24, 2021 at 11:27:21PM +1000, Alistair Popple wrote:
> Currently if copy_nonpresent_pte() returns a non-zero value it is
> assumed to be a swap entry which requires further processing outside the
> loop in copy_pte_range() after dropping locks. This prevents other
> values being returned to signal conditions such as failure which a
> subsequent change requires.
> 
> Instead make copy_nonpresent_pte() return an error code if further
> processing is required and read the value for the swap entry in the main
> loop under the ptl.
> 
> Signed-off-by: Alistair Popple <apop...@nvidia.com>
> 
> ---
> 
> v9:
> 
> New for v9 to allow device exclusive handling to occur in
> copy_nonpresent_pte().
> ---
>  mm/memory.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 2fb455c365c2..e061cfa18c11 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -718,7 +718,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct 
> mm_struct *src_mm,
>  
>       if (likely(!non_swap_entry(entry))) {
>               if (swap_duplicate(entry) < 0)
> -                     return entry.val;
> +                     return -EAGAIN;
>  
>               /* make sure dst_mm is on swapoff's mmlist. */
>               if (unlikely(list_empty(&dst_mm->mmlist))) {
> @@ -974,11 +974,13 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct 
> vm_area_struct *src_vma,
>                       continue;
>               }
>               if (unlikely(!pte_present(*src_pte))) {
> -                     entry.val = copy_nonpresent_pte(dst_mm, src_mm,
> -                                                     dst_pte, src_pte,
> -                                                     src_vma, addr, rss);
> -                     if (entry.val)
> +                     ret = copy_nonpresent_pte(dst_mm, src_mm,
> +                                             dst_pte, src_pte,
> +                                             src_vma, addr, rss);
> +                     if (ret == -EAGAIN) {
> +                             entry = pte_to_swp_entry(*src_pte);
>                               break;
> +                     }
>                       progress += 8;
>                       continue;
>               }

Note that -EAGAIN was previously used by copy_present_page() for early cow
use.  Here later although we check entry.val first:

        if (entry.val) {
                if (add_swap_count_continuation(entry, GFP_KERNEL) < 0) {
                        ret = -ENOMEM;
                        goto out;
                }
                entry.val = 0;
        } else if (ret) {
                WARN_ON_ONCE(ret != -EAGAIN);
                prealloc = page_copy_prealloc(src_mm, src_vma, addr);
                if (!prealloc)
                        return -ENOMEM;
                /* We've captured and resolved the error. Reset, try again. */
                ret = 0;
        }

We didn't reset "ret" in entry.val case (maybe we should?). Then in the next
round of "goto again" if "ret" is unluckily untouched, it could reach the 2nd
if check, and I think it could cause an unexpected page_copy_prealloc().

-- 
Peter Xu

Reply via email to