On Fri, 11 Aug 2017 17:28:29 -0400 [email protected] wrote:

> From: Rik van Riel <[email protected]>
> 
> Introduce MADV_WIPEONFORK semantics, which result in a VMA being
> empty in the child process after fork. This differs from MADV_DONTFORK
> in one important way.
> 
> If a child process accesses memory that was MADV_WIPEONFORK, it
> will get zeroes. The address ranges are still valid, they are just empty.
> 
> If a child process accesses memory that was MADV_DONTFORK, it will
> get a segmentation fault, since those address ranges are no longer
> valid in the child after fork.
> 
> Since MADV_DONTFORK also seems to be used to allow very large
> programs to fork in systems with strict memory overcommit restrictions,
> changing the semantics of MADV_DONTFORK might break existing programs.
> 
> MADV_WIPEONFORK only works on private, anonymous VMAs.
> 
> The use case is libraries that store or cache information, and
> want to know that they need to regenerate it in the child process
> after fork.
> 
> Examples of this would be:
> - systemd/pulseaudio API checks (fail after fork)
>   (replacing a getpid check, which is too slow without a PID cache)
> - PKCS#11 API reinitialization check (mandated by specification)
> - glibc's upcoming PRNG (reseed after fork)
> - OpenSSL PRNG (reseed after fork)
> 
> The security benefits of a forking server having a re-inialized
> PRNG in every child process are pretty obvious. However, due to
> libraries having all kinds of internal state, and programs getting
> compiled with many different versions of each library, it is
> unreasonable to expect calling programs to re-initialize everything
> manually after fork.
> 
> A further complication is the proliferation of clone flags,
> programs bypassing glibc's functions to call clone directly,
> and programs calling unshare, causing the glibc pthread_atfork
> hook to not get called.
> 
> It would be better to have the kernel take care of this automatically.

I'll add "The patch also adds MADV_KEEPONFORK, to undo the effects of a
prior MADV_WIPEONFORK." here.

I guess it isn't worth mentioning that these things can cause VMA
merges and splits. 

> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -80,6 +80,17 @@ static long madvise_behavior(struct vm_area_struct *vma,
>               }
>               new_flags &= ~VM_DONTCOPY;
>               break;
> +     case MADV_WIPEONFORK:
> +             /* MADV_WIPEONFORK is only supported on anonymous memory. */
> +             if (vma->vm_file || vma->vm_flags & VM_SHARED) {
> +                     error = -EINVAL;
> +                     goto out;
> +             }
> +             new_flags |= VM_WIPEONFORK;
> +             break;
> +     case MADV_KEEPONFORK:
> +             new_flags &= ~VM_WIPEONFORK;
> +             break;
>       case MADV_DONTDUMP:
>               new_flags |= VM_DONTDUMP;
>               break;

It seems odd to permit MADV_KEEPONFORK against other-than-anon vmas?

Reply via email to