Joerg Schad created MESOS-4998:
----------------------------------
Summary: Problematic fork/clone performance at high load.
Key: MESOS-4998
URL: https://issues.apache.org/jira/browse/MESOS-4998
Project: Mesos
Issue Type: Epic
Reporter: Joerg Schad
Assignee: Joerg Schad
Creating a new subprocess in mesos involves forking/cloning a new process. In
most cases (executors, perf, ..) the parent of the new process is the
agent/slave process. This can lead to problematic behavior especially when
creating several new processes at the same time.
The problem here is that the normal fork() (or clone syscall used by
libprocess) provides a copy-on-write (cow) view of the parents address space
until the child execs its new binary. Note that during the time between fork
and exec Mesos does several setup actions such as placing the new processes in
systemd units or assigning them to the freezer cgroup.
This cow property of the address space implies that existing memory is marked
as read-only and any write will trigger a page-fault and a newly created page.
Note this behavior also extends to the parent process and hence any write will
be very costly.
We simulated the number of pagefaults when forking/cloning new processes by
this benchmark:
https://github.com/joerg84/forking-benchmark
Results can be seen here:
https://docs.google.com/presentation/d/1SUjKAVHdrutLPpFJy3Q1yhinG5FOMw3HbbEdzuhZ7A8
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)