Re: Proposing a deterministic simulation tool for Mesos master and allocator debugging and testing

Neil Conway Mon, 05 Oct 2015 16:50:52 -0700

On Mon, Oct 5, 2015 at 3:20 PM, Maged Michael <maged.mich...@gmail.com> wrote:
> I have in mind three options.
> (1) Text translation of Mesos source code. E.g., "process::Future"
> into, say, "sim::process::Future".
> - Pros: Does not require any changes to any Mesos or libprocess code.
> Replace only what needs to be replaced in libprocess for simulation.
> - Cons: Fragile.
> (2) Integrate the simulation mode with the libprocess code.
> - Pros: Robust. Add only what needs to be added to libprocess for
> simulation. Partial reuse some data structures from regular-mode
> libprocess.
> - Cons: Might get in the way of the development and bug fixes in the
> regular libprocess code.
> (3) Changes to Mesos makefiles to use alternative simulation-oriented
> libprocess code.
> - Pros: Robust.
> - Cons: Might need to create a lot of stubs that redirect to the
> regular-mode (i.e., not for simulation) libprocess code that doesn't
> need any change under simulation.


My vote is for #2, with the caveat that we might have the code live in
a separate Git repo/branch for a period of time until it has matured.
If the simulator requires drastic (architectural) changes to
libprocess, then merging the changes into mainline Mesos might be
tricky -- but it might be easier to figure that out once we're closer
to an MVP.

> As an example of what I have in mind. this a sketch of sim::process::dispatch.
>
> template<class T, class... Args>
> // Let R be an abbreviation of typename result_of<T::*method(Args...)>::type
> sim::process::Future<R>
> dispatch(
>        const sim::process::Process<T>& pid,
>        R (T::*method)(Args...),
>        Args... args)
> {
>     /* Still running in the context of the parent simulated thread -
> the same C++/OS thread as the simulator. */
>     <context switch to the simulator and back to allow event
> interleaving> /* e.g., setjmp/longjmp */
>     // create a promise
>     std::shared_ptr<sim::process::Promise(R) prom(new
> sim::process::Promise<R>());
>     <create a function object fn initialized with T::method and args>
>     <associate prom with fn> // e.g., a map structure
>     <enqueue fn in pid's structure>
>     return prom->future();
>     /* The dispatched function will start running when at some point
> later the simulator decides to switch to the child thread (pid) when
> pid is ready to run fn. */
> }

I wonder how much of what is happening here (e.g., during the
setjmp/longjmp) could be implemented by instead modifying the
libprocess event queuing/dispatching logic. For example, suppose Mesos
is running on two CPUs (and let's ignore network I/O + clock for now).
If you want to explore all possible schedules, you could start by
capturing the non-deterministic choices that are made when the
processing threads (a) send messages concurrently (b) choose new
processes to run from the run queue. Does that sound like a feasible
approach?

Other suggestions:

* To make what you're suggesting concrete, it would be great if you
started with a VERY minimal prototype -- say, a test program that
creates three libprocess processes and has them exchange messages. The
order in which messages will be sent/received is non-deterministic [1]
-- can we build a simulator that (a) can explore all possible
schedules (b) can replay the schedule chosen by a previous simulation
run?

* For a more interesting but still somewhat-tractable example, the
replicated log (src/log) might be a good place to start. It is fairly
decoupled from the rest of Mesos and involves a bunch of interesting
concurrency. If you setup a test program that creates N log replicas
(in a single OS process) and then explores the possible interleavings
of the messages exchanged between them, that would be a pretty cool
result! There's also a bunch of Paxos-specific invariants that you can
check for (e.g., once the value of a position is agreed-to by a quorum
of replicas, that value will eventually appear at that position in all
sufficiently connected log replicas).

Neil

[1] Although note that not all message schedules are possible: for
example, message schedules can't violate causal dependencies. i.e., if
process P1 sends M1 and then M2 to P2, P2 can't see <M2,M1> (it might
see only <>, <M1>, or <M2> if P2 is remote). Actually, that suggests
to me we probably want to distinguish between local and remote message
sends in the simulator: the former will never be dropped.

Re: Proposing a deterministic simulation tool for Mesos master and allocator debugging and testing

Reply via email to