On Mon, Oct 5, 2015 at 3:20 PM, Maged Michael <maged.mich...@gmail.com> wrote: > I have in mind three options. > (1) Text translation of Mesos source code. E.g., "process::Future" > into, say, "sim::process::Future". > - Pros: Does not require any changes to any Mesos or libprocess code. > Replace only what needs to be replaced in libprocess for simulation. > - Cons: Fragile. > (2) Integrate the simulation mode with the libprocess code. > - Pros: Robust. Add only what needs to be added to libprocess for > simulation. Partial reuse some data structures from regular-mode > libprocess. > - Cons: Might get in the way of the development and bug fixes in the > regular libprocess code. > (3) Changes to Mesos makefiles to use alternative simulation-oriented > libprocess code. > - Pros: Robust. > - Cons: Might need to create a lot of stubs that redirect to the > regular-mode (i.e., not for simulation) libprocess code that doesn't > need any change under simulation.
My vote is for #2, with the caveat that we might have the code live in a separate Git repo/branch for a period of time until it has matured. If the simulator requires drastic (architectural) changes to libprocess, then merging the changes into mainline Mesos might be tricky -- but it might be easier to figure that out once we're closer to an MVP. > As an example of what I have in mind. this a sketch of sim::process::dispatch. > > template<class T, class... Args> > // Let R be an abbreviation of typename result_of<T::*method(Args...)>::type > sim::process::Future<R> > dispatch( > const sim::process::Process<T>& pid, > R (T::*method)(Args...), > Args... args) > { > /* Still running in the context of the parent simulated thread - > the same C++/OS thread as the simulator. */ > <context switch to the simulator and back to allow event > interleaving> /* e.g., setjmp/longjmp */ > // create a promise > std::shared_ptr<sim::process::Promise(R) prom(new > sim::process::Promise<R>()); > <create a function object fn initialized with T::method and args> > <associate prom with fn> // e.g., a map structure > <enqueue fn in pid's structure> > return prom->future(); > /* The dispatched function will start running when at some point > later the simulator decides to switch to the child thread (pid) when > pid is ready to run fn. */ > } I wonder how much of what is happening here (e.g., during the setjmp/longjmp) could be implemented by instead modifying the libprocess event queuing/dispatching logic. For example, suppose Mesos is running on two CPUs (and let's ignore network I/O + clock for now). If you want to explore all possible schedules, you could start by capturing the non-deterministic choices that are made when the processing threads (a) send messages concurrently (b) choose new processes to run from the run queue. Does that sound like a feasible approach? Other suggestions: * To make what you're suggesting concrete, it would be great if you started with a VERY minimal prototype -- say, a test program that creates three libprocess processes and has them exchange messages. The order in which messages will be sent/received is non-deterministic [1] -- can we build a simulator that (a) can explore all possible schedules (b) can replay the schedule chosen by a previous simulation run? * For a more interesting but still somewhat-tractable example, the replicated log (src/log) might be a good place to start. It is fairly decoupled from the rest of Mesos and involves a bunch of interesting concurrency. If you setup a test program that creates N log replicas (in a single OS process) and then explores the possible interleavings of the messages exchanged between them, that would be a pretty cool result! There's also a bunch of Paxos-specific invariants that you can check for (e.g., once the value of a position is agreed-to by a quorum of replicas, that value will eventually appear at that position in all sufficiently connected log replicas). Neil [1] Although note that not all message schedules are possible: for example, message schedules can't violate causal dependencies. i.e., if process P1 sends M1 and then M2 to P2, P2 can't see <M2,M1> (it might see only <>, <M1>, or <M2> if P2 is remote). Actually, that suggests to me we probably want to distinguish between local and remote message sends in the simulator: the former will never be dropped.