On Tue, Oct 6, 2015 at 2:27 AM, Timothy Chen <tnac...@gmail.com> wrote: > I wonder that if testing the allocator and the allocation choices, the > easier way might be extracting the Allocator and write a > framework/standalone tool just around that?
I noticed that some of the master's actions affect the outcome of the allocation policy. For example, the master is responsible for rescinding timed-out offers without any influence from the allocator. Testing the allocator directly (bypassing the master) might result in different allocation behavior. For example, a framework policy of holding on to offered resources without fear of the offer being timed out might seem as a good policy if the test bypasses the master. Does this make sense or did I misunderstand your comment? > Tim --Maged > > On Mon, Oct 5, 2015 at 4:49 PM, Neil Conway <neil.con...@gmail.com> wrote: >> On Mon, Oct 5, 2015 at 3:20 PM, Maged Michael <maged.mich...@gmail.com> >> wrote: >>> I have in mind three options. >>> (1) Text translation of Mesos source code. E.g., "process::Future" >>> into, say, "sim::process::Future". >>> - Pros: Does not require any changes to any Mesos or libprocess code. >>> Replace only what needs to be replaced in libprocess for simulation. >>> - Cons: Fragile. >>> (2) Integrate the simulation mode with the libprocess code. >>> - Pros: Robust. Add only what needs to be added to libprocess for >>> simulation. Partial reuse some data structures from regular-mode >>> libprocess. >>> - Cons: Might get in the way of the development and bug fixes in the >>> regular libprocess code. >>> (3) Changes to Mesos makefiles to use alternative simulation-oriented >>> libprocess code. >>> - Pros: Robust. >>> - Cons: Might need to create a lot of stubs that redirect to the >>> regular-mode (i.e., not for simulation) libprocess code that doesn't >>> need any change under simulation. >> >> My vote is for #2, with the caveat that we might have the code live in >> a separate Git repo/branch for a period of time until it has matured. >> If the simulator requires drastic (architectural) changes to >> libprocess, then merging the changes into mainline Mesos might be >> tricky -- but it might be easier to figure that out once we're closer >> to an MVP. >> >>> As an example of what I have in mind. this a sketch of >>> sim::process::dispatch. >>> >>> template<class T, class... Args> >>> // Let R be an abbreviation of typename result_of<T::*method(Args...)>::type >>> sim::process::Future<R> >>> dispatch( >>> const sim::process::Process<T>& pid, >>> R (T::*method)(Args...), >>> Args... args) >>> { >>> /* Still running in the context of the parent simulated thread - >>> the same C++/OS thread as the simulator. */ >>> <context switch to the simulator and back to allow event >>> interleaving> /* e.g., setjmp/longjmp */ >>> // create a promise >>> std::shared_ptr<sim::process::Promise(R) prom(new >>> sim::process::Promise<R>()); >>> <create a function object fn initialized with T::method and args> >>> <associate prom with fn> // e.g., a map structure >>> <enqueue fn in pid's structure> >>> return prom->future(); >>> /* The dispatched function will start running when at some point >>> later the simulator decides to switch to the child thread (pid) when >>> pid is ready to run fn. */ >>> } >> >> I wonder how much of what is happening here (e.g., during the >> setjmp/longjmp) could be implemented by instead modifying the >> libprocess event queuing/dispatching logic. For example, suppose Mesos >> is running on two CPUs (and let's ignore network I/O + clock for now). >> If you want to explore all possible schedules, you could start by >> capturing the non-deterministic choices that are made when the >> processing threads (a) send messages concurrently (b) choose new >> processes to run from the run queue. Does that sound like a feasible >> approach? >> >> Other suggestions: >> >> * To make what you're suggesting concrete, it would be great if you >> started with a VERY minimal prototype -- say, a test program that >> creates three libprocess processes and has them exchange messages. The >> order in which messages will be sent/received is non-deterministic [1] >> -- can we build a simulator that (a) can explore all possible >> schedules (b) can replay the schedule chosen by a previous simulation >> run? >> >> * For a more interesting but still somewhat-tractable example, the >> replicated log (src/log) might be a good place to start. It is fairly >> decoupled from the rest of Mesos and involves a bunch of interesting >> concurrency. If you setup a test program that creates N log replicas >> (in a single OS process) and then explores the possible interleavings >> of the messages exchanged between them, that would be a pretty cool >> result! There's also a bunch of Paxos-specific invariants that you can >> check for (e.g., once the value of a position is agreed-to by a quorum >> of replicas, that value will eventually appear at that position in all >> sufficiently connected log replicas). >> >> Neil >> >> [1] Although note that not all message schedules are possible: for >> example, message schedules can't violate causal dependencies. i.e., if >> process P1 sends M1 and then M2 to P2, P2 can't see <M2,M1> (it might >> see only <>, <M1>, or <M2> if P2 is remote). Actually, that suggests >> to me we probably want to distinguish between local and remote message >> sends in the simulator: the former will never be dropped.