Thanks to both of you. @David Idempotence (and functional style) will both mitigate the issue of testing.
@Sharma #3 looks impressive and I hear the pain. Few questions: * Since you already have the state machine modeling, can't the scheduler actions also be modeled as a state machine transitions? * Having a spec for (in form of state machine or otherwise) scheduler looks important (and hard) goal. Mocking looks like a good thing. Is mocking general enough to become a library available to all, to enable *verifiably* correct scheduler behavior? Again thanks for sharing your thoughts. Thanks, Dharmesh On Mon, Oct 13, 2014 at 7:29 AM, David Greenberg <dsg123456...@gmail.com> wrote: > Specifically with regards to the state of the framework due to callback > ordering, we ensure that our framework is written in a functional style, so > that all callbacks atomically transform the previous state to a new state. > By doing this, we serialize all callbacks. At this point, you can do > generative testing to create events and run them through your system. This, > at least, makes #3 possible. > > For #4, we are pretty careful to choose idempotent writes into the DB and > a DB that supports snapshot reads. This way, you can just use at-least-once > semantics for easy-to-implement retries. If a write fails, you just crash, > since that means your DB's completely down. Then we test by thinking > through and discussing whether operations have this idempotency property > and the simple retry logic independently. This starts to get at a way to > manage #4 to avoid learning in production. > > On Sun, Oct 12, 2014 at 11:44 AM, Dharmesh Kakadia <dhkaka...@gmail.com> > wrote: > >> Thanks David. >> >> Taking state of the framework is an interesting design. I am assuming the >> scheduler is maintaining the state and then handing tasks on slaves. If >> that's the case, we can safely test executor (stateless - receiving event >> and returning appropriate status to the scheduler). You construct scheduler >> tests similarly by passing different states and event and observing the >> next state. This way you will be sure that your callbacks works fine in >> *isolation*. I would be concerned about the state of the framework in >> case of callback ordering (or re-execution) in *all possible scenarios*. >> Mocking is exactly what might uncover such bugs, but as you pointed out, I >> also think it would not be trivial for many frameworks. >> >> At a high-level it would be important to know for frameworks developers >> that, >> 1. executors are working fine in isolation on fresh start, implementing >> the feature. >> 2. executors are working fine when rescheduled/restarted/in presence of >> other executors. >> 3. scheduler is working fine in isolation. >> 4. scheduler is fine in the wild ( in presence of >> others/failures/checkpointing/...). >> >> 1 is easy to do traditionally. 2 is possible if your executors do not >> have side effect or using Docker etc. >> 3 and 4 are not easy to do. I think having support/library for testing >> scheduler is something all the framework writer would benefit from. Not >> having to think about communication between executors and scheduler is >> already a big plus, can we also make it easier for developers to test about >> their scheduler behaviour? >> >> Thoughts? >> >> I would love to hear thoughts from others. >> >> Thanks, >> Dharmesh >> >> On Sun, Oct 12, 2014 at 8:03 PM, David Greenberg <dsg123456...@gmail.com> >> wrote: >> >>> For our frameworks, we don't tend to do much automated testing of the >>> Mesos interface--instead, we construct the framework state, then "send it a >>> message", since our callbacks take the state of the framework + the event >>> as the argument. This way, we don't need to have mesos running, and we can >>> trim away large amounts of code necessary to connect to mesos but >>> unnecessary for the actual feature under test. We've also been >>> experimenting with simulation testing by mocking out the mesos APIs. These >>> techniques are mostly effective when you can pretend that the executors >>> you're using don't communicate much, or when they're trivial to mock. >>> >>> On Sun, Oct 12, 2014 at 9:42 AM, Dharmesh Kakadia <dhkaka...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I am working on a tiny experimental framework for Mesos. I was >>>> wondering what is the recommended way of writing testcases for framework >>>> testing. I looked at the several existing frameworks, but its still not >>>> clear to me. I understand that I might be able to test executor >>>> functionality in isolation through normal test cases, but testing as a >>>> whole framework is what I am unclear about. >>>> >>>> Suggestions? Is that a non-goal? How do other framework developers go >>>> about it? >>>> >>>> Also, on the related note, is there a way to debug frameworks in better >>>> way than sifting through logs? >>>> >>>> Thanks, >>>> Dharmesh >>>> >>>> >>>> >>> >> >