Folks, We're working on some Mesos features that will allow frameworks to control how partitioned tasks are handled [1]. As part of designing how this will work, I'd love to hear from users and framework developers about they handle partitioned tasks/agents. Specifically:
(a) Have you enabled the strict registry? ('--registry_strict' master flag) (b) If so, do any of your frameworks _depend_ on the semantics provided by the strict registry? [2] (c) Does your framework handle LOST tasks? For example, does your framework account for the fact that LOST tasks might transition back to RUNNING in certain circumstances? (d) Suppose we changed the semantics of LOST in the following way: (1) strict registry is no longer supported, and (2) LOST tasks will *always* be allowed to reregister with the master and resume running (even if the master has not failed over). Would this change cause problems for any of your frameworks? Answering "I don't know" to any of these questions is fine :) Feel free to respond to me privately if you'd prefer. If you have any other feedback or questions, please contact me. Thanks! Neil [1] More information on the proposed changes can be found here: https://goo.gl/7dRw4Q [2] e.g., your framework assumes that LOST tasks will never go back to RUNNING.