I'd like input from individuals who currently use frameworks but do
not enable checkpointing.
Background: "checkpointing" is a parameter that can be enabled in
FrameworkInfo; if enabled, the agent will write the framework pid,
executor PIDs, and status updates to disk for any tasks started by
that framework. This checkpointed information means that these tasks
can survive an agent crash: if the agent exits (whether due to
crashing or as part of an upgrade procedure), a restarted agent can
use this information to reconnect to executors started by the previous
instance of the agent. The downside is that checkpointing requires
some additional disk I/O at the agent.
Checkpointing is not currently the default, but in my experience it is
often enabled for production frameworks. As part of the work on
supporting partition-aware Mesos frameworks (see MESOS-4049), we are
(a) requiring that partition-aware frameworks must also enable
(b) enabling checkpointing by default
If you have intentionally decided to disable checkpointing for your
Mesos framework, I'd be curious to hear more about your use-case and
why you haven't enabled it.