First off, thanks for all the comments! I really appreciate it and am excited about where we get with this effort. Let me see if I can answer your questions (best-effort inlined).
On 21 November 2014 01:11, Tom Arnfeld <t...@duedil.com> wrote: > This all sounds really great, and opens up some interesting opportunities > for automated service discovery (well, the announcement side) for a cluster > which is what we've been looking into for a while. > > > > > Correct me if I'm wrong, but would it be possible to make use of the > master log to achieve an event stream? I'm not entirely sure what's stored > in the shared master transaction log but I'm assume some state about tasks > etc? If there were to be a stream of events, it'd be great to support > rewinding and replaying for some period of time better allow for HA stream > consumers. > See comment below. But yes - service discovery systems could definitely leverage hooks. > > > > > Either way, hooks would be a welcomed feature for us! > > > -- > > > Tom Arnfeld > > Developer // DueDil > > > > > > (+44) 7525940046 > > 25 Christopher Street, London, EC2A 2BS > > On Fri, Nov 21, 2014 at 6:44 AM, Vinod Kone <vinodk...@gmail.com> wrote: > > > Good points Ben. > > Also, I've been recently thinking about an events endpoint (not to > confuse > > with the Event/Call API) that could stream all kinds of events happening > > the cluster (master events, allocator events, gc events, slave events, > > containerizer events etc). In fact this could probably be exposed by > > libprocess very easily. I was mainly thinking about this in terms of > > auditing. Having such an endpoint would allow external tooling to "hook" > > into that endpoint and consume the event stream. The tooling could then > > perform arbitrary actions *without interfering* with mesos control flow. > I > > think such an architecture would be powerful because it is generic and > > non-invasive. Have you considered that approach? > Ben, Vinod: A cluster event stream sounds like an awesome idea! I have previously hacked together post-mortem log analysis to determine workload profiles. That could be done online (!) That aside, our use-case involves hanging meta-data off the task with labels which we cannot do with an event stream alone. The metadata we need is produced by a 3rd party security infrastructure which we invoke and use when setting up the executor environment in the slave. We actually only need the pre hook / filter mechanism to do this, but wanted to come up with a generalized solution. In my mind, the ideas of hooks and event streams are not mutually exclusive. The event stream could use all the insertion points of hooks (and vice-versa). > > On Thu, Nov 20, 2014 at 10:24 PM, Benjamin Mahler < > benjamin.mah...@gmail.com > >> wrote: > >> Thanks for sending this Nik! > >> > >> The general idea of hooks sounds good. I think the question for hooks is > >> about which extensibility points make sense, and I think we'll have to > >> assess that with the introduction of each hook. > >> > >> (1) Is the idea behind hooks about actions, as you initially mentioned? > Or > >> is it about data transformation, which is what is shown in the API > example? > >> Or both? > Both. To Tom's point: service discovery systems with hooks could both 1) be notified when tasks are launched in a push-like fashion and 2) read from and alter the task info (for example with labels) We wanted to aim for flexibility. Similar to web server hooks, they can purposely change the behavior of request handling. If it cannot interact or influence the task sequence, it isn't a hook but rather a probe (similar to DTrace probes). > >> > >> (2) Is external tooling meant to describe hooks? Or is it meant to > describe > >> external tools that can leverage the hooks? This part is a bit fuzzy to > me. > >> > Hooks are defined by us and implementations can be provided module writers. Similar to dtrace probes, kernel developers chose interesting insertion points - some specific, others generic (where filters can be applied). > >> (3) Is instrumentation meant to allow us to gain visibility into things > >> like performance? If so, hooks might not be the most maintainable > approach > >> for that. Ideally we could add instrumentation into libprocess. Are > there > >> other forms of instrumentation in mind? > Instrumentation in libprocess is one thing (being able to analyze bandwidth/latency and message throughput/distribution - which would be pretty awesome). There should be plenty of non-libprocess code which gives insight into the task/status update life-cycle. Hooks would allow local aggregation of high-frequency events where you want to run user-defined code. > >> > >> Let's take the hook example you showed: > >> > >> // Performs an action and/or transforms the TaskInfo. > >> virtual TaskInfo preMasterLaunchTask(const TaskInfo& task) = 0; > >> virtual TaskInfo postMasterLaunchTask(const TaskInfo& task) = 0; > >> virtual TaskInfo preSlaveLaunchTask(const TaskInfo& task) = 0; > >> virtual TaskInfo postSlaveLaunchTask(const TaskInfo& task) = 0; > >> > >> Comment mine. This interface suggests synchronous transformation of > >> TaskInfo objects: > >> > >> (A) A transformation of TaskInfo seems a bit surprising to me, how can > one > >> do this generically? Is the idea that this would be customized per > >> framework within the hook? How would one differentiate the frameworks? > Via > >> role? This part seems fuzzy to me. > That was an oversimplified API. The arguments could/should match the parameters passed to Master::launchTask() for example. The hook runs in the thread and context, so we can share state with the called environment. The return argument could be a tuple with all incoming parameter types, taken these usually are const. > > >> > >> (B) I assume this also means that there is a side-effect inducing > "action" > >> that is performed, in addition to the transformation. I wouldn't be > able to > >> do any expensive or asynchronous work through these, unless we made them > >> return Futures. At which point, we would need some additional semantics > >> (e.g. ordering), and we'd be adding complexity to the Master. > Maybe only entry points, so they effectively before filters, makes sense (to avoid complexity of post actions being executed on arbitrary places and/or on scope exit (which could be one of many places and hard to reason about). > >> > >> (C) What differentiates pre and post in this case? Sending the message? > >> Let's consider that these are responsible for performing "actions". Then > >> differentiating pre and post seems a bit arbitrary, since the sending > of a > >> message is asynchronous. This means that the "action" occurs after the > >> message has been handed to libprocess, but not before it is sent to the > >> socket, not before it is sent over the wire, not before it is received > by > >> the slave, etc. Seems like an odd distinction, no? > See comment above. > >> > >> Looking forward to hearing more, thanks Nik! > >> > >> FYI I'm about to go on vacation, so I'm going to be slow at email. :) > >> > >> On Thu, Nov 20, 2014 at 10:07 AM, Dominic Hamon < > dha...@twopensource.com> > >> wrote: > >> > >> > Do you have specific use cases in mind? Ie, specific actions that > might > >> > take place pre and post launch? > >> > > >> > On Thu, Nov 20, 2014 at 9:37 AM, Niklas Nielsen <nik...@mesosphere.io > > > >> > wrote: > >> > > >> > > Hi everyone, > >> > > > >> > > > >> > > As a part of our current sprint at Mesosphere, we are striving to > work > >> on > >> > > and land an extension to the modules subsystem which we (per > >> > > https://issues.apache.org/jira/browse/MESOS-2060) have referred to > as > >> > > ‘hooks’. We wanted to give some background to this feature and will > be > >> > > asking for input to the proposal. > >> > > > >> > > The term is inspired by Apache Web Server hooks ( > >> > > http://httpd.apache.org/docs/2.2/developer/hooks.html) which allows > >> > > modules > >> > > to tie into the request processing life-cycle. It is different from > the > >> > > existing modules capability, in that the usual request processing > >> remains > >> > > untouched (and isn’t replaced fully as a regular module would do). > >> > > > >> > > In our case, we are interested in being able to tie into the > life-cycle > >> > of > >> > > tasks to run pre and post-actions during task launch in the master > and > >> > > slave processes. In general, it adds capability for all sorts of > >> external > >> > > tooling and instrumentation. > >> > > The main idea is to enable modules to register themselves as hook > >> > > providers. For example through a new flag: --hooks=”module_name1, > >> > > module_name2, ...” > >> > > > >> > > A new ‘HookManager’ will query each module and get an object back of > >> > type ‘ > >> > > Hooks’ which has virtual member functions which points to the > desired > >> > > callbacks in the module. > >> > > > >> > > > >> > > For example, > >> > > > >> > > class Hooks { > >> > > > >> > > public: > >> > > > >> > > virtual TaskInfo preMasterLaunchTask(TaskInfo task) = 0; > >> > > > >> > > virtual TaskInfo postMasterLaunchTask(TaskInfo task) = 0; > >> > > > >> > > virtual TaskInfo preSlaveLaunchTask(TaskInfo task) = 0; > >> > > > >> > > virtual TaskInfo postSlaveLaunchTask(TaskInfo task) = 0; > >> > > > >> > > // ... > >> > > > >> > > }; > >> > > > >> > > An example of the call site in Mesos could be: > >> > > > >> > > Master::launchTask(..., TaskInfo task, ...) > >> > > > >> > > { > >> > > > >> > > task = HookManager::preMasterLaunchTask(task); > >> > > > >> > > ... > >> > > > >> > > task = HookManager::postMasterLaunchTask(task); > >> > > > >> > > } > >> > > > >> > > We are not tied at all to how the hooks will be named (they could > >> > > potentially live in Master, Slave, Allocator, ...) subclasses, > return > >> > Try, > >> > > Option, Result to indicate failure and so on. > >> > > > >> > > > >> > > > >> > > Introducing the hook functionality is similar to what we’ve done in > the > >> > > past with Isolators for the MesosContainerizer that enables people > to > >> > > provide new functionality for launching containers. In that same > way, > >> we > >> > > want people to be able to provide new functionality with respect to > >> > > launching tasks without changing the existing task flow. > >> > > > >> > > > >> > > We’d love to get people’s feedback so we can move forward! > >> > > > >> > > > >> > > Thanks, > >> > > Niklas > >> > > > >> > > >> > > >> > > >> > -- > >> > Dominic Hamon | @mrdo | Twitter > >> > *There are no bad ideas; only good ideas that go horribly wrong.* > >> > > >> > Let's keep the discussion going :-)