On Thu, 2012-05-24 at 13:26 +0100, Martyn Taylor wrote: > Had a chat with Jan this morning about some of your questions. I've > replied to qs with what we talked about plus a few thoughts of my own. > > > On 05/24/2012 01:36 AM, David Lutterkort wrote: > > We need to define in detail for each of these resources what needs to be > > tracked, and what status changes constitute an 'event', but as a general > > requirement this is fine. > > > > I suggest though that we start with something very basic, like tracking > > only instance state changes, and expand from there gradually. > I'd suggest that this is what ever changes from a->b, probably let the > driver to decide what to return since it should know what changes from > say Pending -> Running.
The state changes should all be on the level of DC instance states; we do not want to expose driver-specific state ... I assume that's what you are saying. > > Do you just want a callback every time an event happens or more detail > > about what changed (like 'instance went from pending to started') ? > Same as above I don't understand what you are saying here - basically, I am asking if the callback should just be a 'go GET the details of the resource, they have changed' or if there should be some indication of what has changed an how (which might save you a roundtrip) > >> TODO: authentication when posting data to hook url? we use oauth between > >> other components now > > For the very first cut, I'd either not do any auth, or do something > > incredibly cheesy like a token. > We use http basic authentication for the Conductor API. So we can add > authentication in the callback url. Works for me. > >> TODO: how to handle hook failures (conductor is not accessible and hook > >> can't be invoked)? > > For sure, we'd retry for a while ... after that, retry very infrequently > > (like once a day) + provide an API to retrieve undelivered events. That > > way, if the conductor failure is transient, everything should catch up > > fairly quickly. If the conductor failure was longer (some multi-hour > > maintenance event), conductor can catch up by asking for events > > explicitly. > I wonder if its worth having some agreed upon policy. > > For example, retry for 2 hours then revert the state change (I'm not > sure if this is even possible). But then would know if it receives > nothing in 2 hours then the request failed. This would essentially be some lightweight monitoring functionality; maybe we should just model that as another event with a callback ? > >> TODO: how to handle credentials? will the stateful app keep credentails > >> permanently for each instance being checked? > > As much as this worries me from a security standpoint, I don't see > > another way around this - cloud API's generally don't allow any > > delegation of auth. > > > > There's a couple more TODO's connected to credentials: > > > > TODO: how are credentials changes handled (user revokes API Key and > > generates a new one) ? [not for the first cut > > > > TODO: when are stored credentials purged ? We want to make sure we get > > rid of them as quickly as possible. > Why not make this a feature. Credentials could be stored via the API > and managed via a single login (maybe OAuth)? > > Then to answer the previous question, we could add another callback on > the credentials resource that is invoked when authentication fails. I don't want to introduce an explicit credentials resource; because we then need to safeguard that with credentials of its own. Rather, I'd prefer that the state tracker just snoop the backend credentials out of the ordinary DC requests. > Another > > I think combining these should be fairly straightforward; for frequency > > I imagine we'll build something in based on the anticipated state > > change: for example, while an instance is pending, we might poll its > > state pretty frequently, once it's running, we'd back off and poll much > > less often. > > > > What poll frequencies does conductor use today ? > At the moment conductor uses 60s. > > It makes more sense to me though to have this set at a driver level > rather than across the board. Each provider has a different underlying > process and potentially different state machine so it makes sense to > have a different poll frequency for each, for example in EC2 you can > get the state of a bunch of instances in one request so the poll > frequencey might be higher than say a provider that only allows status > query on a per instance basis > > A nice to have feature would be to then offer a high level system > setting e.g. POLL_FREQUENCY=FASTEST, SLOWEST, DEFAULT which the driver > interprets. This could be set at boot time, or on a per request basis. Yes, I think that makes the most sense (and to get started, we'll only have one frequency, POLL_FREQUENCY=optimal ;) David
