Re: cloud state requirements

David Lutterkort Thu, 24 May 2012 16:07:50 -0700

On Thu, 2012-05-24 at 13:26 +0100, Martyn Taylor wrote:
> Had a chat with Jan this morning about some of your questions. I've 
> replied to qs with what we talked about plus a few thoughts of my own.
> 
> 
> On 05/24/2012 01:36 AM, David Lutterkort wrote:
> > We need to define in detail for each of these resources what needs to be
> > tracked, and what status changes constitute an 'event', but as a general
> > requirement this is fine.
> >
> > I suggest though that we start with something very basic, like tracking
> > only instance state changes, and expand from there gradually.
> I'd suggest that this is what ever changes from a->b, probably let the 
> driver to decide what to return since it should know what changes from 
> say Pending -> Running.


The state changes should all be on the level of DC instance states; we
do not want to expose driver-specific state ... I assume that's what you
are saying.

> > Do you just want a callback every time an event happens or more detail
> > about what changed (like 'instance went from pending to started') ?
> Same as above

I don't understand what you are saying here - basically, I am asking if
the callback should just be a 'go GET the details of the resource, they
have changed' or if there should be some indication of what has changed
an how (which might save you a roundtrip)

> >> TODO: authentication when posting data to hook url? we use oauth between
> >> other components now
> > For the very first cut, I'd either not do any auth, or do something
> > incredibly cheesy like a token.
> We use http basic authentication for the Conductor API.  So we can add 
> authentication in the callback url.

Works for me.

> >> TODO: how to handle hook failures (conductor is not accessible and hook
> >> can't be invoked)?
> > For sure, we'd retry for a while ... after that, retry very infrequently
> > (like once a day) + provide an API to retrieve undelivered events. That
> > way, if the conductor failure is transient, everything should catch up
> > fairly quickly. If the conductor failure was longer (some multi-hour
> > maintenance event), conductor can catch up by asking for events
> > explicitly.
> I wonder if its worth having some agreed upon policy.
> 
> For example, retry for 2 hours then revert the state change  (I'm not 
> sure if this is even possible).  But then would know if it receives 
> nothing in 2 hours then the request failed.

This would essentially be some lightweight monitoring functionality;
maybe we should just model that as another event with a callback ?

> >> TODO: how to handle credentials? will the stateful app keep credentails
> >> permanently for each instance being checked?
> > As much as this worries me from a security standpoint, I don't see
> > another way around this - cloud API's generally don't allow any
> > delegation of auth.
> >
> > There's a couple more TODO's connected to credentials:
> >
> > TODO: how are credentials changes handled (user revokes API Key and
> > generates a new one) ? [not for the first cut
> >
> > TODO: when are stored credentials purged ? We want to make sure we get
> > rid of them as quickly as possible.
> Why not make this a feature.  Credentials could be stored via the API 
> and managed via a single login (maybe OAuth)?
> 
> Then to answer the previous question, we could add another callback on 
> the credentials resource that is invoked when authentication fails.

I don't want to introduce an explicit credentials resource; because we
then need to safeguard that with credentials of its own. Rather, I'd
prefer that the state tracker just snoop the backend credentials out of
the ordinary DC requests.

> Another
> > I think combining these should be fairly straightforward; for frequency
> > I imagine we'll build something in based on the anticipated state
> > change: for example, while an instance is pending, we might poll its
> > state pretty frequently, once it's running, we'd back off and poll much
> > less often.
> >
> > What poll frequencies does conductor use today ?
> At the moment conductor uses 60s.
> 
> It makes more sense to me though to have this set at a driver level  
> rather than across the board.  Each provider has a different underlying  
> process and potentially different state machine so it makes sense to  
> have a different poll frequency for each, for example in EC2 you can 
> get  the state of a bunch of instances in one request so the poll 
> frequencey  might be higher than say a provider that only allows status 
> query on a  per instance basis
> 
> A nice to have feature would be to then offer a high level system  
> setting e.g. POLL_FREQUENCY=FASTEST, SLOWEST, DEFAULT which the driver 
> interprets.  This could be set at  boot time, or on a per request basis.

Yes, I think that makes the most sense (and to get started, we'll only
have one frequency, POLL_FREQUENCY=optimal ;)

David

Re: cloud state requirements

Reply via email to