Andy Doan <[email protected]> writes:
> On 01/28/2013 02:27 AM, Dave Pigott wrote:
>> Looks like we need to plan for some downtime while this happens. I'll
>> make sure I take the boards offline. Can we run something outside the
>> lab to keep availability of job submission?
>
> This seems to be an idea worth at least discussing.
In fact, I think we have discussed it before :-)
> It seems like it would be really cool to create some sort of simple
> scheduler service that basically accepts accepts all job requests and
> saves them to disk.
Yeah, feels like it should be fairly simple. It could run in EC2 or
whatever.
> Maybe its preseeded with a job-id so that we can return unique job
> ID's back to the caller.
I think this is essential. It might help to move away from using the
database-assigned primary key id as the id we present to the user maybe?
One way this could work though is people _always_ submit to this simple
service in the cloud, in which case it could get to assign the IDs.
> We then create some type of import tool that, once the service is back
> online, can suck in this data and execute the jobs.
Right.
> Thoughts? Am I solving the wrong problem?
Well. The thing that occurs to me is that what we are doing here is
building a system that aims to be available for writes in the face of
network partitions, and other people have already built systems that
have this property -- it is basically the whole principle behind
Amazon's famous dynamo db [1] and the systems it inspired like Riak and
Cassandra. It seems unlikely that we'd do a better job than them.
One thing that I don't completely understand how to replicate if we have
a simple job-accepting scheduler in the cloud is the sanity check about
the submitting user being able to submit results to the stream specified
in the job -- or even if token provided while submitting the job is
valid, come to think of it!
Cheers,
mwh
[1] Everyone in computing should take the 40 minutes or so it takes to
read this paper:
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
if only for this quote:
"For example, customers should be able to view and add items to
their shopping cart even if disks are failing, network routes
are flapping, or data centers are being destroyed by tornados."
_______________________________________________
linaro-validation mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-validation