Andrei,

I keep watching the update stream of github.com/axemblr/axemblr-provisionr and still see plenty of activity.

Do you have any rough idea of state transition latency and throughput you get when using Activiti and how this compares to using Whirr/jclouds in a single process?

The reason I ask is that although Activiti has good support for designing processes and programmatic control of the engine, it is necessarily DB transaction limited. An obvious alternative design is to use something that is actor based which can run entirely in RAM. I admit that an actor control system would make it harder to trace what happened, compared to business process control which is very much oriented toward human-in-the-loop.

Regards,

Paul

For example, the

On 20121214 7:34 , Andrei Savu wrote:
Hi guys,

There is no secret that at Axemblr we are using Apache Whirr for
provisioning and initial basic cluster configuration for Hadoop. As soon as
the machines are running we configure Hadoop by leveraging APIs from
existing tools like Cloudera Manager or Ambari.

All the orchestration needed to make this happen is not trivial if you want
the final system to be predictable, robust, restartable and easy to inspect
while running.

A few months ago we've realised that we need to re-work the machine
provisioning layer from Whirr and build a system that has the following
features:

* should be able to provision 10s or 100s of virtual machines by doing a
good job at handling API throttling and by using batch operations as much
as possible

* all the internal workflows should be persistent and as granular as
possible and each step should be idempotent

* it should be possible to restart the application server while starting
virtual machines with no impact

* it should have a modular architecture and provide enough flexibility to
be able to work with a large number of public and private clouds just by
replacing modules

* it should hide all this complexity behind a simple REST API and a simple
interactive shell

* it should be able to automatically build gold base images and use the to
spawn large clusters

We've spent some time looking for existing products that do all this and in
the end we've decided that it's better to start from scratch and build this
system as a new project based on Activiti, Apache Karaf, jclouds and native
sdks.

The source code is now publicly available at:

https://github.com/axemblr/axemblr-provisionr

I would really like to know what you think about the work we've done so
far. The project will improve a lot over the next couple of weeks / months
so I encourage you to stay tunned.

We want to bring this project to the Apache Foundation later on. I will
give a talk in february at ApacheCon NA on this.

Cheers,

-- Andrei Savu / axemblr.com


Reply via email to