Hi All, Have you ever dreamt of running a full *Mesos* cluster but never had the time to play with it? Better, a Mesos cluster running Spark code on *Cassandra*? Or wait. Maybe I'd also like to have it in *ElasticSearch AND Cassandra*. Hmm, now we're talking! *Aerospike* is really fast, I'd also like to have it. At the same time, for the data lake, you're running *Hadoop* in prod, so you'd also like to dump your stuff there. While we are at it, *MongoDB* is nice as well, and our devs could use that for the node.js apps. Or maybe you don't want to choose, and you want them ALL at the same time.
You're 20 lines away from having your dream become true. Keep reading. Spoiler: As you're all Juju users, you're also 35min away from having it running for real in HA in your preferred cloud A month ago I attended Strata+Hadoop in London, and I discovered a pretty awesome piece of technology called Stratio (www.stratio.com) Stratio is an open source Big Data analytics platform based on Spark. It uses a data pipeline built on Kafka and Flume, backed by one or more of Cassandra, MongoDB, ElasticSearch, HDFS or Aerospike (WIP) for the resilient storage. Analytics is provided by running Spark either in Standalone or in a Mesos cluster, managed by ZooKeeper. The ultimate version of Stratio is called Sparkta, and offers the ability to describe data processing with a very simple JSON language that tells input, output, processing to apply etc... (6 words only). Sparkta is due for GA sometime this month. Stratio deployer is based on Chef running from a specific node (Stratio Admin). Hence charming the whole thing was pretty easy as the charm is a wrapper around the chef based deployer, as if Juju was only managing the resources and specifying them to Chef Server. Each node is built depending on the relation that's created with the admin node (ZK, Mesos...). I also designed 4 reference architectures based on each of the storage backends. Each reference arch has: * 1x Stratio Admin (there is no HA yet) * 3x ZooKeeper * 2x Mesos Master * 3 instances of storage, also running Mesos Slaves for data locality. For HDFS, it's actually 8 nodes (3x data, 3x journal, 2x name) The code repositories lie in GitHub, but I push version to Launchpad at the same time in my personal namespace (samuel-cozannet) * Bundles: https://github.com/SaMnCo/bundle-stratio * Charms: * Admin: https://github.com/SaMnCo/charm-stratio-admin * Node: https://github.com/SaMnCo/charm-stratio-node * Discussion tracker: https://groups.google.com/forum/?hl=fr#!topic/stratio-admin/KCth-xqZdM4 Next Steps: * Clean up the code, make it faster (~35min deployment for now, should use the framework to fasten that up) * Add a demo use case, with Spark code that runs out of the box * Charm Sparkta when it's ready. There is little documentation yet as the project itself if really young. I'll be working with Stratio to make it happen, hopefully supported by them over time. * Charm Sparkta dashboard that shows results of analytics Any feedback/questions more than welcome. I hope you'll find this platform or some of its components useful. Stratio people are very nice and answer quickly to questions, don't hesitate to reach out to them. Best, Samuel -- Samuel Cozannet Cloud, Big Data and IoT Strategy Team Business Development - Cloud and ISV Ecosystem Changing the Future of Cloud Ubuntu <http://ubuntu.com> / Canonical UK LTD <http://canonical.com> / Juju <https://jujucharms.com> [email protected] mob: +33 616 702 389 skype: samnco Twitter: @SaMnCo_23
-- Juju mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
