Cassandra tutorial (in docs)
Project: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/repo Commit: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/commit/96584f32 Tree: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/tree/96584f32 Diff: http://git-wip-us.apache.org/repos/asf/brooklyn-docs/diff/96584f32 Branch: refs/heads/0.6.0 Commit: 96584f32a8edcfc7ae59e27155942a223a7c4133 Parents: 38512bb Author: Aled Sage <[email protected]> Authored: Sat Oct 19 22:20:40 2013 +0100 Committer: Aled Sage <[email protected]> Committed: Sun Oct 20 16:40:29 2013 +0100 ---------------------------------------------------------------------- docs/use/examples/before-begin.include.md | 2 +- .../nosql-cassandra/cassandra.include.md | 283 +++++++++++++++++++ docs/use/examples/nosql-cassandra/index.md | 7 + 3 files changed, 291 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/96584f32/docs/use/examples/before-begin.include.md ---------------------------------------------------------------------- diff --git a/docs/use/examples/before-begin.include.md b/docs/use/examples/before-begin.include.md index b153548..bc706cf 100644 --- a/docs/use/examples/before-begin.include.md +++ b/docs/use/examples/before-begin.include.md @@ -36,7 +36,7 @@ Grab a copy of the Brooklyn distribution and set up `BROOKLYN_HOME`: ### Installing the Examples -Grab a copy of the brooklyn-examples source code and build with Maven: +Grab a copy of the brooklyn-examples source code and build it with Maven: {% highlight bash %} % git clone https://github.com/brooklyncentral/brooklyn-examples.git http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/96584f32/docs/use/examples/nosql-cassandra/cassandra.include.md ---------------------------------------------------------------------- diff --git a/docs/use/examples/nosql-cassandra/cassandra.include.md b/docs/use/examples/nosql-cassandra/cassandra.include.md new file mode 100644 index 0000000..a28ac55 --- /dev/null +++ b/docs/use/examples/nosql-cassandra/cassandra.include.md @@ -0,0 +1,283 @@ + +{% readj ../before-begin.include.md %} + +## Simple Cassandra Cluster + +Go to this particular example's directory: + +{% highlight bash %} +% cd simple-nosql-cluster +{% endhighlight %} + +The CLI needs to know where to find your compiled examples. You can set this up by exporting +the ``BROOKLYN_CLASSPATH`` environment variable in the following way: + +{% highlight bash %} +% export BROOKLYN_CLASSPATH=$(pwd)/target/classes +{% endhighlight %} + +The project ``simple-nosql-cluster`` includes several deployment descriptors +for deploying and managing Cassandra, under ``src/main/java``. + +The simplest of these, ``SimpleCassandraCluster``, will start a Cassandra cluster. The code is: + +{% highlight java %} +public class SimpleCassandraCluster extends AbstractApplication { + public void init() { + addChild(EntitySpec.create(CassandraCluster.class) + .configure(CassandraCluster.INITIAL_SIZE, 1) + .configure(CassandraCluster.CLUSTER_NAME, "Brooklyn")); + } +} +{% endhighlight %} + +To run that example on localhost (on *nix or Mac, assuming `ssh localhost` requires no password or passphrase): + +{% highlight bash %} +% ${BROOKLYN_HOME}/bin/brooklyn launch --app brooklyn.demo.SimpleCassandraCluster \ + --location localhost +{% endhighlight %} + +Then visit the Brooklyn console on ``localhost:8081``. +Note that the installation may take some time, because the default deployment downloads the software from +the official repos. You can monitor start-up activity for each entity in the ``Activity`` pane in the management console, +and see more detail by tailing the log file (``tail -f brooklyn.log``). + +This example runs successfully on a local machine because ``INITIAL_SIZE`` is configured to just one node +(a limitation of Cassandra is that every node must be on a different machine/VM). +If you want to run with more than one node in the cluster, you'll need to use a location +that either points to multiple existing machines or to a cloud provider where you can +provision new machines. + +With appropriate setup of credentials (as described [here]({{ site.url }}/use/guide/management/index.html#startup-config)) +this example can also be deployed to your favourite cloud. Let's pretend it's Amazon US East, as follows: + +{% highlight bash %} +% ${BROOKLYN_HOME}/bin/brooklyn launch --app brooklyn.demo.SimpleCassandraCluster \ + --location aws-ec2:us-east-1 +{% endhighlight %} + +If you want more nodes in your cluster, you can either modify the deployment descriptor (i.e. change the ``INITIAL_SIZE`` value), +or dynamically add more nodes by calling the ``resize`` effector through the web-console. +To do the latter, select cluster entity in the tree on the left, then click on the "effectors" tab, and invoke ``resize`` +with the desired number of nodes. + + +### Testing your Cluster + +An easy way to test your cluster is to use the ``cassandra-stress`` command line tool. +For example, run: + +{% highlight bash %} +# Substitute the id below for your VM +NODE_IDS=ec2-54-221-69-95.compute-1.amazonaws.com +/tmp/brooklyn-aled/installs/CassandraNode/1.2.9/apache-cassandra-1.2.9/tools/bin/cassandra-stress \ + --nodes ${NODE_IDS} \ + --replication-factor 1 \ + --progress-interval 1 \ + --num-keys 10000 \ + --operation INSERT +{% endhighlight %} + +This command will fire 10000 inserts at the cluster, via the nodes specified in the comma-separated node list. +If you change ``INSERT`` to ``READ``, it will read each of those 10000 values. + + +## High Availability Cassandra Cluster + +Ready for something more interesting? Try this: + +{% highlight bash %} +% ${BROOKLYN_HOME}/bin/brooklyn launch --app brooklyn.demo.HighAvailabilityCassandraCluster \ + --location aws-ec2:us-east-1 +{% endhighlight %} + +This launches the class ``HighAvailabilityCassandraCluster``, +which launches a Cassandra cluster configured to replicate across availability zones. + +To give some background for that statement, in +[AWS](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html) +(and various other clouds), a region is a +separate geographic area, consisting of multiple isolated locations known as availability zones. +To ensure high availability, the Cassandra cluster and thus the data should be spread across the +availability zones. Cassandra should be configured to ensure there is at least one replica in +each availability zone. In +[Cassandra terminology](http://www.datastax.com/docs/1.1/cluster_architecture/replication) +a region is normally mapped to a "datacenter" and an availability zone to a "rack". + +To be properly highly available, we need some automated policies to restart failed servers +and to replace unhealthy nodes. Brooklyn has these policies available out-of-the-box. +To wire them up, the essential code fragment looks like this: + +{% highlight java %} +public class HighAvailabilityCassandraCluster extends AbstractApplication { + public void init() { + addChild(EntitySpec.create(CassandraCluster.class) + .configure(CassandraCluster.CLUSTER_NAME, "Brooklyn") + .configure(CassandraCluster.INITIAL_SIZE, 1) + .configure(CassandraCluster.ENABLE_AVAILABILITY_ZONES, true) + .configure(CassandraCluster.NUM_AVAILABILITY_ZONES, 3) + .configure(CassandraCluster.ENDPOINT_SNITCH_NAME, "GossipingPropertyFileSnitch") + .configure(CassandraCluster.MEMBER_SPEC, EntitySpec.create(CassandraNode.class) + .policy(PolicySpec.create(ServiceFailureDetector.class)) + .policy(PolicySpec.create(ServiceRestarter.class) + .configure(ServiceRestarter.FAILURE_SENSOR_TO_MONITOR, ServiceFailureDetector.ENTITY_FAILED))) + .policy(PolicySpec.create(ServiceReplacer.class) + .configure(ServiceReplacer.FAILURE_SENSOR_TO_MONITOR, ServiceRestarter.ENTITY_RESTART_FAILED))); + } +} +{% endhighlight %} + +This code is doing a lot and deserves some more detailed explanation: + +* The ``MEMBER_SPEC`` describes the configuration of the Cassandra nodes to be created in the cluster. + Assuming you're happy to use all the default thrift port etc, then the only configuration to add is + a couple of policies. +* The ``ServiceFailureDetector`` policy watches the node's sensors, and generates + an ``ENTITY_FAILED`` event if the node goes down. +* The ``ServiceRestarter`` policy responds to this failure-event + by restarting the node. Its default configuration is that: if a node does not come back up, or if it + fails again within three minutes, then it will emit an ``ENTITY_RESTART_FAILED`` event. +* Finally, the ``SERVICE_REPLACER`` policy on the cluster responds to this event by replacing the + entire VM. It sets up a new VM in the same location, and then tears down the faulty node. + +> *Troubleshooting:* + +> *In AWS, some availability zones can be constrained for particular instance sizes (see + [this bug report](https://github.com/brooklyncentral/brooklyn/issues/973) + If you get this error, the workaround is to specify explicitly the availability zones to use. + This requires an additional line of code such as:* + +{% highlight java %} + .configure(AVAILABILITY_ZONE_NAMES, ImmutableList.of("us-east-1b", "us-east-1c", "us-east-1e")) +{% endhighlight %} + +> *However, this prevents the blueprint from being truly portable. We're looking at fixing this issue.* + + +## Wide Area Cassandra Cluster + +For critical enterprise use-cases, you'll want to run your Cassandra cluster across multiple regions, +or better yet across multiple cloud providers. This gives the highest level of availability for +the service. + +Try running: + +{% highlight bash %} +% ${BROOKLYN_HOME}/bin/brooklyn launch --app brooklyn.demo.WideAreaCassandraCluster \ + --location "aws-ec2:us-east-1,aws-ec2:us-west-2" +{% endhighlight %} + +This launches the class ``WideAreaCassandraCluster`` across two AWS regions. + +Cassandra provides some great support for this with the +[EC2MultiRegionSnitch](http://www.datastax.com/docs/1.1/cluster_architecture/replication) +The +[snitch](http://www.datastax.com/docs/1.1/cluster_architecture/replication#snitches) +maps IPs to racks and data centers; it defines how the nodes are grouped together within the overall +network topology. For wide-area deployments, it must also deal with when to use the private IPs +(within a region) and the public IPs (between regions). +You'll need a more generic snitch if you're going to span different cloud providers. +Brooklyn has a custom MultiCloudSnitch that we're looking to contribute back to Cassandra. + +The important piece of code in ``WideAreaCassandraCluster`` is: + +{% highlight java %} +public class WideAreaCassandraCluster extends AbstractApplication { + public void init() { + addChild(EntitySpec.create(CassandraFabric.class) + .configure(CassandraCluster.CLUSTER_NAME, "Brooklyn") + .configure(CassandraCluster.INITIAL_SIZE, 2) // per location + .configure(CassandraCluster.ENDPOINT_SNITCH_NAME, "brooklyn.entity.nosql.cassandra.customsnitch.MultiCloudSnitch") + .configure(CassandraNode.CUSTOM_SNITCH_JAR_URL, "classpath://brooklyn/entity/nosql/cassandra/cassandra-multicloud-snitch.jar")); + } +} +{% endhighlight %} + +The code below shows the wide-area example with the high-availability policies from the previous section also configured: + +{% highlight java %} +public class WideAreaCassandraCluster extends AbstractApplication { + public void init() { + addChild(EntitySpec.create(CassandraFabric.class) + .configure(CassandraCluster.CLUSTER_NAME, "Brooklyn") + .configure(CassandraCluster.INITIAL_SIZE, 2) // per location + .configure(CassandraCluster.ENDPOINT_SNITCH_NAME, "brooklyn.entity.nosql.cassandra.customsnitch.MultiCloudSnitch") + .configure(CassandraNode.CUSTOM_SNITCH_JAR_URL, "classpath://brooklyn/entity/nosql/cassandra/cassandra-multicloud-snitch.jar") + .configure(CassandraFabric.MEMBER_SPEC, EntitySpec.create(CassandraCluster.class) + .configure(CassandraCluster.MEMBER_SPEC, EntitySpec.create(CassandraNode.class) + .policy(PolicySpec.create(ServiceFailureDetector.class)) + .policy(PolicySpec.create(ServiceRestarter.class) + .configure(ServiceRestarter.FAILURE_SENSOR_TO_MONITOR, ServiceFailureDetector.ENTITY_FAILED))) + .policy(PolicySpec.create(ServiceReplacer.class) + .configure(ServiceReplacer.FAILURE_SENSOR_TO_MONITOR, ServiceRestarter.ENTITY_RESTART_FAILED)))); + } +} +{% endhighlight %} + +To run Cassandra across multiple clouds, try running: + +{% highlight bash %} +% ${BROOKLYN_HOME}/bin/brooklyn launch --app brooklyn.demo.WideAreaCassandraCluster \ + --location "aws-ec2:us-east-1,google-compute-engine,rackspace-cloudservers-uk" +{% endhighlight %} + + +### Testing your Wide-Area Cluster + +You can again use the ``cassandra-stress`` command line tool to test the wide-area cluster. + +Note that the replication strategy (such as +[NetworkTopologyStrategy](http://www.datastax.com/docs/1.0/cluster_architecture/replication#networktopologystrategy) +is specified when creating a +[keyspace](http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/configuration/configStorage_r.html). +The example below specifies a minimum of 1 replica in each datacenter. + +To do updates against a node in a given availability zone: + +{% highlight bash %} +NODE_IDS=<your node hostname> +/tmp/brooklyn-aled/installs/CassandraNode/1.2.9/apache-cassandra-1.2.9/tools/bin/cassandra-stress \ + --nodes ${NODE_IDS} \ + --replication-strategy NetworkTopologyStrategy \ + --strategy-properties=us-east-1:1,us-west-2:1 \ + --progress-interval 1 \ + --num-keys 10000 \ + --operation INSERT +{% endhighlight %} + +To check that the same data is available from a different region, target the reads +against an appropriate node: + +{% highlight bash %} +NODE_IDS=<your node hostname> +/tmp/brooklyn-aled/installs/CassandraNode/1.2.9/apache-cassandra-1.2.9/tools/bin/cassandra-stress \ + --nodes ${NODE_IDS} \ + --replication-strategy NetworkTopologyStrategy \ + --strategy-properties=us-east-1:1,us-west-2:1 \ + --progress-interval 1 \ + --num-keys 10000 \ + --operation READ +{% endhighlight %} + +To really test this, you may want to simulate the failure of a region first. +You can kill the VMs or ``kill -9`` the processes. But remember that if Brooklyn policies are configured +they will by default restart the processes automatically! You can disable the Brooklyn policies through +the brooklyn web-console (select the entity, go the policies tab, select the policy, and click "disable"). + + +## Putting it all together: CumulusRDF + +Let's put all this together to run an example application: +[CumulusRDF](https://code.google.com/p/cumulusrdf) +with a wide-area high-availability Cassandra cluster. + +{% highlight bash %} +% ${BROOKLYN_HOME}/bin/brooklyn launch --app brooklyn.demo.CumulusRDFApplication \ + --location "aws-ec2:us-east-1,aws-ec2:us-east-1" +{% endhighlight %} + + +## Contact us! + +If you encounter any difficulties or have any comments, please [tell us]({{ site.url }}/meta/contact.html) and we'll do our best to help. http://git-wip-us.apache.org/repos/asf/brooklyn-docs/blob/96584f32/docs/use/examples/nosql-cassandra/index.md ---------------------------------------------------------------------- diff --git a/docs/use/examples/nosql-cassandra/index.md b/docs/use/examples/nosql-cassandra/index.md new file mode 100644 index 0000000..3d1a64d --- /dev/null +++ b/docs/use/examples/nosql-cassandra/index.md @@ -0,0 +1,7 @@ +--- +layout: page +title: Cassandra Clusters +toc: /toc.json +--- + +{% readj cassandra.include.md %}
