Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "GettingStarted" page has been changed by JonathanEllis: https://wiki.apache.org/cassandra/GettingStarted?action=diff&rev1=91&rev2=92 Comment: update for cqlsh == Cassandra documentation from DataStax == !DataStax's latest [[http://www.datastax.com/docs/1.2/index|Cassandra documentation]] covers topics from installation to troubleshooting, including a [[http://www.datastax.com/docs/quick_start/quickstart|Quick Start Guide]]. Documentation for older releases is also available. - + == Introduction == + This document aims to provide a few easy to follow steps to take the first-time user from installation, to running single node Cassandra, and overview to configure multinode cluster. Cassandra is meant to run on a cluster of nodes, but will run equally well on a single machine. This is a handy way of getting familiar with the software while avoiding the complexities of a larger system. + - This document aims to provide a few easy to follow steps to take the first-time user from installation, to running single node Cassandra, and overview to configure multinode cluster. - Cassandra is meant to run on a cluster of nodes, but will run equally well on a single machine. This is a handy way of getting familiar with the software while avoiding the complexities of a larger system. - == Step 0: Prerequisites and Connecting to the Community == Cassandra requires the most stable version of Java 1.6 you can deploy, preferably the Oracle/Sun JVM. Cassandra also runs on the IBM JVM, and should run on jrockit as well. - Note for OS X users: + . Note for OS X users: Some people running OS X have trouble getting Java 6 to work. If you've kept up with Apple's updates, Java 6 should already be installed (it comes in Mac OS X 10.5 Update 1). Unfortunately, Apple does not default to using it. What you have to do is change your `JAVA_HOME` environment setting to `/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home` and add `/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin` to the beginning of your `PATH`. - + The best way to ensure you always have up to date information on the project, releases, stability, bugs, and features is to subscribe to the users mailing list ([[mailto:[email protected]|subscription required]]) and participate in the #cassandra channel on [[http://webchat.freenode.net/?channels=#cassandra|IRC]]. + + <<Anchor(picking_a_version)>> <<Anchor(download_a_kit)>> + - - <<Anchor(picking_a_version)>> - <<Anchor(download_a_kit)>> - == Step 1: Download Cassandra == - * Download links for the latest stable release can always be found on the [[http://cassandra.apache.org/download|website]]. * Users of Debian or Debian-based derivatives can install the latest stable release in package form, see DebianPackaging for details. - * Users of RPM-based distributions can get packages from [[http://www.datastax.com/docs/1.1/install/install_rpm|Datastax]]. + * Users of RPM-based distributions can get packages from [[http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/install/installRHEL_t.html|Datastax]]. * If you are interested in building Cassandra from source, please refer to [[HowToBuild|How to Build]] page. - + For more details about misc builds, please refer to [[VersionsAndBuilds|Cassandra versions and builds]] page. - + <<Anchor(running_a_single_node)>> - + == Step 2: Basic Configuration == + The Cassandra configuration files can be found in the `conf` directory of binary and source distributions. If you have installed Cassandra from a deb or rpm package, the configuration files will be located in `/etc/cassandra`. + - - The Cassandra configuration files can be found in the `conf` directory of binary and source distributions. - If you have installed Cassandra from a deb or rpm package, the configuration files will be located in `/etc/cassandra`. - === Step 2.1: Directories Used by Cassandra === + If you've installed Cassandra with a deb or rpm package, the directories that Cassandra will use should already be created an have the correct permissions. Otherwise, you will want to check the following config settings from `conf/cassandra.yaml`: `data_file_directories` (`/var/lib/cassandra/data`), `commitlog_directory` (`/var/lib/cassandra/commitlog`), and `saved_caches_directory` (`/var/lib/cassandra/saved_caches`). Make sure these directories exist and can be written to. - If you've installed Cassandra with a deb or rpm package, the directories that Cassandra will use should already be created an have the correct permissions. Otherwise, you will want to check the following config settings. - - In `conf/cassandra.yaml` you will find the following configuration options: `data_file_directories` (`/var/lib/cassandra/data`), `commitlog_directory` (`/var/lib/cassandra/commitlog`), and `saved_caches_directory` (`/var/lib/cassandra/saved_caches`). Make sure these directories exist and can be written to. By default, Cassandra will write its logs in `/var/log/cassandra/`. Make sure this directory exists and is writeable, or change this line in `conf/log4j-server.properies`: + {{{ log4j.appender.R.File=/var/log/cassandra/system.log }}} + JVM-level settings such as heap size can be set in `conf/cassandra-env.sh`. - - === Step 2.2: Configure Memory Usage (Optional) === - By default, Cassandra will allocate memory based on physical memory your system has, using somewhere between 1/4 and 1/2 of the available RAM. - - If you want to specify how much memory Cassandra should use explicitly, edit `conf/cassandra-env.sh`, find the following lines, uncomment them, and change their values: - {{{ - #MAX_HEAP_SIZE="4G" - #HEAP_NEWSIZE="800M" - }}} - For `MAX_HEAP_SIZE` use as little as you can get away with. It's recommended to stay within 8G because much beyond that, the CMS GC pauses interfere with normal operations. - For `HEAP_NEWSIZE` use the number of cores * 100 but don't exceed 800M. With too much allocated, ParNew GC pauses become detrimental. - == Step 3: Start Cassandra == And now for the moment of truth, start up Cassandra by invoking '`bin/cassandra -f`' from the command line<<FootNote(To learn more about controlling the behavior of startup scripts, see RunningCassandra.)>>. The service should start in the foreground and log gratuitously to the console. Assuming you don't see messages with scary words like "error", or "fatal", or anything that looks like a Java stack trace, then everything should be working. @@ -63, +46 @@ If you start up Cassandra without the "-f" option, it will run in the background. You can stop the process by killing it, using '`pkill -f CassandraDaemon`', for example. - == Step 4: Using cassandra-cli == + . Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up Cassandra simply by untarring and invoking `bin/cassandra -f` with root privileges. Snow Leopard ships with Java 1.6.0 and does not require changing the `JAVA_HOME` environment variable or adding any directory to your `PATH`. On Linux just make sure you have a working Java JDK package installed such as the `openjdk-6-jdk` on Ubuntu Lucid Lynx. + == Step 4: Using cqlsh == + `bin/cqlsh` is an interactive command line interface for Cassandra. You can define the schema and interact with data using it. Run the following command to connect to your local Cassandra instance: - `bin/cassandra-cli` is an interactive command line interface for Cassandra. You can alter the schema and interact with data using the cli. - Run the following command to connect to your local Cassandra instance: - {{{ - bin/cassandra-cli - }}} - - You should see the following prompt, if successful: - {{{ - Connected to: "Test Cluster" on 127.0.0.1/9160 - Welcome to Cassandra CLI version 1.0.7 - - Type 'help;' or '?' for help. - Type 'quit;' or 'exit;' to quit. - - [default@unknown] - }}} - - You can access to the online help with 'help;' command. Commands are terminated with a semicolon (';') in the cli. {{{ - [default@unknown] help; + $ bin/cqlsh }}} + You should see the following prompt, if successful: - - First, create a keyspace for your test. {{{ + Connected to Test Cluster at localhost:9160. + [cqlsh 2.3.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0] + Use HELP for help. - [default@unknown] create keyspace DEMO - with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' - and strategy_options = {replication_factor:1}; - f53dff10-5bd8-11e1-0000-915a024292eb - Waiting for schema agreement... - ... schemas agree across the cluster - [default@unknown] }}} + For clarity, we will omit the cqlsh prompt in the following examples. - Don't forget to add a semicolon (';') at end of the command. + You can access the online help with 'help;' command. Commands are terminated with a semicolon (';') in cqlsh. + First, create a keyspace -- a namespace of tables. - Second, authenticate to the DEMO keyspace: - {{{ - [default@unknown] use DEMO; - Authenticated to keyspace: DEMO - [default@DEMO] - }}} - - Third, create a `Users` column family: - {{{ - [default@DEMO] create column family Users - ... with key_validation_class = 'UTF8Type' - ... and comparator = 'UTF8Type' - ... and default_validation_class = 'UTF8Type'; - [default@DEMO] - }}} - - Now you can store data into `Users` column family: {{{ + CREATE KEYSPACE mykeyspace + WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; - [default@DEMO] set Users[1234][name] = scott; - Value inserted. - Elapsed time: 10 msec(s). - [default@DEMO] set Users[1234][password] = tiger; - Value inserted. - Elapsed time: 10 msec(s). - [default@DEMO] }}} + Second, authenticate to the new keyspace: - You have inserted a row into the `Users` column family. The row key is '1234', and we set values for two columns in the row: 'name', and 'password'. + {{{ + USE mykeyspace; + }}} + Third, create a `users` table: + {{{ + CREATE TABLE users ( + user_id int PRIMARY KEY, + fname text, + lname text + ); + }}} + Now you can store data into `users`: + + {{{ + INSERT INTO users (user_id, fname, lname) + VALUES (1745, 'john', 'smith'); + INSERT INTO users (user_id, fname, lname) + VALUES (1744, 'john', 'doe'); + INSERT INTO users (user_id, fname, lname) + VALUES (1746, 'john', 'smith'); + }}} Now let's fetch the data you inserted: + {{{ - [default@DEMO] get Users[1234]; - => (column=name, value=scott, timestamp=1350769161684000) - => (column=password, value=tiger, timestamp=1350769245191000) + SELECT * FROM users; + }}} + You should see output reflecting your new rows: - Returned 2 results. - Elapsed time: 67 msec(s). - [default@DEMO] + {{{ + user_id | fname | lname + ---------+-------+------- + 1745 | john | smith + 1744 | john | doe + 1746 | john | smith }}} + You can retrieve data about users whose last name is smith by creating an index, then querying the table as follows: + {{{ + CREATE INDEX ON users (lname); - You can easily specify types other than UTF-8 when creating or updating a column family. See '`help update column family;`' and '`help create column family;`' for more details. + SELECT * FROM users WHERE lname = 'smith'; - To be certain though, take some time to try out the examples in CassandraCli before moving on - Also, if you run into problems, Don't Panic, calmly proceed to [[#if_something_goes_wrong|If Something Goes Wrong]]. - - Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up Cassandra simply by untarring and invoking `bin/cassandra -f` with root privileges. Snow Leopard ships with Java 1.6.0 and does not require changing the `JAVA_HOME` environment variable or adding any directory to your `PATH`. On Linux just make sure you have a working Java JDK package installed such as the `openjdk-6-jdk` on Ubuntu Lucid Lynx. - + user_id | fname | lname + ---------+-------+------- + 1745 | john | smith + 1746 | john | smith + }}} - == Configuring Multinode Cluster == + == Configuring Multinode Clusters == - Now you have single working Cassandra node. It is a Cassandra cluster which has only one node. By adding more nodes, you can make it a multi node cluster. Setting up a Cassandra cluster is ''almost'' as simple as repeating the above procedures for each node in your cluster. There are a few minor exceptions though. - + Cassandra nodes exchange information about one another using a mechanism called Gossip, but to get the ball rolling a newly started node needs to know of at least one other, this is called a '''Seed'''. It's customary to pick a small number of relatively stable nodes to serve as your seeds, but there is no hard-and-fast rule here. Do make sure that each seed also knows of at least one other, remember, the goal is to avoid a chicken-and-egg scenario and provide an avenue for all nodes in the cluster to discover one another. - - In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip and Thrift, ('''listen_address''' and '''rpc_address''' respectively). Use a 'listen_address` that will be reachable from the `listen_address` used on all other nodes, and a `rpc_address` that will be accessible to clients. - - One other thing you need to care at multi node cluster is '''Token'''. Each node in the cluster owns a part of token range from 0 to 2^127-1. - If the Nth node in the cluster has token value T(N), the node owns range from T(N-1)+1 to T(N). Cassandra decide nodes where a data should be stored based on the consistent mapping of the row key and token range (refer to RandomPartitioner, ByteOrderedPartitioner). + In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip and CQL, ('''listen_address''' and '''rpc_address''' respectively). Use a 'listen_address` that will be reachable from the `listen_address` used on all other nodes, and a `rpc_address` that will be accessible to clients. - The token can be assigned to node by '''initial_token''' parameter in cassandra.yaml. The parameter is effective only at the first boot of the node. Once you boot a node, use 'nodetool move' command to change the assigned token. You need to specify appropriate initial_token for each node to balance data load across the nodes. Here is a python script to calculate balanced tokens. - {{{ - # Number of nodes in the cluster - num_node = 4 - - for n in range(num_node): - print int(2**127 / num_node * n) - }}} Once everything is configured and the nodes are running, use the `bin/nodetool ring` utility to verify a properly connected cluster. For example: - + {{{ - eevans@achilles:‾$ bin/nodetool -host 192.168.0.10 -p 7199 ring + eevans@achilles:‾$ bin/nodetool -host 192.168.0.10 -p 7199 status - Address DC Rack Status State Load Owns Token - 127605887595351923798765477786913079296 - 192.168.0.10 DC1 r1 Up Normal 17.3 MB 25.00% 0 - 192.168.0.11 DC1 r1 Up Normal 17.4 MB 25.00% 42535295865117307932921825928971026432 - 192.168.0.12 DC1 r1 Up Normal 37.2 MB 25.00% 85070591730234615865843651857942052864 - 192.168.0.13 DC1 r1 Up Normal 24.55 MB 25.00% 127605887595351923798765477786913079296 + Datacenter: datacenter1 + ======================= + Status=Up/Down + |/ State=Normal/Leaving/Joining/Moving + -- Address Load Tokens Owns Host ID Rack + UN 127.0.0.3 30.99 KB 256 32.4% 92b20e08-9ddd-4f55-9173-8516e74d27f5 rack1 + UN 127.0.0.2 31 KB 256 31.5% b9616658-c744-48fb-b64f-83f96b007d93 rack1 + UN 127.0.0.1 30.96 KB 256 36.1% f7a08973-85bd-460f-8176-d6f9df8c23f4 rack1 }}} Advanced cluster management is described in [[Operations]]. - - If you don't yet have access to hardware for a Cassandra cluster you can try it out on EC2 with CloudConfig. + If you don't yet have access to hardware for a real Cassandra cluster, you can manage local clusters easily with [[https://github.com/pcmanus/ccm|ccm]] (Cassandra Cluster Manager). + - For more details about configuring multi node cluster, please refer to [[MultinodeCluster]]. + For more details about configuring multi node cluster, please refer to MultinodeCluster. - + == Write your application == + Review the resources on DataModeling. The full CQL documentation is [[http://www.datastax.com/documentation/cql/3.0/webhelp/index.html|here]]. + + DataStax sponsors development of the CQL drivers at https://github.com/datastax. The full list of CQL drivers is on the ClientOptions page. + - The recommended way to communicate with Cassandra in your application is to use a [[http://wiki.apache.org/cassandra/ClientOptions|higher-level client]]. These provide programming language specific API:s for talking to Cassandra in a variety of languages. The details will vary depending on programming language and client, but in general using a higher-level client will mean that you have to write less code and get several features for free that you would otherwise have to write yourself. - - That said, it is useful to know that Cassandra uses [[http://thrift.apache.org/|Thrift]] for its external client-facing API. Cassandra's main API/RPC/Thrift port is 9160. Thrift supports a [[http://svn.apache.org/viewvc/thrift/trunk/lib/|wide variety of languages]] so you can code your application to use Thrift directly if you so chose (but again we recommend a [[http://wiki.apache.org/cassandra/ClientOptions|high-level client]] where available). - - Important note: If you intend to use thrift directly, you need to install a version of thrift that matches the revision that your version of Cassandra uses. InstallThrift - - Cassandra's main API/RPC/Thrift port is 9160 by default, which is defined as rpc_port in cassandra.yaml. It is a common mistake for API clients to connect to the JMX port instead. - - Checking out a demo application like [[http://github.com/twissandra/twissandra|Twissandra]] (Python + Django) will also be useful. - <<Anchor(if_something_goes_wrong)>> - + == If Something Goes Wrong == If you followed the steps in this guide and failed to get up and running, we'd love to help. Here's what we need. - + 1. If you are running anything other than a stable release, please upgrade first and see if you can still reproduce the problem. 1. Make sure debug logging is enabled (hint: `conf/log4j.properties`) and save a copy of the output. 1. Search the [[http://news.gmane.org/gmane.comp.db.cassandra.user|mailing list archive]] and see if anyone has reported a similar problem and what, if any resolution they received. 1. Ditto for the [[https://issues.apache.org/jira/browse/CASSANDRA|bug tracking system]]. 1. See if you can put together a unit test, script, or application that reproduces the problem. - + Finally, post a message with all relevant details to the list ([[mailto:[email protected]|subscription required]]), or hop onto [[http://webchat.freenode.net/?channels=#cassandra|IRC]] (network irc.freenode.net, channel #cassandra) and let us know. - + <<BR>> <<BR>> ----
