Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "HadoopSupport" page has been changed by jeremyhanna.
The comment on this change is: adding an initial cluster configuration section..
http://wiki.apache.org/cassandra/HadoopSupport?action=diff&rev1=15&rev2=16

--------------------------------------------------

   * [[#MapReduce|MapReduce Support]]
   * [[#Pig|Pig Support]]
   * [[#Hive|Hive Support]]
+  * [[#ClusterConfig|Cluster Configuration]]
  
  <<Anchor(Overview)>>
  
@@ -73, +74 @@

  
  [[#Top|Top]]
  
+ <<Anchor(ClusterConfig)>>
+ 
+ == Cluster Configuration ==
+ 
+ If you would like to configure a Cassandra cluster so that Hadoop may operate 
over its data, it's best to overlay a Hadoop cluster over your Cassandra nodes. 
 You'll want to have a separate server for your Hadoop `namenode`/`jobtracker`. 
 Then install Hadoop `tasktracker`s on each of your Cassandra nodes.  That will 
allow the `jobtracker` to assign tasks to the Cassandra nodes that contain data 
for those tasks.  At least one node in your cluster will also need to be a 
`datanode`.  That's because Hadoop uses HDFS to store information like jar 
dependencies for your job, static data (like stop words for a word count), and 
things like that - it's the distributed cache.  It's a very small amount of 
data but the Hadoop cluster needs it to run properly.
+ 
+ The nice thing about having `tasktracker`s on every node is that 1, you get 
data locality and 2, your analytics engine scales with your data.
+ 
+ [[#Top|Top]]
+ 

Reply via email to