[Cassandra Wiki] Update of "FAQ" by JonathanEllis

Apache Wiki Mon, 14 Jun 2010 06:23:29 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "FAQ" page has been changed by JonathanEllis.
The comment on this change is: mutations against a single key are atomic.
http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=73&rev2=74

--------------------------------------------------

  = Frequently asked questions =
+  *
-  * [[#cant_listen_on_ip_any|Why can't I make Cassandra listen on 0.0.0.0 (all 
my addresses)?]]
+  [[#cant_listen_on_ip_any|Why can't I make Cassandra listen on 0.0.0.0 (all 
my addresses)?]]
+ 
+  *
-  * [[#ports|What ports does Cassandra use?]]
+  [[#ports|What ports does Cassandra use?]]
+ 
+  *
-  * [[#slows_down_after_lotso_inserts|Why does Cassandra slow down after doing 
a lot of inserts?]]
+  [[#slows_down_after_lotso_inserts|Why does Cassandra slow down after doing a 
lot of inserts?]]
+ 
+  *
-  * [[#existing_data_when_adding_new_nodes|What happens to existing data in my 
cluster when I add new nodes?]]
+  [[#existing_data_when_adding_new_nodes|What happens to existing data in my 
cluster when I add new nodes?]]
+ 
+  *
-  * [[#modify_cf_config|Can I add/remove/rename Column Families on a working 
cluster?]]
+  [[#modify_cf_config|Can I add/remove/rename Column Families on a working 
cluster?]]
+ 
+  *
-  * [[#node_clients_connect_to|Does it matter which node a Thrift client 
connects to?]]
+  [[#node_clients_connect_to|Does it matter which node a Thrift client 
connects to?]]
+ 
+  *
-  * [[#what_kind_of_hardware_should_i_use|What kind of hardware should I run 
Cassandra on?]]
+  [[#what_kind_of_hardware_should_i_use|What kind of hardware should I run 
Cassandra on?]]
+ 
+  *
-  * [[#architecture|What are SSTables and Memtables?]]
+  [[#architecture|What are SSTables and Memtables?]]
+ 
+  *
-  * [[#working_with_timeuuid_in_java|Why is it so hard to work with 
TimeUUIDType in Java?]]
+  [[#working_with_timeuuid_in_java|Why is it so hard to work with TimeUUIDType 
in Java?]]
+ 
+  *
-  * [[#i_deleted_what_gives|I delete data from Cassandra, but disk usage stays 
the same. What gives?]]
+  [[#i_deleted_what_gives|I delete data from Cassandra, but disk usage stays 
the same. What gives?]]
+ 
+  *
-  * [[#reads_slower_writes|Why are reads slower than writes?]]
+  [[#reads_slower_writes|Why are reads slower than writes?]]
+ 
+  *
-  * [[#cloned|Why does nodeprobe ring only show one entry, even though my 
nodes logged that they see each other joining the ring?]]
+  [[#cloned|Why does nodeprobe ring only show one entry, even though my nodes 
logged that they see each other joining the ring?]]
+ 
+  *
-  * [[#range_ghosts|Why do deleted keys show up during range scans?]]
+  [[#range_ghosts|Why do deleted keys show up during range scans?]]
+ 
+  *
-  * [[#change_replication|Can I change the ReplicationFactor on a live 
cluster?]]
+  [[#change_replication|Can I change the ReplicationFactor on a live cluster?]]
+ 
+  *
-  * [[#large_file_and_blob_storage|Can I store large files or BLOBs in 
Cassandra?]]
+  [[#large_file_and_blob_storage|Can I store large files or BLOBs in 
Cassandra?]]
+ 
+  *
-  * [[#jmx_localhost_refused|Nodetool says "Connection refused to host: 
127.0.1.1", for any remote host. What gives?]]
+  [[#jmx_localhost_refused|Nodetool says "Connection refused to host: 
127.0.1.1", for any remote host. What gives?]]
+ 
+  *
-  * [[#iter_world|How can I iterate over all the rows in a ColumnFamily?]]
+  [[#iter_world|How can I iterate over all the rows in a ColumnFamily?]]
+ 
+  *
-  * [[#no_keyspaces|Why were none of the keyspaces described in 
storage-conf.xml loaded?]]
+  [[#no_keyspaces|Why were none of the keyspaces described in storage-conf.xml 
loaded?]]
+ 
+  *
-  * [[#gui|Is there a GUI admin tool for Cassandra?]]
+  [[#gui|Is there a GUI admin tool for Cassandra?]]
+ 
+  *
-  * [[#a_long_is_exactly_8_bytes|Insert operation throws 
InvalidRequestException with message "A long is exactly 8 bytes"]]
+  [[#a_long_is_exactly_8_bytes|Insert operation throws InvalidRequestException 
with message "A long is exactly 8 bytes"]]
+ 
+  *
-  * [[#clustername_mismatch|Cassandra says "ClusterName mismatch: 
oldClusterName != newClusterName" and refuses to start]]
+  [[#clustername_mismatch|Cassandra says "ClusterName mismatch: oldClusterName 
!= newClusterName" and refuses to start]]
+ 
+  *
-  * [[#batch_mutate_atomic|Are batch_mutate operations atomic?]]
+  [[#batch_mutate_atomic|Are batch_mutate operations atomic?]]
+ 
  
  <<Anchor(cant_listen_on_ip_any)>>
  
@@ -80, +124 @@

  
   1. You can maintain a list of contact nodes (all or a subset of the nodes in 
the cluster), and configure your clients to choose among them.
   1. Use round-robin DNS and create a record that points to a set of contact 
nodes (recommended).
+  1.
-  1. Use the `get_string_property("token map")` RPC to obtain an 
update-to-date list of the nodes in the cluster and cycle through them.
+  Use the `get_string_property("token map")` RPC to obtain an update-to-date 
list of the nodes in the cluster and cycle through them.
+ 
   1. Deploy a load-balancer, proxy, etc.
  
  <<Anchor(what_kind_of_hardware_should_i_use)>>
@@ -203, +249 @@

  == Can I change the ReplicationFactor on a live cluster? ==
  Yes, but it will require restarting and running repair manually to change the 
replica count of existing data.
  
+  *
-  * Alter the ReplicationFactor for the desired keyspace(s) in the storage 
configuration on each node in the cluster.
+  Alter the ReplicationFactor for the desired keyspace(s) in the storage 
configuration on each node in the cluster.
+ 
   * Restart cassandra on each node in the cluster
  
  If you're reducing the ReplicationFactor:
@@ -221, +269 @@

  
   * The main limitation on a column and super column size is that all the data 
for a single key and column must fit (on disk) on a single machine(node) in the 
cluster.  Because keys alone are used to determine the nodes responsible for 
replicating their data, the amount of data associated with a single key has 
this upper bound. This is an inherent limitation of the distribution model.
  
+  *
-  * When large columns are created and retrieved, that columns data is loaded 
into RAM which  can get resource intensive quickly.  Consider, loading  200 
rows with columns  that store 10Mb image files each into RAM.  That small 
result set would consume about 2Gb of RAM.  Clearly as more and more large 
columns are loaded,  RAM would start to get consumed quickly.  This can be 
worked around, but will take some upfront planning and testing to get a 
workable solution for most applications.  You can find more information 
regarding this behavior here: [[MemtableThresholds|memtables]], and a possible 
solution in 0.7 here: 
[[https://issues.apache.org/jira/browse/CASSANDRA-16|CASSANDRA-16]].
+  When large columns are created and retrieved, that columns data is loaded 
into RAM which  can get resource intensive quickly.  Consider, loading  200 
rows with columns  that store 10Mb image files each into RAM.  That small 
result set would consume about 2Gb of RAM.  Clearly as more and more large 
columns are loaded,  RAM would start to get consumed quickly.  This can be 
worked around, but will take some upfront planning and testing to get a 
workable solution for most applications.  You can find more information 
regarding this behavior here: [[MemtableThresholds|memtables]], and a possible 
solution in 0.7 here: 
[[https://issues.apache.org/jira/browse/CASSANDRA-16|CASSANDRA-16]].
  
+ 
+  *
-  * Please refer to the notes in the Cassandra limitations section for more 
information: [[CassandraLimitations|Cassandra Limitations]]
+  Please refer to the notes in the Cassandra limitations section for more 
information: [[CassandraLimitations|Cassandra Limitations]]
+ 
  
  <<Anchor(jmx_localhost_refused)>>
  
@@ -289, +341 @@

  <<Anchor(batch_mutate_atomic)>>
  
  == Are batch_mutate operations atomic? ==
- No.  [[API#batch_mutate|batch_mutate]] is a way to group many operations into 
a single call in order to save on the cost of network round-trips.  If 
`batch_mutate` fails in the middle of its list of mutations, no rollback occurs 
and the mutations that have already been applied stay applied. The client 
should typically retry the mutation.
+ As a special case, mutations against a single key are atomic, but more 
generally no.  [[API#batch_mutate|batch_mutate]] allows grouping operations on 
many keys into a single call in order to save on the cost of network 
round-trips.  If `batch_mutate` fails in the middle of its list of mutations, 
no rollback occurs and the mutations that have already been applied stay 
applied. The client should typically retry the `batch_mutate` operation.

[Cassandra Wiki] Update of "FAQ" by JonathanEllis

Reply via email to