Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "FAQ" page has been changed by JonathanEllis: http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=150&rev2=151 Comment: remove some obsolete questions and add Partitioners link * [[#ports|What ports does Cassandra use?]] * [[#slows_down_after_lotso_inserts|Why does Cassandra slow down after doing a lot of inserts?]] * [[#existing_data_when_adding_new_nodes|What happens to existing data in my cluster when I add new nodes?]] - * [[#modify_cf_config|Can I add/remove/rename Column Families on a working cluster?]] * [[#node_clients_connect_to|Does it matter which node a Thrift or higher-level client connects to?]] * [[#what_kind_of_hardware_should_i_use|What kind of hardware should I run Cassandra on?]] * [[#architecture|What are SSTables and Memtables?]] * [[#working_with_timeuuid_in_java|Why is it so hard to work with TimeUUIDType in Java?]] * [[#i_deleted_what_gives|I delete data from Cassandra, but disk usage stays the same. What gives?]] - * [[#reads_slower_writes|Why are reads slower than writes?]] * [[#cloned|Why does nodeprobe ring only show one entry, even though my nodes logged that they see each other joining the ring?]] * [[#range_ghosts|Why do deleted keys show up during range scans?]] * [[#change_replication|Can I change the ReplicationFactor on a live cluster?]] @@ -20, +18 @@ * [[#iter_world|How can I iterate over all the rows in a ColumnFamily?]] * [[#no_keyspaces|Why were none of the keyspaces described in storage-conf.xml loaded?]] * [[#gui|Is there a GUI admin tool for Cassandra?]] - * [[#a_long_is_exactly_8_bytes|Insert operation throws InvalidRequestException with message "A long is exactly 8 bytes"]] * [[#clustername_mismatch|Cassandra says "ClusterName mismatch: oldClusterName != newClusterName" and refuses to start]] * [[#batch_mutate_atomic|Are batch_mutate operations atomic?]] * [[#hadoop_support|Is Hadoop (i.e. Map/Reduce, Pig, Hive) supported?]] @@ -33, +30 @@ * [[#bulkloading|How do I bulk load data into Cassandra?]] * [[#range_rp|Why aren't range slices/sequential scans giving me the expected results?]] * [[#unsubscribe|How do I unsubscribe from the email list?]] - * [[#cleaning_compacted_tables|I compacted, so why did space used not decrease?]] * [[#mmap|Why does top report that Cassandra is using a lot more memory than the Java heap max?]] * [[#jna|I'm getting java.io.IOException: Cannot run program "ln" when trying to snapshot or update a keyspace]] * [[#replicaplacement|How does Cassandra decide which nodes have what data?]] * [[#cachehitrateunits|I have a row or key cache hit rate of 0.XX123456789. Is that XX% or 0.XX% ?]] - * [[#bigcommitlog|Commit Log gets very big. Cassandra does not delete "old" commit logs. Why?]] * [[#seed|What are seeds?]] * [[#seed_spof|Does single seed mean single point of failure?]] * [[#jconsole_array_arg|Why can't I call jmx method X on jconsole? (ex. getNaturalEndpoints)]] @@ -48, +43 @@ * [[#dropped_messages|Why do I see "... messages dropped.." in the logs?]] * [[#cli_keys|Why does the 0.8 cli not assume keys are strings anymore?]] * [[#memlock|Cassandra dies with "java.lang.OutOfMemoryError: Map failed"]] + * [[#opp|Why should I avoid order-preserving partitioners]] <<Anchor(cant_listen_on_ip_any)>> @@ -84, +80 @@ Unless you know precisely what you're doing and are aware of how the Cassandra internals work you should never introduce a new empty node to your cluster and have autoboostrap disabled. In version 0.7 under write load it will cause writes to be sent to the new node before the schema arrives from another member of the cluster. This would also indicate to clients that the new node is responsible for servicing reads for data that it definitely doesn't have. In Cassandra 0.4 and below, it is recommended that you manually specify a value for "InitialToken" in the config file of a new node. - - <<Anchor(modify_cf_config)>> - - == Can I add/remove/rename Column Families on a working cluster? == - Yes, but it's important that you do it correctly. For Cassandra 0.7 and newer use cassandra-cli. - - For Cassandra versions before 0.7: - - 1. Empty the commitlog with "nodetool drain." - 1. Shutdown Cassandra and verify that there is no remaining data in the commitlog. - 1. Delete the sstable files (-Data.db, -Index.db, and -Filter.db) for any CFs removed, and rename the files for any CFs that were renamed. - 1. Make necessary changes to your storage-conf.xml. - 1. Start Cassandra back up and your edits should take effect. - - ''see also: [[https://issues.apache.org/jira/browse/CASSANDRA-44|CASSANDRA-44]]'' <<Anchor(node_clients_connect_to)>> @@ -222, +203 @@ == I delete data from Cassandra, but disk usage stays the same. What gives? == Data you write to Cassandra gets persisted to SSTables. Since SSTables are immutable, the data can't actually be removed when you perform a delete, instead, a marker (also called a "tombstone") is written to indicate the value's new status. Never fear though, on the first compaction that occurs after ''GCGraceSeconds'' (hint: storage-conf.xml) have expired, the data will be expunged completely and the corresponding disk space recovered. See DistributedDeletes for more detail. - <<Anchor(reads_slower_writes)>> - - == Why are reads slower than writes? == - Unlike all major relational databases and some NoSQL systems, Cassandra does not use b-trees and in-place updates on disk. Instead, it uses a sstable/memtable model like Bigtable's: writes to each ColumnFamily are grouped together in an in-memory structure before being flushed (sorted and written to disk). This means that writes cost no random I/O, compared to a b-tree system which not only has to seek to the data location to overwrite, but also may have to seek to read different levels of the index if it outgrows disk cache! - - The downside is that on a read, Cassandra has to (potentially) merge row fragments from multiple sstables on disk. We think this is a tradeoff worth making, first because scaling writes has always been harder than scaling reads, and second because as your data corpus grows Cassandra's read disadvantage narrows vs b-tree systems that have to do multiple seeks against a large index. See MemtableSSTable for more details. - <<Anchor(cloned)>> == Why does nodeprobe ring only show one entry, even though my nodes logged that they see each other joining the ring? == @@ -310, +284 @@ * [[https://github.com/sebgiroux/Cassandra-Cluster-Admin|Cassandra Cluster Admin]], a PHP-based web UI. * [[http://toadforcloud.com | Toad for Cloud Databases]], a desktop application and Eclipse plugin which support Cassandra. - <<Anchor(a_long_is_exactly_8_bytes)>> - - == Insert operation throws InvalidRequestException with message "A long is exactly 8 bytes" == - You are propably using !LongType column sorter in your column family. !LongType assumes that the numbers stored into column names are exactly 64bit (8 bytes) long and in big endian format. Example code how to pack and unpack an integer for storing into cassandra and unpacking it for php: - - {{{ - /** - * Takes php integer and packs it to 64bit (8 bytes) long big endian binary representation. - * @param $x integer - * @return string eight bytes long binary repersentation of the integer in big endian order. - */ - public static function pack_longtype($x) { - return pack('C8', ($x >> 56) & 0xff, ($x >> 48) & 0xff, ($x >> 40) & 0xff, ($x >> 32) & 0xff, - ($x >> 24) & 0xff, ($x >> 16) & 0xff, ($x >> 8) & 0xff, $x & 0xff); - } - - /** - * Takes eight bytes long big endian binary representation of an integer and unpacks it to a php integer. - * @param $x - * @return php integer - */ - public static function unpack_longtype($x) { - $a = unpack('C8', $x); - return ($a[1] << 56) + ($a[2] << 48) + ($a[3] << 40) + ($a[4] << 32) + ($a[5] << 24) + ($a[6] << 16) + ($a[7] << 8) + $a[8]; - } - }}} <<Anchor(clustername_mismatch)>> == Cassandra says "ClusterName mismatch: oldClusterName != newClusterName" and refuses to start == @@ -371, +319 @@ <<Anchor(using_cassandra)>> == Who is using Cassandra and for what? == - For information on who is using Cassandra and what they are using it for, see CassandraUsers. + See CassandraUsers. <<Anchor(what_about_the_obdc)>> @@ -411, +359 @@ == How do I unsubscribe from the email list? == Send an email to [email protected] - - <<Anchor(cleaning_compacted_tables)>> - - == I compacted, so why did space used not decrease? == - SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. You can force a GC from jconsole if necessary, but Cassandra will force one itself if it detects that it is low on space. A compaction marker is also added to obsolete sstables so they can be deleted on startup if the server does not perform a GC before being restarted. Read more on this subject [[http://wiki.apache.org/cassandra/MemtableSSTable|here]]. <<Anchor(mmap)>> @@ -449, +392 @@ == I have a row or key cache hit rate of 0.XX123456789 reported by JMX. Is that XX% or 0.XX% ? == XX% - <<Anchor(bigcommitlog)>> - - == Commit Log gets very big. Cassandra does not delete "old" commit logs. Why? == - You probably have one or more Column Families with very low throughput. These will typically not be flushed by crossing the throughput or operations thresholds, causing old commit segments to be retained until the memtable_flush_after_min threshold has been crossed. The default value for this threshold is 60 minutes and may be decreased via cassandra-cli by doing: - - {{{ update column family XXX with memtable_flush_after=YY; }}} - - where YY is a number of minutes. - <<Anchor(seed)>> == What are seeds? == @@ -486, +420 @@ <<Anchor(jconsole_array_arg)>> == Why can't I call jmx method X on jconsole? (ex. getNaturalEndpoints) == - Some of JMX operations can't be called with jconsole because the buttons are inactive for them. Jconsole doesn't support array argument, so operations which need array as arugument can't be invoked on jconsole. You need to write a JMX client to call such operations or need array capable JMX monitoring tool. + Some of JMX operations can't be called with jconsole because the buttons are inactive for them. Jconsole doesn't support array argument, so operations which need array as arugument can't be invoked on jconsole. You need to write a JMX client to call such operations or need array-capable JMX monitoring tool. <<Anchor(max_key_size)>> @@ -516, +450 @@ <<Anchor(schema_disagreement)>> == What are schema disagreement errors and how do I fix them? == - Cassandra schema updates [[LiveSchemaUpdates|assume that schema changes are done one-at-a-time]]. If you make multiple changes at the same time, you can cause some nodes to end up with a different schema, than others. (Before 0.7.6, this can also be caused by cluster system clocks being substantially out of sync with each other.) + Prior to Cassandra 1.1 and 1.2, Cassandra schema updates [[LiveSchemaUpdates|assume that schema changes are done one-at-a-time]]. If you make multiple changes at the same time, you can cause some nodes to end up with a different schema, than others. (Before 0.7.6, this can also be caused by cluster system clocks being substantially out of sync with each other.) To fix schema disagreements, you need to force the disagreeing nodes to rebuild their schema. Here's how: @@ -577, +511 @@ == Cassandra dies with "java.lang.OutOfMemoryError: Map failed" == '''IF''' Cassandra is dying specifically with the "Map failed" message it means the OS is denying java the ability to lock more memory. In linux, this typically means memlock is limited. Check /proc/<pid of cassandra>/limits to verify this and raise it (eg, via ulimit in bash.) You may also need to increase vm.max_map_count. Note that the debian and redhat packages handle this for you automatically. + <<Anchor(opp)>> + == Why should I avoid order-preserving partitioners? == + See [Partitioners]. +
