Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "FAQ" page has been changed by JonathanEllis. The comment on this change is: add "cloned," "range_ghosts" sections. http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=39&rev2=40 -------------------------------------------------- * [[#working_with_timeuuid_in_java|Why is it so hard to work with TimeUUIDType in Java?]] * [[#i_deleted_what_gives|I delete data from Cassandra, but disk usage stays the same. What gives?]] * [[#reads_slower_writes|Why are reads slower than writes?]] + * [[#cloned|Why does nodeprobe ring only show one entry, even though my nodes logged that they see each other joining the ring?]] + * [[#range_ghosts|Why do deleted keys show up during range scans?]] <<Anchor(cant_listen_on_ip_any)>> == Why can't I make Cassandra listen on 0.0.0.0 (all my addresses)? == @@ -35, +37 @@ This is a symptom of having Cassandra's memtable thresholds too high, resulting in a storm of GC operations as the JVM frantically tries to free enough heap to continue to operate. You can increase the amount of memory the JVM uses, or decrease the insert threshold before Cassandra flushes its memtables. See MemtableThresholds for details. + + Setting your cache sizes too large can result in memory pressure. <<Anchor(existing_data_when_adding_new_nodes)>> == What happens to existing data in my cluster when I add new nodes? == @@ -74, +78 @@ <<Anchor(architecture)>> == What are SSTables and Memtables? == + See [[MemtableSSTable]] and [[MemtableThresholds]]. - A Memtable is Cassandra's in-memory representation of key/value pairs - before the data gets flushed to disk as an SSTable. An SSTable - (terminology borrowed from Google) stands for Sorted Strings Table and - is a file of key/value string pairs, sorted by keys. - There are important Memtable parameters described in [[MemtableThresholds|MemtableThresholds]]. - <<Anchor(working_with_timeuuid_in_java)>> == Why is it so hard to work with TimeUUIDType in Java? == @@ -171, +170 @@ The downside is that on a read, Cassandra has to (potentially) merge row fragments from multiple sstables on disk. We think this is a tradeoff worth making, first because scaling writes has always been harder than scaling reads, and second because as your data corpus grows Cassandra's read disadvantage narrows vs b-tree systems that have to do multiple seeks against a large index. See MemtableSSTable for more details. + <<Anchor(cloned)>> + == Why does nodeprobe ring only show one entry, even though my nodes logged that they see each other joining the ring? == + This happens when you have the same token assigned to each node. Don't do that. + + Most often this bites people who deploy by installing Cassandra on a VM (especially when using the Debian package, which auto-starts Cassandra after installation, thus generating and saving a token), then cloning that VM to other nodes. + + The easiest fix is to wipe the data and commitlog directories, thus making sure that each node will generate a random token on the next restart. + + <<Anchor(range_ghosts)>> + == Why do deleted keys show up during range scans? == + Because get_range_slice says, "apply this predicate to the range of rows given," meaning, if the predicate result is empty, we have to include an empty result for that row key. It is perfectly valid to perform such a query returning empty column lists for some or all keys, even if no deletions have been performed. + + So to special case leaving out result entries for deletions, we would have to check the entire rest of the row to make sure there is no undeleted data anywhere else either (in which case leaving the key out would be an error). + + This is what we used to do with the old get_key_range method, but the performance hit turned out to be unacceptable. +
