[Cassandra Wiki] Update of "StorageConfiguration" by Jo nHermes

Apache Wiki Tue, 24 Aug 2010 15:57:55 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "StorageConfiguration" page has been changed by JonHermes.
http://wiki.apache.org/cassandra/StorageConfiguration?action=diff&rev1=30&rev2=31

--------------------------------------------------

   * auto_bootstrap
  Set to 'true' to make new [non-seed] nodes automatically migrate the right 
data to themselves.  (If no InitialToken is specified, they will pick one  such 
that they will get half the range of the most-loaded node.) If a node starts up 
without bootstrapping, it will mark itself bootstrapped so that you can't 
subsequently accidently bootstrap a node with data on it.  (You can reset this 
by wiping your data and commitlog directories.)
  
- Off by default so that new clusters don't bootstrap immediately.  You should 
turn this on when you start adding new nodes to a cluster that already has data 
on it.
+ Default is: 'false', so that new clusters don't bootstrap immediately.  You 
should turn this on when you start adding new nodes to a cluster that already 
has data on it.
  
   * cluster_name
  The name of this cluster.  This is mainly used to prevent machines in one 
logical cluster from joining another.
@@ -45, +45 @@

  32
  
   * disk_access_mode
- auto, mmap, mmap_index_only, standard
+ The options are: 'auto', 'mmap', 'mmap_index_only', and 'standard'.
+ mmapped i/o is substantially faster, but only practical on a 64bit machine 
(which notably does not include EC2 "small" instances) or relatively small 
datasets.  "auto", the safe choice, will enable mmapping on a 64bit JVM.  Other 
values are "mmap", "mmap_index_only" (which may allow you to get part of the 
benefits of mmap on a 32bit machine by mmapping only index files) and 
"standard". (The buffer size settings that follow only apply to standard, 
non-mmapped i/o.)
+ 
+ Default is: 'auto'.
  
   * dynamic_snitch and endpoint_snitch
  !EndPointSnitch: Setting this to the class that implements 
{{{IEndPointSnitch}}} which will see if two endpoints are in the same data 
center or on the same rack. Out of the box, Cassandra provides 
{{{org.apache.cassandra.locator.RackInferringSnitch}}}
@@ -58, +61 @@

  
  Dynamic Snitch is a boolean that controls the above snitch is wrapped with a 
dynamic snitch, which will monitor read latencies and avoid reading from hosts 
that have slowed.
  
+ Defaults are: 'org.apache.cassandra.locator.SimpleSnitch' and 'false'.
+ 
+  * listen_address
+ Commenting out this property leaves it up to 
{{{InetAddress.getLocalHost()}}}. This will always do the Right Thing *if* the 
node is properly configured (hostname, name resolution, etc), and the Right 
Thing is to use the address associated with the hostname (it might not be).  
+ 
+ Default is: 'localhost'. This must be changed for other nodes to contact this 
node.
+ 
   * memtable_flush_after_mins, memtable_operations_in_millions, and 
memtable_throughput_in_mb
- 60 0.3 64
+ 
+ Defaults are: '60' minutes, '0.3' millions, and '64' mb respectively.
  
   * partitioner
  Partitioner: any {{{IPartitioner}}} may be used, including your own as long 
as it is on the classpath.  Out of the box, Cassandra provides 
{{{org.apache.cassandra.dht.RandomPartitioner}}}, 
{{{org.apache.cassandra.dht.OrderPreservingPartitioner}}}, and 
{{{org.apache.cassandra.dht.CollatingOrderPreservingPartitioner}}}. 
(CollatingOPP colates according to EN,US rules, not naive byte ordering.  Use 
this as an example if you need locale-aware collation.) Range queries require 
using an order-preserving partitioner.
@@ -74, +85 @@

  
  With {{{OrderPreservingPartitioner}}} the keys themselves are used to place 
on the ring. One of the potential drawback of this approach is that if rows are 
inserted with sequential keys, all the write load will go to the same node.
  
+ Default is: 'org.apache.cassandra.dht.RandomPartitioner'. Manually assigning 
tokens is highly recommended to guarantee even load distribution.
+ 
   * seeds
  Never use a node's own address as a seed if you are bootstrapping it by 
setting autobootstrap to true!
  
   * thrift_framed_transport_size_in_mb
- 15 by default. Setting this to 0 is how to denote using unframed transport.
+ Setting this to '0' is how to denote using unframed (Buffered) transport.
+ 
+ Default is: '15' mb.
  
  == per-Keyspace Settings ==
   * replica_placement_strategy and replication_factor ===
@@ -86, +101 @@

  
  Note that the replication factor (RF) is the ''total'' number of nodes onto 
which the data will be placed.  So, a replication factor of 1 means that only 1 
node will have the data.  It does '''not''' mean that one ''other'' node will 
have the data.
  
+ Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'. RF 
of at least 2 is highly recommended, keeping in mind that your effective number 
of nodes is N / RF.
+ 
  == per-ColumnFamily Settings ==
    * comment and name
- You can describe a ColumnFamily in plain text by setting this property.
+ You can describe a ColumnFamily in plain text by setting these properties.
  
    * compare_with
  The {{{CompareWith}}} attribute tells Cassandra how to sort the columns for 
slicing operations.  The default is {{{BytesType}}}, which is a straightforward 
lexical comparison of the bytes in each column. Other options are 
{{{AsciiType}}}, {{{UTF8Type}}}, {{{LexicalUUIDType}}}, {{{TimeUUIDType}}}, and 
{{{LongType}}}.  You can also specify the fully-qualified class name to a class 
of your choice extending {{{org.apache.cassandra.db.marshal.AbstractType}}}.
@@ -104, +121 @@

  (To get the closest approximation to 0.3-style {{{supercolumns}}}, you would 
use {{{CompareWith=UTF8Type CompareSubcolumnsWith=LongType}}}.)
  
    * gc_grace_seconds
+ 
    * keys_cached and rows_cached
+ 
    * preload_row_cache
+ 
    * read_repair_chance
+ 
    * default_validation_class
  
  == per-Column Settings ==
    * validation_class
+ 
    * index_type
  
  
+ The ControlPort setting is deprecated in 0.6 and can be safely removed from 
configuration.
- == Partitioner ==
- == Miscellaneous ==
- Time to wait for a reply from other nodes before failing the command
- 
- {{{
- <RpcTimeoutInMillis>5000</RpcTimeoutInMillis>
- }}}
- Size to allow commitlog to grow to before creating a new segment
- 
- {{{
- <CommitLogRotationThresholdInMB>128</CommitLogRotationThresholdInMB>
- }}}
- Local hosts and ports
- 
- Address to bind to and tell other nodes to connect to.  You _must_ change 
this if you want multiple nodes to be able to communicate!
- 
- Leaving it blank leaves it up to {{{InetAddress.getLocalHost()}}}. This will 
always do the Right Thing *if* the node is properly configured (hostname, name 
resolution, etc), and the Right Thing is to use the address associated with the 
hostname (it might not be).  The ControlPort setting is deprecated in 0.6 and 
can be safely removed from configuration.
  
  {{{
  <ListenAddress>localhost</ListenAddress>
@@ -154, +160 @@

  <ThriftFramedTransport>false</ThriftFramedTransport>
  }}}
  == Memory, Disk, and Performance ==
+ Access mode.  
- Access mode.  mmapped i/o is substantially faster, but only practical on a 
64bit machine (which notably does not include EC2 "small" instances) or 
relatively small datasets.  "auto", the safe choice, will enable mmapping on a 
64bit JVM.  Other values are "mmap", "mmap_index_only" (which may allow you to 
get part of the benefits of mmap on a 32bit machine by mmapping only index 
files) and "standard". (The buffer size settings that follow only apply to 
standard, non-mmapped i/o.)
- 
  {{{
  <DiskAccessMode>auto</DiskAccessMode>
  }}}

[Cassandra Wiki] Update of "StorageConfiguration" by Jo nHermes

Reply via email to