[Cassandra Wiki] Update of "StorageConfiguration" by tu xracer69

Apache Wiki Fri, 13 Nov 2009 07:42:03 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "StorageConfiguration" page has been changed by tuxracer69.
http://wiki.apache.org/cassandra/StorageConfiguration?action=diff&rev1=1&rev2=2

--------------------------------------------------

  }}}
  
  == Keyspaces and ColumnFamilies ==
- Keyspaces and !ColumnFamilies: A !ColumnFamily is the Cassandra concept 
closest to a relational table.  !Keyspaces are separate groups of 
!ColumnFamilies.  Except in very unusual circumstances you will have one 
Keyspace per application.
+ Keyspaces and {{{ColumnFamilies}}}: A {{{ColumnFamily}}} is the Cassandra 
concept closest to a relational table.  {{{Keyspaces}}} are separate groups of 
{{{ColumnFamilies}}}.  Except in very unusual circumstances you will have one 
Keyspace per application.
  
  There is an implicit keyspace named 'system' for Cassandra internals.
  
@@ -21, +21 @@

   <Keyspace Name="Keyspace1">
  }}}
  
- The !CompareWith attribute tells Cassandra how to sort the columns for 
slicing operations.  The default is !BytesType, which is a straightforward 
lexical comparison of the bytes in each column. Other options are !AsciiType, 
!UTF8Type, !LexicalUUIDType, !TimeUUIDType, and !LongType.  You can also 
specify the fully-qualified class name to a class of your choice extending 
org.apache.cassandra.db.marshal.AbstractType.
+ The {{{CompareWith}}} attribute tells Cassandra how to sort the columns for 
slicing operations.  The default is {{{BytesType}}}, which is a straightforward 
lexical comparison of the bytes in each column. Other options are 
{{{AsciiType}}}, {{{UTF8Type}}}, {{{LexicalUUIDType}}}, {{{TimeUUIDType}}}, and 
{{{LongType}}}.  You can also specify the fully-qualified class name to a class 
of your choice extending {{{org.apache.cassandra.db.marshal.AbstractType}}}.
  
- !SuperColumns have a similar !CompareSubcolumnsWith attribute.
+  * {{{SuperColumns}}} have a similar {{{CompareSubcolumnsWith}}} attribute.
+  * {{{BytesType}}}: Simple sort by byte value.  No validation is performed. 
+  * {{{AsciiType}}}: Like {{{BytesType}}}, but validates that the input can be 
parsed as US-ASCII.
+  * {{{UTF8Type}}}: A string encoded as UTF8 
+  * {{{LongType}}}: A 64bit long 
+  * {{{LexicalUUIDType}}}: A 128bit UUID, compared lexically (by byte value) 
+  * {{{TimeUUIDType}}}: a 128bit version 1 UUID, compared by timestamp
  
- BytesType: Simple sort by byte value.  No validation is performed. 
!AsciiType: Like !BytesType, but validates that the input can be parsed as 
US-ASCII.
- 
- UTF8Type: A string encoded as UTF8 !LongType: A 64bit long !LexicalUUIDType: 
A 128bit UUID, compared lexically (by byte value) T!imeUUIDType: a 128bit 
version 1 !UUID, compared by !timestamp
- 
- (To get the closest approximation to 0.3-style !supercolumns, you would use 
!CompareWith=UTF8Type !CompareSubcolumnsWith=!LongType.)
+ (To get the closest approximation to 0.3-style {{{supercolumns}}}, you would 
use {{{CompareWith=UTF8Type CompareSubcolumnsWith=LongType}}}.)
  
- If !FlushPeriodInMinutes is configured and positive, it will be flushed to 
disk with that period whether it is dirty or not.  This is intended for 
lightly-used !columnfamilies so that they do not prevent !commitlog segments 
from being purged.
+ If {{{FlushPeriodInMinutes}}} is configured and positive, it will be flushed 
to disk with that period whether it is dirty or not.  This is intended for 
lightly-used {{{columnfamilies}}} so that they do not prevent commitlog 
segments from being purged.
  
  {{{
  <ColumnFamily CompareWith="BytesType"
-  Name="Standard1"
+        Name="Standard1"
-   FlushPeriodInMinutes="60"/>
+        FlushPeriodInMinutes="60"/>
-  <ColumnFamily CompareWith="UTF8Type" Name="Standard2"/> <ColumnFamily 
CompareWith="TimeUUIDType" Name="StandardByUUID1"/> <ColumnFamily 
ColumnType="Super"
- CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" Name="Super1"/>
- </Keyspace>
- </Keyspaces>
+ <ColumnFamily CompareWith="UTF8Type" 
+        Name="Standard2"/> 
+ <ColumnFamily CompareWith="TimeUUIDType" 
+        Name="StandardByUUID1"/> 
+ <ColumnFamily ColumnType="Super"
+        CompareWith="UTF8Type" 
+        CompareSubcolumnsWith="UTF8Type" 
+        Name="Super1"/>
  }}}
  
  == Partitioner ==
- Partitioner: any !IPartitioner may be used, including your own as long as it 
is on the !classpath.  Out of the box, Cassandra provides 
org.apache.cassandra.dht.RandomPartitioner, 
org.apache.cassandra.dht.OrderPreservingPartitioner, and 
org.apache.cassandra.dht.CollatingOrderPreservingPartitioner. (CollatingOPP 
colates according to EN,US rules, not naive byte ordering.  Use this as an 
example if you need locale-aware collation.) Range queries require using an 
order-preserving partitioner.
+ Partitioner: any {{{IPartitioner}}} may be used, including your own as long 
as it is on the classpath.  Out of the box, Cassandra provides 
{{{org.apache.cassandra.dht.RandomPartitioner}}}, 
{{{org.apache.cassandra.dht.OrderPreservingPartitioner}}}, and 
{{{org.apache.cassandra.dht.CollatingOrderPreservingPartitioner}}}. 
(CollatingOPP colates according to EN,US rules, not naive byte ordering.  Use 
this as an example if you need locale-aware collation.) Range queries require 
using an order-preserving partitioner.
  
  Achtung!  Changing this parameter requires wiping your data directories, 
since the partitioner can modify the !sstable on-disk format.
  
@@ -56, +62 @@

  
  If you are using an order-preserving partitioner and you know your key 
distribution, you can specify the token for this node to use. (Keys are sent to 
the node with the "closest" token, so distributing your tokens equally along 
the key distribution space will spread keys evenly across your cluster.)  This 
setting is only checked the first time a node is started.
  
- This can also be useful with RandomPartitioner to force equal spacing of 
tokens around the hash space, especially for clusters with a small number of 
nodes.
+ This can also be useful with {{{RandomPartitioner}}} to force equal spacing 
of tokens around the hash space, especially for clusters with a small number of 
nodes.
  
  {{{
  <InitialToken></InitialToken>
  }}}
  
  == EndPointSnitch ==
- !EndPointSnitch: Setting this to the class that implements !IEndPointSnitch 
which will see if two endpoints are in the same data center or on the same 
rack. Out of the box, Cassandra provides 
org.apache.cassandra.locator.EndPointSnitch
+ !EndPointSnitch: Setting this to the class that implements 
{{{IEndPointSnitch}}} which will see if two endpoints are in the same data 
center or on the same rack. Out of the box, Cassandra provides 
{{{org.apache.cassandra.locator.EndPointSnitch}}}
  
  {{{
  <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
  }}}
  
  == ReplicaPlacementStrategy ==
- Strategy: Setting this to the class that implements IReplicaPlacementStrategy 
will change the way the node picker works. Out of the box, Cassandra provides 
org.apache.cassandra.locator.RackUnawareStrategy and 
org.apache.cassandra.locator.RackAwareStrategy (place one replica in a 
different datacenter, and the others on different racks in the same one.)
+ Strategy: Setting this to the class that implements 
{{{IReplicaPlacementStrategy}}} will change the way the node picker works. Out 
of the box, Cassandra provides 
{{{org.apache.cassandra.locator.RackUnawareStrategy}}} and 
{{{org.apache.cassandra.locator.RackAwareStrategy}}} (place one replica in a 
different datacenter, and the others on different racks in the same one.)
  
  {{{
  
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
@@ -83, +89 @@

  }}}
  
  == Directories ==
- Directories: Specify where Cassandra should store different data on disk.  
Keep the data disks and the CommitLog disks separate for best performance
+ Directories: Specify where Cassandra should store different data on disk.  
Keep the data disks and the {{{CommitLog}}} disks separate for best 
performance. See also [[FAQ#what_kind_of_hardware_should_i_use|what kind of 
hardware should I use?]]
  
  {{{
- <CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory> 
<DataFileDirectories>
+ <CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory> 
+ <DataFileDirectories>
- <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
+       <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
  </DataFileDirectories> 
  <CalloutLocation>/var/lib/cassandra/callouts</CalloutLocation> 
<BootstrapFileDirectory>/var/lib/cassandra/bootstrap</BootstrapFileDirectory> 
<StagingFileDirectory>/var/lib/cassandra/staging</StagingFileDirectory>
  }}}
@@ -118, +125 @@

  
  Address to bind to and tell other nodes to connect to.  You _must_ change 
this if you want multiple nodes to be able to communicate!
  
- Leaving it blank leaves it up to InetAddress.getLocalHost(). This will always 
do the Right Thing *if* the node is properly configured (hostname, name 
resolution, etc), and the Right Thing is to use the address associated with the 
hostname (it might not be).
+ Leaving it blank leaves it up to {{{InetAddress.getLocalHost()}}}. This will 
always do the Right Thing *if* the node is properly configured (hostname, name 
resolution, etc), and the Right Thing is to use the address associated with the 
hostname (it might not be).
  
  {{{
  <ListenAddress>localhost</ListenAddress> 
@@ -128, +135 @@

  <ControlPort>7001</ControlPort>
  }}}
  
- The address to bind the Thrift RPC service to. Unlike ListenAddress above, 
you *can* specify 0.0.0.0 here if you want Thrift to listen on all interfaces.
+ The address to bind the Thrift RPC service to. Unlike {{{ListenAddress}}} 
above, you *can* specify {{{0.0.0.0}}} here if you want Thrift to listen on all 
interfaces.
  
- Leaving this blank has the same effect it does for ListenAddress, (i.e. it 
will be based on the configured hostname of the node).
+ Leaving this blank has the same effect it does for {{{ListenAddress}}}, (i.e. 
it will be based on the configured hostname of the node).
  
  {{{
  <ThriftAddress>localhost</ThriftAddress> 
@@ -164, +171 @@

  <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
  }}}
  
- The maximum amount of data to store in memory per !ColumnFamily before 
flushing to disk.  Note: There is one memtable per column family, and  this 
threshold is based solely on the amount of data stored, not actual heap memory 
usage (there is some overhead in indexing the columns).
+ The maximum amount of data to store in memory per !ColumnFamily before 
flushing to disk.  Note: There is one memtable per column family, and  this 
threshold is based solely on the amount of data stored, not actual heap memory 
usage (there is some overhead in indexing the columns). See also 
[[MemtableThresholds|MemtableThresholds]].
  
  {{{
  <MemtableSizeInMB>64</MemtableSizeInMB>
- }}
+ }}}
  
- The maximum number of columns in millions to store in memory per ColumnFamily 
before flushing to disk.  This is also a per-memtable setting.  Use with 
MemtableSizeInMB to tune memory usage.
+ The maximum number of columns in millions to store in memory per ColumnFamily 
before flushing to disk.  This is also a per-memtable setting.  Use with 
{{{MemtableSizeInMB}}} to tune memory usage.
  
  {{{
  <MemtableObjectCountInMillions>0.1</MemtableObjectCountInMillions>
  }}}
+ 
  Unlike most systems, in Cassandra writes are faster than reads, so you can 
afford more of those in parallel.  A good rule of thumb is 2 concurrent reads 
per processor core.  Increase ConcurrentWrites to the number of clients writing 
at once if you enable CommitLogSync + CommitLogSyncDelay.
  
  {{{
@@ -183, +191 @@

  
  !CommitLogSync may be either "periodic" or "batch."  When in batch mode, 
Cassandra won't ack writes until the commit log has been !fsynced to disk.  It 
will wait up to !CommitLogSyncBatchWindowInMS milliseconds for other writes, 
before performing the sync.
  
- This is less necessary in Cassandra than in traditional databases since 
replication reduces the odds of losing data from a failure after writing the 
log entry but before it actually reaches the disk. So the other option is 
"timed," where writes may be acked immediately and the CommitLog is simply 
synced every CommitLogSyncPeriodInMS milliseconds.
+ This is less necessary in Cassandra than in traditional databases since 
replication reduces the odds of losing data from a failure after writing the 
log entry but before it actually reaches the disk. So the other option is 
"timed," where writes may be acked immediately and the CommitLog is simply 
synced every {{{CommitLogSyncPeriodInMS}}} milliseconds.
  
  {{{
  <CommitLogSync>periodic</CommitLogSync>
@@ -203, +211 @@

  {{{
  <GCGraceSeconds>864000</GCGraceSeconds>
  }}}
- Number of threads to run when flushing memtables to disk.  Set this to the 
number of disks you physically have in your machine allocated for DataDirectory 
* 2.  If you are planning to use the Binary Memtable, its recommended to 
increase the max threads to maintain a higher quality of service while under 
load when normal memtables are flushing to disk.
+ Number of threads to run when flushing memtables to disk.  Set this to the 
number of disks you physically have in your machine allocated for 
{{{DataDirectory * 2}}}.  If you are planning to use the Binary Memtable, its 
recommended to increase the max threads to maintain a higher quality of service 
while under load when normal memtables are flushing to disk.
  
  {{{
  <FlushMinThreads>1</FlushMinThreads> <FlushMaxThreads>1</FlushMaxThreads>

[Cassandra Wiki] Update of "StorageConfiguration" by tu xracer69

Reply via email to