[Cassandra Wiki] Update of "StorageConfiguration" by Jo nHermes

Apache Wiki Tue, 24 Aug 2010 16:09:36 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "StorageConfiguration" page has been changed by JonHermes.
http://wiki.apache.org/cassandra/StorageConfiguration?action=diff&rev1=32&rev2=33

--------------------------------------------------

  Default is: 'localhost'. This must be changed for other nodes to contact this 
node.
  
   * memtable_flush_after_mins, memtable_operations_in_millions, and 
memtable_throughput_in_mb
+ The maximum time to leave a dirty memtable unflushed. (While any affected 
columnfamilies have unflushed data from a commit log segment, that segment 
cannot be deleted.) This needs to be large enough that it won't cause a flush 
storm of all your memtables flushing at once because none has hit the size or 
count thresholds yet.  For production, a larger value such as 1440 is 
recommended.
+ 
+ The maximum number of columns in millions to store in memory per ColumnFamily 
before flushing to disk.  This is also a per-memtable setting.  Use with 
{{{MemtableSizeInMB}}} to tune memory usage.
+ 
+ The maximum amount of data to store in memory per !ColumnFamily before 
flushing to disk.  Note: There is one memtable per column family, and  this 
threshold is based solely on the amount of data stored, not actual heap memory 
usage (there is some overhead in indexing the columns). See also 
MemtableThresholds.
  
  Defaults are: '60' minutes, '0.3' millions, and '64' mb respectively.
  
@@ -108, +113 @@

  
  Note that the replication factor (RF) is the ''total'' number of nodes onto 
which the data will be placed.  So, a replication factor of 1 means that only 1 
node will have the data.  It does '''not''' mean that one ''other'' node will 
have the data.
  
- Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'. RF 
of at least 2 is highly recommended, keeping in mind that your effective number 
of nodes is N / RF.
+ Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'. RF 
of at least 2 is highly recommended, keeping in mind that your effective number 
of nodes is (N total nodes / RF).
  
  == per-ColumnFamily Settings ==
    * comment and name
@@ -126, +131 @@

   a. {{{TimeUUIDType}}}: a 128bit version 1 UUID, compared by timestamp
  
    * gc_grace_seconds
+ Time to wait before garbage-collection deletion markers.  Set this to a large 
enough value that you are confident that the deletion marker will be propagated 
to all replicas by the time this many seconds has elapsed, even in the face of 
hardware failures.  The default value is ten days.
+ 
+ Default is: '864000' seconds, or 10 days.
  
    * keys_cached and rows_cached
  
@@ -141, +149 @@

    * index_type
  
  
- The ControlPort setting is deprecated in 0.6 and can be safely removed from 
configuration.
  
- {{{
- <ListenAddress>localhost</ListenAddress>
- <!-- TCP port, for commands and data -->
- <StoragePort>7000</StoragePort>
- <!-- UDP port, for membership communications (gossip) -->
- <ControlPort>7001</ControlPort>
- }}}
- The address to bind the Thrift RPC service to. Unlike {{{ListenAddress}}} 
above, you *can* specify {{{0.0.0.0}}} here if you want Thrift to listen on all 
interfaces.
  
- Leaving this blank has the same effect it does for {{{ListenAddress}}}, (i.e. 
it will be based on the configured hostname of the node).
- 
- {{{
- <ThriftAddress>localhost</ThriftAddress>
- <!-- Thrift RPC port (the port clients connect to). -->
- <ThriftPort>9160</ThriftPort>
- }}}
- Whether or not to use a framed transport for Thrift. If this option is set to 
true then you must also use a framed transport on the  client-side, (framed and 
non-framed transports are not compatible).
- 
- {{{
- <ThriftFramedTransport>false</ThriftFramedTransport>
- }}}
- == Memory, Disk, and Performance ==
- Access mode.  
- {{{
- <DiskAccessMode>auto</DiskAccessMode>
- }}}
- Buffer size to use when performing contiguous column slices. Increase this to 
the size of the column slices you typically perform.  (Name-based queries are 
performed with a buffer size of  !ColumnIndexSizeInKB.)
- 
- {{{
- <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB>
- }}}
- Buffer size to use when flushing !memtables to disk. (Only one  !memtable is 
ever flushed at a time.) Increase (decrease) the index buffer size relative to 
the data buffer if you have few (many)  columns per key.  Bigger is only better 
_if_ your !memtables get large enough to use the space. (Check in your data 
directory after your app has been running long enough.)
- 
- {{{
- <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
- <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
- }}}
  Add column indexes to a row after its contents reach this size. Increase if 
your column values are large, or if you have a very large number of columns.  
The competing causes are, Cassandra has to deserialize this much of the row to 
read a single column, so you want it to be small - at least if you do many 
partial-row reads - but all the index data is read for each access, so you 
don't want to generate that wastefully either.
  
  {{{
  <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
  }}}
- The maximum amount of data to store in memory per !ColumnFamily before 
flushing to disk.  Note: There is one memtable per column family, and  this 
threshold is based solely on the amount of data stored, not actual heap memory 
usage (there is some overhead in indexing the columns). See also 
MemtableThresholds.
- 
- {{{
- <MemtableSizeInMB>64</MemtableSizeInMB>
- }}}
- The maximum number of columns in millions to store in memory per ColumnFamily 
before flushing to disk.  This is also a per-memtable setting.  Use with 
{{{MemtableSizeInMB}}} to tune memory usage.
- 
- {{{
- <MemtableObjectCountInMillions>0.1</MemtableObjectCountInMillions>
- }}}
- ''[New in 0.5''
- 
- The maximum time to leave a dirty memtable unflushed. (While any affected 
columnfamilies have unflushed data from a commit log segment, that segment 
cannot be deleted.) This needs to be large enough that it won't cause a flush 
storm of all your memtables flushing at once because none has hit the size or 
count thresholds yet.  For production, a larger value such as 1440 is 
recommended.
- 
- {{{
-   <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
- }}}
  '']''
  
- Time to wait before garbage-collection deletion markers.  Set this to a large 
enough value that you are confident that the deletion marker will be propagated 
to all replicas by the time this many seconds has elapsed, even in the face of 
hardware failures.  The default value is ten days.
- 
- {{{
- <GCGraceSeconds>864000</GCGraceSeconds>
- }}}
- Number of threads to run when flushing memtables to disk.  Set this to the 
number of disks you physically have in your machine allocated for 
{{{DataDirectory * 2}}}.  If you are planning to use the Binary Memtable, its 
recommended to increase the max threads to maintain a higher quality of service 
while under load when normal memtables are flushing to disk.
- 
- {{{
- <FlushMinThreads>1</FlushMinThreads>
- <FlushMaxThreads>1</FlushMaxThreads>
- }}}
- The threshold size in megabytes the binary memtable must grow to, before it's 
submitted for flushing to disk.
- 
- {{{
- <BinaryMemtableSizeInMB>256</BinaryMemtableSizeInMB>
- }}}
-

[Cassandra Wiki] Update of "StorageConfiguration" by Jo nHermes

Reply via email to