Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "StorageConfiguration" page has been changed by JonHermes. http://wiki.apache.org/cassandra/StorageConfiguration?action=diff&rev1=32&rev2=33 -------------------------------------------------- Default is: 'localhost'. This must be changed for other nodes to contact this node. * memtable_flush_after_mins, memtable_operations_in_millions, and memtable_throughput_in_mb + The maximum time to leave a dirty memtable unflushed. (While any affected columnfamilies have unflushed data from a commit log segment, that segment cannot be deleted.) This needs to be large enough that it won't cause a flush storm of all your memtables flushing at once because none has hit the size or count thresholds yet. For production, a larger value such as 1440 is recommended. + + The maximum number of columns in millions to store in memory per ColumnFamily before flushing to disk. This is also a per-memtable setting. Use with {{{MemtableSizeInMB}}} to tune memory usage. + + The maximum amount of data to store in memory per !ColumnFamily before flushing to disk. Note: There is one memtable per column family, and this threshold is based solely on the amount of data stored, not actual heap memory usage (there is some overhead in indexing the columns). See also MemtableThresholds. Defaults are: '60' minutes, '0.3' millions, and '64' mb respectively. @@ -108, +113 @@ Note that the replication factor (RF) is the ''total'' number of nodes onto which the data will be placed. So, a replication factor of 1 means that only 1 node will have the data. It does '''not''' mean that one ''other'' node will have the data. - Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'. RF of at least 2 is highly recommended, keeping in mind that your effective number of nodes is N / RF. + Defaults are: 'org.apache.cassandra.locator.RackUnawareStrategy' and '1'. RF of at least 2 is highly recommended, keeping in mind that your effective number of nodes is (N total nodes / RF). == per-ColumnFamily Settings == * comment and name @@ -126, +131 @@ a. {{{TimeUUIDType}}}: a 128bit version 1 UUID, compared by timestamp * gc_grace_seconds + Time to wait before garbage-collection deletion markers. Set this to a large enough value that you are confident that the deletion marker will be propagated to all replicas by the time this many seconds has elapsed, even in the face of hardware failures. The default value is ten days. + + Default is: '864000' seconds, or 10 days. * keys_cached and rows_cached @@ -141, +149 @@ * index_type - The ControlPort setting is deprecated in 0.6 and can be safely removed from configuration. - {{{ - <ListenAddress>localhost</ListenAddress> - <!-- TCP port, for commands and data --> - <StoragePort>7000</StoragePort> - <!-- UDP port, for membership communications (gossip) --> - <ControlPort>7001</ControlPort> - }}} - The address to bind the Thrift RPC service to. Unlike {{{ListenAddress}}} above, you *can* specify {{{0.0.0.0}}} here if you want Thrift to listen on all interfaces. - Leaving this blank has the same effect it does for {{{ListenAddress}}}, (i.e. it will be based on the configured hostname of the node). - - {{{ - <ThriftAddress>localhost</ThriftAddress> - <!-- Thrift RPC port (the port clients connect to). --> - <ThriftPort>9160</ThriftPort> - }}} - Whether or not to use a framed transport for Thrift. If this option is set to true then you must also use a framed transport on the client-side, (framed and non-framed transports are not compatible). - - {{{ - <ThriftFramedTransport>false</ThriftFramedTransport> - }}} - == Memory, Disk, and Performance == - Access mode. - {{{ - <DiskAccessMode>auto</DiskAccessMode> - }}} - Buffer size to use when performing contiguous column slices. Increase this to the size of the column slices you typically perform. (Name-based queries are performed with a buffer size of !ColumnIndexSizeInKB.) - - {{{ - <SlicedBufferSizeInKB>64</SlicedBufferSizeInKB> - }}} - Buffer size to use when flushing !memtables to disk. (Only one !memtable is ever flushed at a time.) Increase (decrease) the index buffer size relative to the data buffer if you have few (many) columns per key. Bigger is only better _if_ your !memtables get large enough to use the space. (Check in your data directory after your app has been running long enough.) - - {{{ - <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB> - <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB> - }}} Add column indexes to a row after its contents reach this size. Increase if your column values are large, or if you have a very large number of columns. The competing causes are, Cassandra has to deserialize this much of the row to read a single column, so you want it to be small - at least if you do many partial-row reads - but all the index data is read for each access, so you don't want to generate that wastefully either. {{{ <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB> }}} - The maximum amount of data to store in memory per !ColumnFamily before flushing to disk. Note: There is one memtable per column family, and this threshold is based solely on the amount of data stored, not actual heap memory usage (there is some overhead in indexing the columns). See also MemtableThresholds. - - {{{ - <MemtableSizeInMB>64</MemtableSizeInMB> - }}} - The maximum number of columns in millions to store in memory per ColumnFamily before flushing to disk. This is also a per-memtable setting. Use with {{{MemtableSizeInMB}}} to tune memory usage. - - {{{ - <MemtableObjectCountInMillions>0.1</MemtableObjectCountInMillions> - }}} - ''[New in 0.5'' - - The maximum time to leave a dirty memtable unflushed. (While any affected columnfamilies have unflushed data from a commit log segment, that segment cannot be deleted.) This needs to be large enough that it won't cause a flush storm of all your memtables flushing at once because none has hit the size or count thresholds yet. For production, a larger value such as 1440 is recommended. - - {{{ - <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes> - }}} '']'' - Time to wait before garbage-collection deletion markers. Set this to a large enough value that you are confident that the deletion marker will be propagated to all replicas by the time this many seconds has elapsed, even in the face of hardware failures. The default value is ten days. - - {{{ - <GCGraceSeconds>864000</GCGraceSeconds> - }}} - Number of threads to run when flushing memtables to disk. Set this to the number of disks you physically have in your machine allocated for {{{DataDirectory * 2}}}. If you are planning to use the Binary Memtable, its recommended to increase the max threads to maintain a higher quality of service while under load when normal memtables are flushing to disk. - - {{{ - <FlushMinThreads>1</FlushMinThreads> - <FlushMaxThreads>1</FlushMaxThreads> - }}} - The threshold size in megabytes the binary memtable must grow to, before it's submitted for flushing to disk. - - {{{ - <BinaryMemtableSizeInMB>256</BinaryMemtableSizeInMB> - }}} -
