Add Change Data Capture documentation

Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/51b939c9
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/51b939c9
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/51b939c9

Branch: refs/heads/trunk
Commit: 51b939c91db5d1a7664d76c8f57160f2570ee1dd
Parents: 7bf837c
Author: Josh McKenzie <[email protected]>
Authored: Mon Jun 20 13:38:00 2016 -0400
Committer: Sylvain Lebresne <[email protected]>
Committed: Tue Jun 21 14:12:59 2016 +0200

----------------------------------------------------------------------
 doc/source/operations.rst | 75 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/cassandra/blob/51b939c9/doc/source/operations.rst
----------------------------------------------------------------------
diff --git a/doc/source/operations.rst b/doc/source/operations.rst
index 9094766..d7fcafb 100644
--- a/doc/source/operations.rst
+++ b/doc/source/operations.rst
@@ -338,7 +338,6 @@ There is a number of common options for all the compaction 
strategies;
 ``enabled`` (default: true)
     Whether minor compactions should run. Note that you can have 'enabled': 
true as a compaction option and then do
     'nodetool enableautocompaction' to start running compactions.
-    Default true.
 ``tombstone_threshold`` (default: 0.2)
     How much of the sstable should be tombstones for us to consider doing a 
single sstable compaction of that sstable.
 ``tombstone_compaction_interval`` (default: 86400s (1 day))
@@ -738,7 +737,7 @@ similar text columns (such as repeated JSON blobs) often 
compress very well.
 Operational Impact
 ^^^^^^^^^^^^^^^^^^
 
-- Compression metadata is stored offheap and scales with data on disk.  This 
often requires 1-3GB of offheap RAM per
+- Compression metadata is stored off-heap and scales with data on disk.  This 
often requires 1-3GB of off-heap RAM per
   terabyte of data on disk, though the exact usage varies with 
``chunk_length_in_kb`` and compression ratios.
 
 - Streaming operations involve compressing and decompressing data on 
compressed tables - in some code paths (such as
@@ -754,6 +753,78 @@ Advanced Use
 Advanced users can provide their own compression class by implementing the 
interface at
 ``org.apache.cassandra.io.compress.ICompressor``.
 
+Change Data Capture
+-------------------
+
+Overview
+^^^^^^^^
+
+Change data capture (CDC) provides a mechanism to flag specific tables for 
archival as well as rejecting writes to those
+tables once a configurable size-on-disk for the combined flushed and unflushed 
CDC-log is reached. An operator can
+enable CDC on a table by setting the table property ``cdc=true`` (either when 
:ref:`creating the table
+<create-table-statement>` or :ref:`altering it <alter-table-statement>`), 
after which any CommitLogSegments containing
+data for a CDC-enabled table are moved to the directory specified in 
``cassandra.yaml`` on segment discard. A threshold
+of total disk space allowed is specified in the yaml at which time newly 
allocated CommitLogSegments will not allow CDC
+data until a consumer parses and removes data from the destination archival 
directory.
+
+Configuration
+^^^^^^^^^^^^^
+
+Enabling or disable CDC on a table
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+CDC is enable or disable through the `cdc` table property, for instance::
+
+    CREATE TABLE foo (a int, b text, PRIMARY KEY(a)) WITH cdc=true;
+
+    ALTER TABLE foo WITH cdc=true;
+
+    ALTER TABLE foo WITH cdc=false;
+
+cassandra.yaml parameters
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following `cassandra.yaml` are available for CDC:
+
+``cdc_enabled`` (default: false)
+   Enable or disable CDC operations node-wide.
+``cdc_raw_directory`` (default: ``$CASSANDRA_HOME/data/cdc_raw``)
+   Destination for CommitLogSegments to be moved after all corresponding 
memtables are flushed.
+``cdc_free_space_in_mb``: (default: min of 4096 and 1/8th volume space)
+   Calculated as sum of all active CommitLogSegments that permit CDC + all 
flushed CDC segments in
+   ``cdc_raw_directory``.
+``cdc_free_space_check_interval_ms`` (default: 250)
+   When at capacity, we limit the frequency with which we re-calculate the 
space taken up by ``cdc_raw_directory`` to
+   prevent burning CPU cycles unnecessarily. Default is to check 4 times per 
second.
+
+.. _reading-commitlogsegments:
+
+Reading CommitLogSegments
+^^^^^^^^^^^^^^^^^^^^^^^^^
+This implementation included a refactor of CommitLogReplayer into 
`CommitLogReader.java
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java>`__.
+Usage is `fairly straightforward
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L132-L140>`__
+with a `variety of signatures
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java#L71-L103>`__
+available for use. In order to handle mutations read from disk, implement 
`CommitLogReadHandler
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReadHandler.java>`__.
+
+Warnings
+^^^^^^^^
+
+**Do not enable CDC without some kind of consumption process in-place.**
+
+The initial implementation of Change Data Capture does not include a parser 
(see :ref:`reading-commitlogsegments` above)
+so, if CDC is enabled on a node and then on a table, the 
``cdc_free_space_in_mb`` will fill up and then writes to
+CDC-enabled tables will be rejected unless some consumption process is in 
place.
+
+Further Reading
+^^^^^^^^^^^^^^^
+
+- `Design doc 
<https://docs.google.com/document/d/1ZxCWYkeZTquxsvf5hdPc0fiUnUHna8POvgt6TIzML4Y/edit>`__
+- `JIRA ticket <https://issues.apache.org/jira/browse/CASSANDRA-8844>`__
+
 Backups
 -------
 

Reply via email to