svn commit: r1757419 [6/29] - in /cassandra/site/src/doc: ./ 3.10/ 3.10/_images/ 3.10/_sources/ 3.10/_sources/architecture/ 3.10/_sources/configuration/ 3.10/_sources/cql/ 3.10/_sources/data_modeling/ 3.10/_sources/development/ 3.10/_sources/faq/ 3.10/...

tylerhobbs Tue, 23 Aug 2016 12:26:28 -0700

Added: cassandra/site/src/doc/3.10/_sources/getting_started/installing.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/getting_started/installing.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/getting_started/installing.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/getting_started/installing.txt Tue Aug 
23 19:25:17 2016
@@ -0,0 +1,101 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Installing Cassandra
+--------------------
+
+Prerequisites
+^^^^^^^^^^^^^
+
+- The latest version of Java 8, either the `Oracle Java Standard Edition 8
+  <http://www.oracle.com/technetwork/java/javase/downloads/index.html>`__ or 
`OpenJDK 8 <http://openjdk.java.net/>`__. To
+  verify that you have the correct version of java installed, type ``java 
-version``.
+
+- For using cqlsh, the latest version of `Python 2.7 
<https://www.python.org/downloads/>`__. To verify that you have
+  the correct version of Python installed, type ``python --version``.
+
+Installation from binary tarball files
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- Download the latest stable release from the `Apache Cassandra downloads 
website <http://cassandra.apache.org/download/>`__.
+
+- Untar the file somewhere, for example:
+
+::
+
+    tar -xvf apache-cassandra-3.6-bin.tar.gz cassandra
+
+The files will be extracted into ``apache-cassandra-3.6``, you need to 
substitute 3.6 with the release number that you
+have downloaded.
+
+- Optionally add ``apache-cassandra-3.6\bin`` to your path.
+- Start Cassandra in the foreground by invoking ``bin/cassandra -f`` from the 
command line. Press "Control-C" to stop
+  Cassandra. Start Cassandra in the background by invoking ``bin/cassandra`` 
from the command line. Invoke ``kill pid``
+  or ``pkill -f CassandraDaemon`` to stop Cassandra, where pid is the 
Cassandra process id, which you can find for
+  example by invoking ``pgrep -f CassandraDaemon``.
+- Verify that Cassandra is running by invoking ``bin/nodetool status`` from 
the command line.
+- Configuration files are located in the ``conf`` sub-directory.
+- Since Cassandra 2.1, log and data directories are located in the ``logs`` 
and ``data`` sub-directories respectively.
+  Older versions defaulted to ``/var/log/cassandra`` and 
``/var/lib/cassandra``. Due to this, it is necessary to either
+  start Cassandra with root privileges or change ``conf/cassandra.yaml`` to 
use directories owned by the current user,
+  as explained below in the section on changing the location of directories.
+
+Installation from Debian packages
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- Add the Apache repository of Cassandra to 
``/etc/apt/sources.list.d/cassandra.sources.list``, for example for version
+  3.6:
+
+::
+
+    echo "deb http://www.apache.org/dist/cassandra/debian 36x main" | sudo tee 
-a /etc/apt/sources.list.d/cassandra.sources.list
+
+- Update the repositories:
+
+::
+
+    sudo apt-get update
+
+- If you encounter this error:
+
+::
+
+    GPG error: http://www.apache.org 36x InRelease: The following signatures 
couldn't be verified because the public key is not available: NO_PUBKEY 
749D6EEC0353B12C
+
+Then add the public key 749D6EEC0353B12C as follows:
+
+::
+
+    gpg --keyserver pgp.mit.edu --recv-keys 749D6EEC0353B12C
+    gpg --export --armor 749D6EEC0353B12C | sudo apt-key add -
+
+and repeat ``sudo apt-get update``. The actual key may be different, you get 
it from the error message itself. For a
+full list of Apache contributors public keys, you can refer to `this link 
<https://www.apache.org/dist/cassandra/KEYS>`__.
+
+- Install Cassandra:
+
+::
+
+    sudo apt-get install cassandra
+
+- You can start Cassandra with ``sudo service cassandra start`` and stop it 
with ``sudo service cassandra stop``.
+  However, normally the service will start automatically. For this reason be 
sure to stop it if you need to make any
+  configuration changes.
+- Verify that Cassandra is running by invoking ``nodetool status`` from the 
command line.
+- The default location of configuration files is ``/etc/cassandra``.
+- The default location of log and data directories is ``/var/log/cassandra/`` 
and ``/var/lib/cassandra``.


Added: cassandra/site/src/doc/3.10/_sources/getting_started/querying.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/getting_started/querying.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/getting_started/querying.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/getting_started/querying.txt Tue Aug 
23 19:25:17 2016
@@ -0,0 +1,52 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Inserting and querying
+----------------------
+
+The API to Cassandra is :ref:`CQL <cql>`, the Cassandra Query Language. To use 
CQL, you will need to connect to the
+cluster, which can be done:
+
+- either using cqlsh,
+- or through a client driver for Cassandra.
+
+CQLSH
+^^^^^
+
+cqlsh is a command line shell for interacting with Cassandra through CQL. It 
is shipped with every Cassandra package,
+and can be found in the bin/ directory alongside the cassandra executable. It 
connects to the single node specified on
+the command line. For example::
+
+    $ bin/cqlsh localhost
+    Connected to Test Cluster at localhost:9042.
+    [cqlsh 5.0.1 | Cassandra 3.8 | CQL spec 3.4.2 | Native protocol v4]
+    Use HELP for help.
+    cqlsh> SELECT cluster_name, listen_address FROM system.local;
+
+     cluster_name | listen_address
+    --------------+----------------
+     Test Cluster |      127.0.0.1
+
+    (1 rows)
+    cqlsh>
+
+See the :ref:`cqlsh section <cqlsh>` for full documentation.
+
+Client drivers
+^^^^^^^^^^^^^^
+
+A lot of client drivers are provided by the Community and a list of known 
drivers is provided in :ref:`the next section
+<client-drivers>`. You should refer to the documentation of each drivers for 
more information on how to use them.

Added: cassandra/site/src/doc/3.10/_sources/index.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/index.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/index.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/index.txt Tue Aug 23 19:25:17 2016
@@ -0,0 +1,41 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Welcome to Apache Cassandra's documentation!
+============================================
+
+This is the official documentation for `Apache Cassandra 
<http://cassandra.apache.org>`__ |version|.  If you would like
+to contribute to this documentation, you are welcome to do so by submitting 
your contribution like any other patch
+following `these instructions 
<https://wiki.apache.org/cassandra/HowToContribute>`__.
+
+Contents:
+
+.. toctree::
+   :maxdepth: 2
+
+   getting_started/index
+   architecture/index
+   data_modeling/index
+   cql/index
+   configuration/index
+   operating/index
+   tools/index
+   troubleshooting/index
+   development/index
+   faq/index
+
+   bugs
+   contactus

Added: cassandra/site/src/doc/3.10/_sources/operating/backups.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/operating/backups.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/operating/backups.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/operating/backups.txt Tue Aug 23 
19:25:17 2016
@@ -0,0 +1,22 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Backups
+=======
+
+.. todo:: TODO

Added: cassandra/site/src/doc/3.10/_sources/operating/bloom_filters.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/operating/bloom_filters.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/operating/bloom_filters.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/operating/bloom_filters.txt Tue Aug 23 
19:25:17 2016
@@ -0,0 +1,65 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Bloom Filters
+-------------
+
+In the read path, Cassandra merges data on disk (in SSTables) with data in RAM 
(in memtables). To avoid checking every
+SSTable data file for the partition being requested, Cassandra employs a data 
structure known as a bloom filter.
+
+Bloom filters are a probabilistic data structure that allows Cassandra to 
determine one of two possible states: - The
+data definitely does not exist in the given file, or - The data probably 
exists in the given file.
+
+While bloom filters can not guarantee that the data exists in a given SSTable, 
bloom filters can be made more accurate
+by allowing them to consume more RAM. Operators have the opportunity to tune 
this behavior per table by adjusting the
+the ``bloom_filter_fp_chance`` to a float between 0 and 1.
+
+The default value for ``bloom_filter_fp_chance`` is 0.1 for tables using 
LeveledCompactionStrategy and 0.01 for all
+other cases.
+
+Bloom filters are stored in RAM, but are stored offheap, so operators should 
not consider bloom filters when selecting
+the maximum heap size.  As accuracy improves (as the 
``bloom_filter_fp_chance`` gets closer to 0), memory usage
+increases non-linearly - the bloom filter for ``bloom_filter_fp_chance = 
0.01`` will require about three times as much
+memory as the same table with ``bloom_filter_fp_chance = 0.1``.
+
+Typical values for ``bloom_filter_fp_chance`` are usually between 0.01 (1%) to 
0.1 (10%) false-positive chance, where
+Cassandra may scan an SSTable for a row, only to find that it does not exist 
on the disk. The parameter should be tuned
+by use case:
+
+- Users with more RAM and slower disks may benefit from setting the 
``bloom_filter_fp_chance`` to a numerically lower
+  number (such as 0.01) to avoid excess IO operations
+- Users with less RAM, more dense nodes, or very fast disks may tolerate a 
higher ``bloom_filter_fp_chance`` in order to
+  save RAM at the expense of excess IO operations
+- In workloads that rarely read, or that only perform reads by scanning the 
entire data set (such as analytics
+  workloads), setting the ``bloom_filter_fp_chance`` to a much higher number 
is acceptable.
+
+Changing
+^^^^^^^^
+
+The bloom filter false positive chance is visible in the ``DESCRIBE TABLE`` 
output as the field
+``bloom_filter_fp_chance``. Operators can change the value with an ``ALTER 
TABLE`` statement:
+::
+
+    ALTER TABLE keyspace.table WITH bloom_filter_fp_chance=0.01
+
+Operators should be aware, however, that this change is not immediate: the 
bloom filter is calculated when the file is
+written, and persisted on disk as the Filter component of the SSTable. Upon 
issuing an ``ALTER TABLE`` statement, new
+files on disk will be written with the new ``bloom_filter_fp_chance``, but 
existing sstables will not be modified until
+they are compacted - if an operator needs a change to 
``bloom_filter_fp_chance`` to take effect, they can trigger an
+SSTable rewrite using ``nodetool scrub`` or ``nodetool upgradesstables -a``, 
both of which will rebuild the sstables on
+disk, regenerating the bloom filters in the progress.

Added: cassandra/site/src/doc/3.10/_sources/operating/bulk_loading.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/operating/bulk_loading.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/operating/bulk_loading.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/operating/bulk_loading.txt Tue Aug 23 
19:25:17 2016
@@ -0,0 +1,24 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+.. _bulk-loading:
+
+Bulk Loading
+------------
+
+.. todo:: TODO

Added: cassandra/site/src/doc/3.10/_sources/operating/cdc.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/operating/cdc.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/operating/cdc.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/operating/cdc.txt Tue Aug 23 19:25:17 
2016
@@ -0,0 +1,89 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Change Data Capture
+-------------------
+
+Overview
+^^^^^^^^
+
+Change data capture (CDC) provides a mechanism to flag specific tables for 
archival as well as rejecting writes to those
+tables once a configurable size-on-disk for the combined flushed and unflushed 
CDC-log is reached. An operator can
+enable CDC on a table by setting the table property ``cdc=true`` (either when 
:ref:`creating the table
+<create-table-statement>` or :ref:`altering it <alter-table-statement>`), 
after which any CommitLogSegments containing
+data for a CDC-enabled table are moved to the directory specified in 
``cassandra.yaml`` on segment discard. A threshold
+of total disk space allowed is specified in the yaml at which time newly 
allocated CommitLogSegments will not allow CDC
+data until a consumer parses and removes data from the destination archival 
directory.
+
+Configuration
+^^^^^^^^^^^^^
+
+Enabling or disable CDC on a table
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+CDC is enable or disable through the `cdc` table property, for instance::
+
+    CREATE TABLE foo (a int, b text, PRIMARY KEY(a)) WITH cdc=true;
+
+    ALTER TABLE foo WITH cdc=true;
+
+    ALTER TABLE foo WITH cdc=false;
+
+cassandra.yaml parameters
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following `cassandra.yaml` are available for CDC:
+
+``cdc_enabled`` (default: false)
+   Enable or disable CDC operations node-wide.
+``cdc_raw_directory`` (default: ``$CASSANDRA_HOME/data/cdc_raw``)
+   Destination for CommitLogSegments to be moved after all corresponding 
memtables are flushed.
+``cdc_free_space_in_mb``: (default: min of 4096 and 1/8th volume space)
+   Calculated as sum of all active CommitLogSegments that permit CDC + all 
flushed CDC segments in
+   ``cdc_raw_directory``.
+``cdc_free_space_check_interval_ms`` (default: 250)
+   When at capacity, we limit the frequency with which we re-calculate the 
space taken up by ``cdc_raw_directory`` to
+   prevent burning CPU cycles unnecessarily. Default is to check 4 times per 
second.
+
+.. _reading-commitlogsegments:
+
+Reading CommitLogSegments
+^^^^^^^^^^^^^^^^^^^^^^^^^
+This implementation included a refactor of CommitLogReplayer into 
`CommitLogReader.java
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java>`__.
+Usage is `fairly straightforward
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L132-L140>`__
+with a `variety of signatures
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java#L71-L103>`__
+available for use. In order to handle mutations read from disk, implement 
`CommitLogReadHandler
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReadHandler.java>`__.
+
+Warnings
+^^^^^^^^
+
+**Do not enable CDC without some kind of consumption process in-place.**
+
+The initial implementation of Change Data Capture does not include a parser 
(see :ref:`reading-commitlogsegments` above)
+so, if CDC is enabled on a node and then on a table, the 
``cdc_free_space_in_mb`` will fill up and then writes to
+CDC-enabled tables will be rejected unless some consumption process is in 
place.
+
+Further Reading
+^^^^^^^^^^^^^^^
+
+- `Design doc 
<https://docs.google.com/document/d/1ZxCWYkeZTquxsvf5hdPc0fiUnUHna8POvgt6TIzML4Y/edit>`__
+- `JIRA ticket <https://issues.apache.org/jira/browse/CASSANDRA-8844>`__

Added: cassandra/site/src/doc/3.10/_sources/operating/compaction.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/operating/compaction.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/operating/compaction.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/operating/compaction.txt Tue Aug 23 
19:25:17 2016
@@ -0,0 +1,438 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+.. _compaction:
+
+Compaction
+----------
+
+Types of compaction
+^^^^^^^^^^^^^^^^^^^
+
+The concept of compaction is used for different kinds of operations in 
Cassandra, the common thing about these
+operations is that it takes one or more sstables and output new sstables. The 
types of compactions are;
+
+Minor compaction
+    triggered automatically in Cassandra.
+Major compaction
+    a user executes a compaction over all sstables on the node.
+User defined compaction
+    a user triggers a compaction on a given set of sstables.
+Scrub
+    try to fix any broken sstables. This can actually remove valid data if 
that data is corrupted, if that happens you
+    will need to run a full repair on the node.
+Upgradesstables
+    upgrade sstables to the latest version. Run this after upgrading to a new 
major version.
+Cleanup
+    remove any ranges this node does not own anymore, typically triggered on 
neighbouring nodes after a node has been
+    bootstrapped since that node will take ownership of some ranges from those 
nodes.
+Secondary index rebuild
+    rebuild the secondary indexes on the node.
+Anticompaction
+    after repair the ranges that were actually repaired are split out of the 
sstables that existed when repair started.
+Sub range compaction
+    It is possible to only compact a given sub range - this could be useful if 
you know a token that has been
+    misbehaving - either gathering many updates or many deletes. (``nodetool 
compact -st x -et y``) will pick
+    all sstables containing the range between x and y and issue a compaction 
for those sstables. For STCS this will
+    most likely include all sstables but with LCS it can issue the compaction 
for a subset of the sstables. With LCS
+    the resulting sstable will end up in L0.
+
+When is a minor compaction triggered?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+#  When an sstable is added to the node through flushing/streaming etc.
+#  When autocompaction is enabled after being disabled (``nodetool 
enableautocompaction``)
+#  When compaction adds new sstables.
+#  A check for new minor compactions every 5 minutes.
+
+Merging sstables
+^^^^^^^^^^^^^^^^
+
+Compaction is about merging sstables, since partitions in sstables are sorted 
based on the hash of the partition key it
+is possible to efficiently merge separate sstables. Content of each partition 
is also sorted so each partition can be
+merged efficiently.
+
+Tombstones and Garbage Collection (GC) Grace
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Why Tombstones
+~~~~~~~~~~~~~~
+
+When a delete request is received by Cassandra it does not actually remove the 
data from the underlying store. Instead
+it writes a special piece of data known as a tombstone. The Tombstone 
represents the delete and causes all values which
+occurred before the tombstone to not appear in queries to the database. This 
approach is used instead of removing values
+because of the distributed nature of Cassandra.
+
+Deletes without tombstones
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Imagine a three node cluster which has the value [A] replicated to every 
node.::
+
+    [A], [A], [A]
+
+If one of the nodes fails and and our delete operation only removes existing 
values we can end up with a cluster that
+looks like::
+
+    [], [], [A]
+
+Then a repair operation would replace the value of [A] back onto the two
+nodes which are missing the value.::
+
+    [A], [A], [A]
+
+This would cause our data to be resurrected even though it had been
+deleted.
+
+Deletes with Tombstones
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Starting again with a three node cluster which has the value [A] replicated to 
every node.::
+
+    [A], [A], [A]
+
+If instead of removing data we add a tombstone record, our single node failure 
situation will look like this.::
+
+    [A, Tombstone[A]], [A, Tombstone[A]], [A]
+
+Now when we issue a repair the Tombstone will be copied to the replica, rather 
than the deleted data being
+resurrected.::
+
+    [A, Tombstone[A]], [A, Tombstone[A]], [A, Tombstone[A]]
+
+Our repair operation will correctly put the state of the system to what we 
expect with the record [A] marked as deleted
+on all nodes. This does mean we will end up accruing Tombstones which will 
permanently accumulate disk space. To avoid
+keeping tombstones forever we have a parameter known as ``gc_grace_seconds`` 
for every table in Cassandra.
+
+The gc_grace_seconds parameter and Tombstone Removal
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The table level ``gc_grace_seconds`` parameter controls how long Cassandra 
will retain tombstones through compaction
+events before finally removing them. This duration should directly reflect the 
amount of time a user expects to allow
+before recovering a failed node. After ``gc_grace_seconds`` has expired the 
tombstone may be removed (meaning there will
+no longer be any record that a certain piece of data was deleted), but as a 
tombstone can live in one sstable and the
+data it covers in another, a compaction must also include both sstable for a 
tombstone to be removed. More precisely, to
+be able to drop an actual tombstone the following needs to be true;
+
+- The tombstone must be older than ``gc_grace_seconds``
+- If partition X contains the tombstone, the sstable containing the partition 
plus all sstables containing data older
+  than the tombstone containing X must be included in the same compaction. We 
don't need to care if the partition is in
+  an sstable if we can guarantee that all data in that sstable is newer than 
the tombstone. If the tombstone is older
+  than the data it cannot shadow that data.
+- If the option ``only_purge_repaired_tombstones`` is enabled, tombstones are 
only removed if the data has also been
+  repaired.
+
+If a node remains down or disconnected for longer than ``gc_grace_seconds`` 
it's deleted data will be repaired back to
+the other nodes and re-appear in the cluster. This is basically the same as in 
the "Deletes without Tombstones" section.
+Note that tombstones will not be removed until a compaction event even if 
``gc_grace_seconds`` has elapsed.
+
+The default value for ``gc_grace_seconds`` is 864000 which is equivalent to 10 
days. This can be set when creating or
+altering a table using ``WITH gc_grace_seconds``.
+
+TTL
+^^^
+
+Data in Cassandra can have an additional property called time to live - this 
is used to automatically drop data that has
+expired once the time is reached. Once the TTL has expired the data is 
converted to a tombstone which stays around for
+at least ``gc_grace_seconds``. Note that if you mix data with TTL and data 
without TTL (or just different length of the
+TTL) Cassandra will have a hard time dropping the tombstones created since the 
partition might span many sstables and
+not all are compacted at once.
+
+Fully expired sstables
+^^^^^^^^^^^^^^^^^^^^^^
+
+If an sstable contains only tombstones and it is guaranteed that that sstable 
is not shadowing data in any other sstable
+compaction can drop that sstable. If you see sstables with only tombstones 
(note that TTL:ed data is considered
+tombstones once the time to live has expired) but it is not being dropped by 
compaction, it is likely that other
+sstables contain older data. There is a tool called ``sstableexpiredblockers`` 
that will list which sstables are
+droppable and which are blocking them from being dropped. This is especially 
useful for time series compaction with
+``TimeWindowCompactionStrategy`` (and the deprecated 
``DateTieredCompactionStrategy``).
+
+Repaired/unrepaired data
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+With incremental repairs Cassandra must keep track of what data is repaired 
and what data is unrepaired. With
+anticompaction repaired data is split out into repaired and unrepaired 
sstables. To avoid mixing up the data again
+separate compaction strategy instances are run on the two sets of data, each 
instance only knowing about either the
+repaired or the unrepaired sstables. This means that if you only run 
incremental repair once and then never again, you
+might have very old data in the repaired sstables that block compaction from 
dropping tombstones in the unrepaired
+(probably newer) sstables.
+
+Data directories
+^^^^^^^^^^^^^^^^
+
+Since tombstones and data can live in different sstables it is important to 
realize that losing an sstable might lead to
+data becoming live again - the most common way of losing sstables is to have a 
hard drive break down. To avoid making
+data live tombstones and actual data are always in the same data directory. 
This way, if a disk is lost, all versions of
+a partition are lost and no data can get undeleted. To achieve this a 
compaction strategy instance per data directory is
+run in addition to the compaction strategy instances containing 
repaired/unrepaired data, this means that if you have 4
+data directories there will be 8 compaction strategy instances running. This 
has a few more benefits than just avoiding
+data getting undeleted:
+
+- It is possible to run more compactions in parallel - leveled compaction will 
have several totally separate levelings
+  and each one can run compactions independently from the others.
+- Users can backup and restore a single data directory.
+- Note though that currently all data directories are considered equal, so if 
you have a tiny disk and a big disk
+  backing two data directories, the big one will be limited the by the small 
one. One work around to this is to create
+  more data directories backed by the big disk.
+
+Single sstable tombstone compaction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When an sstable is written a histogram with the tombstone expiry times is 
created and this is used to try to find
+sstables with very many tombstones and run single sstable compaction on that 
sstable in hope of being able to drop
+tombstones in that sstable. Before starting this it is also checked how likely 
it is that any tombstones will actually
+will be able to be dropped how much this sstable overlaps with other sstables. 
To avoid most of these checks the
+compaction option ``unchecked_tombstone_compaction`` can be enabled.
+
+.. _compaction-options:
+
+Common options
+^^^^^^^^^^^^^^
+
+There is a number of common options for all the compaction strategies;
+
+``enabled`` (default: true)
+    Whether minor compactions should run. Note that you can have 'enabled': 
true as a compaction option and then do
+    'nodetool enableautocompaction' to start running compactions.
+``tombstone_threshold`` (default: 0.2)
+    How much of the sstable should be tombstones for us to consider doing a 
single sstable compaction of that sstable.
+``tombstone_compaction_interval`` (default: 86400s (1 day))
+    Since it might not be possible to drop any tombstones when doing a single 
sstable compaction we need to make sure
+    that one sstable is not constantly getting recompacted - this option 
states how often we should try for a given
+    sstable. 
+``log_all`` (default: false)
+    New detailed compaction logging, see :ref:`below 
<detailed-compaction-logging>`.
+``unchecked_tombstone_compaction`` (default: false)
+    The single sstable compaction has quite strict checks for whether it 
should be started, this option disables those
+    checks and for some usecases this might be needed.  Note that this does 
not change anything for the actual
+    compaction, tombstones are only dropped if it is safe to do so - it might 
just rewrite an sstable without being able
+    to drop any tombstones.
+``only_purge_repaired_tombstone`` (default: false)
+    Option to enable the extra safety of making sure that tombstones are only 
dropped if the data has been repaired.
+``min_threshold`` (default: 4)
+    Lower limit of number of sstables before a compaction is triggered. Not 
used for ``LeveledCompactionStrategy``.
+``max_threshold`` (default: 32)
+    Upper limit of number of sstables before a compaction is triggered. Not 
used for ``LeveledCompactionStrategy``.
+
+Further, see the section on each strategy for specific additional options.
+
+Compaction nodetool commands
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The :ref:`nodetool <nodetool>` utility provides a number of commands related 
to compaction:
+
+``enableautocompaction``
+    Enable compaction.
+``disableautocompaction``
+    Disable compaction.
+``setcompactionthroughput``
+    How fast compaction should run at most - defaults to 16MB/s, but note that 
it is likely not possible to reach this
+    throughput.
+``compactionstats``
+    Statistics about current and pending compactions.
+``compactionhistory``
+    List details about the last compactions.
+``setcompactionthreshold``
+    Set the min/max sstable count for when to trigger compaction, defaults to 
4/32.
+
+Switching the compaction strategy and options using JMX
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+It is possible to switch compaction strategies and its options on just a 
single node using JMX, this is a great way to
+experiment with settings without affecting the whole cluster. The mbean is::
+
+    
org.apache.cassandra.db:type=ColumnFamilies,keyspace=<keyspace_name>,columnfamily=<table_name>
+
+and the attribute to change is ``CompactionParameters`` or 
``CompactionParametersJson`` if you use jconsole or jmc. The
+syntax for the json version is the same as you would use in an :ref:`ALTER 
TABLE <alter-table-statement>` statement -
+for example::
+
+    { 'class': 'LeveledCompactionStrategy', 'sstable_size_in_mb': 123 }
+
+The setting is kept until someone executes an :ref:`ALTER TABLE 
<alter-table-statement>` that touches the compaction
+settings or restarts the node.
+
+.. _detailed-compaction-logging:
+
+More detailed compaction logging
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Enable with the compaction option ``log_all`` and a more detailed compaction 
log file will be produced in your log
+directory.
+
+.. _STCS:
+
+Size Tiered Compaction Strategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The basic idea of ``SizeTieredCompactionStrategy`` (STCS) is to merge sstables 
of approximately the same size. All
+sstables are put in different buckets depending on their size. An sstable is 
added to the bucket if size of the sstable
+is within ``bucket_low`` and ``bucket_high`` of the current average size of 
the sstables already in the bucket. This
+will create several buckets and the most interesting of those buckets will be 
compacted. The most interesting one is
+decided by figuring out which bucket's sstables takes the most reads.
+
+Major compaction
+~~~~~~~~~~~~~~~~
+
+When running a major compaction with STCS you will end up with two sstables 
per data directory (one for repaired data
+and one for unrepaired data). There is also an option (-s) to do a major 
compaction that splits the output into several
+sstables. The sizes of the sstables are approximately 50%, 25%, 12.5%... of 
the total size.
+
+.. _stcs-options:
+
+STCS options
+~~~~~~~~~~~~
+
+``min_sstable_size`` (default: 50MB)
+    Sstables smaller than this are put in the same bucket.
+``bucket_low`` (default: 0.5)
+    How much smaller than the average size of a bucket a sstable should be 
before not being included in the bucket. That
+    is, if ``bucket_low * avg_bucket_size < sstable_size`` (and the 
``bucket_high`` condition holds, see below), then
+    the sstable is added to the bucket.
+``bucket_high`` (default: 1.5)
+    How much bigger than the average size of a bucket a sstable should be 
before not being included in the bucket. That
+    is, if ``sstable_size < bucket_high * avg_bucket_size`` (and the 
``bucket_low`` condition holds, see above), then
+    the sstable is added to the bucket.
+
+Defragmentation
+~~~~~~~~~~~~~~~
+
+Defragmentation is done when many sstables are touched during a read.  The 
result of the read is put in to the memtable
+so that the next read will not have to touch as many sstables. This can cause 
writes on a read-only-cluster.
+
+.. _LCS:
+
+Leveled Compaction Strategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The idea of ``LeveledCompactionStrategy`` (LCS) is that all sstables are put 
into different levels where we guarantee
+that no overlapping sstables are in the same level. By overlapping we mean 
that the first/last token of a single sstable
+are never overlapping with other sstables. This means that for a SELECT we 
will only have to look for the partition key
+in a single sstable per level. Each level is 10x the size of the previous one 
and each sstable is 160MB by default. L0
+is where sstables are streamed/flushed - no overlap guarantees are given here.
+
+When picking compaction candidates we have to make sure that the compaction 
does not create overlap in the target level.
+This is done by always including all overlapping sstables in the next level. 
For example if we select an sstable in L3,
+we need to guarantee that we pick all overlapping sstables in L4 and make sure 
that no currently ongoing compactions
+will create overlap if we start that compaction. We can start many parallel 
compactions in a level if we guarantee that
+we wont create overlap. For L0 -> L1 compactions we almost always need to 
include all L1 sstables since most L0 sstables
+cover the full range. We also can't compact all L0 sstables with all L1 
sstables in a single compaction since that can
+use too much memory.
+
+When deciding which level to compact LCS checks the higher levels first (with 
LCS, a "higher" level is one with a higher
+number, L0 being the lowest one) and if the level is behind a compaction will 
be started in that level.
+
+Major compaction
+~~~~~~~~~~~~~~~~
+
+It is possible to do a major compaction with LCS - it will currently start by 
filling out L1 and then once L1 is full,
+it continues with L2 etc. This is sub optimal and will change to create all 
the sstables in a high level instead,
+CASSANDRA-11817.
+
+Bootstrapping
+~~~~~~~~~~~~~
+
+During bootstrap sstables are streamed from other nodes. The level of the 
remote sstable is kept to avoid many
+compactions after the bootstrap is done. During bootstrap the new node also 
takes writes while it is streaming the data
+from a remote node - these writes are flushed to L0 like all other writes and 
to avoid those sstables blocking the
+remote sstables from going to the correct level, we only do STCS in L0 until 
the bootstrap is done.
+
+STCS in L0
+~~~~~~~~~~
+
+If LCS gets very many L0 sstables reads are going to hit all (or most) of the 
L0 sstables since they are likely to be
+overlapping. To more quickly remedy this LCS does STCS compactions in L0 if 
there are more than 32 sstables there. This
+should improve read performance more quickly compared to letting LCS do its L0 
-> L1 compactions. If you keep getting
+too many sstables in L0 it is likely that LCS is not the best fit for your 
workload and STCS could work out better.
+
+Starved sstables
+~~~~~~~~~~~~~~~~
+
+If a node ends up with a leveling where there are a few very high level 
sstables that are not getting compacted they
+might make it impossible for lower levels to drop tombstones etc. For example, 
if there are sstables in L6 but there is
+only enough data to actually get a L4 on the node the left over sstables in L6 
will get starved and not compacted.  This
+can happen if a user changes sstable\_size\_in\_mb from 5MB to 160MB for 
example. To avoid this LCS tries to include
+those starved high level sstables in other compactions if there has been 25 
compaction rounds where the highest level
+has not been involved.
+
+.. _lcs-options:
+
+LCS options
+~~~~~~~~~~~
+
+``sstable_size_in_mb`` (default: 160MB)
+    The target compressed (if using compression) sstable size - the sstables 
can end up being larger if there are very
+    large partitions on the node.
+
+LCS also support the ``cassandra.disable_stcs_in_l0`` startup option 
(``-Dcassandra.disable_stcs_in_l0=true``) to avoid
+doing STCS in L0.
+
+.. _TWCS:
+
+Time Window CompactionStrategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``TimeWindowCompactionStrategy`` (TWCS) is designed specifically for workloads 
where it's beneficial to have data on
+disk grouped by the timestamp of the data, a common goal when the workload is 
time-series in nature or when all data is
+written with a TTL. In an expiring/TTL workload, the contents of an entire 
SSTable likely expire at approximately the
+same time, allowing them to be dropped completely, and space reclaimed much 
more reliably than when using
+``SizeTieredCompactionStrategy`` or ``LeveledCompactionStrategy``. The basic 
concept is that
+``TimeWindowCompactionStrategy`` will create 1 sstable per file for a given 
window, where a window is simply calculated
+as the combination of two primary options:
+
+``compaction_window_unit`` (default: DAYS)
+    A Java TimeUnit (MINUTES, HOURS, or DAYS).
+``compaction_window_size`` (default: 1)
+    The number of units that make up a window.
+
+Taken together, the operator can specify windows of virtually any size, and 
`TimeWindowCompactionStrategy` will work to
+create a single sstable for writes within that window. For efficiency during 
writing, the newest window will be
+compacted using `SizeTieredCompactionStrategy`.
+
+Ideally, operators should select a ``compaction_window_unit`` and 
``compaction_window_size`` pair that produces
+approximately 20-30 windows - if writing with a 90 day TTL, for example, a 3 
Day window would be a reasonable choice
+(``'compaction_window_unit':'DAYS','compaction_window_size':3``).
+
+TimeWindowCompactionStrategy Operational Concerns
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The primary motivation for TWCS is to separate data on disk by timestamp and 
to allow fully expired SSTables to drop
+more efficiently. One potential way this optimal behavior can be subverted is 
if data is written to SSTables out of
+order, with new data and old data in the same SSTable. Out of order data can 
appear in two ways:
+
+- If the user mixes old data and new data in the traditional write path, the 
data will be comingled in the memtables
+  and flushed into the same SSTable, where it will remain comingled.
+- If the user's read requests for old data cause read repairs that pull old 
data into the current memtable, that data
+  will be comingled and flushed into the same SSTable.
+
+While TWCS tries to minimize the impact of comingled data, users should 
attempt to avoid this behavior.  Specifically,
+users should avoid queries that explicitly set the timestamp via CQL ``USING 
TIMESTAMP``. Additionally, users should run
+frequent repairs (which streams data in such a way that it does not become 
comingled), and disable background read
+repair by setting the table's ``read_repair_chance`` and 
``dclocal_read_repair_chance`` to 0.
+
+Changing TimeWindowCompactionStrategy Options
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Operators wishing to enable ``TimeWindowCompactionStrategy`` on existing data 
should consider running a major compaction
+first, placing all existing data into a single (old) window. Subsequent newer 
writes will then create typical SSTables
+as expected.
+
+Operators wishing to change ``compaction_window_unit`` or 
``compaction_window_size`` can do so, but may trigger
+additional compactions as adjacent windows are joined together. If the window 
size is decrease d (for example, from 24
+hours to 12 hours), then the existing SSTables will not be modified - TWCS can 
not split existing SSTables into multiple
+windows.

Added: cassandra/site/src/doc/3.10/_sources/operating/compression.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/operating/compression.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/operating/compression.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/operating/compression.txt Tue Aug 23 
19:25:17 2016
@@ -0,0 +1,94 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Compression
+-----------
+
+Cassandra offers operators the ability to configure compression on a per-table 
basis. Compression reduces the size of
+data on disk by compressing the SSTable in user-configurable compression 
``chunk_length_in_kb``. Because Cassandra
+SSTables are immutable, the CPU cost of compressing is only necessary when the 
SSTable is written - subsequent updates
+to data will land in different SSTables, so Cassandra will not need to 
decompress, overwrite, and recompress data when
+UPDATE commands are issued. On reads, Cassandra will locate the relevant 
compressed chunks on disk, decompress the full
+chunk, and then proceed with the remainder of the read path (merging data from 
disks and memtables, read repair, and so
+on).
+
+Configuring Compression
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Compression is configured on a per-table basis as an optional argument to 
``CREATE TABLE`` or ``ALTER TABLE``. By
+default, three options are relevant:
+
+- ``class`` specifies the compression class - Cassandra provides three classes 
(``LZ4Compressor``,
+  ``SnappyCompressor``, and ``DeflateCompressor`` ). The default is 
``SnappyCompressor``.
+- ``chunk_length_in_kb`` specifies the number of kilobytes of data per 
compression chunk. The default is 64KB.
+- ``crc_check_chance`` determines how likely Cassandra is to verify the 
checksum on each compression chunk during
+  reads. The default is 1.0.
+
+Users can set compression using the following syntax:
+
+::
+
+    CREATE TABLE keyspace.table (id int PRIMARY KEY) WITH compression = 
{'class': 'LZ4Compressor'};
+
+Or
+
+::
+
+    ALTER TABLE keyspace.table WITH compression = {'class': 
'SnappyCompressor', 'chunk_length_in_kb': 128, 'crc_check_chance': 0.5};
+
+Once enabled, compression can be disabled with ``ALTER TABLE`` setting 
``enabled`` to ``false``:
+
+::
+
+    ALTER TABLE keyspace.table WITH compression = {'enabled':'false'};
+
+Operators should be aware, however, that changing compression is not 
immediate. The data is compressed when the SSTable
+is written, and as SSTables are immutable, the compression will not be 
modified until the table is compacted. Upon
+issuing a change to the compression options via ``ALTER TABLE``, the existing 
SSTables will not be modified until they
+are compacted - if an operator needs compression changes to take effect 
immediately, the operator can trigger an SSTable
+rewrite using ``nodetool scrub`` or ``nodetool upgradesstables -a``, both of 
which will rebuild the SSTables on disk,
+re-compressing the data in the process.
+
+Benefits and Uses
+^^^^^^^^^^^^^^^^^
+
+Compression's primary benefit is that it reduces the amount of data written to 
disk. Not only does the reduced size save
+in storage requirements, it often increases read and write throughput, as the 
CPU overhead of compressing data is faster
+than the time it would take to read or write the larger volume of uncompressed 
data from disk.
+
+Compression is most useful in tables comprised of many rows, where the rows 
are similar in nature. Tables containing
+similar text columns (such as repeated JSON blobs) often compress very well.
+
+Operational Impact
+^^^^^^^^^^^^^^^^^^
+
+- Compression metadata is stored off-heap and scales with data on disk.  This 
often requires 1-3GB of off-heap RAM per
+  terabyte of data on disk, though the exact usage varies with 
``chunk_length_in_kb`` and compression ratios.
+
+- Streaming operations involve compressing and decompressing data on 
compressed tables - in some code paths (such as
+  non-vnode bootstrap), the CPU overhead of compression can be a limiting 
factor.
+
+- The compression path checksums data to ensure correctness - while the 
traditional Cassandra read path does not have a
+  way to ensure correctness of data on disk, compressed tables allow the user 
to set ``crc_check_chance`` (a float from
+  0.0 to 1.0) to allow Cassandra to probabilistically validate chunks on read 
to verify bits on disk are not corrupt.
+
+Advanced Use
+^^^^^^^^^^^^
+
+Advanced users can provide their own compression class by implementing the 
interface at
+``org.apache.cassandra.io.compress.ICompressor``.

Added: cassandra/site/src/doc/3.10/_sources/operating/hardware.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/operating/hardware.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/operating/hardware.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/operating/hardware.txt Tue Aug 23 
19:25:17 2016
@@ -0,0 +1,87 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Hardware Choices
+----------------
+
+Like most databases, Cassandra throughput improves with more CPU cores, more 
RAM, and faster disks. While Cassandra can
+be made to run on small servers for testing or development environments 
(including Raspberry Pis), a minimal production
+server requires at least 2 cores, and at least 8GB of RAM. Typical production 
servers have 8 or more cores and at least
+32GB of RAM.
+
+CPU
+^^^
+Cassandra is highly concurrent, handling many simultaneous requests (both read 
and write) using multiple threads running
+on as many CPU cores as possible. The Cassandra write path tends to be heavily 
optimized (writing to the commitlog and
+then inserting the data into the memtable), so writes, in particular, tend to 
be CPU bound. Consequently, adding
+additional CPU cores often increases throughput of both reads and writes.
+
+Memory
+^^^^^^
+Cassandra runs within a Java VM, which will pre-allocate a fixed size heap 
(java's Xmx system parameter). In addition to
+the heap, Cassandra will use significant amounts of RAM offheap for 
compression metadata, bloom filters, row, key, and
+counter caches, and an in process page cache. Finally, Cassandra will take 
advantage of the operating system's page
+cache, storing recently accessed portions files in RAM for rapid re-use.
+
+For optimal performance, operators should benchmark and tune their clusters 
based on their individual workload. However,
+basic guidelines suggest:
+
+-  ECC RAM should always be used, as Cassandra has few internal safeguards to 
protect against bit level corruption
+-  The Cassandra heap should be no less than 2GB, and no more than 50% of your 
system RAM
+-  Heaps smaller than 12GB should consider ParNew/ConcurrentMarkSweep garbage 
collection
+-  Heaps larger than 12GB should consider G1GC
+
+Disks
+^^^^^
+Cassandra persists data to disk for two very different purposes. The first is 
to the commitlog when a new write is made
+so that it can be replayed after a crash or system shutdown. The second is to 
the data directory when thresholds are
+exceeded and memtables are flushed to disk as SSTables.
+
+Commitlogs receive every write made to a Cassandra node and have the potential 
to block client operations, but they are
+only ever read on node start-up. SSTable (data file) writes on the other hand 
occur asynchronously, but are read to
+satisfy client look-ups. SSTables are also periodically merged and rewritten 
in a process called compaction.  The data
+held in the commitlog directory is data that has not been permanently saved to 
the SSTable data directories - it will be
+periodically purged once it is flushed to the SSTable data files.
+
+Cassandra performs very well on both spinning hard drives and solid state 
disks. In both cases, Cassandra's sorted
+immutable SSTables allow for linear reads, few seeks, and few overwrites, 
maximizing throughput for HDDs and lifespan of
+SSDs by avoiding write amplification. However, when using spinning disks, it's 
important that the commitlog
+(``commitlog_directory``) be on one physical disk (not simply a partition, but 
a physical disk), and the data files
+(``data_file_directories``) be set to a separate physical disk. By separating 
the commitlog from the data directory,
+writes can benefit from sequential appends to the commitlog without having to 
seek around the platter as reads request
+data from various SSTables on disk.
+
+In most cases, Cassandra is designed to provide redundancy via multiple 
independent, inexpensive servers. For this
+reason, using NFS or a SAN for data directories is an antipattern and should 
typically be avoided.  Similarly, servers
+with multiple disks are often better served by using RAID0 or JBOD than RAID1 
or RAID5 - replication provided by
+Cassandra obsoletes the need for replication at the disk layer, so it's 
typically recommended that operators take
+advantage of the additional throughput of RAID0 rather than protecting against 
failures with RAID1 or RAID5.
+
+Common Cloud Choices
+^^^^^^^^^^^^^^^^^^^^
+
+Many large users of Cassandra run in various clouds, including AWS, Azure, and 
GCE - Cassandra will happily run in any
+of these environments. Users should choose similar hardware to what would be 
needed in physical space. In EC2, popular
+options include:
+
+- m1.xlarge instances, which provide 1.6TB of local ephemeral spinning storage 
and sufficient RAM to run moderate
+  workloads
+- i2 instances, which provide both a high RAM:CPU ratio and local ephemeral 
SSDs
+- m4.2xlarge / c4.4xlarge instances, which provide modern CPUs, enhanced 
networking and work well with EBS GP2 (SSD)
+  storage
+
+Generally, disk and network performance increases with instance size and 
generation, so newer generations of instances
+and larger instance types within each family often perform better than their 
smaller or older alternatives.

Added: cassandra/site/src/doc/3.10/_sources/operating/hints.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/operating/hints.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/operating/hints.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/operating/hints.txt Tue Aug 23 
19:25:17 2016
@@ -0,0 +1,22 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Hints
+-----
+
+.. todo:: todo

Added: cassandra/site/src/doc/3.10/_sources/operating/index.txt
URL: 
http://svn.apache.org/viewvc/cassandra/site/src/doc/3.10/_sources/operating/index.txt?rev=1757419&view=auto
==============================================================================
--- cassandra/site/src/doc/3.10/_sources/operating/index.txt (added)
+++ cassandra/site/src/doc/3.10/_sources/operating/index.txt Tue Aug 23 
19:25:17 2016
@@ -0,0 +1,39 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Operating Cassandra
+===================
+
+.. toctree::
+   :maxdepth: 2
+
+   snitch
+   topo_changes
+   repair
+   read_repair
+   hints
+   compaction
+   bloom_filters
+   compression
+   cdc
+   backups
+   bulk_loading
+   metrics
+   security
+   hardware
+

svn commit: r1757419 [6/29] - in /cassandra/site/src/doc: ./ 3.10/ 3.10/_images/ 3.10/_sources/ 3.10/_sources/architecture/ 3.10/_sources/configuration/ 3.10/_sources/cql/ 3.10/_sources/data_modeling/ 3.10/_sources/development/ 3.10/_sources/faq/ 3.10/...

Reply via email to