[cassandra] branch trunk updated: Improve and clean up documentation and fix typos

smiklosovic Thu, 26 Jan 2023 07:38:52 -0800

This is an automated email from the ASF dual-hosted git repository.

smiklosovic pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git



The following commit(s) were added to refs/heads/trunk by this push:
     new f27790c969 Improve and clean up documentation and fix typos
f27790c969 is described below

commit f27790c96912ac9a83f052d8e6d0bfcdfe60ca0e
Author: Nikita Eshkeev <[email protected]>
AuthorDate: Thu Jan 26 13:28:16 2023 +0100

    Improve and clean up documentation and fix typos
    
    This patch includes all the changes from the PRs that introduce small
    changes related to typos and similar in the documentation. The changes are
    accumulated from the following PRs:
    
    - https://github.com/apache/cassandra/pull/206
    - https://github.com/apache/cassandra/pull/359
    - https://github.com/apache/cassandra/pull/366
    - https://github.com/apache/cassandra/pull/390
    - https://github.com/apache/cassandra/pull/450
    - https://github.com/apache/cassandra/pull/567
    - https://github.com/apache/cassandra/pull/615
    - https://github.com/apache/cassandra/pull/618
    - https://github.com/apache/cassandra/pull/746
    - https://github.com/apache/cassandra/pull/984
    - https://github.com/apache/cassandra/pull/1052
    - https://github.com/apache/cassandra/pull/1088
    - https://github.com/apache/cassandra/pull/1274
    - https://github.com/apache/cassandra/pull/1378
    - https://github.com/apache/cassandra/pull/1404
    - https://github.com/apache/cassandra/pull/1504
    - https://github.com/apache/cassandra/pull/1540
    - https://github.com/apache/cassandra/pull/1544
    - https://github.com/apache/cassandra/pull/1673
    - https://github.com/apache/cassandra/pull/1697
    - https://github.com/apache/cassandra/pull/1722
    - https://github.com/apache/cassandra/pull/1815
    - https://github.com/apache/cassandra/pull/1830
    - https://github.com/apache/cassandra/pull/1863
    - https://github.com/apache/cassandra/pull/1865
    - https://github.com/apache/cassandra/pull/1879
    - https://github.com/apache/cassandra/pull/2062
    
    patch by Nikita Eshkeev, reviewed by Stefan Miklosovic, Lorina Poland, 
Michael Semb Wever for CASSANDRA-18185
    
    Co-authored-by: kalmant <[email protected]>
    Co-authored-by: Dmitry <[email protected]>
    Co-authored-by: Tibor Répási <[email protected]>
    Co-authored-by: Tzach Livyatan <[email protected]>
    Co-authored-by: Jérôme BAROTIN <[email protected]>
    Co-authored-by: Giorgio Giuffrè <[email protected]>
    Co-authored-by: Siddhartha Tiwari <[email protected]>
    Co-authored-by: Angelo Polo <[email protected]>
    Co-authored-by: Tjeu Kayim <[email protected]>
    Co-authored-by: 陳傑夫 <[email protected]>
    Co-authored-by: Bhouse99 <[email protected]>
    Co-authored-by: Matthew Hardwick <[email protected]>
    Co-authored-by: Paul Wouters <[email protected]>
    Co-authored-by: Romain Hardouin <[email protected]>
    Co-authored-by: Guilherme Poleto <[email protected]>
    Co-authored-by: 陳傑夫 <[email protected]>
    Co-authored-by: etc-crontab <[email protected]>
    Co-authored-by: Prashant Bhuruk <[email protected]>
    Co-authored-by: Jingchuan Zhu 
<[email protected]>
    Co-authored-by: Ryan Stewart <[email protected]>
    Co-authored-by: utkarsh-agrawal-jm 
<[email protected]>
    Co-authored-by: Ben Dalling <[email protected]>
    Co-authored-by: Terry L. Blessing <[email protected]>
    Co-authored-by: gruzilkin <[email protected]>
    Co-authored-by: Kevin <[email protected]>
    Co-authored-by: yziadeh <[email protected]>
    Co-authored-by: Lorina Poland <[email protected]>
    Co-authored-by: Stefan Miklosovic <[email protected]>
---
 TESTING.md                                         |   6 +-
 conf/cassandra.yaml                                |  28 +++---
 conf/cqlshrc.sample                                |  12 ++-
 doc/modules/cassandra/examples/BNF/alter_table.bnf |   2 +-
 .../cassandra/pages/architecture/dynamo.adoc       |  51 +++++-----
 .../cassandra/pages/architecture/guarantees.adoc   |  46 ++++-----
 .../cassandra/pages/architecture/overview.adoc     |  16 +--
 .../cassandra/pages/architecture/snitch.adoc       |   2 +-
 .../pages/architecture/storage_engine.adoc         |  57 ++++++-----
 doc/modules/cassandra/pages/cql/ddl.adoc           |   9 +-
 doc/modules/cassandra/pages/cql/definitions.adoc   |   2 +-
 doc/modules/cassandra/pages/cql/dml.adoc           |   2 +-
 doc/modules/cassandra/pages/cql/types.adoc         |  17 ++--
 .../pages/data_modeling/data_modeling_rdbms.adoc   |   2 +-
 .../pages/data_modeling/data_modeling_schema.adoc  |   2 +-
 .../pages/data_modeling/data_modeling_tools.adoc   |   2 +-
 doc/modules/cassandra/pages/faq/index.adoc         |   5 +-
 .../cassandra/pages/getting_started/drivers.adoc   |   2 +-
 .../pages/getting_started/production.adoc          |  10 +-
 .../cassandra/pages/operating/auditlogging.adoc    | 110 +++++++++++++++++----
 doc/modules/cassandra/pages/operating/backups.adoc |   2 +-
 doc/modules/cassandra/pages/tools/cqlsh.adoc       |  37 +++++--
 .../pages/tools/sstable/sstablelevelreset.adoc     |   2 +-
 .../org/apache/cassandra/db/tries/InMemoryTrie.md  |   2 +-
 24 files changed, 267 insertions(+), 159 deletions(-)

diff --git a/TESTING.md b/TESTING.md
index 0f25743f92..b9c5c7a5ba 100644
--- a/TESTING.md
+++ b/TESTING.md
@@ -364,14 +364,14 @@ dependencies into the constructor is not practical, 
wrapping accesses to global
 
 
 **Example, alternative**
-```javayy
+```java
 class SomeVerbHandler implements IVerbHandler<SomeMessage>
 { 
        @VisibleForTesting
        protected boolean isAlive(InetAddress addr) { return 
FailureDetector.instance.isAlive(msg.payload.otherNode); }
 
        @VisibleForTesting
-       protected void streamSomethind(InetAddress to) { new 
StreamPlan(to).requestRanges(someRanges).execute(); }
+       protected void streamSomething(InetAddress to) { new 
StreamPlan(to).requestRanges(someRanges).execute(); }
 
        @VisibleForTesting
        protected void compactSomething(ColumnFamilyStore cfs ) { 
CompactionManager.instance.submitBackground(); }
@@ -404,7 +404,7 @@ class SomeVerbTest
                protected boolean isAlive(InetAddress addr) { return alive; }
                
                @Override
-               protected void streamSomethind(InetAddress to) { streamCalled = 
true; }
+               protected void streamSomething(InetAddress to) { streamCalled = 
true; }
 
                @Override
                protected void compactSomething(ColumnFamilyStore cfs ) { 
compactCalled = true; }
diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml
index 21c70a7a1a..08abc620d6 100644
--- a/conf/cassandra.yaml
+++ b/conf/cassandra.yaml
@@ -44,8 +44,8 @@ num_tokens: 16
 allocate_tokens_for_local_replication_factor: 3
 
 # initial_token allows you to specify tokens manually.  While you can use it 
with
-# vnodes (num_tokens > 1, above) -- in which case you should provide a 
-# comma-separated list -- it's primarily used when adding nodes to legacy 
clusters 
+# vnodes (num_tokens > 1, above) -- in which case you should provide a
+# comma-separated list -- it's primarily used when adding nodes to legacy 
clusters
 # that do not have vnodes enabled.
 # initial_token:
 
@@ -290,7 +290,7 @@ credentials_validity: 2000ms
 partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 # Directories where Cassandra should store data on disk. If multiple
-# directories are specified, Cassandra will spread data evenly across 
+# directories are specified, Cassandra will spread data evenly across
 # them by partitioning the token ranges.
 # If not set, the default directory is $CASSANDRA_HOME/data/data.
 # data_file_directories:
@@ -496,8 +496,8 @@ counter_cache_save_period: 7200s
 # Min unit: s
 # cache_load_timeout: 30s
 
-# commitlog_sync may be either "periodic", "group", or "batch." 
-# 
+# commitlog_sync may be either "periodic", "group", or "batch."
+#
 # When in batch mode, Cassandra won't ack writes until the commit log
 # has been flushed to disk.  Each incoming write will trigger the flush task.
 # commitlog_sync_batch_window_in_ms is a deprecated value. Previously it had
@@ -940,7 +940,7 @@ incremental_backups: false
 snapshot_before_compaction: false
 
 # Whether or not a snapshot is taken of the data before keyspace truncation
-# or dropping of column families. The STRONGLY advised default of true 
+# or dropping of column families. The STRONGLY advised default of true
 # should be used to provide data safety. If you set this flag to false, you 
will
 # lose data on truncation or drop.
 auto_snapshot: true
@@ -994,7 +994,7 @@ column_index_cache_size: 2KiB
 #
 # concurrent_compactors defaults to the smaller of (number of disks,
 # number of cores), with a minimum of 2 and a maximum of 8.
-# 
+#
 # If your data directories are backed by SSD, you should increase this
 # to the number of cores.
 # concurrent_compactors: 1
@@ -1022,7 +1022,7 @@ compaction_throughput: 64MiB/s
 
 # When compacting, the replacement sstable(s) can be opened before they
 # are completely written, and used in place of the prior sstables for
-# any range that has been written. This helps to smoothly transfer reads 
+# any range that has been written. This helps to smoothly transfer reads
 # between the sstables, reducing page cache churn and keeping hot rows hot
 # Set sstable_preemptive_open_interval to null for disabled which is 
equivalent to
 # sstable_preemptive_open_interval_in_mb being negative
@@ -1170,10 +1170,10 @@ slow_query_log_timeout: 500ms
 # Enable operation timeout information exchange between nodes to accurately
 # measure request timeouts.  If disabled, replicas will assume that requests
 # were forwarded to them instantly by the coordinator, which means that
-# under overload conditions we will waste that much extra time processing 
+# under overload conditions we will waste that much extra time processing
 # already-timed-out requests.
 #
-# Warning: It is generally assumed that users have setup NTP on their 
clusters, and that clocks are modestly in sync, 
+# Warning: It is generally assumed that users have setup NTP on their 
clusters, and that clocks are modestly in sync,
 # since this is a requirement for general correctness of last write wins.
 # internode_timeout: true
 
@@ -1586,7 +1586,7 @@ compaction_tombstone_warning_threshold: 100000
 # max_concurrent_automatic_sstable_upgrades: 1
 
 # Audit logging - Logs every incoming CQL command request, authentication to a 
node. See the docs
-# on audit_logging for full details about the various configuration options.
+# on audit_logging for full details about the various configuration options 
and production tips.
 audit_logging_options:
   enabled: false
   logger:
@@ -1602,11 +1602,13 @@ audit_logging_options:
   # block: true
   # max_queue_weight: 268435456 # 256 MiB
   # max_log_size: 17179869184 # 16 GiB
-  ## archive command is "/path/to/script.sh %path" where %path is replaced 
with the file being rolled:
+  #
+  ## If archive_command is empty or unset, Cassandra uses a built-in 
DeletingArchiver that deletes the oldest files if ``max_log_size`` is reached.
+  ## If archive_command is set, Cassandra does not use DeletingArchiver, so it 
is the responsibility of the script to make any required cleanup.
+  ## Example: "/path/to/script.sh %path" where %path is replaced with the file 
being rolled.
   # archive_command:
   # max_archive_retries: 10
 
-
 # default options for full query logging - these can be overridden from 
command line when executing
 # nodetool enablefullquerylog
 # full_query_logging_options:
diff --git a/conf/cqlshrc.sample b/conf/cqlshrc.sample
index 4878b589bc..56011f4927 100644
--- a/conf/cqlshrc.sample
+++ b/conf/cqlshrc.sample
@@ -15,7 +15,7 @@
 ; specific language governing permissions and limitations
 ; under the License.
 ;
-; Sample ~/.cqlshrc file.
+; Sample ~/.cassandra/cqlshrc file.
 
 [authentication]
 ;; If Cassandra has auth enabled, fill out these options
@@ -23,7 +23,6 @@
 ; credentials = ~/.cassandra/credentials
 ; keyspace = ks1
 
-
 [auth_provider]
 ;; you can specify any auth provider found in your python environment
 ;; module and class will be used to dynamically load the class
@@ -33,6 +32,10 @@
 ; classname = PlainTextAuthProvider
 ; username = user1
 
+[protocol]
+;; Specify a specific protcol version otherwise the client will default and 
downgrade as necessary
+; version = None
+
 [ui]
 ;; Whether or not to display query results with colors
 ; color = on
@@ -153,9 +156,8 @@ port = 9042
 ; boolstyle = True,False
 
 ;; The number of child worker processes to create for
-;; COPY tasks.  Defaults to a max of 4 for COPY FROM and 16
-;; for COPY TO.  However, at most (num_cores - 1) processes
-;; will be created.
+;; COPY tasks.  Defaults to 16 for `COPY` tasks.
+;; However, at most (num_cores - 1) processes will be created.
 ; numprocesses =
 
 ;; The maximum number of failed attempts to fetch a range of data (when using
diff --git a/doc/modules/cassandra/examples/BNF/alter_table.bnf 
b/doc/modules/cassandra/examples/BNF/alter_table.bnf
index 728a78a4c0..7b58320a2f 100644
--- a/doc/modules/cassandra/examples/BNF/alter_table.bnf
+++ b/doc/modules/cassandra/examples/BNF/alter_table.bnf
@@ -1,5 +1,5 @@
 alter_table_statement::= ALTER TABLE [ IF EXISTS ] table_name 
alter_table_instruction
 alter_table_instruction::= ADD [ IF NOT EXISTS ] column_name cql_type ( ',' 
column_name cql_type )*
-       | DROP [ IF EXISTS ] column_name ( column_name )*
+       | DROP [ IF EXISTS ] column_name ( ',' column_name )*
        | RENAME [ IF EXISTS ] column_name to column_name (AND column_name to 
column_name)*
        | WITH options
diff --git a/doc/modules/cassandra/pages/architecture/dynamo.adoc 
b/doc/modules/cassandra/pages/architecture/dynamo.adoc
index e90390a7cb..aa1cf5aa47 100644
--- a/doc/modules/cassandra/pages/architecture/dynamo.adoc
+++ b/doc/modules/cassandra/pages/architecture/dynamo.adoc
@@ -1,7 +1,7 @@
 = Dynamo
 
 Apache Cassandra relies on a number of techniques from Amazon's
-http://courses.cse.tamu.edu/caverlee/csce438/readings/dynamo-paper.pdf[Dynamo]
+https://www.cs.cornell.edu/courses/cs5414/2017fa/papers/dynamo.pdf[Dynamo]
 distributed storage key-value system. Each node in the Dynamo system has
 three main components:
 
@@ -22,10 +22,10 @@ protocol
 
 Cassandra was designed this way to meet large-scale (PiB+)
 business-critical storage requirements. In particular, as applications
-demanded full global replication of petabyte scale datasets along with
+demanded full global replication of petabyte-scale datasets along with
 always available low-latency reads and writes, it became imperative to
 design a new kind of database model as the relational database systems
-of the time struggled to meet the new requirements of global scale
+of the time struggled to meet the new requirements of global-scale
 applications.
 
 == Dataset Partitioning: Consistent Hashing
@@ -38,11 +38,11 @@ as racks and even datacenters. As every replica can 
independently accept
 mutations to every key that it owns, every key must be versioned. Unlike
 in the original Dynamo paper where deterministic versions and vector
 clocks were used to reconcile concurrent updates to a key, Cassandra
-uses a simpler last write wins model where every mutation is timestamped
+uses a simpler last-write-wins model where every mutation is timestamped
 (including deletes) and then the latest version of data is the "winning"
 value. Formally speaking, Cassandra uses a Last-Write-Wins Element-Set
 conflict-free replicated data type for each CQL row, or 
-https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type 
LWW-Element-Set_(Last-Write-Wins-Element-Set)[LWW-Element-Set
+https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type#LWW-Element-Set_(Last-Write-Wins-Element-Set)[LWW-Element-Set
 CRDT], to resolve conflicting mutations on replica sets.
 
 === Consistent Hashing using a Token Ring
@@ -76,14 +76,14 @@ gRF=3 can be visualized as follows:
 
 image::ring.svg[image]
 
-You can see that in a Dynamo like system, ranges of keys, also known as
+You can see that in a Dynamo-like system, ranges of keys, also known as
 *token ranges*, map to the same physical set of nodes. In this example,
 all keys that fall in the token range excluding token 1 and including
 token 2 (grange(t1, t2]) are stored on nodes 2, 3 and 4.
 
 === Multiple Tokens per Physical Node (vnodes)
 
-Simple single token consistent hashing works well if you have many
+Simple single-token consistent hashing works well if you have many
 physical nodes to spread data over, but with evenly spaced tokens and a
 small number of physical nodes, incremental scaling (adding just a few
 nodes of capacity) is difficult because there are no token selections
@@ -104,8 +104,7 @@ even a single node.
 
 Cassandra introduces some nomenclature to handle these concepts:
 
-* *Token*: A single position on the dynamo style hash
-ring.
+* *Token*: A single position on the Dynamo-style hash ring.
 * *Endpoint*: A single physical IP and port on the network.
 * *Host ID*: A unique identifier for a single "physical" node, usually
 present at one gEndpoint and containing one or more
@@ -131,7 +130,7 @@ data across the cluster.
 . When a node is decommissioned, it loses data roughly equally to other
 members of the ring, again keeping equal distribution of data across the
 cluster.
-. If a node becomes unavailable, query load (especially token aware
+. If a node becomes unavailable, query load (especially token-aware
 query load), is evenly distributed across many other nodes.
 
 Multiple tokens, however, can also have disadvantages:
@@ -152,7 +151,7 @@ Note that in Cassandra `2.x`, the only token allocation 
algorithm
 available was picking random tokens, which meant that to keep balance
 the default number of tokens per node had to be quite high, at `256`.
 This had the effect of coupling many physical endpoints together,
-increasing the risk of unavailability. That is why in `3.x +` the new
+increasing the risk of unavailability. That is why in `3.x +` a new
 deterministic token allocator was added which intelligently picks tokens
 such that the ring is optimally balanced while requiring a much lower
 number of tokens per physical node.
@@ -256,7 +255,7 @@ secondary indices with them.
 Transient replication is an experimental feature that is not ready
 for production use. The expected audience is experienced users of
 Cassandra capable of fully validating a deployment of their particular
-application. That means being able check that operations like reads,
+application. That means you have the experience to check that operations like 
reads,
 writes, decommission, remove, rebuild, repair, and replace all work with
 your queries, data, configuration, operational practices, and
 availability requirements.
@@ -269,18 +268,18 @@ transient replication, as well as LWT, logged batches, 
and counters.
 Cassandra uses mutation timestamp versioning to guarantee eventual
 consistency of data. Specifically all mutations that enter the system do
 so with a timestamp provided either from a client clock or, absent a
-client provided timestamp, from the coordinator node's clock. Updates
+client-provided timestamp, from the coordinator node's clock. Updates
 resolve according to the conflict resolution rule of last write wins.
 Cassandra's correctness does depend on these clocks, so make sure a
 proper time synchronization process is running such as NTP.
 
 Cassandra applies separate mutation timestamps to every column of every
 row within a CQL partition. Rows are guaranteed to be unique by primary
-key, and each column in a row resolve concurrent mutations according to
+key, and each column in a row resolves concurrent mutations according to
 last-write-wins conflict resolution. This means that updates to
 different primary keys within a partition can actually resolve without
 conflict! Furthermore the CQL collection types such as maps and sets use
-this same conflict free mechanism, meaning that concurrent updates to
+this same conflict-free mechanism, meaning that concurrent updates to
 maps and sets are guaranteed to resolve as well.
 
 ==== Replica Synchronization
@@ -293,7 +292,7 @@ many best-effort techniques to drive convergence of 
replicas including
 
 These techniques are only best-effort, however, and to guarantee
 eventual consistency Cassandra implements `anti-entropy
-repair <repair>` where replicas calculate hierarchical hash-trees over
+repair <repair>` where replicas calculate hierarchical hash trees over
 their datasets called https://en.wikipedia.org/wiki/Merkle_tree[Merkle
 trees] that can then be compared across replicas to identify mismatched
 data. Like the original Dynamo paper Cassandra supports full repairs
@@ -340,7 +339,7 @@ The following consistency levels are available:
   A majority of the replicas in each datacenter must respond.
 `LOCAL_ONE`::
   Only a single replica must respond. In a multi-datacenter cluster,
-  this also gaurantees that read requests are not sent to replicas in a
+  this also guarantees that read requests are not sent to replicas in a
   remote datacenter.
 `ANY`::
   A single replica may respond, or the coordinator may store a hint. If
@@ -400,7 +399,7 @@ versions. In Cassandra's gossip system, nodes exchange 
state information
 not only about themselves but also about other nodes they know about.
 This information is versioned with a vector clock of
 `(generation, version)` tuples, where the generation is a monotonic
-timestamp and version is a logical clock the increments roughly every
+timestamp and version is a logical clock that increments roughly every
 second. These logical clocks allow Cassandra gossip to ignore old
 versions of cluster state just by inspecting the logical clocks
 presented with gossip messages.
@@ -417,10 +416,10 @@ state with.
 one exists)
 . Gossips with a seed node if that didn't happen in step 2.
 
-When an operator first bootstraps a Cassandra cluster they designate
-certain nodes as seed nodes. Any node can be a seed node and the only
-difference between seed and non-seed nodes is seed nodes are allowed to
-bootstrap into the ring without seeing any other seed nodes.
+When an operator first bootstraps a Cassandra cluster, they designate
+certain nodes as seed nodes. Any node can be a seed node, and the only
+difference between seed and non-seed nodes is that seed nodes are allowed
+to bootstrap into the ring without seeing any other seed nodes.
 Furthermore, once a cluster is bootstrapped, seed nodes become
 hotspots for gossip due to step 4 above.
 
@@ -435,7 +434,7 @@ chosen using existing off-the-shelf service discovery 
mechanisms.
 Nodes do not have to agree on the seed nodes, and indeed once a cluster
 is bootstrapped, newly launched nodes can be configured to use any
 existing nodes as seeds. The only advantage to picking the same nodes
-as seeds is it increases their usefullness as gossip hotspots.
+as seeds is that it increases their usefulness as gossip hotspots.
 ====
 
 Currently, gossip also propagates token metadata and schema
@@ -488,7 +487,7 @@ and every additional node brings linear improvements in 
compute and
 storage. In contrast, scaling-up implies adding more capacity to the
 existing database nodes. Cassandra is also capable of scale-up, and in
 certain environments it may be preferable depending on the deployment.
-Cassandra gives operators the flexibility to chose either scale-out or
+Cassandra gives operators the flexibility to choose either scale-out or
 scale-up.
 
 One key aspect of Dynamo that Cassandra follows is to attempt to run on
@@ -507,7 +506,7 @@ API, and allows Cassandra to more easily scale horizontally 
since
 multi-partition transactions spanning multiple nodes are notoriously
 difficult to implement and typically very latent.
 
-Instead, Cassanda chooses to offer fast, consistent, latency at any
+Instead, Cassandra chooses to offer fast, consistent, latency at any
 scale for single partition operations, allowing retrieval of entire
 partitions or only subsets of partitions based on primary key filters.
 Furthermore, Cassandra does support single partition compare and swap
@@ -516,7 +515,7 @@ functionality via the lightweight transaction CQL API.
 === Simple Interface for Storing Records
 
 Cassandra, in a slight departure from Dynamo, chooses a storage
-interface that is more sophisticated then "simple key value" stores but
+interface that is more sophisticated than "simple key-value" stores but
 significantly less complex than SQL relational data models. Cassandra
 presents a wide-column store interface, where partitions of data contain
 multiple rows, each of which contains a flexible set of individually
diff --git a/doc/modules/cassandra/pages/architecture/guarantees.adoc 
b/doc/modules/cassandra/pages/architecture/guarantees.adoc
index 3313a1140c..a5f09b97bb 100644
--- a/doc/modules/cassandra/pages/architecture/guarantees.adoc
+++ b/doc/modules/cassandra/pages/architecture/guarantees.adoc
@@ -1,17 +1,17 @@
 = Guarantees
 
 Apache Cassandra is a highly scalable and reliable database. Cassandra
-is used in web based applications that serve large number of clients and
+is used in web-based applications that serve large number of clients and
 the quantity of data processed is web-scale (Petabyte) large. Cassandra
 makes some guarantees about its scalability, availability and
 reliability. To fully understand the inherent limitations of a storage
 system in an environment in which a certain level of network partition
 failure is to be expected and taken into account when designing the
-system it is important to first briefly introduce the CAP theorem.
+system, it is important to first briefly introduce the CAP theorem.
 
 == What is CAP?
 
-According to the CAP theorem it is not possible for a distributed data
+According to the CAP theorem, it is not possible for a distributed data
 store to provide more than two of the following guarantees
 simultaneously.
 
@@ -24,7 +24,7 @@ recent write or data.
 storage system to failure of a network partition. Even if some of the
 messages are dropped or delayed the system continues to operate.
 
-CAP theorem implies that when using a network partition, with the
+The CAP theorem implies that when using a network partition, with the
 inherent risk of partition failure, one has to choose between
 consistency and availability and both cannot be guaranteed at the same
 time. CAP theorem is illustrated in Figure 1.
@@ -33,7 +33,7 @@ image::Figure_1_guarantees.jpg[image]
 
 Figure 1. CAP Theorem
 
-High availability is a priority in web based applications and to this
+High availability is a priority in web-based applications and to this
 objective Cassandra chooses Availability and Partition Tolerance from
 the CAP guarantees, compromising on data Consistency to some extent.
 
@@ -47,19 +47,19 @@ Cassandra makes the following guarantees.
 * Batched writes across multiple tables are guaranteed to succeed
 completely or not at all
 * Secondary indexes are guaranteed to be consistent with their local
-replicas data
+replicas' data
 
 == High Scalability
 
 Cassandra is a highly scalable storage system in which nodes may be
-added/removed as needed. Using gossip-based protocol a unified and
+added/removed as needed. Using gossip-based protocol, a unified and
 consistent membership list is kept at each node.
 
 == High Availability
 
 Cassandra guarantees high availability of data by implementing a
-fault-tolerant storage system. Failure detection in a node is detected
-using a gossip-based protocol.
+fault-tolerant storage system. Failure of a node is detected using
+a gossip-based protocol.
 
 == Durability
 
@@ -67,26 +67,26 @@ Cassandra guarantees data durability by using replicas. 
Replicas are
 multiple copies of a data stored on different nodes in a cluster. In a
 multi-datacenter environment the replicas may be stored on different
 datacenters. If one replica is lost due to unrecoverable node/datacenter
-failure the data is not completely lost as replicas are still available.
+failure, the data is not completely lost, as replicas are still available.
 
 == Eventual Consistency
 
 Meeting the requirements of performance, reliability, scalability and
-high availability in production Cassandra is an eventually consistent
-storage system. Eventually consistent implies that all updates reach all
+high availability in production, Cassandra is an eventually consistent
+storage system. Eventually consistency implies that all updates reach all
 replicas eventually. Divergent versions of the same data may exist
-temporarily but they are eventually reconciled to a consistent state.
-Eventual consistency is a tradeoff to achieve high availability and it
+temporarily, but they are eventually reconciled to a consistent state.
+Eventual consistency is a tradeoff to achieve high availability, and it
 involves some read and write latencies.
 
 == Lightweight transactions with linearizable consistency
 
-Data must be read and written in a sequential order. Paxos consensus
-protocol is used to implement lightweight transactions. Paxos protocol
+Data must be read and written in a sequential order. The Paxos consensus
+protocol is used to implement lightweight transactions. The Paxos protocol
 implements lightweight transactions that are able to handle concurrent
 operations using linearizable consistency. Linearizable consistency is
-sequential consistency with real-time constraints and it ensures
-transaction isolation with compare and set (CAS) transaction. With CAS
+sequential consistency with real-time constraints, and it ensures
+transaction isolation with compare-and-set (CAS) transactions. With CAS
 replica data is compared and data that is found to be out of date is set
 to the most consistent value. Reads with linearizable consistency allow
 reading the current state of the data, which may possibly be
@@ -97,12 +97,12 @@ uncommitted, without making a new addition or update.
 The guarantee for batched writes across multiple tables is that they
 will eventually succeed, or none will. Batch data is first written to
 batchlog system data, and when the batch data has been successfully
-stored in the cluster the batchlog data is removed. The batch is
-replicated to another node to ensure the full batch completes in the
-event the coordinator node fails.
+stored in the cluster, the batchlog data is removed. The batch is
+replicated to another node to ensure that the full batch completes in
+the event if coordinator node fails.
 
 == Secondary Indexes
 
-A secondary index is an index on a column and is used to query a table
-that is normally not queryable. Secondary indexes when built are
+A secondary index is an index on a column, and it's used to query a table
+that is normally not queryable. Secondary indexes, when built, are
 guaranteed to be consistent with their local replicas.
diff --git a/doc/modules/cassandra/pages/architecture/overview.adoc 
b/doc/modules/cassandra/pages/architecture/overview.adoc
index 58db6b14ba..2c86ce4c2b 100644
--- a/doc/modules/cassandra/pages/architecture/overview.adoc
+++ b/doc/modules/cassandra/pages/architecture/overview.adoc
@@ -1,7 +1,7 @@
 = Overview
 :exper: experimental
 
-Apache Cassandra is an open source, distributed, NoSQL database. It
+Apache Cassandra is an open-source, distributed, NoSQL database. It
 presents a partitioned wide column storage model with eventually
 consistent semantics.
 
@@ -10,7 +10,7 @@ 
https://www.cs.cornell.edu/projects/ladis2009/papers/lakshman-ladis2009.pdf[Face
 using a staged event-driven architecture
 (http://www.sosp.org/2001/papers/welsh.pdf[SEDA]) to implement a
 combination of Amazon’s
-http://courses.cse.tamu.edu/caverlee/csce438/readings/dynamo-paper.pdf[Dynamo]
+https://www.cs.cornell.edu/courses/cs5414/2017fa/papers/dynamo.pdf[Dynamo]
 distributed storage and replication techniques and Google's
 
https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf[Bigtable]
 data and storage engine model. Dynamo and Bigtable were both developed
@@ -23,7 +23,7 @@ storage requirements. As applications began to require full 
global
 replication and always available low-latency reads and writes, it became
 imperative to design a new kind of database model as the relational
 database systems of the time struggled to meet the new requirements of
-global scale applications.
+global-scale applications.
 
 Systems like Cassandra are designed for these challenges and seek the
 following design objectives:
@@ -59,19 +59,19 @@ keys.
 CQL supports numerous advanced features over a partitioned dataset such
 as:
 
-* Single partition lightweight transactions with atomic compare and set
-semantics.
+* Single-partition lightweight transactions with atomic compare and set
+semantics
 * User-defined types, functions and aggregates
-* Collection types including sets, maps, and lists.
+* Collection types including sets, maps, and lists
 * Local secondary indices
 * (Experimental) materialized views
 
 Cassandra explicitly chooses not to implement operations that require
-cross partition coordination as they are typically slow and hard to
+cross-partition coordination as they are typically slow and hard to
 provide highly available global semantics. For example Cassandra does
 not support:
 
-* Cross partition transactions
+* Cross-partition transactions
 * Distributed joins
 * Foreign keys or referential integrity.
 
diff --git a/doc/modules/cassandra/pages/architecture/snitch.adoc 
b/doc/modules/cassandra/pages/architecture/snitch.adoc
index 3ae066d61e..cd59f98f9d 100644
--- a/doc/modules/cassandra/pages/architecture/snitch.adoc
+++ b/doc/modules/cassandra/pages/architecture/snitch.adoc
@@ -12,7 +12,7 @@ physical location).
 
 == Dynamic snitching
 
-The dynamic snitch monitor read latencies to avoid reading from hosts
+The dynamic snitch monitors read latencies to avoid reading from hosts
 that have slowed down. The dynamic snitch is configured with the
 following properties on `cassandra.yaml`:
 
diff --git a/doc/modules/cassandra/pages/architecture/storage_engine.adoc 
b/doc/modules/cassandra/pages/architecture/storage_engine.adoc
index 9a0c37a089..52158c6980 100644
--- a/doc/modules/cassandra/pages/architecture/storage_engine.adoc
+++ b/doc/modules/cassandra/pages/architecture/storage_engine.adoc
@@ -3,17 +3,17 @@
 [[commit-log]]
 == CommitLog
 
-Commitlogs are an append only log of all mutations local to a Cassandra
+Commitlogs are an append-only log of all mutations local to a Cassandra
 node. Any data written to Cassandra will first be written to a commit
 log before being written to a memtable. This provides durability in the
 case of unexpected shutdown. On startup, any mutations in the commit log
 will be applied to memtables.
 
-All mutations write optimized by storing in commitlog segments, reducing
-the number of seeks needed to write to disk. Commitlog Segments are
-limited by the `commitlog_segment_size` option, once the size is
+All mutations are write-optimized by storing in commitlog segments, reducing
+the number of seeks needed to write to disk. Commitlog segments are
+limited by the `commitlog_segment_size` option. Once the size is
 reached, a new commitlog segment is created. Commitlog segments can be
-archived, deleted, or recycled once all its data has been flushed to
+archived, deleted, or recycled once all the data has been flushed to
 SSTables. Commitlog segments are truncated when Cassandra has written
 data older than a certain point to the SSTables. Running "nodetool
 drain" before stopping Cassandra will write everything in the memtables
@@ -22,19 +22,18 @@ to SSTables and remove the need to sync with the commitlogs 
on startup.
 * `commitlog_segment_size`: The default size is 32MiB, which is
 almost always fine, but if you are archiving commitlog segments (see
 commitlog_archiving.properties), then you probably want a finer
-granularity of archiving; 8 or 16 MiB is reasonable. `commitlog_segment_size`
-also determines the default value of `max_mutation_size` in cassandra.yaml.
-By default, max_mutation_size is half the size of `commitlog_segment_size`.
-
-**NOTE: If `max_mutation_size` is set explicitly then
+granularity of archiving; 8 or 16 MiB is reasonable.
+`commitlog_segment_size` also determines the default value of
+`max_mutation_size` in `cassandra.yaml`. By default,
+`max_mutation_size` is a half the size of `commitlog_segment_size`.
+
+[NOTE]
+.Note
+====
+If `max_mutation_size` is set explicitly then
 `commitlog_segment_size` must be set to at least twice the size of
-`max_mutation_size`**.
-
-Commitlogs are an append only log of all mutations local to a Cassandra
-node. Any data written to Cassandra will first be written to a commit
-log before being written to a memtable. This provides durability in the
-case of unexpected shutdown. On startup, any mutations in the commit log
-will be applied.
+`max_mutation_size`.
+====
 
 * `commitlog_sync`: may be either _periodic_ or _batch_.
 ** `batch`: In batch mode, Cassandra won’t ack writes until the commit
@@ -55,23 +54,27 @@ _Default Value:_ 10000ms
 
 _Default Value:_ batch
 
-** NOTE: In the event of an unexpected shutdown, Cassandra can lose up
+[NOTE]
+.Note
+====
+In the event of an unexpected shutdown, Cassandra can lose up
 to the sync period or more if the sync is delayed. If using "batch"
 mode, it is recommended to store commitlogs in a separate, dedicated
-device.*
+device.
+====
 
-* `commitlog_directory`: This option is commented out by default When
+* `commitlog_directory`: This option is commented out by default. When
 running on magnetic HDD, this should be a separate spindle than the data
 directories. If not set, the default directory is
-$CASSANDRA_HOME/data/commitlog.
+`$CASSANDRA_HOME/data/commitlog`.
 
-_Default Value:_ /var/lib/cassandra/commitlog
+_Default Value:_ `/var/lib/cassandra/commitlog`
 
 * `commitlog_compression`: Compression to apply to the commitlog. If
 omitted, the commit log will be written uncompressed. LZ4, Snappy,
 Deflate and Zstd compressors are supported.
 
-(Default Value: (complex option):
+_Default Value:_ (complex option):
 
 [source, yaml]
 ----
@@ -86,8 +89,8 @@ If space gets above this value, Cassandra will flush every 
dirty CF in
 the oldest segment and remove it. So a small total commitlog space will
 tend to cause more flush activity on less-active columnfamilies.
 
-The default value is the smaller of 8192, and 1/4 of the total space of
-the commitlog volume.
+The default value is the smallest between 8192 and 1/4 of the total
+space of the commitlog volume.
 
 _Default Value:_ 8192MiB
 
@@ -202,7 +205,7 @@ we should not allow streaming of super columns into this 
new format)
 ** index summaries can be downsampled and the sampling level is
 persisted
 ** switch uncompressed checksums to adler32
-** tracks presense of legacy (local and remote) counter shards
+** tracks presence of legacy (local and remote) counter shards
 * la (2.2.0): new file name format
 * lb (2.2.7): commit log lower bound included
 
@@ -221,5 +224,5 @@ match the "ib" SSTable version
 
 [source,bash]
 ----
-include:example$find_sstables.sh[]
+include::example$BASH/find_sstables.sh[]
 ----
diff --git a/doc/modules/cassandra/pages/cql/ddl.adoc 
b/doc/modules/cassandra/pages/cql/ddl.adoc
index 36cce45565..09e888808e 100644
--- a/doc/modules/cassandra/pages/cql/ddl.adoc
+++ b/doc/modules/cassandra/pages/cql/ddl.adoc
@@ -4,9 +4,10 @@
 CQL stores data in _tables_, whose schema defines the layout of the
 data in the table. Tables are located in _keyspaces_. 
 A keyspace defines options that apply to all the keyspace's tables. 
-The xref:cql/ddl.adoc#replication-strategy[replication strategy] is an 
important keyspace option, as is the replication factor. 
+The xref:cql/ddl.adoc#replication-strategy[replication strategy]
+is an important keyspace option, as is the replication factor. 
 A good general rule is one keyspace per application.
-It is common for a cluster to define only one keyspace for an actie 
application.
+It is common for a cluster to define only one keyspace for an active 
application.
 
 This section describes the statements used to create, modify, and remove
 those keyspace and tables.
@@ -32,7 +33,7 @@ double-quotes (`"myTable"` is different from `mytable`).
 Further, a table is always part of a keyspace and a table name can be
 provided fully-qualified by the keyspace it is part of. If is is not
 fully-qualified, the table is assumed to be in the _current_ keyspace
-(see xref:cql/ddl.adoc#use-statement[USE] statement.
+(see xref:cql/ddl.adoc#use-statement[USE] statement).
 
 Further, the valid names for columns are defined as:
 
@@ -502,7 +503,7 @@ A table supports the following options:
 | `bloom_filter_fp_chance` |_simple_ |0.00075 |The target probability of
 false positive of the sstable bloom filters. Said bloom filters will be
 sized to provide the provided probability, thus lowering this value
-impact the size of bloom filters in-memory and on-disk.
+impacts the size of bloom filters in-memory and on-disk.
 | `default_time_to_live` |_simple_ |0 |Default expiration time (“TTL”) in 
seconds for a table
 | `compaction` |_map_ |_see below_ | 
xref:operating/compaction/index.adoc#cql-compaction-options[Compaction options]
 | `compression` |_map_ |_see below_ | 
xref:operating/compression/index.adoc#cql-compression-options[Compression 
options]
diff --git a/doc/modules/cassandra/pages/cql/definitions.adoc 
b/doc/modules/cassandra/pages/cql/definitions.adoc
index 95be20ff1d..14e60487f3 100644
--- a/doc/modules/cassandra/pages/cql/definitions.adoc
+++ b/doc/modules/cassandra/pages/cql/definitions.adoc
@@ -103,7 +103,7 @@ however than float allows the special `NaN` and `Infinity` 
constants.
 * CQL supports
 https://en.wikipedia.org/wiki/Universally_unique_identifier[UUID]
 constants.
-* Blobs content are provided in hexadecimal and prefixed by `0x`.
+* The content for blobs is provided in hexadecimal and prefixed by `0x`.
 * The special `NULL` constant denotes the absence of value.
 
 For how these constants are typed, see the xref:cql/types.adoc[Data types] 
section.
diff --git a/doc/modules/cassandra/pages/cql/dml.adoc 
b/doc/modules/cassandra/pages/cql/dml.adoc
index 513dc1d1e5..6c82d4020a 100644
--- a/doc/modules/cassandra/pages/cql/dml.adoc
+++ b/doc/modules/cassandra/pages/cql/dml.adoc
@@ -318,7 +318,7 @@ include::example$CQL/update_statement.cql[]
 
 The `UPDATE` statement writes one or more columns for a given row in a
 table. 
-The `WHERE`clause is used to select the row to update and must include all 
columns of the `PRIMARY KEY`. 
+The `WHERE` clause is used to select the row to update and must include all 
columns of the `PRIMARY KEY`. 
 Non-primary key columns are set using the `SET` keyword.
 In an `UPDATE` statement, all updates within the same partition key are 
applied atomically and in isolation.
 
diff --git a/doc/modules/cassandra/pages/cql/types.adoc 
b/doc/modules/cassandra/pages/cql/types.adoc
index 17c78b5e79..c17dc38674 100644
--- a/doc/modules/cassandra/pages/cql/types.adoc
+++ b/doc/modules/cassandra/pages/cql/types.adoc
@@ -222,7 +222,7 @@ collections have the following noteworthy characteristics 
and
 limitations:
 
 * Individual collections are not indexed internally. Which means that
-even to access a single element of a collection, the while collection
+even to access a single element of a collection, the whole collection
 has to be read (and reading one is not paged internally).
 * While insertion operations on sets and maps never incur a
 read-before-write internally, some operations on lists do. Further, some
@@ -265,7 +265,7 @@ Note that for removing multiple elements in a `map`, you 
remove from it
 a `set` of keys.
 
 Lastly, TTLs are allowed for both `INSERT` and `UPDATE`, but in both
-case the TTL set only apply to the newly inserted/updated elements. In
+cases the TTL set only apply to the newly inserted/updated elements. In
 other words:
 
 [source,cql]
@@ -279,7 +279,7 @@ of the map remaining unaffected.
 === Sets
 
 A `set` is a (sorted) collection of unique values. You can define and
-insert a map with:
+insert a set with:
 
 [source,cql]
 ----
@@ -317,7 +317,7 @@ xref:cql/types.adoc#sets[set] instead of list, always 
prefer a set.
 ====
 
 A `list` is a (sorted) collection of non-unique values where
-elements are ordered by there position in the list. You can define and
+elements are ordered by their position in the list. You can define and
 insert a list with:
 
 [source,cql]
@@ -338,13 +338,13 @@ include::example$CQL/update_list.cql[]
 .Warning
 ====
 The append and prepend operations are not idempotent by nature. So in
-particular, if one of these operation timeout, then retrying the
+particular, if one of these operations times out, then retrying the
 operation is not safe and it may (or may not) lead to
 appending/prepending the value twice.
 ====
 
 * Setting the value at a particular position in a list that has a pre-existing 
element for that position. An error
-will be thrown if the list does not have the position.:
+will be thrown if the list does not have the position:
 +
 [source,cql]
 ----
@@ -423,12 +423,11 @@ and can only be used in that keyspace. At creation, if 
the type name is
 prefixed by a keyspace name, it is created in that keyspace. Otherwise,
 it is created in the current keyspace.
 * As of Cassandra , UDT have to be frozen in most cases, hence the
-`frozen<address>` in the table definition above. Please see the section
-on xref:cql/types.adoc#frozen[frozen] for more details.
+`frozen<address>` in the table definition above.
 
 === UDT literals
 
-Once a used-defined type has been created, value can be input using a
+Once a user-defined type has been created, value can be input using a
 UDT literal:
 
 [source,bnf]
diff --git a/doc/modules/cassandra/pages/data_modeling/data_modeling_rdbms.adoc 
b/doc/modules/cassandra/pages/data_modeling/data_modeling_rdbms.adoc
index b478df14a1..2acd6cc2bf 100644
--- a/doc/modules/cassandra/pages/data_modeling/data_modeling_rdbms.adoc
+++ b/doc/modules/cassandra/pages/data_modeling/data_modeling_rdbms.adoc
@@ -17,7 +17,7 @@ image::data_modeling_hotel_relational.png[image]
 == Design Differences Between RDBMS and Cassandra
 
 Let’s take a minute to highlight some of the key differences in doing
-ata modeling for Cassandra versus a relational database.
+data modeling for Cassandra versus a relational database.
 
 === No joins
 
diff --git 
a/doc/modules/cassandra/pages/data_modeling/data_modeling_schema.adoc 
b/doc/modules/cassandra/pages/data_modeling/data_modeling_schema.adoc
index 7b0cf5cd35..04a0434a54 100644
--- a/doc/modules/cassandra/pages/data_modeling/data_modeling_schema.adoc
+++ b/doc/modules/cassandra/pages/data_modeling/data_modeling_schema.adoc
@@ -32,7 +32,7 @@ CREATE TABLE hotel.hotels (
   name text,
   phone text,
   address frozen<address>,
-  pois set )
+  pois set<text> )
   WITH comment = ‘Q2. Find information about a hotel’;
 
 CREATE TABLE hotel.pois_by_hotel (
diff --git a/doc/modules/cassandra/pages/data_modeling/data_modeling_tools.adoc 
b/doc/modules/cassandra/pages/data_modeling/data_modeling_tools.adoc
index 0f3556f5b5..608caaaff2 100644
--- a/doc/modules/cassandra/pages/data_modeling/data_modeling_tools.adoc
+++ b/doc/modules/cassandra/pages/data_modeling/data_modeling_tools.adoc
@@ -35,7 +35,7 @@ management and query execution.
 Some IDEs and tools that claim to support Cassandra do not actually
 support CQL natively, but instead access Cassandra using a JDBC/ODBC
 driver and interact with Cassandra as if it were a relational database
-with SQL support. Wnen selecting tools for working with Cassandra you’ll
+with SQL support. When selecting tools for working with Cassandra you’ll
 want to make sure they support CQL and reinforce Cassandra best
 practices for data modeling as presented in this documentation.
 
diff --git a/doc/modules/cassandra/pages/faq/index.adoc 
b/doc/modules/cassandra/pages/faq/index.adoc
index df74db96d4..a745e138fc 100644
--- a/doc/modules/cassandra/pages/faq/index.adoc
+++ b/doc/modules/cassandra/pages/faq/index.adoc
@@ -79,7 +79,10 @@ intensive process that may result in adverse cluster 
performance. It's
 highly recommended to do rolling repairs, as an attempt to repair the
 entire cluster at once will most likely swamp it. Note that you will
 need to run a full repair (`-full`) to make sure that already repaired
-sstables are not skipped.
+sstables are not skipped. You should use `ConsistencyLevel.QUORUM` or
+`ALL` (depending on your existing replication factor) to make sure that
+a replica that actually has the data is consulted. Otherwise some
+clients potentially being told no data exists until repair is done.
 
 [[can-large-blob]]
 == Can I Store (large) BLOBs in Cassandra?
diff --git a/doc/modules/cassandra/pages/getting_started/drivers.adoc 
b/doc/modules/cassandra/pages/getting_started/drivers.adoc
index eb15a55830..3deb613038 100644
--- a/doc/modules/cassandra/pages/getting_started/drivers.adoc
+++ b/doc/modules/cassandra/pages/getting_started/drivers.adoc
@@ -10,7 +10,7 @@ functionality supported by a specific driver.
 * https://github.com/Netflix/astyanax/wiki/Getting-Started[Astyanax]
 * https://github.com/noorq/casser[Casser]
 * https://github.com/datastax/java-driver[Datastax Java driver]
-* https://github.com/impetus-opensource/Kundera[Kundera]
+* https://github.com/Impetus/kundera[Kundera]
 * https://github.com/deanhiller/playorm[PlayORM]
 
 == Python
diff --git a/doc/modules/cassandra/pages/getting_started/production.adoc 
b/doc/modules/cassandra/pages/getting_started/production.adoc
index de7fb54234..ad28a35f56 100644
--- a/doc/modules/cassandra/pages/getting_started/production.adoc
+++ b/doc/modules/cassandra/pages/getting_started/production.adoc
@@ -51,7 +51,7 @@ appropriate number of replicates, to ensure even token 
allocation.
 Read ahead is an operating system feature that attempts to keep as much
 data as possible loaded in the page cache.
 Spinning disks can have long seek times causing high latency, so additional
-throughout on reads using page cache can improve performance.
+throughput on reads using page cache can improve performance.
 By leveraging read ahead, the OS can pull additional data into memory without
 the cost of additional seeks.
 This method works well when the available RAM is greater than the size of the
@@ -80,7 +80,7 @@ The recommended read ahead settings are:
 
 Read ahead can be adjusted on Linux systems using the `blockdev` tool.
 
-For example, set the read ahead of the disk `/dev/sda1\` to 4KB:
+For example, set the read ahead of the disk `/dev/sda1` to 4KB:
 
 [source, shell]
 ----
@@ -100,7 +100,7 @@ section.
 
 == Compression
 
-Compressed data is stored by compressing fixed size byte buffers and writing 
the
+Compressed data is stored by compressing fixed-size byte buffers and writing 
the
 data to disk.
 The buffer size is determined by the `chunk_length_in_kb` element in the 
compression
 map of a table's schema settings for `WITH COMPRESSION`.
@@ -158,6 +158,6 @@ of the ability to configure multiple racks and data centers.
 **Correctly configuring or changing racks after a cluster has been provisioned 
is an unsupported process**.
 Migrating from a single rack to multiple racks is also unsupported and can
 result in data loss.
-Using `GossipingPropertyFileSnitch` is the most flexible solution for on
-premise or mixed cloud environments.
+Using `GossipingPropertyFileSnitch` is the most flexible solution for
+on-premise or mixed cloud environments.
 `Ec2Snitch` is reliable for AWS EC2 only environments.
diff --git a/doc/modules/cassandra/pages/operating/auditlogging.adoc 
b/doc/modules/cassandra/pages/operating/auditlogging.adoc
index a47992192f..c83c5aaead 100644
--- a/doc/modules/cassandra/pages/operating/auditlogging.adoc
+++ b/doc/modules/cassandra/pages/operating/auditlogging.adoc
@@ -12,7 +12,7 @@ Some of the features of audit logging are:
 * Latency of database operations is not affected, so there is no performance 
impact.
 * Heap memory usage is bounded by a weighted queue, with configurable maximum 
weight sitting in front of logging thread.
 * Disk utilization is bounded by a configurable size, deleting old log 
segments once the limit is reached.
-* Can be enabled, disabled, or reset (to delete on-disk data) using the JMX 
tool, ``nodetool``.
+* Can be enabled or disabled at startup time using `cassandra.yaml` or at 
runtime using the JMX tool, ``nodetool``.
 * Can configure the settings in either the `cassandra.yaml` file or by using 
``nodetool``.
 
 Audit logging includes all CQL requests, both successful and failed. 
@@ -88,10 +88,24 @@ Common audit log entry types are one of the following:
 | ERROR | REQUEST_FAILURE
 |===
 
+== Availability and durability
+
+NOTE: Unlike data, audit log entries are not replicated
+
+For a given query, the corresponding audit entry is only stored on the 
coordinator node.
+For example, an ``INSERT`` in a keyspace with replication factor of 3 will 
produce an audit entry on one node, the coordinator who handled the request, 
and not on the two other nodes.
+For this reason, and depending on compliance requirements you must meet,
+make sure that audit logs are stored on a non-ephemeral storage.
+
+You can achieve custom needs with the <<archive_command>> option.
+
 == Configuring audit logging in cassandra.yaml
 
 The `cassandra.yaml` file can be used to configure and enable audit logging.
 Configuration and enablement may be the same or different on each node, 
depending on the `cassandra.yaml` file settings.
+
+Audit logging can also be configured using ``nodetool`` when enabling the 
feature, and will override any values set in the `cassandra.yaml` file, as 
discussed in <<enabling_audit_with_nodetool, Enabling Audit Logging with 
nodetool>>.
+
 Audit logs are generated on each enabled node, so logs on each node will have 
that node's queries.
 All options for audit logging can be set in the `cassandra.yaml` file under 
the ``audit_logging_options:``.
 
@@ -123,16 +137,25 @@ audit_logging_options:
 
 === enabled
 
-Audit logging is enabled by setting the `enabled` option to `true` in
-the `audit_logging_options` setting. 
+Control whether audit logging is enabled or disabled (default).
+
+To enable audit logging set ``enabled: true``.
+
 If this option is enabled, audit logging will start when Cassandra is started.
-For example, ``enabled: true``.
+It can be disabled afterwards at runtime with <<enabling_audit_with_nodetool, 
nodetool>>.
+
+TIP: You can monitor whether audit logging is enabled with ``AuditLogEnabled`` 
attribute of the JMX MBean ``org.apache.cassandra.db:type=StorageService``.
 
 === logger
 
 The type of audit logger is set with the `logger` option. 
-Supported values are: `BinAuditLogger` (default), `FileAuditLogger` and 
`NoOpAuditLogger`.
-`BinAuditLogger` logs events to a file in binary format. 
+Supported values are:
+
+- `BinAuditLogger` (default)
+- `FileAuditLogger`
+- `NoOpAuditLogger`
+
+`BinAuditLogger` logs events to a file in binary format.
 `FileAuditLogger` uses the standard logging mechanism, `slf4j` to log events 
to the `audit/audit.log` file. It is a synchronous, file-based audit logger. 
The roll_cycle will be set in the `logback.xml` file.
 `NoOpAuditLogger` is a no-op implementation of the audit logger that shoudl be 
specified when audit logging is disabled.
 
@@ -144,6 +167,8 @@ logger:
   - class_name: FileAuditLogger
 ----
 
+TIP:  `BinAuditLogger` make use of open source 
https://github.com/OpenHFT/Chronicle-Queue[Chronicle Queue] under the hood. If 
you consider using audit logging for regulatory compliance purpose, it might be 
wise to be somewhat familiar with this library. See <<archive_command>> and 
<<roll_cycle>> for an example of the implications.
+
 === audit_logs_dir
 
 To write audit logs, an existing directory must be set in ``audit_logs_dir``.
@@ -151,7 +176,7 @@ To write audit logs, an existing directory must be set in 
``audit_logs_dir``.
 The directory must have appropriate permissions set to allow reading, writing, 
and executing.
 Logging will recursively delete the directory contents as needed.
 Do not place links in this directory to other sections of the filesystem.
-For example, ``audit_logs_dir: /cassandra/audit/logs/hourly``.
+For example, ``audit_logs_dir: /non_ephemeral_storage/audit/logs/hourly``.
 
 The audit log directory can also be configured using the system property 
`cassandra.logdir.audit`, which by default is set to `cassandra.logdir + 
/audit/`.
 
@@ -173,7 +198,7 @@ excluded_keyspaces: system, system_schema, 
system_virtual_schema
 The categories of database operations to include are specified with the 
`included_categories` option as a comma-separated list. 
 The categories of database operations to exclude are specified with 
`excluded_categories` option as a comma-separated list. 
 The supported categories for audit log are: `AUTH`, `DCL`, `DDL`, `DML`, 
`ERROR`, `OTHER`, `PREPARE`, and `QUERY`.
-By default all supported categories are included, and no category is excluded. 
+By default, all supported categories are included, and no category is excluded.
 
 [source, yaml]
 ----
@@ -186,7 +211,7 @@ excluded_categories: DDL, DML, QUERY, PREPARE
 Users to audit log are set with the `included_users` and `excluded_users` 
options. 
 The `included_users` option specifies a comma-separated list of users to 
include explicitly.
 The `excluded_users` option specifies a comma-separated list of users to 
exclude explicitly.
-By default all users are included, and no users are excluded. 
+By default, all users are included, and no users are excluded.
 
 [source, yaml]
 ----
@@ -194,20 +219,55 @@ included_users:
 excluded_users: john, mary
 ----
 
+[[roll_cycle]]
 === roll_cycle
 
 The ``roll_cycle`` defines the frequency with which the audit log segments are 
rolled.
-Supported values are ``HOURLY`` (default), ``MINUTELY``, and ``DAILY``.
+Supported values are:
+
+- ``MINUTELY``
+- ``FIVE_MINUTELY``
+- ``TEN_MINUTELY``
+- ``TWENTY_MINUTELY``
+- ``HALF_HOURLY``
+- ``HOURLY`` (default)
+- ``TWO_HOURLY``
+- ``FOUR_HOURLY``
+- ``SIX_HOURLY``
+- ``DAILY``
+
 For example: ``roll_cycle: DAILY``
 
+WARNING: Read the following paragraph when changing ``roll_cycle`` on a 
production node.
+
+With the `BinLogger` implementation, any attempt to modify the roll cycle on a 
node where audit logging was previously enabled will fail silentely due to 
https://github.com/OpenHFT/Chronicle-Queue[Chronicle Queue] roll cycle 
inference mechanism (even if you delete the ``metadata.cq4t`` file).
+
+Here is an example of such an override visible in Cassandra logs:
+----
+INFO  [main] <DATE TIME> BinLog.java:420 - Attempting to configure bin log: 
Path: /path/to/audit Roll cycle: TWO_HOURLY [...]
+WARN  [main] <DATE TIME> SingleChronicleQueueBuilder.java:477 - Overriding 
roll cycle from TWO_HOURLY to FIVE_MINUTE
+----
+
+In order to change ``roll_cycle`` on a node, you have to:
+
+1. Stop Cassandra
+2. Move or offload all audit logs somewhere else (in a safe and durable 
location)
+3. Restart Cassandra.
+4. Check Cassandra logs
+5. Make sure that audit log filenames under ``audit_logs_dir`` correspond to 
the new roll cycle.
+
 === block
 
 The ``block`` option specifies whether audit logging should block writing or 
drop log records if the audit logging falls behind. Supported boolean values 
are ``true`` (default) or ``false``.
-For example: ``block: false`` to drop records
+
+For example: ``block: false`` to drop records (e.g. if audit is used for 
troobleshooting)
+
+For regulatory compliance purposes, it's a good practice to explicitly set 
``block: true`` to prevent any regression in case of future default value 
change.
 
 === max_queue_weight
 
 The ``max_queue_weight`` option sets the maximum weight of in-memory queue for 
records waiting to be written to the file before blocking or dropping.  The 
option must be set to a positive value. The default value is 268435456, or 256 
MiB.
+
 For example, to change the default: ``max_queue_weight: 134217728 # 128 MiB``
 
 === max_log_size
@@ -215,23 +275,37 @@ For example, to change the default: ``max_queue_weight: 
134217728 # 128 MiB``
 The ``max_log_size`` option sets the maximum size of the rolled files to 
retain on disk before deleting the oldest file.  The option must be set to a 
positive value. The default is 17179869184, or 16 GiB.
 For example, to change the default: ``max_log_size: 34359738368 # 32 GiB``
 
+WARNING: ``max_log_size`` is ignored if ``archive_command`` option is set.
+
+[[archive_command]]
 === archive_command
 
+NOTE: If ``archive_command`` option is empty or unset (default), Cassandra 
uses a built-in DeletingArchiver that deletes the oldest files if 
``max_log_size`` is reached.
+
 The ``archive_command`` option sets the user-defined archive script to execute 
on rolled log files.
-For example: ``archive_command: /usr/local/bin/archiveit.sh %path # %path is 
the file being rolled``
+For example: ``archive_command: "/usr/local/bin/archiveit.sh %path"``
 
-=== max_archive_retries
+``%path`` is replaced with the absolute file path of the file being rolled.
 
-The ``max_archive_retries`` option sets the max number of retries of failed 
archive commands. The default is 10.
-For example: ``max_archive_retries: 10``
+When using a user-defined script, Cassandra does **not** use the 
DeletingArchiver, so it's the responsibility of the script to make any required 
cleanup.
 
+Cassandra will call the user-defined script as soon as the log file is rolled. 
It means that Chronicle Queue's QueueFileShrinkManager will not be able to 
shrink the sparse log file because it's done asynchronously. In other words, 
all log files will have at least the size of the default block size (80 MiB), 
even if there are only a few KB of real data. Consequently, some warnings will 
appear in Cassandra system.log:
 
-An audit log file could get rolled for other reasons as well such as a
-log file reaches the configured size threshold.
+----
+WARN  [main/queue~file~shrink~daemon] <DATE TIME> 
QueueFileShrinkManager.java:63 - Failed to shrink file as it exists no longer, 
file=/path/to/xxx.cq4
+----
 
-Audit logging can also be configured using ``nodetool` when enabling the 
feature, and will override any values set in the `cassandra.yaml` file, as 
discussed in the next section.
+TIP: Because Cassandra does not make use of Pretoucher, you can configure 
Chronicle Queue to shrink files synchronously -- i.e. as soon as the file is 
rolled -- with ``chronicle.queue.synchronousFileShrinking`` JVM properties. For 
instance, you can add the following line at the end of ``cassandra-env.sh``: 
``JVM_OPTS="$JVM_OPTS -Dchronicle.queue.synchronousFileShrinking=true"``
+
+=== max_archive_retries
+
+The ``max_archive_retries`` option sets the max number of retries of failed 
archive commands. The default is 10.
+
+For example: ``max_archive_retries: 10``
 
+Interval between each retry is hard coded to 5 minutes.
 
+[[enabling_audit_with_nodetool]]
 == Enabling Audit Logging with ``nodetool``
  
 Audit logging is enabled on a per-node basis using the ``nodetool 
enableauditlog`` command. The logging directory must be defined with 
``audit_logs_dir`` in the `cassandra.yaml` file or uses the default value 
``cassandra.logdir.audit``.
diff --git a/doc/modules/cassandra/pages/operating/backups.adoc 
b/doc/modules/cassandra/pages/operating/backups.adoc
index a083d5b58f..78f9186d8f 100644
--- a/doc/modules/cassandra/pages/operating/backups.adoc
+++ b/doc/modules/cassandra/pages/operating/backups.adoc
@@ -508,7 +508,7 @@ The two main tools/commands for restoring a table after it 
has been
 dropped are:
 
 * sstableloader
-* nodetool import
+* nodetool refresh
 
 A snapshot contains essentially the same set of SSTable files as an
 incremental backup does with a few additional files. A snapshot includes
diff --git a/doc/modules/cassandra/pages/tools/cqlsh.adoc 
b/doc/modules/cassandra/pages/tools/cqlsh.adoc
index 0d40608c2c..afafa1d875 100644
--- a/doc/modules/cassandra/pages/tools/cqlsh.adoc
+++ b/doc/modules/cassandra/pages/tools/cqlsh.adoc
@@ -38,7 +38,7 @@ modules that are central to the performance of `COPY`.
 == cqlshrc
 
 The `cqlshrc` file holds configuration options for `cqlsh`. 
-By default, the file is locagted the user's home directory at 
`~/.cassandra/cqlsh`, but a
+By default, the file is located the user's home directory at 
`~/.cassandra/cqlshrc`, but a
 custom location can be specified with the `--cqlshrc` option.
 
 Example config values and documentation can be found in the
@@ -452,7 +452,7 @@ representing a path to the source file. This can also the 
special value
 See `shared-copy-options` for options that apply to both `COPY TO` and
 `COPY FROM`.
 
-==== Options for `COPY TO`
+==== Options for `COPY FROM`
 
 `INGESTRATE`::
   The maximum number of rows to process per second. Defaults to 100000.
@@ -477,10 +477,10 @@ See `shared-copy-options` for options that apply to both 
`COPY TO` and
 `MAXBATCHSIZE`::
   The max number of rows inserted in a single batch. Defaults to 20.
 `MINBATCHSIZE`::
-  The min number of rows inserted in a single batch. Defaults to 2.
+  The min number of rows inserted in a single batch. Defaults to 10.
 `CHUNKSIZE`::
   The number of rows that are passed to child worker processes from the
-  main process at a time. Defaults to 1000.
+  main process at a time. Defaults to 5000.
 
 ==== Shared COPY Options
 
@@ -504,8 +504,8 @@ Options that are common to both `COPY TO` and `COPY FROM`.
   `True,False`.
 `NUMPROCESSES`::
   The number of child worker processes to create for `COPY` tasks.
-  Defaults to a max of 4 for `COPY FROM` and 16 for `COPY TO`. However,
-  at most (num_cores - 1) processes will be created.
+  Defaults to 16 for `COPY` tasks. However, at most (num_cores - 1)
+  processes will be created.
 `MAXATTEMPTS`::
   The maximum number of failed attempts to fetch a range of data (when
   using `COPY TO`) or insert a chunk of data (when using `COPY FROM`)
@@ -515,3 +515,28 @@ Options that are common to both `COPY TO` and `COPY FROM`.
 `RATEFILE`::
   An optional file to output rate statistics to. By default, statistics
   are not output to a file.
+
+== Escaping Quotes
+
+Dates, IP addresses, and strings need to be enclosed in single quotation 
marks. To use a single quotation mark itself in a string literal, escape it 
using a single quotation mark.
+
+When fetching simple text data, `cqlsh` will return an unquoted string. 
However, when fetching text data from complex types (collections, user-defined 
types, etc.) `cqlsh` will return a quoted string containing the escaped 
characters. For example:
+
+Simple data
+[source,none]
+----
+cqlsh> CREATE TABLE test.simple_data (id int, data text, PRIMARY KEY (id));
+cqlsh> INSERT INTO test.simple_data (id, data) values(1, 'I''m fine');
+cqlsh> SELECT data from test.simple_data; data
+----------
+ I'm fine
+----
+Complex data
+[source,none]
+----
+cqlsh> CREATE TABLE test.complex_data (id int, data map<int, text>, PRIMARY 
KEY (id));
+cqlsh> INSERT INTO test.complex_data (id, data) values(1, {1:'I''m fine'});
+cqlsh> SELECT data from test.complex_data; data
+------------------
+ {1: 'I''m fine'}
+----
diff --git a/doc/modules/cassandra/pages/tools/sstable/sstablelevelreset.adoc 
b/doc/modules/cassandra/pages/tools/sstable/sstablelevelreset.adoc
index 65dc02e25c..69bbf4240b 100644
--- a/doc/modules/cassandra/pages/tools/sstable/sstablelevelreset.adoc
+++ b/doc/modules/cassandra/pages/tools/sstable/sstablelevelreset.adoc
@@ -6,7 +6,7 @@ for example, change the minimum sstable size, and therefore 
restart the
 compaction process using this new configuration.
 
 See
-http://cassandra.apache.org/doc/latest/operating/compaction.html#leveled-compaction-strategy
+https://cassandra.apache.org/doc/latest/operating/compaction/lcs.html#lcs
 for information on how levels are used in this compaction strategy.
 
 Cassandra must be stopped before this tool is executed, or unexpected
diff --git a/src/java/org/apache/cassandra/db/tries/InMemoryTrie.md 
b/src/java/org/apache/cassandra/db/tries/InMemoryTrie.md
index 010003a679..09c1408731 100644
--- a/src/java/org/apache/cassandra/db/tries/InMemoryTrie.md
+++ b/src/java/org/apache/cassandra/db/tries/InMemoryTrie.md
@@ -85,7 +85,7 @@ Example: -1 is a leaf cell with content `contentArray[0]`.
 
 Chain nodes are one-child nodes. Multiple chain nodes, forming a chain of 
transitions to one target, can reside in a
 single cell. Chain nodes are identified by the lowest 5 bits of a pointer 
being between `0x00` and `0x1B`. In addition
-to the the type of node, in this case the bits also define the length of the 
chain &mdash; the difference between
+to the type of node, in this case the bits also define the length of the chain 
&mdash; the difference between
 `0x1C` and the pointer offset specifies the number of characters in the chain.
 
 The simplest chain node has one transition leading to one child and is laid 
out like this:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[cassandra] branch trunk updated: Improve and clean up documentation and fix typos

Reply via email to