[hbase] 02/02: HBASE-21972 Copy master doc into branch-2.2 and edit to make it suit 2.2.0

zghao Mon, 04 Mar 2019 22:36:35 -0800

This is an automated email from the ASF dual-hosted git repository.

zghao pushed a commit to branch branch-2.2
in repository https://gitbox.apache.org/repos/asf/hbase.git


commit 743f6801d7c826994d93f1b93acac005980a8fa2
Author: Guanghao Zhang <[email protected]>
AuthorDate: Tue Mar 5 13:43:29 2019 +0800

    HBASE-21972 Copy master doc into branch-2.2 and edit to make it suit 2.2.0
---
 .../asciidoc/_chapters/appendix_hfile_format.adoc  |  32 +-
 src/main/asciidoc/_chapters/architecture.adoc      | 258 ++++----
 src/main/asciidoc/_chapters/community.adoc         |   3 +
 src/main/asciidoc/_chapters/configuration.adoc     |  89 +--
 src/main/asciidoc/_chapters/cp.adoc                |   9 +-
 src/main/asciidoc/_chapters/developer.adoc         | 162 +++--
 src/main/asciidoc/_chapters/ops_mgt.adoc           | 276 +++++++-
 src/main/asciidoc/_chapters/preface.adoc           |   2 +-
 src/main/asciidoc/_chapters/security.adoc          |   7 +-
 src/main/asciidoc/_chapters/spark.adoc             | 699 +++++++++++++++++++++
 src/main/asciidoc/_chapters/sync_replication.adoc  | 125 ++++
 src/main/asciidoc/_chapters/troubleshooting.adoc   |   4 +-
 src/main/asciidoc/_chapters/upgrading.adoc         |  37 +-
 src/main/asciidoc/book.adoc                        |   3 +-
 14 files changed, 1447 insertions(+), 259 deletions(-)

diff --git a/src/main/asciidoc/_chapters/appendix_hfile_format.adoc 
b/src/main/asciidoc/_chapters/appendix_hfile_format.adoc
index 0f37beb..98659c2 100644
--- a/src/main/asciidoc/_chapters/appendix_hfile_format.adoc
+++ b/src/main/asciidoc/_chapters/appendix_hfile_format.adoc
@@ -106,11 +106,11 @@ In the version 2 every block in the data section contains 
the following fields:
 .. BLOOM_CHUNK – Bloom filter chunks
 .. META – meta blocks (not used for Bloom filters in version 2 anymore)
 .. INTERMEDIATE_INDEX – intermediate-level index blocks in a multi-level 
blockindex
-.. ROOT_INDEX – root>level index blocks in a multi>level block index
-.. FILE_INFO – the ``file info'' block, a small key>value map of metadata
-.. BLOOM_META – a Bloom filter metadata block in the load>on>open section
-.. TRAILER – a fixed>size file trailer.
-  As opposed to the above, this is not an HFile v2 block but a fixed>size (for 
each HFile version) data structure
+.. ROOT_INDEX – root-level index blocks in a multi-level block index
+.. FILE_INFO – the ''file info'' block, a small key-value map of metadata
+.. BLOOM_META – a Bloom filter metadata block in the load-on-open section
+.. TRAILER – a fixed-size file trailer.
+  As opposed to the above, this is not an HFile v2 block but a fixed-size (for 
each HFile version) data structure
 .. INDEX_V1 – this block type is only used for legacy HFile v1 block
 . Compressed size of the block's data, not including the header (int).
 +
@@ -127,7 +127,7 @@ The above format of blocks is used in the following HFile 
sections:
 
 Scanned block section::
   The section is named so because it contains all data blocks that need to be 
read when an HFile is scanned sequentially.
-  Also contains leaf block index and Bloom chunk blocks.
+  Also contains Leaf index blocks and Bloom chunk blocks.
 Non-scanned block section::
   This section still contains unified-format v2 blocks but it does not have to 
be read when doing a sequential scan.
   This section contains "meta" blocks and intermediate-level index blocks.
@@ -140,10 +140,10 @@ There are three types of block indexes in HFile version 
2, stored in two differe
 
 . Data index -- version 2 multi-level block index, consisting of:
 .. Version 2 root index, stored in the data block index section of the file
-.. Optionally, version 2 intermediate levels, stored in the non%root format in 
  the data index section of the file. Intermediate levels can only be present 
if leaf level blocks are present
-.. Optionally, version 2 leaf levels, stored in the non%root format inline 
with   data blocks
+.. Optionally, version 2 intermediate levels, stored in the non-root format in 
  the data index section of the file. Intermediate levels can only be present 
if leaf level blocks are present
+.. Optionally, version 2 leaf levels, stored in the non-root format inline 
with   data blocks
 . Meta index -- version 2 root index format only, stored in the meta index 
section of the file
-. Bloom index -- version 2 root index format only, stored in the 
``load-on-open'' section as part of Bloom filter metadata.
+. Bloom index -- version 2 root index format only, stored in the 
''load-on-open'' section as part of Bloom filter metadata.
 
 ==== Root block index format in version 2
 
@@ -156,7 +156,7 @@ A version 2 root index block is a sequence of entries of 
the following format, s
 
 . Offset (long)
 +
-This offset may point to a data block or to a deeper>level index block.
+This offset may point to a data block or to a deeper-level index block.
 
 . On-disk size (int)
 . Key (a serialized byte array stored using Bytes.writeByteArray)
@@ -172,7 +172,7 @@ For the data index and the meta index the number of entries 
is stored in the tra
 For a multi-level block index we also store the following fields in the root 
index block in the load-on-open section of the HFile, in addition to the data 
structure described above:
 
 . Middle leaf index block offset
-. Middle leaf block on-disk size (meaning the leaf index block containing the 
reference to the ``middle'' data block of the file)
+. Middle leaf block on-disk size (meaning the leaf index block containing the 
reference to the ''middle'' data block of the file)
 . The index of the mid-key (defined below) in the middle leaf-level block.
 
 
@@ -200,9 +200,9 @@ Every non-root index block is structured as follows.
 . Entries.
   Each entry contains:
 +
-. Offset of the block referenced by this entry in the file (long)
-. On>disk size of the referenced block (int)
-. Key.
+.. Offset of the block referenced by this entry in the file (long)
+.. On-disk size of the referenced block (int)
+.. Key.
   The length can be calculated from entryOffsets.
 
 
@@ -214,7 +214,7 @@ In contrast with version 1, in a version 2 HFile Bloom 
filter metadata is stored
 +
 . Bloom filter version = 3 (int). There used to be a DynamicByteBloomFilter 
class that had the Bloom   filter version number 2
 . The total byte size of all compound Bloom filter chunks (long)
-. Number of hash functions (int
+. Number of hash functions (int)
 . Type of hash functions (int)
 . The total key count inserted into the Bloom filter (long)
 . The maximum total number of keys in the Bloom filter (long)
@@ -246,7 +246,7 @@ This is because we need to know the comparator at the time 
of parsing the load-o
 ==== Fixed file trailer format differences between versions 1 and 2
 
 The following table shows common and different fields between fixed file 
trailers in versions 1 and 2.
-Note that the size of the trailer is different depending on the version, so it 
is ``fixed'' only within one version.
+Note that the size of the trailer is different depending on the version, so it 
is ''fixed'' only within one version.
 However, the version is always stored as the last four-byte integer in the 
file.
 
 .Differences between HFile Versions 1 and 2
diff --git a/src/main/asciidoc/_chapters/architecture.adoc 
b/src/main/asciidoc/_chapters/architecture.adoc
index 26ad0f9..7680948 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -594,6 +594,80 @@ See <<regions.arch.assignment>> for more information on 
region assignment.
 Periodically checks and cleans up the `hbase:meta` table.
 See <<arch.catalog.meta>> for more information on the meta table.
 
+[[master.wal]]
+=== MasterProcWAL
+
+HMaster records administrative operations and their running states, such as 
the handling of a crashed server,
+table creation, and other DDLs, into its own WAL file. The WALs are stored 
under the MasterProcWALs
+directory. The Master WALs are not like RegionServer WALs. Keeping up the 
Master WAL allows
+us run a state machine that is resilient across Master failures. For example, 
if a HMaster was in the
+middle of creating a table encounters an issue and fails, the next active 
HMaster can take up where
+the previous left off and carry the operation to completion. Since 
hbase-2.0.0, a
+new AssignmentManager (A.K.A AMv2) was introduced and the HMaster handles 
region assignment
+operations, server crash processing, balancing, etc., all via AMv2 persisting 
all state and
+transitions into MasterProcWALs rather than up into ZooKeeper, as we do in 
hbase-1.x.
+
+See <<amv2>> (and <<pv2>> for its basis) if you would like to learn more about 
the new
+AssignmentManager.
+
+[[master.wal.conf]]
+==== Configurations for MasterProcWAL
+Here are the list of configurations that effect MasterProcWAL operation.
+You should not have to change your defaults.
+
+[[hbase.procedure.store.wal.periodic.roll.msec]]
+*`hbase.procedure.store.wal.periodic.roll.msec`*::
++
+.Description
+Frequency of generating a new WAL
++
+.Default
+`1h (3600000 in msec)`
+
+[[hbase.procedure.store.wal.roll.threshold]]
+*`hbase.procedure.store.wal.roll.threshold`*::
++
+.Description
+Threshold in size before the WAL rolls. Every time the WAL reaches this size 
or the above period, 1 hour, passes since last log roll, the HMaster will 
generate a new WAL.
++
+.Default
+`32MB (33554432 in byte)`
+
+[[hbase.procedure.store.wal.warn.threshold]]
+*`hbase.procedure.store.wal.warn.threshold`*::
++
+.Description
+If the number of WALs goes beyond this threshold, the following message should 
appear in the HMaster log with WARN level when rolling.
+
+ procedure WALs count=xx above the warning threshold 64. check running 
procedures to see if something is stuck.
+
++
+.Default
+`64`
+
+[[hbase.procedure.store.wal.max.retries.before.roll]]
+*`hbase.procedure.store.wal.max.retries.before.roll`*::
++
+.Description
+Max number of retry when syncing slots (records) to its underlying storage, 
such as HDFS. Every attempt, the following message should appear in the HMaster 
log.
+
+ unable to sync slots, retry=xx
+
++
+.Default
+`3`
+
+[[hbase.procedure.store.wal.sync.failure.roll.max]]
+*`hbase.procedure.store.wal.sync.failure.roll.max`*::
++
+.Description
+After the above 3 retrials, the log is rolled and the retry count is reset to 
0, thereon a new set of retrial starts. This configuration controls the max 
number of attempts of log rolling upon sync failure. That is, HMaster is 
allowed to fail to sync 9 times in total. Once it exceeds, the following log 
should appear in the HMaster log.
+
+ Sync slots after log roll failed, abort.
++
+.Default
+`3`
+
 [[regionserver.arch]]
 == RegionServer
 
@@ -876,14 +950,14 @@ In the above, we set the BucketCache to be 4G.
 We configured the on-heap LruBlockCache have 20% (0.2) of the RegionServer's 
heap size (0.2 * 5G = 1G). In other words, you configure the L1 LruBlockCache 
as you would normally (as if there were no L2 cache present).
 
 link:https://issues.apache.org/jira/browse/HBASE-10641[HBASE-10641] introduced 
the ability to configure multiple sizes for the buckets of the BucketCache, in 
HBase 0.98 and newer.
-To configurable multiple bucket sizes, configure the new property 
`hfile.block.cache.sizes` (instead of `hfile.block.cache.size`) to a 
comma-separated list of block sizes, ordered from smallest to largest, with no 
spaces.
+To configurable multiple bucket sizes, configure the new property 
`hbase.bucketcache.bucket.sizes` to a comma-separated list of block sizes, 
ordered from smallest to largest, with no spaces.
 The goal is to optimize the bucket sizes based on your data access patterns.
 The following example configures buckets of size 4096 and 8192.
 
 [source,xml]
 ----
 <property>
-  <name>hfile.block.cache.sizes</name>
+  <name>hbase.bucketcache.bucket.sizes</name>
   <value>4096,8192</value>
 </property>
 ----
@@ -1175,127 +1249,40 @@ WAL log splitting and recovery can be resource 
intensive and take a long time, d
 Distributed log processing is enabled by default since HBase 0.92.
 The setting is controlled by the `hbase.master.distributed.log.splitting` 
property, which can be set to `true` or `false`, but defaults to `true`.
 
-[[log.splitting.step.by.step]]
-.Distributed Log Splitting, Step by Step
+==== WAL splitting based on procedureV2
+After HBASE-20610, we introduce a new way to do WAL splitting coordination by 
procedureV2 framework. This can simplify the process of WAL splitting and no 
need to connect zookeeper any more.
 
-After configuring distributed log splitting, the HMaster controls the process.
-The HMaster enrolls each RegionServer in the log splitting process, and the 
actual work of splitting the logs is done by the RegionServers.
-The general process for log splitting, as described in 
<<log.splitting.step.by.step>> still applies here.
+[[background]]
+.Background
+Currently, splitting WAL processes are coordinated by zookeeper. Each region 
server are trying to grab tasks from zookeeper. And the burden becomes heavier 
when the number of region server increase.
 
-. If distributed log processing is enabled, the HMaster creates a _split log 
manager_ instance when the cluster is started.
-  .. The split log manager manages all log files which need to be scanned and 
split.
-  .. The split log manager places all the logs into the ZooKeeper splitWAL 
node (_/hbase/splitWAL_) as tasks.
-  .. You can view the contents of the splitWAL by issuing the following 
`zkCli` command. Example output is shown.
-+
-[source,bash]
-----
-ls /hbase/splitWAL
-[hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2FWALs%2Fhost8.sample.com%2C57020%2C1340474893275-splitting%2Fhost8.sample.com%253A57020.1340474893900,
-hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2FWALs%2Fhost3.sample.com%2C57020%2C1340474893299-splitting%2Fhost3.sample.com%253A57020.1340474893931,
-hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2FWALs%2Fhost4.sample.com%2C57020%2C1340474893287-splitting%2Fhost4.sample.com%253A57020.1340474893946]
-----
-+
-The output contains some non-ASCII characters.
-When decoded, it looks much more simple:
-+
-----
-[hdfs://host2.sample.com:56020/hbase/WALs
-/host8.sample.com,57020,1340474893275-splitting
-/host8.sample.com%3A57020.1340474893900,
-hdfs://host2.sample.com:56020/hbase/WALs
-/host3.sample.com,57020,1340474893299-splitting
-/host3.sample.com%3A57020.1340474893931,
-hdfs://host2.sample.com:56020/hbase/WALs
-/host4.sample.com,57020,1340474893287-splitting
-/host4.sample.com%3A57020.1340474893946]
-----
-+
-The listing represents WAL file names to be scanned and split, which is a list 
of log splitting tasks.
+[[implementation.on.master.side]]
+.Implementation on Master side
+During ServerCrashProcedure, SplitWALManager will create one SplitWALProcedure 
for each WAL file which should be split. Then each SplitWALProcedure will spawn 
a SplitWalRemoteProcedure to send the request to region server.
+SplitWALProcedure is a StateMachineProcedure and here is the state transfer 
diagram.
 
-. The split log manager monitors the log-splitting tasks and workers.
-+
-The split log manager is responsible for the following ongoing tasks:
-+
-* Once the split log manager publishes all the tasks to the splitWAL znode, it 
monitors these task nodes and waits for them to be processed.
-* Checks to see if there are any dead split log workers queued up.
-  If it finds tasks claimed by unresponsive workers, it will resubmit those 
tasks.
-  If the resubmit fails due to some ZooKeeper exception, the dead worker is 
queued up again for retry.
-* Checks to see if there are any unassigned tasks.
-  If it finds any, it create an ephemeral rescan node so that each split log 
worker is notified to re-scan unassigned tasks via the `nodeChildrenChanged` 
ZooKeeper event.
-* Checks for tasks which are assigned but expired.
-  If any are found, they are moved back to `TASK_UNASSIGNED` state again so 
that they can be retried.
-  It is possible that these tasks are assigned to slow workers, or they may 
already be finished.
-  This is not a problem, because log splitting tasks have the property of 
idempotence.
-  In other words, the same log splitting task can be processed many times 
without causing any problem.
-* The split log manager watches the HBase split log znodes constantly.
-  If any split log task node data is changed, the split log manager retrieves 
the node data.
-  The node data contains the current state of the task.
-  You can use the `zkCli` `get` command to retrieve the current state of a 
task.
-  In the example output below, the first line of the output shows that the 
task is currently unassigned.
-+
-----
-get 
/hbase/splitWAL/hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2FWALs%2Fhost6.sample.com%2C57020%2C1340474893287-splitting%2Fhost6.sample.com%253A57020.1340474893945
+.WAL_splitting_coordination
+image::WAL_splitting.png[]
 
-unassigned host2.sample.com:57000
-cZxid = 0×7115
-ctime = Sat Jun 23 11:13:40 PDT 2012
-...
-----
-+
-Based on the state of the task whose data is changed, the split log manager 
does one of the following:
-+
-* Resubmit the task if it is unassigned
-* Heartbeat the task if it is assigned
-* Resubmit or fail the task if it is resigned (see 
<<distributed.log.replay.failure.reasons>>)
-* Resubmit or fail the task if it is completed with errors (see 
<<distributed.log.replay.failure.reasons>>)
-* Resubmit or fail the task if it could not complete due to errors (see 
<<distributed.log.replay.failure.reasons>>)
-* Delete the task if it is successfully completed or failed
-+
-[[distributed.log.replay.failure.reasons]]
-[NOTE]
-.Reasons a Task Will Fail
-====
-* The task has been deleted.
-* The node no longer exists.
-* The log status manager failed to move the state of the task to 
`TASK_UNASSIGNED`.
-* The number of resubmits is over the resubmit threshold.
-====
+[[implementation.on.region.server.side]]
+.Implementation on Region Server side
+Region Server will receive a SplitWALCallable and execute it, which is much 
more straightforward than before. It will return null if success and return 
exception if there is any error.
 
-. Each RegionServer's split log worker performs the log-splitting tasks.
-+
-Each RegionServer runs a daemon thread called the _split log worker_, which 
does the work to split the logs.
-The daemon thread starts when the RegionServer starts, and registers itself to 
watch HBase znodes.
-If any splitWAL znode children change, it notifies a sleeping worker thread to 
wake up and grab more tasks.
-If a worker's current task's node data is changed,
-the worker checks to see if the task has been taken by another worker.
-If so, the worker thread stops work on the current task.
-+
-The worker monitors the splitWAL znode constantly.
-When a new task appears, the split log worker retrieves the task paths and 
checks each one until it finds an unclaimed task, which it attempts to claim.
-If the claim was successful, it attempts to perform the task and updates the 
task's `state` property based on the splitting outcome.
-At this point, the split log worker scans for another unclaimed task.
-+
-.How the Split Log Worker Approaches a Task
-* It queries the task state and only takes action if the task is in 
`TASK_UNASSIGNED `state.
-* If the task is in `TASK_UNASSIGNED` state, the worker attempts to set the 
state to `TASK_OWNED` by itself.
-  If it fails to set the state, another worker will try to grab it.
-  The split log manager will also ask all workers to rescan later if the task 
remains unassigned.
-* If the worker succeeds in taking ownership of the task, it tries to get the 
task state again to make sure it really gets it asynchronously.
-  In the meantime, it starts a split task executor to do the actual work:
-** Get the HBase root folder, create a temp folder under the root, and split 
the log file to the temp folder.
-** If the split was successful, the task executor sets the task to state 
`TASK_DONE`.
-** If the worker catches an unexpected IOException, the task is set to state 
`TASK_ERR`.
-** If the worker is shutting down, set the task to state `TASK_RESIGNED`.
-** If the task is taken by another worker, just log it.
-
-
-. The split log manager monitors for uncompleted tasks.
-+
-The split log manager returns when all tasks are completed successfully.
-If all tasks are completed with some failures, the split log manager throws an 
exception so that the log splitting can be retried.
-Due to an asynchronous implementation, in very rare cases, the split log 
manager loses track of some completed tasks.
-For that reason, it periodically checks for remaining uncompleted task in its 
task map or ZooKeeper.
-If none are found, it throws an exception so that the log splitting can be 
retried right away instead of hanging there waiting for something that won't 
happen.
+[[preformance]]
+.Performance
+According to tests on a cluster which has 5 regionserver and 1 master.
+procedureV2 coordinated WAL splitting has a better performance than ZK 
coordinated WAL splitting no master when restarting the whole cluster or one 
region server crashing.
+
+[[enable.this.feature]]
+.Enable this feature
+To enable this feature, first we should ensure our package of HBase already 
contains these code. If not, please upgrade the package of HBase cluster 
without any configuration change first.
+Then change configuration 'hbase.split.wal.zk.coordinated' to false. Rolling 
upgrade the master with new configuration. Now WAL splitting are handled by our 
new implementation.
+But region server are still trying to grab tasks from zookeeper, we can 
rolling upgrade the region servers with the new configuration to stop that.
+
+* steps as follows:
+** Upgrade whole cluster to get the new Implementation.
+** Upgrade Master with new configuration 
'hbase.split.wal.zk.coordinated'=false.
+** Upgrade region server to stop grab tasks from zookeeper.
 
 [[wal.compression]]
 ==== WAL Compression ====
@@ -1711,6 +1698,9 @@ For example, to view the content of the file 
_hdfs://10.81.47.41:8020/hbase/defa
 If you leave off the option -v to see just a summary on the HFile.
 See usage for other things to do with the `hfile` tool.
 
+NOTE: In the output of this tool, you might see 'seqid=0' for certain keys in 
places such as 'Mid-key'/'firstKey'/'lastKey'. These are
+ 'KeyOnlyKeyValue' type instances - meaning their seqid is irrelevant & we 
just need the keys of these Key-Value instances.
+
 [[store.file.dir]]
 ===== StoreFile Directory Structure on HDFS
 
@@ -2469,12 +2459,6 @@ The most straightforward method is to either use the 
`TableOutputFormat` class f
 The bulk load feature uses a MapReduce job to output table data in HBase's 
internal data format, and then directly loads the generated StoreFiles into a 
running cluster.
 Using bulk load will use less CPU and network resources than simply using the 
HBase API.
 
-[[arch.bulk.load.limitations]]
-=== Bulk Load Limitations
-
-As bulk loading bypasses the write path, the WAL doesn't get written to as 
part of the process.
-Replication works by reading the WAL files so it won't see the bulk loaded 
data – and the same goes for the edits that use `Put.setDurability(SKIP_WAL)`. 
One way to handle that is to ship the raw files or the HFiles to the other 
cluster and do the other processing there.
-
 [[arch.bulk.load.arch]]
 === Bulk Load Architecture
 
@@ -2527,6 +2511,32 @@ To get started doing so, dig into `ImportTsv.java` and 
check the JavaDoc for HFi
 The import step of the bulk load can also be done programmatically.
 See the `LoadIncrementalHFiles` class for more information.
 
+[[arch.bulk.load.replication]]
+=== Bulk Loading Replication
+HBASE-13153 adds replication support for bulk loaded HFiles, available since 
HBase 1.3/2.0. This feature is enabled by setting 
`hbase.replication.bulkload.enabled` to `true` (default is `false`).
+You also need to copy the source cluster configuration files to the 
destination cluster.
+
+Additional configurations are required too:
+
+. `hbase.replication.source.fs.conf.provider`
++
+This defines the class which loads the source cluster file system client 
configuration in the destination cluster. This should be configured for all the 
RS in the destination cluster. Default is 
`org.apache.hadoop.hbase.replication.regionserver.DefaultSourceFSConfigurationProvider`.
++
+. `hbase.replication.conf.dir`
++
+This represents the base directory where the file system client configurations 
of the source cluster are copied to the destination cluster. This should be 
configured for all the RS in the destination cluster. Default is 
`$HBASE_CONF_DIR`.
++
+. `hbase.replication.cluster.id`
++
+This configuration is required in the cluster where replication for bulk 
loaded data is enabled. A source cluster is uniquely identified by the 
destination cluster using this id. This should be configured for all the RS in 
the source cluster configuration file for all the RS.
++
+
+
+
+For example: If source cluster FS client configurations are copied to the 
destination cluster under directory `/home/user/dc1/`, then 
`hbase.replication.cluster.id` should be configured as `dc1` and 
`hbase.replication.conf.dir` as `/home/user`.
+
+NOTE: `DefaultSourceFSConfigurationProvider` supports only `xml` type files. 
It loads source cluster FS client configuration only once, so if source cluster 
FS client configuration files are updated, every peer(s) cluster RS must be 
restarted to reload the configuration.
+
 [[arch.hdfs]]
 == HDFS
 
diff --git a/src/main/asciidoc/_chapters/community.adoc 
b/src/main/asciidoc/_chapters/community.adoc
index 3a896cf..3a429d6 100644
--- a/src/main/asciidoc/_chapters/community.adoc
+++ b/src/main/asciidoc/_chapters/community.adoc
@@ -97,6 +97,9 @@ NOTE: End-of-life releases are not included in this list.
 | 2.1
 | Duo Zhang
 
+| 2.2
+| Guanghao Zhang
+
 |===
 
 [[hbase.commit.msg.format]]
diff --git a/src/main/asciidoc/_chapters/configuration.adoc 
b/src/main/asciidoc/_chapters/configuration.adoc
index 6980a26..fd9d299 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -93,7 +93,10 @@ This section lists required services and some required 
system configuration.
 [[java]]
 .Java
 
-The following table summarizes the recommendation of the HBase community wrt 
deploying on various Java versions. An entry of "yes" is meant to indicate a 
base level of testing and willingness to help diagnose and address issues you 
might run into. Similarly, an entry of "no" or "Not Supported" generally means 
that should you run into an issue the community is likely to ask you to change 
the Java environment before proceeding to help. In some cases, specific 
guidance on limitations (e.g.  [...]
+The following table summarizes the recommendation of the HBase community wrt 
deploying on various Java versions.
+A icon:check-circle[role="green"] symbol is meant to indicate a base level of 
testing and willingness to help diagnose and address issues you might run into.
+Similarly, an entry of icon:exclamation-circle[role="yellow"] or 
icon:times-circle[role="red"] generally means that should you run into an issue 
the community is likely to ask you to change the Java environment before 
proceeding to help.
+In some cases, specific guidance on limitations (e.g. whether compiling / unit 
tests work, specific operational issues, etc) will also be noted.
 
 .Long Term Support JDKs are recommended
 [TIP]
@@ -102,32 +105,34 @@ HBase recommends downstream users rely on JDK releases 
that are marked as Long T
 ====
 
 .Java support by release line
-[cols="1,1,1,1,1", options="header"]
+[cols="6*^.^", options="header"]
 |===
 |HBase Version
 |JDK 7
 |JDK 8
-|JDK 9
-|JDK 10
-
-|2.0
-|link:http://search-hadoop.com/m/YGbbsPxZ723m3as[Not Supported]
-|yes
-|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
-|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
-
-|1.3
-|yes
-|yes
-|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
-|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
-
-
-|1.2
-|yes
-|yes
-|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
-|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported]
+|JDK 9 (Non-LTS)
+|JDK 10 (Non-LTS)
+|JDK 11
+
+|2.0+
+|icon:times-circle[role="red"]
+|icon:check-circle[role="green"]
+v|icon:exclamation-circle[role="yellow"]
+link:https://issues.apache.org/jira/browse/HBASE-20264[HBASE-20264]
+v|icon:exclamation-circle[role="yellow"]
+link:https://issues.apache.org/jira/browse/HBASE-20264[HBASE-20264]
+v|icon:exclamation-circle[role="yellow"]
+link:https://issues.apache.org/jira/browse/HBASE-21110[HBASE-21110]
+
+|1.2+
+|icon:check-circle[role="green"]
+|icon:check-circle[role="green"]
+v|icon:exclamation-circle[role="yellow"]
+link:https://issues.apache.org/jira/browse/HBASE-20264[HBASE-20264]
+v|icon:exclamation-circle[role="yellow"]
+link:https://issues.apache.org/jira/browse/HBASE-20264[HBASE-20264]
+v|icon:exclamation-circle[role="yellow"]
+link:https://issues.apache.org/jira/browse/HBASE-21110[HBASE-21110]
 
 |===
 
@@ -213,26 +218,28 @@ Use the following legend to interpret this table:
 
 .Hadoop version support matrix
 
-* "S" = supported
-* "X" = not supported
-* "NT" = Not tested
+* icon:check-circle[role="green"] = Tested to be fully-functional
+* icon:times-circle[role="red"] = Known to not be fully-functional
+* icon:exclamation-circle[role="yellow"] = Not tested, may/may-not function
 
-[cols="1,1,1,1,1,1", options="header"]
+[cols="1,4*^.^", options="header"]
 |===
-| | HBase-1.2.x | HBase-1.3.x | HBase-1.5.x | HBase-2.0.x | HBase-2.1.x
-|Hadoop-2.4.x | S | S | X | X | X
-|Hadoop-2.5.x | S | S | X | X | X
-|Hadoop-2.6.0 | X | X | X | X | X
-|Hadoop-2.6.1+ | S | S | X | S | X
-|Hadoop-2.7.0 | X | X | X | X | X
-|Hadoop-2.7.1+ | S | S | S | S | S
-|Hadoop-2.8.[0-1] | X | X | X | X | X
-|Hadoop-2.8.2 | NT | NT | NT | NT | NT
-|Hadoop-2.8.3+ | NT | NT | NT | S | S
-|Hadoop-2.9.0 | X | X | X | X | X
-|Hadoop-2.9.1+ | NT | NT | NT | NT | NT
-|Hadoop-3.0.x | X | X | X | X | X
-|Hadoop-3.1.0 | X | X | X | X | X
+| | HBase-1.2.x, HBase-1.3.x | HBase-1.4.x | HBase-2.0.x | HBase-2.1.x
+|Hadoop-2.4.x | icon:check-circle[role="green"] | 
icon:times-circle[role="red"] | icon:times-circle[role="red"] | 
icon:times-circle[role="red"]
+|Hadoop-2.5.x | icon:check-circle[role="green"] | 
icon:times-circle[role="red"] | icon:times-circle[role="red"] | 
icon:times-circle[role="red"]
+|Hadoop-2.6.0 | icon:times-circle[role="red"] | icon:times-circle[role="red"] 
| icon:times-circle[role="red"] | icon:times-circle[role="red"]
+|Hadoop-2.6.1+ | icon:check-circle[role="green"] | 
icon:times-circle[role="red"] | icon:check-circle[role="green"] | 
icon:times-circle[role="red"]
+|Hadoop-2.7.0 | icon:times-circle[role="red"] | icon:times-circle[role="red"] 
| icon:times-circle[role="red"] | icon:times-circle[role="red"]
+|Hadoop-2.7.1+ | icon:check-circle[role="green"] | 
icon:check-circle[role="green"] | icon:check-circle[role="green"] | 
icon:check-circle[role="green"]
+|Hadoop-2.8.[0-1] | icon:times-circle[role="red"] | 
icon:times-circle[role="red"] | icon:times-circle[role="red"] | 
icon:times-circle[role="red"]
+|Hadoop-2.8.2 | icon:exclamation-circle[role="yellow"] | 
icon:exclamation-circle[role="yellow"] | icon:exclamation-circle[role="yellow"] 
| icon:exclamation-circle[role="yellow"]
+|Hadoop-2.8.3+ | icon:exclamation-circle[role="yellow"] | 
icon:exclamation-circle[role="yellow"] | icon:check-circle[role="green"] | 
icon:check-circle[role="green"]
+|Hadoop-2.9.0 | icon:times-circle[role="red"] | icon:times-circle[role="red"] 
| icon:times-circle[role="red"] | icon:times-circle[role="red"]
+|Hadoop-2.9.1+ | icon:exclamation-circle[role="yellow"] | 
icon:exclamation-circle[role="yellow"] | icon:exclamation-circle[role="yellow"] 
| icon:exclamation-circle[role="yellow"]
+|Hadoop-3.0.[0-2] | icon:times-circle[role="red"] | 
icon:times-circle[role="red"] | icon:times-circle[role="red"] | 
icon:times-circle[role="red"]
+|Hadoop-3.0.3+ | icon:times-circle[role="red"] | icon:times-circle[role="red"] 
| icon:check-circle[role="green"] | icon:check-circle[role="green"]
+|Hadoop-3.1.0 | icon:times-circle[role="red"] | icon:times-circle[role="red"] 
| icon:times-circle[role="red"] | icon:times-circle[role="red"]
+|Hadoop-3.1.1+ | icon:times-circle[role="red"] | icon:times-circle[role="red"] 
| icon:check-circle[role="green"] | icon:check-circle[role="green"]
 |===
 
 .Hadoop Pre-2.6.1 and JDK 1.8 Kerberos
diff --git a/src/main/asciidoc/_chapters/cp.adoc 
b/src/main/asciidoc/_chapters/cp.adoc
index abe334c..5fd80b4 100644
--- a/src/main/asciidoc/_chapters/cp.adoc
+++ b/src/main/asciidoc/_chapters/cp.adoc
@@ -483,6 +483,7 @@ The following Observer coprocessor prevents the details of 
the user `admin` from
 returned in a `Get` or `Scan` of the `users` table.
 
 . Write a class that implements the
+link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionCoprocessor.html[RegionCoprocessor],
 
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionObserver]
 class.
 
@@ -500,10 +501,9 @@ empty result. Otherwise, process the request as normal.
 
 Following are the implementation of the above steps:
 
-
 [source,java]
 ----
-public class RegionObserverExample implements RegionObserver {
+public class RegionObserverExample implements RegionCoprocessor, 
RegionObserver {
 
     private static final byte[] ADMIN = Bytes.toBytes("admin");
     private static final byte[] COLUMN_FAMILY = Bytes.toBytes("details");
@@ -511,6 +511,11 @@ public class RegionObserverExample implements 
RegionObserver {
     private static final byte[] VALUE = Bytes.toBytes("You can't see Admin 
details");
 
     @Override
+    public Optional<RegionObserver> getRegionObserver() {
+      return Optional.of(this);
+    }
+
+    @Override
     public void preGetOp(final ObserverContext<RegionCoprocessorEnvironment> 
e, final Get get, final List<Cell> results)
     throws IOException {
 
diff --git a/src/main/asciidoc/_chapters/developer.adoc 
b/src/main/asciidoc/_chapters/developer.adoc
index 32432ff..f728236 100644
--- a/src/main/asciidoc/_chapters/developer.adoc
+++ b/src/main/asciidoc/_chapters/developer.adoc
@@ -2039,30 +2039,97 @@ For more information on how to use ReviewBoard, see 
link:http://www.reviewboard.
 
 ==== Guide for HBase Committers
 
+===== Becoming a committer
+
+Committers are responsible for reviewing and integrating code changes, testing
+and voting on release candidates, weighing in on design discussions, as well as
+other types of project contributions. The PMC votes to make a contributor a
+committer based on an assessment of their contributions to the project. It is
+expected that committers demonstrate a sustained history of high-quality
+contributions to the project and community involvement.
+
+Contributions can be made in many ways. There is no single path to becoming a
+committer, nor any expected timeline. Submitting features, improvements, and 
bug
+fixes is the most common avenue, but other methods are both recognized and
+encouraged (and may be even more important to the health of HBase as a project 
and a
+community). A non-exhaustive list of potential contributions (in no particular
+order):
+
+* <<appendix_contributing_to_documentation,Update the documentation>> for new
+  changes, best practices, recipes, and other improvements.
+* Keep the website up to date.
+* Perform testing and report the results. For instance, scale testing and
+  testing non-standard configurations is always appreciated.
+* Maintain the shared Jenkins testing environment and other testing
+  infrastructure.
+* <<hbase.rc.voting,Vote on release candidates>> after performing validation, 
even if non-binding.
+  A non-binding vote is a vote by a non-committer.
+* Provide input for discussion threads on the link:/mail-lists.html[mailing 
lists] (which usually have
+  `[DISCUSS]` in the subject line).
+* Answer questions questions on the user or developer mailing lists and on
+  Slack.
+* Make sure the HBase community is a welcoming one and that we adhere to our
+  link:/coc.html[Code of conduct]. Alert the PMC if you
+  have concerns.
+* Review other people's work (both code and non-code) and provide public
+  feedback.
+* Report bugs that are found, or file new feature requests.
+* Triage issues and keep JIRA organized. This includes closing stale issues,
+  labeling new issues, updating metadata, and other tasks as needed.
+* Mentor new contributors of all sorts.
+* Give talks and write blogs about HBase. Add these to the link:/[News] section
+  of the website.
+* Provide UX feedback about HBase, the web UI, the CLI, APIs, and the website.
+* Write demo applications and scripts.
+* Help attract and retain a diverse community.
+* Interact with other projects in ways that benefit HBase and those other
+  projects.
+
+Not every individual is able to do all (or even any) of the items on this list.
+If you think of other ways to contribute, go for it (and add them to the list).
+A pleasant demeanor and willingness to contribute are all you need to make a
+positive impact on the HBase project. Invitations to become a committer are the
+result of steady interaction with the community over the long term, which 
builds
+trust and recognition.
+
 ===== New committers
 
-New committers are encouraged to first read Apache's generic committer 
documentation:
+New committers are encouraged to first read Apache's generic committer
+documentation:
 
 * link:https://www.apache.org/dev/new-committers-guide.html[Apache New 
Committer Guide]
 * link:https://www.apache.org/dev/committers.html[Apache Committer FAQ]
 
 ===== Review
 
-HBase committers should, as often as possible, attempt to review patches 
submitted by others.
-Ideally every submitted patch will get reviewed by a committer _within a few 
days_.
-If a committer reviews a patch they have not authored, and believe it to be of 
sufficient quality, then they can commit the patch, otherwise the patch should 
be cancelled with a clear explanation for why it was rejected.
-
-The list of submitted patches is in the 
link:https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12312392[HBase
 Review Queue], which is ordered by time of last modification.
-Committers should scan the list from top to bottom, looking for patches that 
they feel qualified to review and possibly commit.
-
-For non-trivial changes, it is required to get another committer to review 
your own patches before commit.
-Use the btn:[Submit Patch]                        button in JIRA, just like 
other contributors, and then wait for a `+1` response from another committer 
before committing.
+HBase committers should, as often as possible, attempt to review patches
+submitted by others. Ideally every submitted patch will get reviewed by a
+committer _within a few days_. If a committer reviews a patch they have not
+authored, and believe it to be of sufficient quality, then they can commit the
+patch. Otherwise the patch should be cancelled with a clear explanation for why
+it was rejected.
+
+The list of submitted patches is in the
+link:https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12312392[HBase
 Review Queue],
+which is ordered by time of last modification. Committers should scan the list
+from top to bottom, looking for patches that they feel qualified to review and
+possibly commit. If you see a patch you think someone else is better qualified
+to review, you can mention them by username in the JIRA.
+
+For non-trivial changes, it is required that another committer review your
+patches before commit. **Self-commits of non-trivial patches are not allowed.**
+Use the btn:[Submit Patch] button in JIRA, just like other contributors, and
+then wait for a `+1` response from another committer before committing.
 
 ===== Reject
 
-Patches which do not adhere to the guidelines in 
link:https://hbase.apache.org/book.html#developer[HowToContribute] and to the 
link:https://wiki.apache.org/hadoop/CodeReviewChecklist[code review checklist] 
should be rejected.
-Committers should always be polite to contributors and try to instruct and 
encourage them to contribute better patches.
-If a committer wishes to improve an unacceptable patch, then it should first 
be rejected, and a new patch should be attached by the committer for review.
+Patches which do not adhere to the guidelines in
+link:https://hbase.apache.org/book.html#developer[HowToContribute] and to the
+link:https://wiki.apache.org/hadoop/CodeReviewChecklist[code review checklist]
+should be rejected. Committers should always be polite to contributors and try
+to instruct and encourage them to contribute better patches. If a committer
+wishes to improve an unacceptable patch, then it should first be rejected, and 
a
+new patch should be attached by the committer for further review.
 
 [[committing.patches]]
 ===== Commit
@@ -2073,29 +2140,34 @@ Committers commit patches to the Apache HBase GIT 
repository.
 [NOTE]
 ====
 Make sure your local configuration is correct, especially your identity and 
email.
-Examine the output of the +$ git config
-                                --list+ command and be sure it is correct.
-See this GitHub article, link:https://help.github.com/articles/set-up-git[Set 
Up Git] if you need pointers.
+Examine the output of the +$ git config --list+ command and be sure it is 
correct.
+See link:https://help.github.com/articles/set-up-git[Set Up Git] if you need
+pointers.
 ====
 
-When you commit a patch, please:
-
-. Include the Jira issue id in the commit message along with a short 
description of the change. Try
-  to add something more than just the Jira title so that someone looking at 
git log doesn't
-  have to go to Jira to discern what the change is about.
-  Be sure to get the issue ID right, as this causes Jira to link to the change 
in Git (use the
-  issue's "All" tab to see these).
-. Commit the patch to a new branch based off master or other intended branch.
-  It's a good idea to call this branch by the JIRA ID.
-  Then check out the relevant target branch where you want to commit, make 
sure your local branch has all remote changes, by doing a +git pull --rebase+ 
or another similar command, cherry-pick the change into each relevant branch 
(such as master), and do +git push <remote-server>
-  <remote-branch>+.
+When you commit a patch:
+
+. Include the Jira issue ID in the commit message along with a short 
description
+  of the change. Try to add something more than just the Jira title so that
+  someone looking at `git log` output doesn't have to go to Jira to discern 
what
+  the change is about. Be sure to get the issue ID right, because this causes
+  Jira to link to the change in Git (use the issue's "All" tab to see these
+  automatic links).
+. Commit the patch to a new branch based off `master` or the other intended
+  branch. It's a good idea to include the JIRA ID in the name of this branch.
+  Check out the relevant target branch where you want to commit, and make sure
+  your local branch has all remote changes, by doing a +git pull --rebase+ or
+  another similar command. Next, cherry-pick the change into each relevant
+  branch (such as master), and push the changes to the remote branch using
+  a command such as +git push <remote-server> <remote-branch>+.
 +
 WARNING: If you do not have all remote changes, the push will fail.
 If the push fails for any reason, fix the problem or ask for help.
 Do not do a +git push --force+.
 +
 Before you can commit a patch, you need to determine how the patch was created.
-The instructions and preferences around the way to create patches have 
changed, and there will be a transition period.
+The instructions and preferences around the way to create patches have changed,
+and there will be a transition period.
 +
 .Determine How a Patch Was Created
 * If the first few lines of the patch look like the headers of an email, with 
a From, Date, and
@@ -2122,16 +2194,18 @@ diff --git src/main/asciidoc/_chapters/developer.adoc 
src/main/asciidoc/_chapter
 +
 .Example of committing a Patch
 ====
-One thing you will notice with these examples is that there are a lot of +git 
pull+ commands.
-The only command that actually writes anything to the remote repository is 
+git push+, and you need to make absolutely sure you have the correct versions 
of everything and don't have any conflicts before pushing.
-The extra +git
-                                        pull+ commands are usually redundant, 
but better safe than sorry.
+One thing you will notice with these examples is that there are a lot of
++git pull+ commands. The only command that actually writes anything to the
+remote repository is +git push+, and you need to make absolutely sure you have
+the correct versions of everything and don't have any conflicts before pushing.
+The extra +git pull+ commands are usually redundant, but better safe than 
sorry.
 
-The first example shows how to apply a patch that was generated with +git 
format-patch+ and apply it to the `master` and `branch-1` branches.
+The first example shows how to apply a patch that was generated with +git
+format-patch+ and apply it to the `master` and `branch-1` branches.
 
-The directive to use +git format-patch+                                    
rather than +git diff+, and not to use `--no-prefix`, is a new one.
-See the second example for how to apply a patch created with +git
-                                        diff+, and educate the person who 
created the patch.
+The directive to use +git format-patch+ rather than +git diff+, and not to use
+`--no-prefix`, is a new one. See the second example for how to apply a patch
+created with +git diff+, and educate the person who created the patch.
 
 ----
 $ git checkout -b HBASE-XXXX
@@ -2153,13 +2227,13 @@ $ git push origin branch-1
 $ git branch -D HBASE-XXXX
 ----
 
-This example shows how to commit a patch that was created using +git diff+ 
without `--no-prefix`.
-If the patch was created with `--no-prefix`, add `-p0` to the +git apply+ 
command.
+This example shows how to commit a patch that was created using +git diff+
+without `--no-prefix`. If the patch was created with `--no-prefix`, add `-p0` 
to
+the +git apply+ command.
 
 ----
 $ git apply ~/Downloads/HBASE-XXXX-v2.patch
-$ git commit -m "HBASE-XXXX Really Good Code Fix (Joe Schmo)" 
--author=<contributor> -a  # This
-and next command is needed for patches created with 'git diff'
+$ git commit -m "HBASE-XXXX Really Good Code Fix (Joe Schmo)" 
--author=<contributor> -a  # This and next command is needed for patches 
created with 'git diff'
 $ git commit --amend --signoff
 $ git checkout master
 $ git pull --rebase
@@ -2180,7 +2254,9 @@ $ git branch -D HBASE-XXXX
 ====
 
 . Resolve the issue as fixed, thanking the contributor.
-  Always set the "Fix Version" at this point, but please only set a single fix 
version for each branch where the change was committed, the earliest release in 
that branch in which the change will appear.
+  Always set the "Fix Version" at this point, but only set a single fix version
+  for each branch where the change was committed, the earliest release in that
+  branch in which the change will appear.
 
 ====== Commit Message Format
 
@@ -2195,7 +2271,9 @@ The preferred commit message format is:
 HBASE-12345 Fix All The Things ([email protected])
 ----
 
-If the contributor used +git format-patch+ to generate the patch, their commit 
message is in their patch and you can use that, but be sure the JIRA ID is at 
the front of the commit message, even if the contributor left it out.
+If the contributor used +git format-patch+ to generate the patch, their commit
+message is in their patch and you can use that, but be sure the JIRA ID is at
+the front of the commit message, even if the contributor left it out.
 
 [[committer.amending.author]]
 ====== Add Amending-Author when a conflict cherrypick backporting
diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc 
b/src/main/asciidoc/_chapters/ops_mgt.adoc
index 4da4a67..b8a0fd5 100644
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@@ -51,7 +51,8 @@ Options:
 Commands:
 Some commands take arguments. Pass no args or -h for usage.
   shell           Run the HBase shell
-  hbck            Run the hbase 'fsck' tool
+  hbck            Run the HBase 'fsck' tool. Defaults read-only hbck1.
+                  Pass '-j /path/to/HBCK2.jar' to run hbase-2.x HBCK2.
   snapshot        Tool for managing snapshots
   wal             Write-ahead-log analyzer
   hfile           Store file analyzer
@@ -386,12 +387,33 @@ Each command except `RowCounter` and `CellCounter` accept 
a single `--help` argu
 [[hbck]]
 === HBase `hbck`
 
-To run `hbck` against your HBase cluster run `$./bin/hbase hbck`. At the end 
of the command's output it prints `OK` or `INCONSISTENCY`.
-If your cluster reports inconsistencies, pass `-details` to see more detail 
emitted.
-If inconsistencies, run `hbck` a few times because the inconsistency may be 
transient (e.g. cluster is starting up or a region is splitting).
- Passing `-fix` may correct the inconsistency (This is an experimental 
feature).
+The `hbck` tool that shipped with hbase-1.x has been made read-only in 
hbase-2.x. It is not able to repair
+hbase-2.x clusters as hbase internals have changed. Nor should its assessments 
in read-only mode be
+trusted as it does not understand hbase-2.x operation.
 
-For more information, see <<hbck.in.depth>>.
+A new tool, <<HBCK2>>, described in the next section, replaces `hbck`.
+
+[[HBCK2]]
+=== HBase `HBCK2`
+
+`HBCK2` is the successor to <<hbck>>, the hbase-1.x fix tool (A.K.A `hbck1`). 
Use it in place of `hbck1`
+making repairs against hbase-2.x installs.
+
+`HBCK2` does not ship as part of hbase. It can be found as a subproject of the 
companion
+link:https://github.com/apache/hbase-operator-tools[hbase-operator-tools] 
repository at
+link:https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2[Apache
 HBase HBCK2 Tool].
+`HBCK2` was moved out of hbase so it could evolve at a cadence apart from that 
of hbase core.
+
+See the 
[https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2](HBCK2) 
Home Page
+for how `HBCK2` differs from `hbck1`, and for how to build and use it.
+
+Once built, you can run `HBCK2` as follows:
+
+```
+$ hbase hbck -j /path/to/HBCK2.jar
+```
+
+This will generate `HBCK2` usage describing commands and options.
 
 [[hfile_tool2]]
 === HFile Tool
@@ -509,6 +531,124 @@ By default, CopyTable utility only copies the latest 
version of row cells unless
 See Jonathan Hsieh's 
link:https://blog.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/[Online
           HBase Backups with CopyTable] blog post for more on `CopyTable`.
 
+[[hashtable.synctable]]
+=== HashTable/SyncTable
+
+HashTable/SyncTable is a two steps tool for synchronizing table data, where 
each of the steps are implemented as MapReduce jobs.
+Similarly to CopyTable, it can be used for partial or entire table data 
syncing, under same or remote cluster.
+However, it performs the sync in a more efficient way than CopyTable. Instead 
of copying all cells
+in specified row key/time period range, HashTable (the first step) creates 
hashed indexes for batch of cells on source table and output those as results.
+On the next stage, SyncTable scans the source table and now calculates hash 
indexes for table cells,
+compares these hashes with the outputs of HashTable, then it just scans (and 
compares) cells for diverging hashes, only updating
+mismatching cells. This results in less network traffic/data transfers, which 
can be impacting when syncing large tables on remote clusters.
+
+==== Step 1, HashTable
+
+First, run HashTable on the source table cluster (this is the table whose 
state will be copied to its counterpart).
+
+Usage:
+
+----
+$ ./bin/hbase org.apache.hadoop.hbase.mapreduce.HashTable --help
+Usage: HashTable [options] <tablename> <outputpath>
+
+Options:
+ batchsize     the target amount of bytes to hash in each batch
+               rows are added to the batch until this size is reached
+               (defaults to 8000 bytes)
+ numhashfiles  the number of hash files to create
+               if set to fewer than number of regions then
+               the job will create this number of reducers
+               (defaults to 1/100 of regions -- at least 1)
+ startrow      the start row
+ stoprow       the stop row
+ starttime     beginning of the time range (unixtime in millis)
+               without endtime means from starttime to forever
+ endtime       end of the time range.  Ignored if no starttime specified.
+ scanbatch     scanner batch size to support intra row scans
+ versions      number of cell versions to include
+ families      comma-separated list of families to include
+
+Args:
+ tablename     Name of the table to hash
+ outputpath    Filesystem path to put the output data
+
+Examples:
+ To hash 'TestTable' in 32kB batches for a 1 hour window into 50 files:
+ $ bin/hbase org.apache.hadoop.hbase.mapreduce.HashTable --batchsize=32000 
--numhashfiles=50 --starttime=1265875194289 --endtime=1265878794289 
--families=cf2,cf3 TestTable /hashes/testTable
+----
+
+The *batchsize* property defines how much cell data for a given region will be 
hashed together in a single hash value.
+Sizing this properly has a direct impact on the sync efficiency, as it may 
lead to less scans executed by mapper tasks
+of SyncTable (the next step in the process). The rule of thumb is that, the 
smaller the number of cells out of sync
+(lower probability of finding a diff), larger batch size values can be 
determined.
+
+==== Step 2, SyncTable
+
+Once HashTable has completed on source cluster, SyncTable can be ran on target 
cluster.
+Just like replication and other synchronization jobs, it requires that all 
RegionServers/DataNodes
+on source cluster be accessible by NodeManagers on the target cluster (where 
SyncTable job tasks will be running).
+
+Usage:
+
+----
+$ ./bin/hbase org.apache.hadoop.hbase.mapreduce.SyncTable --help
+Usage: SyncTable [options] <sourcehashdir> <sourcetable> <targettable>
+
+Options:
+ sourcezkcluster  ZK cluster key of the source table
+                  (defaults to cluster in classpath's config)
+ targetzkcluster  ZK cluster key of the target table
+                  (defaults to cluster in classpath's config)
+ dryrun           if true, output counters but no writes
+                  (defaults to false)
+ doDeletes        if false, does not perform deletes
+                  (defaults to true)
+ doPuts           if false, does not perform puts
+                  (defaults to true)
+
+Args:
+ sourcehashdir    path to HashTable output dir for source table
+                  (see org.apache.hadoop.hbase.mapreduce.HashTable)
+ sourcetable      Name of the source table to sync from
+ targettable      Name of the target table to sync to
+
+Examples:
+ For a dry run SyncTable of tableA from a remote source cluster
+ to a local target cluster:
+ $ bin/hbase org.apache.hadoop.hbase.mapreduce.SyncTable --dryrun=true 
--sourcezkcluster=zk1.example.com,zk2.example.com,zk3.example.com:2181:/hbase 
hdfs://nn:9000/hashes/tableA tableA tableA
+----
+
+The *dryrun* option is useful when a read only, diff report is wanted, as it 
will produce only COUNTERS indicating the differences, but will not perform
+any actual changes. It can be used as an alternative to VerifyReplication tool.
+
+By default, SyncTable will cause target table to become an exact copy of 
source table (at least, for the specified startrow/stoprow or/and 
starttime/endtime).
+
+Setting doDeletes to false modifies default behaviour to not delete target 
cells that are missing on source.
+Similarly, setting doPuts to false modifies default behaviour to not add 
missing cells on target. Setting both doDeletes
+and doPuts to false would give same effect as setting dryrun to true.
+
+.Set doDeletes to false on Two-Way Replication scenarios
+[NOTE]
+====
+On Two-Way Replication or other scenarios where both source and target 
clusters can have data ingested, it's advisable to always set doDeletes option 
to false,
+as any additional cell inserted on SyncTable target cluster and not yet 
replicated to source would be deleted, and potentially lost permanently.
+====
+
+.Set sourcezkcluster to the actual source cluster ZK quorum
+[NOTE]
+====
+Although not required, if sourcezkcluster is not set, SyncTable will connect 
to local HBase cluster for both source and target,
+which does not give any meaningful result.
+====
+
+.Remote Clusters on different Kerberos Realms
+[NOTE]
+====
+Currently, SyncTable can't be ran for remote clusters on different Kerberos 
realms.
+There's some work in progress to resolve this on 
link:https://jira.apache.org/jira/browse/HBASE-20586[HBASE-20586]
+====
+
 [[export]]
 === Export
 
@@ -819,15 +959,85 @@ See 
link:https://issues.apache.org/jira/browse/HBASE-4391[HBASE-4391 Add ability
 [[compaction.tool]]
 === Offline Compaction Tool
 
-See the usage for the
-link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/CompactionTool.html[CompactionTool].
-Run it like:
+*CompactionTool* provides a way of running compactions (either minor or major) 
as an independent
+process from the RegionServer. It reuses same internal implementation classes 
executed by RegionServer
+compaction feature. However, since this runs on a complete separate 
independent java process, it
+releases RegionServers from the overhead involved in rewrite a set of hfiles, 
which can be critical
+for latency sensitive use cases.
 
-[source, bash]
+Usage:
 ----
 $ ./bin/hbase org.apache.hadoop.hbase.regionserver.CompactionTool
+
+Usage: java org.apache.hadoop.hbase.regionserver.CompactionTool \
+  [-compactOnce] [-major] [-mapred] [-D<property=value>]* files...
+
+Options:
+ mapred         Use MapReduce to run compaction.
+ compactOnce    Execute just one compaction step. (default: while needed)
+ major          Trigger major compaction.
+
+Note: -D properties will be applied to the conf used.
+For example:
+ To stop delete of compacted file, pass -Dhbase.compactiontool.delete=false
+ To set tmp dir, pass -Dhbase.tmp.dir=ALTERNATE_DIR
+
+Examples:
+ To compact the full 'TestTable' using MapReduce:
+ $ hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred 
hdfs://hbase/data/default/TestTable
+
+ To compact column family 'x' of the table 'TestTable' region 'abc':
+ $ hbase org.apache.hadoop.hbase.regionserver.CompactionTool 
hdfs://hbase/data/default/TestTable/abc/x
 ----
 
+As shown by usage options above, *CompactionTool* can run as a standalone 
client or a mapreduce job.
+When running as mapreduce job, each family dir is handled as an input split, 
and is processed
+by a separate map task.
+
+The *compactionOnce* parameter controls how many compaction cycles will be 
performed until
+*CompactionTool* program decides to finish its work. If omitted, it will 
assume it should keep
+running compactions on each specified family as determined by the given 
compaction policy
+configured. For more info on compaction policy, see <<compaction,compaction>>.
+
+If a major compaction is desired, *major* flag can be specified. If omitted, 
*CompactionTool* will
+assume minor compaction is wanted by default.
+
+It also allows for configuration overrides with `-D` flag. In the usage 
section above, for example,
+`-Dhbase.compactiontool.delete=false` option will instruct compaction engine 
to not delete original
+files from temp folder.
+
+Files targeted for compaction must be specified as parent hdfs dirs. It allows 
for multiple dirs
+definition, as long as each for these dirs are either a *family*, a *region*, 
or a *table* dir. If a
+table or region dir is passed, the program will recursively iterate through 
related sub-folders,
+effectively running compaction for each family found below the table/region 
level.
+
+Since these dirs are nested under *hbase* hdfs directory tree, 
*CompactionTool* requires hbase super
+user permissions in order to have access to required hfiles.
+
+.Running in MapReduce mode
+[NOTE]
+====
+MapReduce mode offers the ability to process each family dir in parallel, as a 
separate map task.
+Generally, it would make sense to run in this mode when specifying one or more 
table dirs as targets
+for compactions. The caveat, though, is that if number of families to be 
compacted become too large,
+the related mapreduce job may have indirect impacts on *RegionServers* 
performance .
+Since *NodeManagers* are normally co-located with RegionServers, such large 
jobs could
+compete for IO/Bandwidth resources with the *RegionServers*.
+====
+
+.MajorCompaction completely disabled on RegionServers due performance impacts
+[NOTE]
+====
+*Major compactions* can be a costly operation (see <<compaction,compaction>>), 
and can indeed
+impact performance on RegionServers, leading operators to completely disable 
it for critical
+low latency application. *CompactionTool* could be used as an alternative in 
such scenarios,
+although, additional custom application logic would need to be implemented, 
such as deciding
+scheduling and selection of tables/regions/families target for a given 
compaction run.
+====
+
+For additional details about CompactionTool, see also
+link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/CompactionTool.html[CompactionTool].
+
 === `hbase clean`
 
 The `hbase clean` command cleans HBase data from ZooKeeper, HDFS, or both.
@@ -1272,15 +1482,6 @@ Monitor the output of the _/tmp/log.txt_ file to follow 
the progress of the scri
 Use the following guidelines if you want to create your own rolling restart 
script.
 
 . Extract the new release, verify its configuration, and synchronize it to all 
nodes of your cluster using `rsync`, `scp`, or another secure synchronization 
mechanism.
-. Use the hbck utility to ensure that the cluster is consistent.
-+
-----
-
-$ ./bin/hbck
-----
-+
-Perform repairs if required.
-See <<hbck,hbck>> for details.
 
 . Restart the master first.
   You may need to modify these commands if your new HBase directory is 
different from the old one, such as for an upgrade.
@@ -1312,7 +1513,6 @@ $ for i in `cat conf/regionservers|sort`; do 
./bin/graceful_stop.sh --restart --
 ----
 
 . Restart the Master again, to clear out the dead servers list and re-enable 
the load balancer.
-. Run the `hbck` utility again, to be sure the cluster is consistent.
 
 [[adding.new.node]]
 === Adding a New Node
@@ -1612,6 +1812,28 @@ image::bc_l1.png[]
 This is not an exhaustive list of all the screens and reports available.
 Have a look in the Web UI.
 
+=== Snapshot Space Usage Monitoring
+
+Starting with HBase 0.95, Snapshot usage information on individual snapshots 
was shown in the HBase Master Web UI. This was further enhanced starting with 
HBase 1.3 to show the total Storefile size of the Snapshot Set. The following 
metrics are shown in the Master Web UI with HBase 1.3 and later.
+
+* Shared Storefile Size is the Storefile size shared between snapshots and 
active tables.
+* Mob Storefile Size is the Mob Storefile size shared between snapshots and 
active tables.
+* Archived Storefile Size is the Storefile size in Archive.
+
+The format of Archived Storefile Size is NNN(MMM). NNN is the total Storefile 
size in Archive, MMM is the total Storefile size in Archive that is specific to 
the snapshot (not shared with other snapshots and tables).
+
+.Master Snapshot Overview
+image::master-snapshot.png[]
+
+.Snapshot Storefile Stats Example 1
+image::1-snapshot.png[]
+
+.Snapshot Storefile Stats Example 2
+image::2-snapshots.png[]
+
+.Empty Snapshot Storfile Stats Example
+image::empty-snapshots.png[]
+
 == Cluster Replication
 
 NOTE: This information was previously available at
@@ -1628,6 +1850,9 @@ Some use cases for cluster replication include:
 NOTE: Replication is enabled at the granularity of the column family.
 Before enabling replication for a column family, create the table and all 
column families to be replicated, on the destination cluster.
 
+NOTE: Replication is asynchronous as we send WAL to another cluster in 
background, which means that when you want to do recovery through replication, 
you could loss some data. To address this problem, we have introduced a new 
feature called synchronous replication. As the mechanism is a bit different so 
we use a separated section to describe it. Please see
+<<Synchronous Replication,Synchronous Replication>>.
+
 === Replication Overview
 
 Cluster replication uses a source-push methodology.
@@ -2454,9 +2679,12 @@ Since the cluster is up, there is a risk that edits 
could be missed in the expor
 [[ops.snapshots]]
 == HBase Snapshots
 
-HBase Snapshots allow you to take a snapshot of a table without too much 
impact on Region Servers.
-Snapshot, Clone and restore operations don't involve data copying.
-Also, Exporting the snapshot to another cluster doesn't have impact on the 
Region Servers.
+HBase Snapshots allow you to take a copy of a table (both contents and 
metadata)with a very small performance impact. A Snapshot is an immutable
+collection of table metadata and a list of HFiles that comprised the table at 
the time the Snapshot was taken. A "clone"
+of a snapshot creates a new table from that snapshot, and a "restore" of a 
snapshot returns the contents of a table to
+what it was when the snapshot was created. The "clone" and "restore" 
operations do not require any data to be copied,
+as the underlying HFiles (the files which contain the data for an HBase table) 
are not modified with either action.
+Simiarly, exporting a snapshot to another cluster has little impact on 
RegionServers of the local cluster.
 
 Prior to version 0.94.6, the only way to backup or to clone a table is to use 
CopyTable/ExportTable, or to copy all the hfiles in HDFS after disabling the 
table.
 The disadvantages of these methods are that you can degrade region server 
performance (Copy/Export Table) or you need to disable the table, that means no 
reads or writes; and this is usually unacceptable.
@@ -2719,8 +2947,6 @@ HDFS replication factor only affects your disk usage and 
is invisible to most HB
 You can view the current number of regions for a given table using the HMaster 
UI.
 In the [label]#Tables# section, the number of online regions for each table is 
listed in the [label]#Online Regions# column.
 This total only includes the in-memory state and does not include disabled or 
offline regions.
-If you do not want to use the HMaster UI, you can determine the number of 
regions by counting the number of subdirectories of the /hbase/<table>/ 
subdirectories in HDFS, or by running the `bin/hbase hbck` command.
-Each of these methods may return a slightly different number, depending on the 
status of each region.
 
 [[ops.capacity.regions.count]]
 ==== Number of regions per RS - upper bound
diff --git a/src/main/asciidoc/_chapters/preface.adoc 
b/src/main/asciidoc/_chapters/preface.adoc
index 280f2d8..deebdd3 100644
--- a/src/main/asciidoc/_chapters/preface.adoc
+++ b/src/main/asciidoc/_chapters/preface.adoc
@@ -68,7 +68,7 @@ Yours, the HBase Community.
 
 Please use link:https://issues.apache.org/jira/browse/hbase[JIRA] to report 
non-security-related bugs.
 
-To protect existing HBase installations from new vulnerabilities, please *do 
not* use JIRA to report security-related bugs. Instead, send your report to the 
mailing list [email protected], which allows anyone to send messages, but 
restricts who can read them. Someone on that list will contact you to follow up 
on your report.
+To protect existing HBase installations from new vulnerabilities, please *do 
not* use JIRA to report security-related bugs. Instead, send your report to the 
mailing list [email protected], which allows anyone to send messages, 
but restricts who can read them. Someone on that list will contact you to 
follow up on your report.
 
 [[hbase_supported_tested_definitions]]
 .Support and Testing Expectations
diff --git a/src/main/asciidoc/_chapters/security.adoc 
b/src/main/asciidoc/_chapters/security.adoc
index 1afc131..56f6566 100644
--- a/src/main/asciidoc/_chapters/security.adoc
+++ b/src/main/asciidoc/_chapters/security.adoc
@@ -30,7 +30,7 @@
 [IMPORTANT]
 .Reporting Security Bugs
 ====
-NOTE: To protect existing HBase installations from exploitation, please *do 
not* use JIRA to report security-related bugs. Instead, send your report to the 
mailing list [email protected], which allows anyone to send messages, but 
restricts who can read them. Someone on that list will contact you to follow up 
on your report.
+NOTE: To protect existing HBase installations from exploitation, please *do 
not* use JIRA to report security-related bugs. Instead, send your report to the 
mailing list [email protected], which allows anyone to send messages, 
but restricts who can read them. Someone on that list will contact you to 
follow up on your report.
 
 HBase adheres to the Apache Software Foundation's policy on reported 
vulnerabilities, available at http://apache.org/security/.
 
@@ -1739,7 +1739,7 @@ All options have been discussed separately in the 
sections above.
 <!-- HBase Superuser -->
 <property>
   <name>hbase.superuser</name>
-  <value>hbase, admin</value>
+  <value>hbase,admin</value>
 </property>
 <!-- Coprocessors for ACLs and Visibility Tags -->
 <property>
@@ -1759,8 +1759,7 @@ All options have been discussed separately in the 
sections above.
 </property>
 <property>
   <name>hbase.coprocessor.regionserver.classes</name>
-  <value>org.apache.hadoop/hbase.security.access.AccessController,
-  org.apache.hadoop.hbase.security.access.VisibilityController</value>
+  <value>org.apache.hadoop.hbase.security.access.AccessController</value>
 </property>
 <!-- Executable ACL for Coprocessor Endpoints -->
 <property>
diff --git a/src/main/asciidoc/_chapters/spark.adoc 
b/src/main/asciidoc/_chapters/spark.adoc
new file mode 100644
index 0000000..d5089f2
--- /dev/null
+++ b/src/main/asciidoc/_chapters/spark.adoc
@@ -0,0 +1,699 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ . . http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[spark]]
+= HBase and Spark
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+link:https://spark.apache.org/[Apache Spark] is a software framework that is 
used
+to process data in memory in a distributed manner, and is replacing MapReduce 
in
+many use cases.
+
+Spark itself is out of scope of this document, please refer to the Spark site 
for
+more information on the Spark project and subprojects. This document will focus
+on 4 main interaction points between Spark and HBase. Those interaction points 
are:
+
+Basic Spark::
+  The ability to have an HBase Connection at any point in your Spark DAG.
+Spark Streaming::
+  The ability to have an HBase Connection at any point in your Spark Streaming
+  application.
+Spark Bulk Load::
+  The ability to write directly to HBase HFiles for bulk insertion into HBase
+SparkSQL/DataFrames::
+  The ability to write SparkSQL that draws on tables that are represented in 
HBase.
+
+The following sections will walk through examples of all these interaction 
points.
+
+== Basic Spark
+
+This section discusses Spark HBase integration at the lowest and simplest 
levels.
+All the other interaction points are built upon the concepts that will be 
described
+here.
+
+At the root of all Spark and HBase integration is the HBaseContext. The 
HBaseContext
+takes in HBase configurations and pushes them to the Spark executors. This 
allows
+us to have an HBase Connection per Spark Executor in a static location.
+
+For reference, Spark Executors can be on the same nodes as the Region Servers 
or
+on different nodes, there is no dependence on co-location. Think of every Spark
+Executor as a multi-threaded client application. This allows any Spark Tasks
+running on the executors to access the shared Connection object.
+
+.HBaseContext Usage Example
+====
+
+This example shows how HBaseContext can be used to do a `foreachPartition` on 
a RDD
+in Scala:
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+...
+
+val hbaseContext = new HBaseContext(sc, config)
+
+rdd.hbaseForeachPartition(hbaseContext, (it, conn) => {
+ val bufferedMutator = conn.getBufferedMutator(TableName.valueOf("t1"))
+ it.foreach((putRecord) => {
+. val put = new Put(putRecord._1)
+. putRecord._2.foreach((putValue) => put.addColumn(putValue._1, putValue._2, 
putValue._3))
+. bufferedMutator.mutate(put)
+ })
+ bufferedMutator.flush()
+ bufferedMutator.close()
+})
+----
+
+Here is the same example implemented in Java:
+
+[source, java]
+----
+JavaSparkContext jsc = new JavaSparkContext(sparkConf);
+
+try {
+  List<byte[]> list = new ArrayList<>();
+  list.add(Bytes.toBytes("1"));
+  ...
+  list.add(Bytes.toBytes("5"));
+
+  JavaRDD<byte[]> rdd = jsc.parallelize(list);
+  Configuration conf = HBaseConfiguration.create();
+
+  JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);
+
+  hbaseContext.foreachPartition(rdd,
+      new VoidFunction<Tuple2<Iterator<byte[]>, Connection>>() {
+   public void call(Tuple2<Iterator<byte[]>, Connection> t)
+        throws Exception {
+    Table table = t._2().getTable(TableName.valueOf(tableName));
+    BufferedMutator mutator = 
t._2().getBufferedMutator(TableName.valueOf(tableName));
+    while (t._1().hasNext()) {
+      byte[] b = t._1().next();
+      Result r = table.get(new Get(b));
+      if (r.getExists()) {
+       mutator.mutate(new Put(b));
+      }
+    }
+
+    mutator.flush();
+    mutator.close();
+    table.close();
+   }
+  });
+} finally {
+  jsc.stop();
+}
+----
+====
+
+All functionality between Spark and HBase will be supported both in Scala and 
in
+Java, with the exception of SparkSQL which will support any language that is
+supported by Spark. For the remaining of this documentation we will focus on
+Scala examples.
+
+The examples above illustrate how to do a foreachPartition with a connection. A
+number of other Spark base functions  are supported out of the box:
+
+// tag::spark_base_functions[]
+`bulkPut`:: For massively parallel sending of puts to HBase
+`bulkDelete`:: For massively parallel sending of deletes to HBase
+`bulkGet`:: For massively parallel sending of gets to HBase to create a new RDD
+`mapPartition`:: To do a Spark Map function with a Connection object to allow 
full
+access to HBase
+`hBaseRDD`:: To simplify a distributed scan to create a RDD
+// end::spark_base_functions[]
+
+For examples of all these functionalities, see the
+link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark 
integration]
+in the link:https://github.com/apache/hbase-connectors[hbase-connectors] 
repository
+(the hbase-spark connectors live outside hbase core in a related,
+Apache HBase project maintained, associated repo).
+
+== Spark Streaming
+https://spark.apache.org/streaming/[Spark Streaming] is a micro batching stream
+processing framework built on top of Spark. HBase and Spark Streaming make 
great
+companions in that HBase can help serve the following benefits alongside Spark
+Streaming.
+
+* A place to grab reference data or profile data on the fly
+* A place to store counts or aggregates in a way that supports Spark 
Streaming's
+promise of _only once processing_.
+
+The 
link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark 
integration]
+with Spark Streaming is similar to its normal Spark integration points, in 
that the following
+commands are possible straight off a Spark Streaming DStream.
+
+include::spark.adoc[tags=spark_base_functions]
+
+.`bulkPut` Example with DStreams
+====
+
+Below is an example of bulkPut with DStreams. It is very close in feel to the 
RDD
+bulk put.
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+val hbaseContext = new HBaseContext(sc, config)
+val ssc = new StreamingContext(sc, Milliseconds(200))
+
+val rdd1 = ...
+val rdd2 = ...
+
+val queue = mutable.Queue[RDD[(Array[Byte], Array[(Array[Byte],
+    Array[Byte], Array[Byte])])]]()
+
+queue += rdd1
+queue += rdd2
+
+val dStream = ssc.queueStream(queue)
+
+dStream.hbaseBulkPut(
+  hbaseContext,
+  TableName.valueOf(tableName),
+  (putRecord) => {
+   val put = new Put(putRecord._1)
+   putRecord._2.foreach((putValue) => put.addColumn(putValue._1, putValue._2, 
putValue._3))
+   put
+  })
+----
+
+There are three inputs to the `hbaseBulkPut` function.
+The hbaseContext that carries the configuration broadcast information link
+to the HBase Connections in the executor, the table name of the table we are
+putting data into, and a function that will convert a record in the DStream
+into an HBase Put object.
+====
+
+== Bulk Load
+
+There are two options for bulk loading data into HBase with Spark.  There is 
the
+basic bulk load functionality that will work for cases where your rows have
+millions of columns and cases where your columns are not consolidated and
+partitioned before the map side of the Spark bulk load process.
+
+There is also a thin record bulk load option with Spark. This second option is
+designed for tables that have less then 10k columns per row.  The advantage
+of this second option is higher throughput and less over-all load on the Spark
+shuffle operation.
+
+Both implementations work more or less like the MapReduce bulk load process in
+that a partitioner partitions the rowkeys based on region splits and
+the row keys are sent to the reducers in order, so that HFiles can be written
+out directly from the reduce phase.
+
+In Spark terms, the bulk load will be implemented around a Spark
+`repartitionAndSortWithinPartitions` followed by a Spark `foreachPartition`.
+
+First lets look at an example of using the basic bulk load functionality
+
+.Bulk Loading Example
+====
+
+The following example shows bulk loading with Spark.
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+val hbaseContext = new HBaseContext(sc, config)
+
+val stagingFolder = ...
+val rdd = sc.parallelize(Array(
+      (Bytes.toBytes("1"),
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("a"), 
Bytes.toBytes("foo1"))),
+      (Bytes.toBytes("3"),
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("b"), 
Bytes.toBytes("foo2.b"))), ...
+
+rdd.hbaseBulkLoad(TableName.valueOf(tableName),
+  t => {
+   val rowKey = t._1
+   val family:Array[Byte] = t._2(0)._1
+   val qualifier = t._2(0)._2
+   val value = t._2(0)._3
+
+   val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, family, qualifier)
+
+   Seq((keyFamilyQualifier, value)).iterator
+  },
+  stagingFolder.getPath)
+
+val load = new LoadIncrementalHFiles(config)
+load.doBulkLoad(new Path(stagingFolder.getPath),
+  conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName)))
+----
+====
+
+The `hbaseBulkLoad` function takes three required parameters:
+
+. The table name of the table we intend to bulk load too
+
+. A function that will convert a record in the RDD to a tuple key value par. 
With
+the tuple key being a KeyFamilyQualifer object and the value being the cell 
value.
+The KeyFamilyQualifer object will hold the RowKey, Column Family, and Column 
Qualifier.
+The shuffle will partition on the RowKey but will sort by all three values.
+
+. The temporary path for the HFile to be written out too
+
+Following the Spark bulk load command,  use the HBase's LoadIncrementalHFiles 
object
+to load the newly created HFiles into HBase.
+
+.Additional Parameters for Bulk Loading with Spark
+
+You can set the following attributes with additional parameter options on 
hbaseBulkLoad.
+
+* Max file size of the HFiles
+* A flag to exclude HFiles from compactions
+* Column Family settings for compression, bloomType, blockSize, and 
dataBlockEncoding
+
+.Using Additional Parameters
+====
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+val hbaseContext = new HBaseContext(sc, config)
+
+val stagingFolder = ...
+val rdd = sc.parallelize(Array(
+      (Bytes.toBytes("1"),
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("a"), 
Bytes.toBytes("foo1"))),
+      (Bytes.toBytes("3"),
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("b"), 
Bytes.toBytes("foo2.b"))), ...
+
+val familyHBaseWriterOptions = new java.util.HashMap[Array[Byte], 
FamilyHFileWriteOptions]
+val f1Options = new FamilyHFileWriteOptions("GZ", "ROW", 128, "PREFIX")
+
+familyHBaseWriterOptions.put(Bytes.toBytes("columnFamily1"), f1Options)
+
+rdd.hbaseBulkLoad(TableName.valueOf(tableName),
+  t => {
+   val rowKey = t._1
+   val family:Array[Byte] = t._2(0)._1
+   val qualifier = t._2(0)._2
+   val value = t._2(0)._3
+
+   val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, family, qualifier)
+
+   Seq((keyFamilyQualifier, value)).iterator
+  },
+  stagingFolder.getPath,
+  familyHBaseWriterOptions,
+  compactionExclude = false,
+  HConstants.DEFAULT_MAX_FILE_SIZE)
+
+val load = new LoadIncrementalHFiles(config)
+load.doBulkLoad(new Path(stagingFolder.getPath),
+  conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName)))
+----
+====
+
+Now lets look at how you would call the thin record bulk load implementation
+
+.Using thin record bulk load
+====
+
+[source, scala]
+----
+val sc = new SparkContext("local", "test")
+val config = new HBaseConfiguration()
+
+val hbaseContext = new HBaseContext(sc, config)
+
+val stagingFolder = ...
+val rdd = sc.parallelize(Array(
+      ("1",
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("a"), 
Bytes.toBytes("foo1"))),
+      ("3",
+        (Bytes.toBytes(columnFamily1), Bytes.toBytes("b"), 
Bytes.toBytes("foo2.b"))), ...
+
+rdd.hbaseBulkLoadThinRows(hbaseContext,
+      TableName.valueOf(tableName),
+      t => {
+        val rowKey = t._1
+
+        val familyQualifiersValues = new FamiliesQualifiersValues
+        t._2.foreach(f => {
+          val family:Array[Byte] = f._1
+          val qualifier = f._2
+          val value:Array[Byte] = f._3
+
+          familyQualifiersValues +=(family, qualifier, value)
+        })
+        (new ByteArrayWrapper(Bytes.toBytes(rowKey)), familyQualifiersValues)
+      },
+      stagingFolder.getPath,
+      new java.util.HashMap[Array[Byte], FamilyHFileWriteOptions],
+      compactionExclude = false,
+      20)
+
+val load = new LoadIncrementalHFiles(config)
+load.doBulkLoad(new Path(stagingFolder.getPath),
+  conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName)))
+----
+====
+
+Note that the big difference in using bulk load for thin rows is the function
+returns a tuple with the first value being the row key and the second value
+being an object of FamiliesQualifiersValues, which will contain all the
+values for this row for all column families.
+
+== SparkSQL/DataFrames
+
+The 
link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark 
integration]
+leverages
+link:https://databricks.com/blog/2015/01/09/spark-sql-data-sources-api-unified-data-access-for-the-spark-platform.html[DataSource
 API]
+(link:https://issues.apache.org/jira/browse/SPARK-3247[SPARK-3247])
+introduced in Spark-1.2.0, which bridges the gap between simple HBase KV store 
and complex
+relational SQL queries and enables users to perform complex data analytical 
work
+on top of HBase using Spark. HBase Dataframe is a standard Spark Dataframe, 
and is able to
+interact with any other data sources such as Hive, Orc, Parquet, JSON, etc.
+The 
link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark 
integration]
+applies critical techniques such as partition pruning, column pruning,
+predicate pushdown and data locality.
+
+To use the
+link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark 
integration]
+connector, users need to define the Catalog for the schema mapping
+between HBase and Spark tables, prepare the data and populate the HBase table,
+then load the HBase DataFrame. After that, users can do integrated query and 
access records
+in HBase tables with SQL query. The following illustrates the basic procedure.
+
+=== Define catalog
+
+[source, scala]
+----
+def catalog = s"""{
+       |"table":{"namespace":"default", "name":"table1"},
+       |"rowkey":"key",
+       |"columns":{
+         |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
+         |"col1":{"cf":"cf1", "col":"col1", "type":"boolean"},
+         |"col2":{"cf":"cf2", "col":"col2", "type":"double"},
+         |"col3":{"cf":"cf3", "col":"col3", "type":"float"},
+         |"col4":{"cf":"cf4", "col":"col4", "type":"int"},
+         |"col5":{"cf":"cf5", "col":"col5", "type":"bigint"},
+         |"col6":{"cf":"cf6", "col":"col6", "type":"smallint"},
+         |"col7":{"cf":"cf7", "col":"col7", "type":"string"},
+         |"col8":{"cf":"cf8", "col":"col8", "type":"tinyint"}
+       |}
+     |}""".stripMargin
+----
+
+Catalog defines a mapping between HBase and Spark tables. There are two 
critical parts of this catalog.
+One is the rowkey definition and the other is the mapping between table column 
in Spark and
+the column family and column qualifier in HBase. The above defines a schema 
for a HBase table
+with name as table1, row key as key and a number of columns (col1 `-` col8). 
Note that the rowkey
+also has to be defined in details as a column (col0), which has a specific cf 
(rowkey).
+
+=== Save the DataFrame
+
+[source, scala]
+----
+case class HBaseRecord(
+   col0: String,
+   col1: Boolean,
+   col2: Double,
+   col3: Float,
+   col4: Int,       
+   col5: Long,
+   col6: Short,
+   col7: String,
+   col8: Byte)
+
+object HBaseRecord
+{                                                                              
                               
+   def apply(i: Int, t: String): HBaseRecord = {
+      val s = s"""row${"%03d".format(i)}"""       
+      HBaseRecord(s,
+      i % 2 == 0,
+      i.toDouble,
+      i.toFloat,  
+      i,
+      i.toLong,
+      i.toShort,  
+      s"String$i: $t",      
+      i.toByte)
+  }
+}
+
+val data = (0 to 255).map { i =>  HBaseRecord(i, "extra")}
+
+sc.parallelize(data).toDF.write.options(
+ Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> 
"5"))
+ .format("org.apache.hadoop.hbase.spark ")
+ .save()
+
+----
+`data` prepared by the user is a local Scala collection which has 256 
HBaseRecord objects.
+`sc.parallelize(data)` function distributes `data` to form an RDD. `toDF` 
returns a DataFrame.
+`write` function returns a DataFrameWriter used to write the DataFrame to 
external storage
+systems (e.g. HBase here). Given a DataFrame with specified schema `catalog`, 
`save` function
+will create an HBase table with 5 regions and save the DataFrame inside.
+
+=== Load the DataFrame
+
+[source, scala]
+----
+def withCatalog(cat: String): DataFrame = {
+  sqlContext
+  .read
+  .options(Map(HBaseTableCatalog.tableCatalog->cat))
+  .format("org.apache.hadoop.hbase.spark")
+  .load()
+}
+val df = withCatalog(catalog)
+----
+In ‘withCatalog’ function, sqlContext is a variable of SQLContext, which is 
the entry point
+for working with structured data (rows and columns) in Spark.
+`read` returns a DataFrameReader that can be used to read data in as a 
DataFrame.
+`option` function adds input options for the underlying data source to the 
DataFrameReader,
+and `format` function specifies the input data source format for the 
DataFrameReader.
+The `load()` function loads input in as a DataFrame. The date frame `df` 
returned
+by `withCatalog` function could be used to access HBase table, such as 4.4 and 
4.5.
+
+=== Language Integrated Query
+
+[source, scala]
+----
+val s = df.filter(($"col0" <= "row050" && $"col0" > "row040") ||
+  $"col0" === "row005" ||
+  $"col0" <= "row005")
+  .select("col0", "col1", "col4")
+s.show
+----
+DataFrame can do various operations, such as join, sort, select, filter, 
orderBy and so on.
+`df.filter` above filters rows using the given SQL expression. `select` 
selects a set of columns:
+`col0`, `col1` and `col4`.
+
+=== SQL Query
+
+[source, scala]
+----
+df.registerTempTable("table1")
+sqlContext.sql("select count(col1) from table1").show
+----
+
+`registerTempTable` registers `df` DataFrame as a temporary table using the 
table name `table1`.
+The lifetime of this temporary table is tied to the SQLContext that was used 
to create `df`.
+`sqlContext.sql` function allows the user to execute SQL queries.
+
+=== Others
+
+.Query with different timestamps
+====
+In HBaseSparkConf, four parameters related to timestamp can be set. They are 
TIMESTAMP,
+MIN_TIMESTAMP, MAX_TIMESTAMP and MAX_VERSIONS respectively. Users can query 
records with
+different timestamps or time ranges with MIN_TIMESTAMP and MAX_TIMESTAMP. In 
the meantime,
+use concrete value instead of tsSpecified and oldMs in the examples below.
+
+The example below shows how to load df DataFrame with different timestamps.
+tsSpecified is specified by the user.
+HBaseTableCatalog defines the HBase and Relation relation schema.
+writeCatalog defines catalog for the schema mapping.
+
+[source, scala]
+----
+val df = sqlContext.read
+      .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, 
HBaseSparkConf.TIMESTAMP -> tsSpecified.toString))
+      .format("org.apache.hadoop.hbase.spark")
+      .load()
+----
+
+The example below shows how to load df DataFrame with different time ranges.
+oldMs is specified by the user.
+
+[source, scala]
+----
+val df = sqlContext.read
+      .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, 
HBaseSparkConf.MIN_TIMESTAMP -> "0",
+        HBaseSparkConf.MAX_TIMESTAMP -> oldMs.toString))
+      .format("org.apache.hadoop.hbase.spark")
+      .load()
+----
+After loading df DataFrame, users can query data.
+
+[source, scala]
+----
+df.registerTempTable("table")
+sqlContext.sql("select count(col1) from table").show
+----
+====
+
+.Native Avro support
+====
+The 
link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark 
integration]
+connector supports different data formats like Avro, JSON, etc. The use case 
below
+shows how spark supports Avro. Users can persist the Avro record into HBase 
directly. Internally,
+the Avro schema is converted to a native Spark Catalyst data type 
automatically.
+Note that both key-value parts in an HBase table can be defined in Avro format.
+
+1) Define catalog for the schema mapping:
+
+[source, scala]
+----
+def catalog = s"""{
+                     |"table":{"namespace":"default", "name":"Avrotable"},
+                      |"rowkey":"key",
+                      |"columns":{
+                      |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
+                      |"col1":{"cf":"cf1", "col":"col1", "type":"binary"}
+                      |}
+                      |}""".stripMargin
+----
+
+`catalog` is a schema for a HBase table named `Avrotable`. row key as key and
+one column col1. The rowkey also has to be defined in details as a column 
(col0),
+which has a specific cf (rowkey).
+
+2) Prepare the Data:
+
+[source, scala]
+----
+ object AvroHBaseRecord {
+   val schemaString =
+     s"""{"namespace": "example.avro",
+         |   "type": "record",      "name": "User",
+         |    "fields": [
+         |        {"name": "name", "type": "string"},
+         |        {"name": "favorite_number",  "type": ["int", "null"]},
+         |        {"name": "favorite_color", "type": ["string", "null"]},
+         |        {"name": "favorite_array", "type": {"type": "array", 
"items": "string"}},
+         |        {"name": "favorite_map", "type": {"type": "map", "values": 
"int"}}
+         |      ]    }""".stripMargin
+
+   val avroSchema: Schema = {
+     val p = new Schema.Parser
+     p.parse(schemaString)
+   }
+
+   def apply(i: Int): AvroHBaseRecord = {
+     val user = new GenericData.Record(avroSchema);
+     user.put("name", s"name${"%03d".format(i)}")
+     user.put("favorite_number", i)
+     user.put("favorite_color", s"color${"%03d".format(i)}")
+     val favoriteArray = new GenericData.Array[String](2, 
avroSchema.getField("favorite_array").schema())
+     favoriteArray.add(s"number${i}")
+     favoriteArray.add(s"number${i+1}")
+     user.put("favorite_array", favoriteArray)
+     import collection.JavaConverters._
+     val favoriteMap = Map[String, Int](("key1" -> i), ("key2" -> 
(i+1))).asJava
+     user.put("favorite_map", favoriteMap)
+     val avroByte = AvroSedes.serialize(user, avroSchema)
+     AvroHBaseRecord(s"name${"%03d".format(i)}", avroByte)
+   }
+ }
+
+ val data = (0 to 255).map { i =>
+    AvroHBaseRecord(i)
+ }
+----
+
+`schemaString` is defined first, then it is parsed to get `avroSchema`. 
`avroSchema` is used to
+generate `AvroHBaseRecord`. `data` prepared by users is a local Scala 
collection
+which has 256 `AvroHBaseRecord` objects.
+
+3) Save DataFrame:
+
+[source, scala]
+----
+ sc.parallelize(data).toDF.write.options(
+     Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable 
-> "5"))
+     .format("org.apache.spark.sql.execution.datasources.hbase")
+     .save()
+----
+
+Given a data frame with specified schema `catalog`, above will create an HBase 
table with 5
+regions and save the data frame inside.
+
+4) Load the DataFrame
+
+[source, scala]
+----
+def avroCatalog = s"""{
+            |"table":{"namespace":"default", "name":"avrotable"},
+            |"rowkey":"key",
+            |"columns":{
+              |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
+              |"col1":{"cf":"cf1", "col":"col1", "avro":"avroSchema"}
+            |}
+          |}""".stripMargin
+
+ def withCatalog(cat: String): DataFrame = {
+     sqlContext
+         .read
+         .options(Map("avroSchema" -> AvroHBaseRecord.schemaString, 
HBaseTableCatalog.tableCatalog -> avroCatalog))
+         .format("org.apache.spark.sql.execution.datasources.hbase")
+         .load()
+ }
+ val df = withCatalog(catalog)
+----
+
+In `withCatalog` function, `read` returns a DataFrameReader that can be used 
to read data in as a DataFrame.
+The `option` function adds input options for the underlying data source to the 
DataFrameReader.
+There are two options: one is to set `avroSchema` as 
`AvroHBaseRecord.schemaString`, and one is to
+set `HBaseTableCatalog.tableCatalog` as `avroCatalog`. The `load()` function 
loads input in as a DataFrame.
+The date frame `df` returned by `withCatalog` function could be used to access 
the HBase table.
+
+5) SQL Query
+
+[source, scala]
+----
+ df.registerTempTable("avrotable")
+ val c = sqlContext.sql("select count(1) from avrotable").
+----
+
+After loading df DataFrame, users can query data. registerTempTable registers 
df DataFrame
+as a temporary table using the table name avrotable. `sqlContext.sql` function 
allows the
+user to execute SQL queries.
+====
diff --git a/src/main/asciidoc/_chapters/sync_replication.adoc 
b/src/main/asciidoc/_chapters/sync_replication.adoc
new file mode 100644
index 0000000..d28b9a9
--- /dev/null
+++ b/src/main/asciidoc/_chapters/sync_replication.adoc
@@ -0,0 +1,125 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[syncreplication]]
+= Synchronous Replication
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+:source-language: java
+
+== Background
+
+The current <<Cluster Replication, replication>> in HBase in asynchronous. So 
if the master cluster crashes, the slave cluster may not have the
+newest data. If users want strong consistency then they can not switch to the 
slave cluster.
+
+== Design
+
+Please see the design doc on 
link:https://issues.apache.org/jira/browse/HBASE-19064[HBASE-19064]
+
+== Operation and maintenance
+
+Case.1 Setup two synchronous replication clusters::
+
+* Add a synchronous peer in both source cluster and peer cluster.
+
+For source cluster:
+[source,ruby]
+----
+hbase> add_peer  '1', CLUSTER_KEY => 
'lg-hadoop-tst-st01.bj:10010,lg-hadoop-tst-st02.bj:10010,lg-hadoop-tst-st03.bj:10010:/hbase/test-hbase-slave',
 
REMOTE_WAL_DIR=>'hdfs://lg-hadoop-tst-st01.bj:20100/hbase/test-hbase-slave/remoteWALs',
 TABLE_CFS => {"ycsb-test"=>[]}
+----
+
+For peer cluster:
+[source,ruby]
+----
+hbase> add_peer  '1', CLUSTER_KEY => 
'lg-hadoop-tst-st01.bj:10010,lg-hadoop-tst-st02.bj:10010,lg-hadoop-tst-st03.bj:10010:/hbase/test-hbase',
 
REMOTE_WAL_DIR=>'hdfs://lg-hadoop-tst-st01.bj:20100/hbase/test-hbase/remoteWALs',
 TABLE_CFS => {"ycsb-test"=>[]}
+----
+
+NOTE: For synchronous replication, the current implementation require that we 
have the same peer id for both source
+and peer cluster. Another thing that need attention is: the peer does not 
support cluster-level, namespace-level, or
+cf-level replication, only support table-level replication now.
+
+* Transit the peer cluster to be STANDBY state
+
+[source,ruby]
+----
+hbase> transit_peer_sync_replication_state '1', 'STANDBY'
+----
+
+* Transit the source cluster to be ACTIVE state
+
+[source,ruby]
+----
+hbase> transit_peer_sync_replication_state '1', 'ACTIVE'
+----
+
+Now, the synchronous replication has been set up successfully. the HBase 
client can only request to source cluster, if
+request to peer cluster, the peer cluster which is STANDBY state now will 
reject the read/write requests.
+
+Case.2 How to operate when standby cluster crashed::
+
+If the standby cluster has been crashed, it will fail to write remote WAL for 
the active cluster. So we need to transit
+the source cluster to DOWNGRANDE_ACTIVE state, which means source cluster 
won't write any remote WAL any more, but
+the normal replication (asynchronous Replication) can still work fine, it 
queue the newly written WALs, but the
+replication block until the peer cluster come back.
+
+[source,ruby]
+----
+hbase> transit_peer_sync_replication_state '1', 'DOWNGRADE_ACTIVE'
+----
+
+Once the peer cluster come back, we can just transit the source cluster to 
ACTIVE, to ensure that the replication will be
+synchronous.
+
+[source,ruby]
+----
+hbase> transit_peer_sync_replication_state '1', 'ACTIVE'
+----
+
+Case.3 How to operate when active cluster crashed::
+
+If the active cluster has been crashed (it may be not reachable now), so let's 
just transit the standby cluster to
+DOWNGRANDE_ACTIVE state, and after that, we should redirect all the requests 
from client to the DOWNGRADE_ACTIVE cluster.
+
+[source,ruby]
+----
+hbase> transit_peer_sync_replication_state '1', 'DOWNGRADE_ACTIVE'
+----
+
+If the crashed cluster come back again, we just need to transit it to STANDBY 
directly. Otherwise if you transit the
+cluster to DOWNGRADE_ACTIVE, the original ACTIVE cluster may have redundant 
data compared to the current ACTIVE
+cluster. Because we designed to write source cluster WALs and remote cluster 
WALs concurrently, so it's possible that
+the source cluster WALs has more data than the remote cluster, which result in 
data inconsistency. The procedure of
+transiting ACTIVE to STANDBY has no problem, because we'll skip to replay the 
original WALs.
+
+[source,ruby]
+----
+hbase> transit_peer_sync_replication_state '1', 'STANDBY'
+----
+
+After that, we can promote the DOWNGRADE_ACTIVE cluster to ACTIVE now, to 
ensure that the replication will be synchronous.
+
+[source,ruby]
+----
+hbase> transit_peer_sync_replication_state '1', 'ACTIVE'
+----
diff --git a/src/main/asciidoc/_chapters/troubleshooting.adoc 
b/src/main/asciidoc/_chapters/troubleshooting.adoc
index 6795b8a..9fc7c35 100644
--- a/src/main/asciidoc/_chapters/troubleshooting.adoc
+++ b/src/main/asciidoc/_chapters/troubleshooting.adoc
@@ -868,9 +868,9 @@ Snapshots::
   When you create a snapshot, HBase retains everything it needs to recreate 
the table's
   state at that time of the snapshot. This includes deleted cells or expired 
versions.
   For this reason, your snapshot usage pattern should be well-planned, and you 
should
-  prune snapshots that you no longer need. Snapshots are stored in 
`/hbase/.snapshots`,
+  prune snapshots that you no longer need. Snapshots are stored in 
`/hbase/.hbase-snapshot`,
   and archives needed to restore snapshots are stored in
-  `/hbase/.archive/<_tablename_>/<_region_>/<_column_family_>/`.
+  `/hbase/archive/<_tablename_>/<_region_>/<_column_family_>/`.
 
   *Do not* manage snapshots or archives manually via HDFS. HBase provides APIs 
and
   HBase Shell commands for managing them. For more information, see 
<<ops.snapshots>>.
diff --git a/src/main/asciidoc/_chapters/upgrading.adoc 
b/src/main/asciidoc/_chapters/upgrading.adoc
index 6dc788a..2a33e42 100644
--- a/src/main/asciidoc/_chapters/upgrading.adoc
+++ b/src/main/asciidoc/_chapters/upgrading.adoc
@@ -314,6 +314,20 @@ Quitting...
 
 == Upgrade Paths
 
+[[upgrade 2.2]]
+=== Upgrade from 2.0 or 2.1 to 2.2+
+
+HBase 2.2+ uses a new Procedure form assiging/unassigning/moving Regions. It 
does not process HBase 2.1 and 2.0's Unassign/Assign Procedure types. Upgrade 
requires that we first drain the Master Procedure Store of old style Procedures 
before starting the new 2.2 Master. So you need to make sure that before you 
kill the old version (2.0 or 2.1) Master, there is no region in transition. And 
once the new version (2.2+) Master is up, you can rolling upgrade RegionServers 
one by one.
+
+And there is a more safer way if you are running 2.1.1+ or 2.0.3+ cluster. It 
need four steps to upgrade Master.
+
+. Shutdown both active and standby Masters (Your cluster will continue to 
server reads and writes without interruption).
+. Set the property hbase.procedure.upgrade-to-2-2 to true in hbase-site.xml 
for the Master, and start only one Master, still using the 2.1.1+ (or 2.0.3+) 
version.
+. Wait until the Master quits. Confirm that there is a 'READY TO ROLLING 
UPGRADE' message in the Master log as the cause of the shutdown. The Procedure 
Store is now empty.
+. Start new Masters with the new 2.2+ version.
+
+Then you can rolling upgrade RegionServers one by one. See 
link:https://issues.apache.org/jira/browse/HBASE-21075[HBASE-21075] for more 
details.
+
 [[upgrade2.0]]
 === Upgrading from 1.x to 2.x
 
@@ -331,7 +345,10 @@ As noted in the section <<basic.prerequisites>>, HBase 
2.0+ requires a minimum o
 .HBCK must match HBase server version
 You *must not* use an HBase 1.x version of HBCK against an HBase 2.0+ cluster. 
HBCK is strongly tied to the HBase server version. Using the HBCK tool from an 
earlier release against an HBase 2.0+ cluster will destructively alter said 
cluster in unrecoverable ways.
 
-As of HBase 2.0, HBCK is a read-only tool that can report the status of some 
non-public system internals. You should not rely on the format nor content of 
these internals to remain consistent across HBase releases.
+As of HBase 2.0, HBCK (A.K.A _HBCK1_ or _hbck1_) is a read-only tool that can 
report the status of some non-public system internals. You should not rely on 
the format nor content of these internals to remain consistent across HBase 
releases.
+
+To read about HBCK's replacement, see <<HBCK2>> in <<ops_mgt>>.
+
 
 ////
 Link to a ref guide section on HBCK in 2.0 that explains use and calls out the 
inability of clients and server sides to detect version of each other.
@@ -611,6 +628,19 @@ Performance is also an area that is now under active 
review so look forward to
 improvement in coming releases (See
 link:https://issues.apache.org/jira/browse/HBASE-20188[HBASE-20188 TESTING 
Performance]).
 
+[[upgrade2.0.it.kerberos]]
+.Integration Tests and Kerberos
+Integration Tests (`IntegrationTests*`) used to rely on the Kerberos 
credential cache
+for authentication against secured clusters. This used to lead to tests 
failing due
+to authentication failures when the tickets in the credential cache expired.
+As of hbase-2.0.0 (and hbase-1.3.0+), the integration test clients will make 
use
+of the configuration properties `hbase.client.keytab.file` and
+`hbase.client.kerberos.principal`. They are required. The clients will perform 
a
+login from the configured keytab file and automatically refresh the credentials
+in the background for the process lifetime (See
+link:https://issues.apache.org/jira/browse/HBASE-16231[HBASE-16231]).
+
+
 ////
 This would be a good place to link to an appendix on migrating applications
 ////
@@ -731,6 +761,11 @@ Notes:
 
 Doing a raw scan will now return results that have expired according to TTL 
settings.
 
+[[upgrade1.3]]
+=== Upgrading from pre-1.3 to 1.3+
+If running Integration Tests under Kerberos, see <<upgrade2.0.it.kerberos>>.
+
+
 [[upgrade1.0]]
 === Upgrading to 1.x
 
diff --git a/src/main/asciidoc/book.adoc b/src/main/asciidoc/book.adoc
index 2b01749..6e9e19d 100644
--- a/src/main/asciidoc/book.adoc
+++ b/src/main/asciidoc/book.adoc
@@ -65,9 +65,11 @@ include::_chapters/security.adoc[]
 include::_chapters/architecture.adoc[]
 include::_chapters/hbase_mob.adoc[]
 include::_chapters/inmemory_compaction.adoc[]
+include::_chapters/sync_replication.adoc[]
 include::_chapters/hbase_apis.adoc[]
 include::_chapters/external_apis.adoc[]
 include::_chapters/thrift_filter_language.adoc[]
+include::_chapters/spark.adoc[]
 include::_chapters/cp.adoc[]
 include::_chapters/performance.adoc[]
 include::_chapters/troubleshooting.adoc[]
@@ -85,7 +87,6 @@ include::_chapters/community.adoc[]
 
 include::_chapters/appendix_contributing_to_documentation.adoc[]
 include::_chapters/faq.adoc[]
-include::_chapters/hbck_in_depth.adoc[]
 include::_chapters/appendix_acl_matrix.adoc[]
 include::_chapters/compression.adoc[]
 include::_chapters/sql.adoc[]

[hbase] 02/02: HBASE-21972 Copy master doc into branch-2.2 and edit to make it suit 2.2.0

Reply via email to