hbase git commit: HBASE-21493 Update ref guide for branch-1.2 related changes

busbey Fri, 16 Nov 2018 22:20:24 -0800

Repository: hbase
Updated Branches:
  refs/heads/branch-1.2 b8a7edab8 -> d1ac42b22



HBASE-21493 Update ref guide for branch-1.2 related changes

* HBASE-21460 correct Document Configurable Bucket Sizes in bucketCache
* HBASE-15557 Add guidance on HashTable/SyncTable to the RefGuide
* HBASE-20753 fix the email address for security related issues in docs
* HBASE-20977 Avoid using the word "snapshot" when defining HBase Snapshots
* HBASE-20731 fix incorrect snapshot folders path in documentation


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/d1ac42b2
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/d1ac42b2
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/d1ac42b2

Branch: refs/heads/branch-1.2
Commit: d1ac42b22cb5b6b9e7ffedf4dea6763394396304
Parents: b8a7eda
Author: Sean Busbey <bus...@apache.org>
Authored: Sat Nov 17 00:18:37 2018 -0600
Committer: Sean Busbey <bus...@apache.org>
Committed: Sat Nov 17 00:18:37 2018 -0600

----------------------------------------------------------------------
 src/main/asciidoc/_chapters/architecture.adoc   |   4 +-
 src/main/asciidoc/_chapters/ops_mgt.adoc        | 127 ++++++++++++++++++-
 src/main/asciidoc/_chapters/preface.adoc        |   2 +-
 src/main/asciidoc/_chapters/security.adoc       |   2 +-
 .../asciidoc/_chapters/troubleshooting.adoc     |   4 +-
 5 files changed, 130 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/d1ac42b2/src/main/asciidoc/_chapters/architecture.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/architecture.adoc 
b/src/main/asciidoc/_chapters/architecture.adoc
index 6ab5f48..3b91869 100644
--- a/src/main/asciidoc/_chapters/architecture.adoc
+++ b/src/main/asciidoc/_chapters/architecture.adoc
@@ -816,14 +816,14 @@ In the above, we set the BucketCache to be 4G.
 We configured the on-heap LruBlockCache have 20% (0.2) of the RegionServer's 
heap size (0.2 * 5G = 1G). In other words, you configure the L1 LruBlockCache 
as you would normally (as if there were no L2 cache present).
 
 link:https://issues.apache.org/jira/browse/HBASE-10641[HBASE-10641] introduced 
the ability to configure multiple sizes for the buckets of the BucketCache, in 
HBase 0.98 and newer.
-To configurable multiple bucket sizes, configure the new property 
`hfile.block.cache.sizes` (instead of `hfile.block.cache.size`) to a 
comma-separated list of block sizes, ordered from smallest to largest, with no 
spaces.
+To configurable multiple bucket sizes, configure the new property 
`hbase.bucketcache.bucket.sizes` to a comma-separated list of block sizes, 
ordered from smallest to largest, with no spaces.
 The goal is to optimize the bucket sizes based on your data access patterns.
 The following example configures buckets of size 4096 and 8192.
 
 [source,xml]
 ----
 <property>
-  <name>hfile.block.cache.sizes</name>
+  <name>hbase.bucketcache.bucket.sizes</name>
   <value>4096,8192</value>
 </property>
 ----

http://git-wip-us.apache.org/repos/asf/hbase/blob/d1ac42b2/src/main/asciidoc/_chapters/ops_mgt.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc 
b/src/main/asciidoc/_chapters/ops_mgt.adoc
index 7398a5e..85a3f7d 100644
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@@ -440,6 +440,124 @@ By default, CopyTable utility only copies the latest 
version of row cells unless
 See Jonathan Hsieh's 
link:http://www.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/[Online
           HBase Backups with CopyTable] blog post for more on `CopyTable`.
 
+[[hashtable.synctable]]
+=== HashTable/SyncTable
+
+HashTable/SyncTable is a two steps tool for synchronizing table data, where 
each of the steps are implemented as MapReduce jobs.
+Similarly to CopyTable, it can be used for partial or entire table data 
syncing, under same or remote cluster.
+However, it performs the sync in a more efficient way than CopyTable. Instead 
of copying all cells
+in specified row key/time period range, HashTable (the first step) creates 
hashed indexes for batch of cells on source table and output those as results.
+On the next stage, SyncTable scans the source table and now calculates hash 
indexes for table cells,
+compares these hashes with the outputs of HashTable, then it just scans (and 
compares) cells for diverging hashes, only updating
+mismatching cells. This results in less network traffic/data transfers, which 
can be impacting when syncing large tables on remote clusters.
+
+==== Step 1, HashTable
+
+First, run HashTable on the source table cluster (this is the table whose 
state will be copied to its counterpart).
+
+Usage:
+
+----
+$ ./bin/hbase org.apache.hadoop.hbase.mapreduce.HashTable --help
+Usage: HashTable [options] <tablename> <outputpath>
+
+Options:
+ batchsize     the target amount of bytes to hash in each batch
+               rows are added to the batch until this size is reached
+               (defaults to 8000 bytes)
+ numhashfiles  the number of hash files to create
+               if set to fewer than number of regions then
+               the job will create this number of reducers
+               (defaults to 1/100 of regions -- at least 1)
+ startrow      the start row
+ stoprow       the stop row
+ starttime     beginning of the time range (unixtime in millis)
+               without endtime means from starttime to forever
+ endtime       end of the time range.  Ignored if no starttime specified.
+ scanbatch     scanner batch size to support intra row scans
+ versions      number of cell versions to include
+ families      comma-separated list of families to include
+
+Args:
+ tablename     Name of the table to hash
+ outputpath    Filesystem path to put the output data
+
+Examples:
+ To hash 'TestTable' in 32kB batches for a 1 hour window into 50 files:
+ $ bin/hbase org.apache.hadoop.hbase.mapreduce.HashTable --batchsize=32000 
--numhashfiles=50 --starttime=1265875194289 --endtime=1265878794289 
--families=cf2,cf3 TestTable /hashes/testTable
+----
+
+The *batchsize* property defines how much cell data for a given region will be 
hashed together in a single hash value.
+Sizing this properly has a direct impact on the sync efficiency, as it may 
lead to less scans executed by mapper tasks
+of SyncTable (the next step in the process). The rule of thumb is that, the 
smaller the number of cells out of sync
+(lower probability of finding a diff), larger batch size values can be 
determined.
+
+==== Step 2, SyncTable
+
+Once HashTable has completed on source cluster, SyncTable can be ran on target 
cluster.
+Just like replication and other synchronization jobs, it requires that all 
RegionServers/DataNodes
+on source cluster be accessible by NodeManagers on the target cluster (where 
SyncTable job tasks will be running).
+
+Usage:
+
+----
+$ ./bin/hbase org.apache.hadoop.hbase.mapreduce.SyncTable --help
+Usage: SyncTable [options] <sourcehashdir> <sourcetable> <targettable>
+
+Options:
+ sourcezkcluster  ZK cluster key of the source table
+                  (defaults to cluster in classpath's config)
+ targetzkcluster  ZK cluster key of the target table
+                  (defaults to cluster in classpath's config)
+ dryrun           if true, output counters but no writes
+                  (defaults to false)
+ doDeletes        if false, does not perform deletes
+                  (defaults to true)
+ doPuts           if false, does not perform puts
+                  (defaults to true)
+
+Args:
+ sourcehashdir    path to HashTable output dir for source table
+                  (see org.apache.hadoop.hbase.mapreduce.HashTable)
+ sourcetable      Name of the source table to sync from
+ targettable      Name of the target table to sync to
+
+Examples:
+ For a dry run SyncTable of tableA from a remote source cluster
+ to a local target cluster:
+ $ bin/hbase org.apache.hadoop.hbase.mapreduce.SyncTable --dryrun=true 
--sourcezkcluster=zk1.example.com,zk2.example.com,zk3.example.com:2181:/hbase 
hdfs://nn:9000/hashes/tableA tableA tableA
+----
+
+The *dryrun* option is useful when a read only, diff report is wanted, as it 
will produce only COUNTERS indicating the differences, but will not perform
+any actual changes. It can be used as an alternative to VerifyReplication tool.
+
+By default, SyncTable will cause target table to become an exact copy of 
source table (at least, for the specified startrow/stoprow or/and 
starttime/endtime).
+
+Setting doDeletes to false modifies default behaviour to not delete target 
cells that are missing on source.
+Similarly, setting doPuts to false modifies default behaviour to not add 
missing cells on target. Setting both doDeletes
+and doPuts to false would give same effect as setting dryrun to true.
+
+.Set doDeletes to false on Two-Way Replication scenarios
+[NOTE]
+====
+On Two-Way Replication or other scenarios where both source and target 
clusters can have data ingested, it's advisable to always set doDeletes option 
to false,
+as any additional cell inserted on SyncTable target cluster and not yet 
replicated to source would be deleted, and potentially lost permanently.
+====
+
+.Set sourcezkcluster to the actual source cluster ZK quorum
+[NOTE]
+====
+Although not required, if sourcezkcluster is not set, SyncTable will connect 
to local HBase cluster for both source and target,
+which does not give any meaningful result.
+====
+
+.Remote Clusters on different Kerberos Realms
+[NOTE]
+====
+Currently, SyncTable can't be ran for remote clusters on different Kerberos 
realms.
+There's some work in progress to resolve this on 
link:https://jira.apache.org/jira/browse/HBASE-20586[HBASE-20586]
+====
+
 [[export]]
 === Export
 
@@ -1944,9 +2062,12 @@ Since the cluster is up, there is a risk that edits 
could be missed in the expor
 [[ops.snapshots]]
 == HBase Snapshots
 
-HBase Snapshots allow you to take a snapshot of a table without too much 
impact on Region Servers.
-Snapshot, Clone and restore operations don't involve data copying.
-Also, Exporting the snapshot to another cluster doesn't have impact on the 
Region Servers.
+HBase Snapshots allow you to take a copy of a table (both contents and 
metadata)with a very small performance impact. A Snapshot is an immutable
+collection of table metadata and a list of HFiles that comprised the table at 
the time the Snapshot was taken. A "clone"
+of a snapshot creates a new table from that snapshot, and a "restore" of a 
snapshot returns the contents of a table to
+what it was when the snapshot was created. The "clone" and "restore" 
operations do not require any data to be copied,
+as the underlying HFiles (the files which contain the data for an HBase table) 
are not modified with either action.
+Simiarly, exporting a snapshot to another cluster has little impact on 
RegionServers of the local cluster.
 
 Prior to version 0.94.6, the only way to backup or to clone a table is to use 
CopyTable/ExportTable, or to copy all the hfiles in HDFS after disabling the 
table.
 The disadvantages of these methods are that you can degrade region server 
performance (Copy/Export Table) or you need to disable the table, that means no 
reads or writes; and this is usually unacceptable.

http://git-wip-us.apache.org/repos/asf/hbase/blob/d1ac42b2/src/main/asciidoc/_chapters/preface.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/preface.adoc 
b/src/main/asciidoc/_chapters/preface.adoc
index 7d244bd..52ea568 100644
--- a/src/main/asciidoc/_chapters/preface.adoc
+++ b/src/main/asciidoc/_chapters/preface.adoc
@@ -68,7 +68,7 @@ Yours, the HBase Community.
 
 Please use link:https://issues.apache.org/jira/browse/hbase[JIRA] to report 
non-security-related bugs.
 
-To protect existing HBase installations from new vulnerabilities, please *do 
not* use JIRA to report security-related bugs. Instead, send your report to the 
mailing list priv...@apache.org, which allows anyone to send messages, but 
restricts who can read them. Someone on that list will contact you to follow up 
on your report.
+To protect existing HBase installations from new vulnerabilities, please *do 
not* use JIRA to report security-related bugs. Instead, send your report to the 
mailing list priv...@hbase.apache.org, which allows anyone to send messages, 
but restricts who can read them. Someone on that list will contact you to 
follow up on your report.
 
 [[hbase_supported_tested_definitions]]
 .Support and Testing Expectations

http://git-wip-us.apache.org/repos/asf/hbase/blob/d1ac42b2/src/main/asciidoc/_chapters/security.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/security.adoc 
b/src/main/asciidoc/_chapters/security.adoc
index 0d1407a..8eec9d2 100644
--- a/src/main/asciidoc/_chapters/security.adoc
+++ b/src/main/asciidoc/_chapters/security.adoc
@@ -30,7 +30,7 @@
 [IMPORTANT]
 .Reporting Security Bugs
 ====
-NOTE: To protect existing HBase installations from exploitation, please *do 
not* use JIRA to report security-related bugs. Instead, send your report to the 
mailing list priv...@apache.org, which allows anyone to send messages, but 
restricts who can read them. Someone on that list will contact you to follow up 
on your report.
+NOTE: To protect existing HBase installations from exploitation, please *do 
not* use JIRA to report security-related bugs. Instead, send your report to the 
mailing list priv...@hbase.apache.org, which allows anyone to send messages, 
but restricts who can read them. Someone on that list will contact you to 
follow up on your report.
 
 HBase adheres to the Apache Software Foundation's policy on reported 
vulnerabilities, available at http://apache.org/security/.
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/d1ac42b2/src/main/asciidoc/_chapters/troubleshooting.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/troubleshooting.adoc 
b/src/main/asciidoc/_chapters/troubleshooting.adoc
index fc9aadb..556fc3f 100644
--- a/src/main/asciidoc/_chapters/troubleshooting.adoc
+++ b/src/main/asciidoc/_chapters/troubleshooting.adoc
@@ -851,9 +851,9 @@ Snapshots::
   When you create a snapshot, HBase retains everything it needs to recreate 
the table's
   state at that time of the snapshot. This includes deleted cells or expired 
versions.
   For this reason, your snapshot usage pattern should be well-planned, and you 
should
-  prune snapshots that you no longer need. Snapshots are stored in 
`/hbase/.snapshots`,
+  prune snapshots that you no longer need. Snapshots are stored in 
`/hbase/.hbase-snapshot`,
   and archives needed to restore snapshots are stored in
-  `/hbase/.archive/<_tablename_>/<_region_>/<_column_family_>/`.
+  `/hbase/archive/<_tablename_>/<_region_>/<_column_family_>/`.
 
   *Do not* manage snapshots or archives manually via HDFS. HBase provides APIs 
and
   HBase Shell commands for managing them. For more information, see 
<<ops.snapshots>>.

hbase git commit: HBASE-21493 Update ref guide for branch-1.2 related changes

Reply via email to