Repository: hbase
Updated Branches:
  refs/heads/master a050e1d9f -> 92f5595e7


HBASE-15646 Add some docs about exporting and importing snapshots using S3


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/92f5595e
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/92f5595e
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/92f5595e

Branch: refs/heads/master
Commit: 92f5595e7edd805f092f5e18352a012207d64fe2
Parents: a050e1d
Author: Misty Stanley-Jones <[email protected]>
Authored: Wed Apr 13 12:14:29 2016 -0700
Committer: Misty Stanley-Jones <[email protected]>
Committed: Thu May 19 13:01:05 2016 -0700

----------------------------------------------------------------------
 src/main/asciidoc/_chapters/configuration.adoc | 31 ++++++++++
 src/main/asciidoc/_chapters/ops_mgt.adoc       | 68 +++++++++++++++++++++
 2 files changed, 99 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/92f5595e/src/main/asciidoc/_chapters/configuration.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/configuration.adoc 
b/src/main/asciidoc/_chapters/configuration.adoc
index d705db9..4702bcb 100644
--- a/src/main/asciidoc/_chapters/configuration.adoc
+++ b/src/main/asciidoc/_chapters/configuration.adoc
@@ -1111,6 +1111,37 @@ Only a subset of all configurations can currently be 
changed in the running serv
 Here is an incomplete list: `hbase.regionserver.thread.compaction.large`, 
`hbase.regionserver.thread.compaction.small`, 
`hbase.regionserver.thread.split`, `hbase.regionserver.thread.merge`, as well 
as compaction policy and configurations and adjustment to offpeak hours.
 For the full list consult the patch attached to  
link:https://issues.apache.org/jira/browse/HBASE-12147[HBASE-12147 Porting 
Online Config Change from 89-fb].
 
+[[amazon_s3_configuration]]
+== Using Amazon S3 Storage
+
+HBase is designed to be tightly coupled with HDFS, and testing of other 
filesystems
+has not been thorough.
+
+The following limitations have been reported:
+
+- RegionServers should be deployed in Amazon EC2 to mitigate latency and 
bandwidth
+limitations when accessing the filesystem, and RegionServers must remain 
available
+to preserve data locality.
+- S3 writes each inbound and outbound file to disk, which adds overhead to 
each operation.
+- The best performance is achieved when all clients and servers are in the 
Amazon
+cloud, rather than a heterogenous architecture.
+- You must be aware of the location of `hadoop.tmp.dir` so that the local 
`/tmp/`
+directory is not filled to capacity.
+- HBase has a different file usage pattern than MapReduce jobs and has been 
optimized for
+HDFS, rather than distant networked storage.
+- The `s3a://` protocol is strongly recommended. The `s3n://` and `s3://` 
protocols have serious
+limitations and do not use the Amazon AWS SDK. The `s3a://` protocol is 
supported
+for use with HBase if you use Hadoop 2.6.1 or higher with HBase 1.2 or higher. 
Hadoop
+2.6.0 is not supported with HBase at all.
+
+Configuration details for Amazon S3 and associated Amazon services such as EMR 
are
+out of the scope of the HBase documentation. See the
+link:https://wiki.apache.org/hadoop/AmazonS3[Hadoop Wiki entry on Amazon S3 
Storage]
+and
+link:http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hbase.html[Amazon's
 documentation for deploying HBase in EMR].
+
+One use case that is well-suited for Amazon S3 is storing snapshots. See 
<<snapshots_s3>>.
+
 ifdef::backend-docbook[]
 [index]
 == Index

http://git-wip-us.apache.org/repos/asf/hbase/blob/92f5595e/src/main/asciidoc/_chapters/ops_mgt.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc 
b/src/main/asciidoc/_chapters/ops_mgt.adoc
index 583a872..bc75951 100644
--- a/src/main/asciidoc/_chapters/ops_mgt.adoc
+++ b/src/main/asciidoc/_chapters/ops_mgt.adoc
@@ -2050,6 +2050,74 @@ The following example limits the above example to 200 
MB/sec.
 $ bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 
MySnapshot -copy-to hdfs://srv2:8082/hbase -mappers 16 -bandwidth 200
 ----
 
+[[snapshots_s3]]
+=== Storing Snapshots in an Amazon S3 Bucket
+
+For general information and limitations of using Amazon S3 storage with HBase, 
see
+<<amazon_s3_configuration>>. You can also store and retrieve snapshots from 
Amazon
+S3, using the following procedure.
+
+NOTE: You can also store snapshots in Microsoft Azure Blob Storage. See 
<<snapshots_azure>>.
+
+.Prerequisites
+- You must be using HBase 1.0 or higher and Hadoop 2.6.1 or higher, which is 
the first
+configuration that uses the Amazon AWS SDK.
+- You must use the `s3a://` protocol to connect to Amazon S3. The older 
`s3n://`
+and `s3://` protocols have various limitations and do not use the Amazon AWS 
SDK.
+- The `s3a://` URI must be configured and available on the server where you run
+the commands to export and restore the snapshot.
+
+After you have fulfilled the prerequisites, take the snapshot like you 
normally would.
+Afterward, you can export it using the 
`org.apache.hadoop.hbase.snapshot.ExportSnapshot`
+command like the one below, substituting your own `s3a://` path in the 
`copy-from`
+or `copy-to` directive and substituting or modifying other options as required:
+
+----
+$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
+    -snapshot MySnapshot \
+    -copy-from hdfs://srv2:8082/hbase \
+    -copy-to s3a://<bucket>/<namespace>/hbase \
+    -chuser MyUser \
+    -chgroup MyGroup \
+    -chmod 700 \
+    -mappers 16
+----
+
+----
+$ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot \
+    -snapshot MySnapshot
+    -copy-from s3a://<bucket>/<namespace>/hbase \
+    -copy-to hdfs://srv2:8082/hbase \
+    -chuser MyUser \
+    -chgroup MyGroup \
+    -chmod 700 \
+    -mappers 16
+----
+
+You can also use the `org.apache.hadoop.hbase.snapshot.SnapshotInfo` utility 
with the `s3a://` path by including the
+`-remote-dir` option.
+
+----
+$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo \
+    -remote-dir s3a://<bucket>/<namespace>/hbase \
+    -list-snapshots
+----
+
+[[snapshots_azure]]
+== Storing Snapshots in Microsoft Azure Blob Storage
+
+You can store snapshots in Microsoft Azure Blog Storage using the same 
techniques
+as in <<snapshots_s3>>.
+
+.Prerequisites
+- You must be using HBase 1.2 or higher with Hadoop 2.7.1 or
+  higher. No version of HBase supports Hadoop 2.7.0.
+- Your hosts must be configured to be aware of the Azure blob storage 
filesystem.
+  See http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html.
+
+After you meet the prerequisites, follow the instructions
+in <<snapshots_s3>>, replacingthe protocol specifier with `wasb://` or 
`wasbs://`.
+
 [[ops.capacity]]
 == Capacity Planning and Region Sizing
 

Reply via email to