Repository: falcon Updated Branches: refs/heads/master 7b78c39eb -> ed410e841
FALCON-1908 Document HDFS snapshot based mirroring extension Author: bvellanki <[email protected]> Reviewers: "Ying Zheng <[email protected]>" Closes #139 from bvellanki/FALCON-1908 Project: http://git-wip-us.apache.org/repos/asf/falcon/repo Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/ed410e84 Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/ed410e84 Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/ed410e84 Branch: refs/heads/master Commit: ed410e841b8465af45dbef236e83db5618508816 Parents: 7b78c39 Author: bvellanki <[email protected]> Authored: Fri May 13 09:09:41 2016 -0700 Committer: bvellanki <[email protected]> Committed: Fri May 13 09:09:41 2016 -0700 ---------------------------------------------------------------------- docs/src/site/twiki/Extensions.twiki | 1 + docs/src/site/twiki/HdfsSnapshotMirroring.twiki | 93 ++++++++++++++++++++ 2 files changed, 94 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/falcon/blob/ed410e84/docs/src/site/twiki/Extensions.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/Extensions.twiki b/docs/src/site/twiki/Extensions.twiki index 8e74321..cf88c87 100644 --- a/docs/src/site/twiki/Extensions.twiki +++ b/docs/src/site/twiki/Extensions.twiki @@ -43,6 +43,7 @@ Sample extensions are published in addons/extensions ---++ Types of extensions * [[HDFSMirroring][HDFS mirroring extension]] * [[HiveMirroring][Hive mirroring extension]] + * [[HdfsSnapshotMirroring][HDFS snapshot based mirroring]] ---++ Packaging and installation http://git-wip-us.apache.org/repos/asf/falcon/blob/ed410e84/docs/src/site/twiki/HdfsSnapshotMirroring.twiki ---------------------------------------------------------------------- diff --git a/docs/src/site/twiki/HdfsSnapshotMirroring.twiki b/docs/src/site/twiki/HdfsSnapshotMirroring.twiki new file mode 100644 index 0000000..ec4f16c --- /dev/null +++ b/docs/src/site/twiki/HdfsSnapshotMirroring.twiki @@ -0,0 +1,93 @@ +---+HDFS Snapshot based Mirroring + +---++Overview +HDFS snapshots are very cost effective to create ( cost is O(1) excluding iNode lookup time). Once created, it is very +efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). +This makes for cost effective HDFS mirroring. + +---++Prerequisites +Following is the prerequisite to use HDFS Snapshot based Mirrroring. + + * Hadoop version 2.7.0 or higher. + * User submitting and scheduling falcon snapshot based mirroring job should have permission to create and manage snapshots on both source and target directories. + +---++ Use Case +Create and manage snapshots on source/target directories. Mirror data from source to target for disaster +recovery using these snapshots. Perform retention on the snapshots created on source and target. + + +---++ Usage + +---+++ Setup + * Submit a source cluster and target cluster entities to Falcon. + <verbatim> + $FALCON_HOME/bin/falcon entity -submit -type cluster -file source-cluster-definition.xml + $FALCON_HOME/bin/falcon entity -submit -type cluster -file target-cluster-definition.xml + </verbatim> + * Ensure that source directory on source cluster and target directory on target cluster exists. + * Ensure that these dirs are snapshot-able by user submitting extension. You can find more [[https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html][information on snapshots here]]. + +---+++ HDFS Snapshot based mirroring extension properties + Extension artifacts are expected to be installed on HDFS at the path specified by "extension.store.uri" in startup properties. + hdfs-snapshot-mirroring-properties.json file located at "<extension.store.uri>/hdfs-snapshot-mirroring/META/hdfs-snapshot-mirroring-properties.json" + lists all the required and optional parameters/arguments for scheduling the mirroring job. + + Here is a sample set of properties, + <verbatim> + ## Job Properties + jobName=hdfs-snapshot-test + jobClusterName=backupCluster + jobValidityStart=2016-01-01T00:00Z + jobValidityEnd=2016-04-01T00:00Z + jobFrequency=hours(12) + jobTimezone=UTC + [email protected] + jobRetryPolicy=periodic + jobRetryDelay=minutes(30) + jobRetryAttempts=3 + + ## Job owner + jobAclOwner=ambari-qa + jobAclGroup=users + jobAclPermission=* + + ## Source information + sourceCluster=primaryCluster + sourceSnapshotDir=/apps/falcon/snapshots/source/ + sourceSnapshotRetentionPolicy=delete + sourceSnapshotRetentionAgeLimit=days(15) + sourceSnapshotRetentionNumber=10 + + ## Target information + targetCluster=backupCluster + targetSnapshotDir=/apps/falcon/snapshots/target/ + targetSnapshotRetentionPolicy=delete + targetSnapshotRetentionAgeLimit=months(6) + targetSnapshotRetentionNumber=20 + + ## Distcp properties + distcpMaxMaps=1 + distcpMapBandwidth=100 + tdeEncryptionEnabled=false + </verbatim> + + +The above properties ensure Falcon hdfs snapshot based mirroring extension does the following every 12 hours. + * Create snapshot on dir /apps/falcon/snapshots/source/ on primaryCluster. + * DistCP data from /apps/falcon/snapshots/source/ on primaryCluster to /apps/falcon/snapshots/target/ on backupCluster. + * Create snapshot on dir /apps/falcon/snapshots/target/ on backupCluster. + * Perform retention job on source and target. + * Maintain at least N latest snapshots and delete all other snapshots older than specified age limit. + * Today, only "delete" policy is supported for snapshot retention. + +*Note:* +When TDE encryption is enabled on source/target directories, DistCP ignores the snapshots and treats it like a regular +replication. While user may not get the performance benefit of using snapshot based DistCP, the extension is still useful +for creating and maintaining snapshots. + +---+++ Submit and schedule HDFS snapshot mirroring extension +User can submit extension using CLI or RestAPI. CLI command looks as follows + <verbatim> + $FALCON_HOME/bin/falcon extension -submitAndSchedule -extensionName hdfs-snapshot-mirroring -file propeties-file.txt + </verbatim> + Please Refer to [[falconcli/FalconCLI][Falcon CLI]] and [[restapi/ResourceList][REST API]] for more details on usage of CLI and REST API's. \ No newline at end of file
