bq. The following diagram summarizes the steps for each option I don't see diagram.
Is this writing published somewhere ? On Wed, Aug 23, 2017 at 3:21 AM, RuthEvans <ruthevans...@gmail.com> wrote: > Starting with Amazon EMR 5.2.0, you have the option to run Apache HBase > <https://tekslate.com/> on Amazon S3. Running HBase on S3 gives you > several added benefits, including lower costs, data durability, and easier > scalability. > > HBase provides several options that you can use to migrate and back up > HBase > tables. The steps to migrate to HBase on S3 are similar to the steps for > HBase on the Apache Hadoop Distributed File System (HDFS). However, the > migration can be easier if you are aware of some minor differences and a > few > “gotchas.” > > In this post, I describe how to use some of the common HBase migration > options to get started with HBase on S3. > > HBase migration options > Selecting the right migration method and tools is an important step in > ensuring a successful HBase table migration. However, choosing the right > ones is not always an easy task. > > The following HBase helps you migrate to HBase on S3: > > Snapshots > Export and Import > CopyTable > The following diagram summarizes the steps for each option. > > > > > Various factors determine the HBase migration method that you use. For > example, EMR offers HBase version 1.2.3 as the earliest version that you > can > run on S3. Therefore, the HBase version that you’re migrating from can be > an > important factor in helping you decide. For more information about HBase > versions and compatibility, see the HBase version number and compatibility > documentation in the Apache HBase Reference Guide. > > If you’re migrating from an older version of HBase (for example, HBase > 0.94), you should test your application to make sure it’s compatible with > newer HBase API versions. You don’t want to spend several hours migrating a > large table only to find out that your application and API have issues with > a different HBase version. > > The good news is that HBase provides utilities that you can use to migrate > only part of a table. This lets you test your existing HBase applications > without having to fully migrate entire HBase tables. For example, you can > use the Export, Import, or CopyTable utilities to migrate a small part of > your table to HBase on S3. After you confirm that your application works > with newer HBase versions, you can proceed with migrating the entire table > using HBase <https://tekslate.com/> snapshots. > > Option 1: Migrate to HBase on S3 using snapshots > You can create table backups easily by using HBase snapshots. HBase also > provides the ExportSnapshot utility, which lets you export snapshots to a > different location, like S3. In this section, I discuss how you can combine > snapshots with ExportSnapshot to migrate tables to HBase on S3. > > For details about how you can use HBase snapshots to perform table backups, > see Using HBase Snapshots in the Amazon EMR Release Guide and HBase > Snapshots in the Apache HBase Reference Guide. These resources provide > additional settings and configurations that you can use with snapshots and > ExportSnapshot. > > The following example shows how to use snapshots to migrate HBase tables to > HBase on S3. > > Note: Earlier HBase versions, like HBase 0.94, have a different snapshot > structure than HBase 1.x, which is what you’re migrating to. If you’re > migrating from HBase 0.94 using snapshots, you get a > TableInfoMissingException error when you try to restore the table. For > details about migrating from HBase 0.94 using snapshots, see the Migrating > from HBase 0.94 section. > > From the source HBase cluster, create a snapshot of your table: > $ echo "snapshot '<table_name>', '<snapshot_name>'" | hbase shell > Export the snapshot to an S3 bucket: > $ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot > <snapshot_name> -copy-to s3://<HBase_on_S3_root_dir>/ > For the -copy-to parameter in the ExportSnapshot utility, specify the S3 > location that you are using for the HBase root directory of your EMR > cluster. If your cluster is already up and running, you can find its S3 > hbase.rootdir value by viewing the cluster’s Configurations in the EMR > console, or by using the AWS CLI. Here’s the command to find that value: > > $ aws emr describe-cluster --cluster-id <cluster_id> | grep hbase.rootdir > Launch an EMR cluster that uses the S3 storage option with HBase (skip this > step if you already have one up and running). For detailed steps, see > Creating a Cluster with HBase Using the Console in the Amazon EMR Release > Guide. When launching the cluster, ensure that the HBase root directory is > set to the same S3 location as your exported snapshots (that is, the > location used in the -copy-to parameter in the previous step). > Restore or clone the HBase table from that snapshot. > To restore the table and keep the same table name as the source table, use > restore_snapshot: > $ echo "restore_snapshot '<SNAPSHOT_NAME>'"| hbase shell > To restore the table into a different table name, use clone_snapshot: > $ echo "clone_snapshot '<snapshot_name>', '<table_name>'" | hbase shell > Migrating from HBase 0.94 using snapshots > If you’re migrating from HBase version 0.94 using the snapshot method, you > get an error if you try to restore from the snapshot. This is because the > structure of a snapshot in HBase 0.94 is different from the snapshot > structure in HBase 1.x. > > The following steps show how to fix an HBase 0.94 snapshot so that it can > be > restored to an HBase on S3 table. > > Complete steps 1—3 in the previous example to create and export a snapshot. > From your destination cluster, follow these steps to repair the snapshot: > Use s3-dist-cp to copy the snapshot data (archive) directory into a new > directory. The archive directory contains your snapshot data. Depending on > your table size, it might be large. Use s3-dist-cp to make this step > faster: > $ s3-dist-cp --src s3://<HBase_on_S3_root_dir>/.archive/<table_name> > --dest > s3://<HBase_on_S3_root_dir>/archive/data/default/<table_name> > Create and fix the snapshot descriptor file: > $ hdfs dfs -mkdir > s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tabledesc > > $ hdfs dfs -mv > s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tableinfo.<*> > s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tabledesc > Restore the snapshot: > $ echo "restore_snapshot '<snapshot_name>'" | hbase shell > Option 2: Migrate to HBase on S3 using Export and Import > As I discussed in the earlier sections, HBase snapshots and ExportSnapshot > are great options for migrating tables. But sometimes you want to migrate > only part of a table, so you need a different tool. In this section, I > describe how to use the HBase Export and Import utilities. > > The steps to migrate a table to HBase on S3 using Export and Import is not > much different from the steps provided in the HBase documentation. In those > docs, you can also find detailed information, including how you can use > them > to migrate part of a table. > > The following steps show how you can use Export and Import to migrate a > table to HBase on S3. > > From your source cluster, export the HBase table: > $ hbase org.apache.hadoop.hbase.mapreduce.Export <table_name> > s3://<table_s3_backup>/<location>/ > In the destination cluster, create the target table into which to import > data. Ensure that the column families in the target table are identical to > the exported/source table’s column families. > From the destination cluster, import the table using the Import utility: > $ hbase org.apache.hadoop.hbase.mapreduce.Import '<table_name>' > s3://<table_s3_backup>/<location>/ > HBase snapshots are usually the recommended method to migrate HBase tables. > However, the Export and Import utilities can be useful for test use cases > in > which you migrate only a small part of your table and test your > application. > It’s also handy if you’re migrating from an HBase cluster that does not > have > the HBase snapshots feature. > > Option 3: Migrate to HBase on S3 using CopyTable > Similar to the Export and Import utilities, CopyTable is an HBase utility > that you can use to copy part of HBase tables. However, keep in mind that > CopyTable doesn’t work if you’re copying or migrating tables between HBase > versions that are not wire compatible (for example, copying from HBase 0.94 > to HBase 1.x). > > For more information and examples, see CopyTable in the HBase > documentation. > > Conclusion > In this post, I demonstrated how you can use common HBase backup utilities > to migrate your tables easily to HBase on S3. By using HBase snapshots, you > can migrate entire tables to HBase <https://tekslate.com/> on S3. To > test > HBase on S3 by migrating or copying only part of your tables, you can use > the HBase Export, Import, or CopyTable utilities. > > If you have questions or suggestions, please comment below. > > > > -- > View this message in context: http://apache-hbase.679495.n3. > nabble.com/Tips-for-Migrating-to-Apache-HBase-on-Amazon-S3- > from-HDFS-tp4089926.html > Sent from the HBase Developer mailing list archive at Nabble.com. >