Re: Tips for Migrating to Apache HBase on Amazon S3 from HDFS

Ted Yu Wed, 23 Aug 2017 09:26:30 -0700

bq. The following diagram summarizes the steps for each option

I don't see diagram.


Is this writing published somewhere ?

On Wed, Aug 23, 2017 at 3:21 AM, RuthEvans <ruthevans...@gmail.com> wrote:

> Starting with Amazon EMR 5.2.0, you have the option to run Apache HBase
> <https://tekslate.com/>   on Amazon S3. Running HBase on S3 gives you
> several added benefits, including lower costs, data durability, and easier
> scalability.
>
> HBase provides several options that you can use to migrate and back up
> HBase
> tables. The steps to migrate to HBase on S3 are similar to the steps for
> HBase on the Apache Hadoop Distributed File System (HDFS). However, the
> migration can be easier if you are aware of some minor differences and a
> few
> “gotchas.”
>
> In this post, I describe how to use some of the common HBase migration
> options to get started with HBase on S3.
>
> HBase migration options
> Selecting the right migration method and tools is an important step in
> ensuring a successful HBase table migration. However, choosing the right
> ones is not always an easy task.
>
> The following HBase helps you migrate to HBase on S3:
>
> Snapshots
> Export and Import
> CopyTable
> The following diagram summarizes the steps for each option.
>
>
>
>
> Various factors determine the HBase migration method that you use. For
> example, EMR offers HBase version 1.2.3 as the earliest version that you
> can
> run on S3. Therefore, the HBase version that you’re migrating from can be
> an
> important factor in helping you decide. For more information about HBase
> versions and compatibility, see the HBase version number and compatibility
> documentation in the Apache HBase Reference Guide.
>
> If you’re migrating from an older version of HBase (for example, HBase
> 0.94), you should test your application to make sure it’s compatible with
> newer HBase API versions. You don’t want to spend several hours migrating a
> large table only to find out that your application and API have issues with
> a different HBase version.
>
> The good news is that HBase provides utilities that you can use to migrate
> only part of a table. This lets you test your existing HBase applications
> without having to fully migrate entire HBase tables. For example, you can
> use the Export, Import, or CopyTable utilities to migrate a small part of
> your table to HBase on S3. After you confirm that your application works
> with newer HBase versions, you can proceed with migrating the entire table
> using  HBase <https://tekslate.com/>   snapshots.
>
> Option 1: Migrate to HBase on S3 using snapshots
> You can create table backups easily by using HBase snapshots. HBase also
> provides the ExportSnapshot utility, which lets you export snapshots to a
> different location, like S3. In this section, I discuss how you can combine
> snapshots with ExportSnapshot to migrate tables to HBase on S3.
>
> For details about how you can use HBase snapshots to perform table backups,
> see Using HBase Snapshots in the Amazon EMR Release Guide and HBase
> Snapshots in the Apache HBase Reference Guide. These resources provide
> additional settings and configurations that you can use with snapshots and
> ExportSnapshot.
>
> The following example shows how to use snapshots to migrate HBase tables to
> HBase on S3.
>
> Note: Earlier HBase versions, like HBase 0.94, have a different snapshot
> structure than HBase 1.x, which is what you’re migrating to. If you’re
> migrating from HBase 0.94 using snapshots, you get a
> TableInfoMissingException error when you try to restore the table. For
> details about migrating from HBase 0.94 using snapshots, see the Migrating
> from HBase 0.94 section.
>
> From the source HBase cluster, create a snapshot of your table:
> $ echo "snapshot '<table_name>', '<snapshot_name>'" | hbase shell
> Export the snapshot to an S3 bucket:
> $ hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
> <snapshot_name> -copy-to s3://<HBase_on_S3_root_dir>/
> For the -copy-to parameter in the ExportSnapshot utility, specify the S3
> location that you are using for the HBase root directory of your EMR
> cluster. If your cluster is already up and running, you can find its S3
> hbase.rootdir value by viewing the cluster’s Configurations in the EMR
> console, or by using the AWS CLI. Here’s the command to find that value:
>
> $ aws emr describe-cluster --cluster-id <cluster_id> | grep hbase.rootdir
> Launch an EMR cluster that uses the S3 storage option with HBase (skip this
> step if you already have one up and running). For detailed steps, see
> Creating a Cluster with HBase Using the Console in the Amazon EMR Release
> Guide. When launching the cluster, ensure that the HBase root directory is
> set to the same S3 location as your exported snapshots (that is, the
> location used in the -copy-to parameter in the previous step).
> Restore or clone the HBase table from that snapshot.
> To restore the table and keep the same table name as the source table, use
> restore_snapshot:
> $ echo "restore_snapshot '<SNAPSHOT_NAME>'"| hbase shell
> To restore the table into a different table name, use clone_snapshot:
> $ echo "clone_snapshot '<snapshot_name>', '<table_name>'" | hbase shell
> Migrating from HBase 0.94 using snapshots
> If you’re migrating from HBase version 0.94 using the snapshot method, you
> get an error if you try to restore from the snapshot. This is because the
> structure of a snapshot in HBase 0.94 is different from the snapshot
> structure in HBase 1.x.
>
> The following steps show how to fix an HBase 0.94 snapshot so that it can
> be
> restored to an HBase on S3 table.
>
> Complete steps 1—3 in the previous example to create and export a snapshot.
> From your destination cluster, follow these steps to repair the snapshot:
> Use s3-dist-cp to copy the snapshot data (archive) directory into a new
> directory. The archive directory contains your snapshot data. Depending on
> your table size, it might be large. Use s3-dist-cp to make this step
> faster:
> $ s3-dist-cp --src s3://<HBase_on_S3_root_dir>/.archive/<table_name>
> --dest
> s3://<HBase_on_S3_root_dir>/archive/data/default/<table_name>
> Create and fix the snapshot descriptor file:
> $ hdfs dfs -mkdir
> s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tabledesc
>
> $ hdfs dfs -mv
> s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tableinfo.<*>
> s3://<HBase_on_S3_root_dir>/.hbase-snapshot/<snapshot_name>/.tabledesc
> Restore the snapshot:
> $ echo "restore_snapshot '<snapshot_name>'" | hbase shell
> Option 2: Migrate to HBase on S3 using Export and Import
> As I discussed in the earlier sections, HBase snapshots and ExportSnapshot
> are great options for migrating tables. But sometimes you want to migrate
> only part of a table, so you need a different tool. In this section, I
> describe how to use the HBase Export and Import utilities.
>
> The steps to migrate a table to HBase on S3 using Export and Import is not
> much different from the steps provided in the HBase documentation. In those
> docs, you can also find detailed information, including how you can use
> them
> to migrate part of a table.
>
> The following steps show how you can use Export and Import to migrate a
> table to HBase on S3.
>
> From your source cluster, export the HBase table:
> $ hbase org.apache.hadoop.hbase.mapreduce.Export <table_name>
> s3://<table_s3_backup>/<location>/
> In the destination cluster, create the target table into which to import
> data. Ensure that the column families in the target table are identical to
> the exported/source table’s column families.
> From the destination cluster, import the table using the Import utility:
> $ hbase org.apache.hadoop.hbase.mapreduce.Import '<table_name>'
> s3://<table_s3_backup>/<location>/
> HBase snapshots are usually the recommended method to migrate HBase tables.
> However, the Export and Import utilities can be useful for test use cases
> in
> which you migrate only a small part of your table and test your
> application.
> It’s also handy if you’re migrating from an HBase cluster that does not
> have
> the HBase snapshots feature.
>
> Option 3: Migrate to HBase on S3 using CopyTable
> Similar to the Export and Import utilities, CopyTable is an HBase utility
> that you can use to copy part of HBase tables. However, keep in mind that
> CopyTable doesn’t work if you’re copying or migrating tables between HBase
> versions that are not wire compatible (for example, copying from HBase 0.94
> to HBase 1.x).
>
> For more information and examples, see CopyTable in the HBase
> documentation.
>
> Conclusion
> In this post, I demonstrated how you can use common HBase backup utilities
> to migrate your tables easily to HBase on S3. By using HBase snapshots, you
> can migrate entire tables to  HBase <https://tekslate.com/>   on S3. To
> test
> HBase on S3 by migrating or copying only part of your tables, you can use
> the HBase Export, Import, or CopyTable utilities.
>
> If you have questions or suggestions, please comment below.
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.
> nabble.com/Tips-for-Migrating-to-Apache-HBase-on-Amazon-S3-
> from-HDFS-tp4089926.html
> Sent from the HBase Developer mailing list archive at Nabble.com.
>

Re: Tips for Migrating to Apache HBase on Amazon S3 from HDFS

Reply via email to