[
https://issues.apache.org/jira/browse/HBASE-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Gray resolved HBASE-897.
---------------------------------
Resolution: Won't Fix
Issue contains tools to perform this on really old versions and also on 0.19.
No plans to commit any of this into branches.
Other implementations for 0.18/0.19 available in HBASE-974
Closing issue as Won't Fix. 0.20 backup now being worked on in HBASE-1684
> Backup/Export/Import Tool
> -------------------------
>
> Key: HBASE-897
> URL: https://issues.apache.org/jira/browse/HBASE-897
> Project: Hadoop HBase
> Issue Type: New Feature
> Affects Versions: 0.1.2, 0.1.3
> Environment: MacOS 10.5.4, CentOS 5.1
> Reporter: Dan Zinngrabe
> Priority: Minor
> Attachments: hbase_backup_release.tar.gz,
> hbase_backup_with_hbase_0.19.x.tar.gz
>
>
> Attached is a simple import, export, and backup utility. Mahalo.com has been
> using this in production for several months to back up our HBase clusters as
> well as to migrate data from production to development clusters, etc.
> Documentation included below is from the readme.
> HBase Backup
> author: Dan Zinngrabe [email protected]
> ------------------
> Summary:
> Simple MapReduce job for exporting data from an HBase table. The exported
> data is in a simple, flat format that can then be imported using another
> MapReduce job. This gives you both a backup capability, and a simple way to
> import and export data from tables.
> Backup File Format
> ------------------
> The output of a backup job is a flat text file, or series of flat text files.
> Each row is represented by a single line, with each item tab delimited.
> Column names are plain text, while column values are base 64 encoded. This
> helps us deal with tabs and line breaks in the data. Generally you should not
> have to worry about this at all.
> Setup and installation
> ------------------
> First, make sure your Hadoop installation is properly configured to load the
> HBase classes. This can easily be done by editing the hadoop-env.sh file to
> include HBase's jar libraries. You can add the following to hadoop-env.sh to
> have it load HBase classes:
> export HBASE_HOME=/Users/quellish/Desktop/hadoop/hbase-0.1.2
> export
> HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.1.2.jar:$HBASE_HOME/conf:$HBASE_HOME/hbase-0.1.2-test.jar
> Second, make sure the hbase-backup.jar file is on the classpath for Hadoop as
> well. While you can put this into a system-wide class path directory such as
> ${JAVA_HOME}/lib , it's much easier to just put it into
> ${HADOOP_HOME}/lib
> With that done, you are ready to go. Start up hadoop and HBase normally and
> you will be able to run a backup and restore.
> Backing up
> ------------------
> Backups are run using the Exporter class. From ${HADOOP_HOME} :
> bin/hadoop com.mahalo.hadoop.hbase.Exporter -output backup -table text
> -columns text_flags: text_data:
> This will output the backup into the new directory "backup" in the Hadoop
> File System, and will back up the columns "old_flags" and "old_text", with
> whatever the table's row identifier is. Colons are required in the column
> names, and this will produce multiple files in the output directory (simply
> 'cat' them together to form a single file). Note that if the backup directory
> exists it will stop. This may be changed in a future version. The output
> directory can also be any file system path or URL that Hadoop can understand,
> such as an S3 URL.
> Restoring from a backup
> ------------------
> From ${HADOOP_HOME} :
> bin/hadoop com.mahalo.hadoop.hbase.Importer backup/backup.tsv text
> This will load a single file (that you 'cat'd together from parts),
> backup/backup.tsv into the table text. Note that the table must already
> exist, and it can have data in it - those values can be overwritten by the
> restore process. You can create the table easily using HBase's Shell. The
> backup file can be loaded from any URL that Hadoop understands, such as a
> file URL or S3 URL. A path not formatted as URL (such as shown above) assumes
> a path from your user directory in the hadoop filesystem.
> Combining a file from pieces using cat
> ------------------
> As mentioned above, typically a MapReduce job will produce several files of
> output that must be assembled together to make a single file. On a unix
> system, this is fairly easy to do, using cat and the find command: First,
> export your data from the hadoop filesystem to the local filesystem:
> bin/hadoop dfs -copyToLocal backup ~/mybackups
> Then:
> cd ~/
> find mybackups/. -name "part-00*" | xargs cat >> backup.tsv
> This will take all the files in the "backup" directory matching the pattern
> "part-00*" and combine them into a file "backup.tsv"
> Troubleshooting
> ------------------
> During a restore/import, regionservers splitting or becoming unavailable is
> normal, and the application will recover from it. You may see errors in the
> logs, but this is normal.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.