Import Tool

Jonathan Gray (JIRA) Wed, 22 Jul 2009 15:07:49 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jonathan Gray resolved HBASE-897.
---------------------------------

    Resolution: Won't Fix

Issue contains tools to perform this on really old versions and also on 0.19.  
No plans to commit any of this into branches.

Other implementations for 0.18/0.19 available in HBASE-974

Closing issue as Won't Fix.  0.20 backup now being worked on in HBASE-1684

> Backup/Export/Import Tool
> -------------------------
>
>                 Key: HBASE-897
>                 URL: https://issues.apache.org/jira/browse/HBASE-897
>             Project: Hadoop HBase
>          Issue Type: New Feature
>    Affects Versions: 0.1.2, 0.1.3
>         Environment: MacOS 10.5.4, CentOS 5.1
>            Reporter: Dan Zinngrabe
>            Priority: Minor
>         Attachments: hbase_backup_release.tar.gz, 
> hbase_backup_with_hbase_0.19.x.tar.gz
>
>
> Attached is a simple import, export, and backup utility. Mahalo.com has been 
> using this in production for several months to back up our HBase clusters as 
> well as to migrate data from production to development clusters, etc.
> Documentation included below is from the readme.
> HBase Backup
> author: Dan Zinngrabe [email protected]
> ------------------
> Summary:
> Simple MapReduce job for exporting data from an HBase table. The exported 
> data is in a simple, flat format that can then be imported using another 
> MapReduce job. This gives you both a backup capability, and a simple way to 
> import and export data from tables.
> Backup File Format
> ------------------
> The output of a backup job is a flat text file, or series of flat text files. 
> Each row is represented by a single line, with each item tab delimited. 
> Column names are plain text, while column values are base 64 encoded. This 
> helps us deal with tabs and line breaks in the data. Generally you should not 
> have to worry about this at all.
> Setup and installation
> ------------------
> First, make sure your Hadoop installation is properly configured to load the 
> HBase classes. This can easily be done by editing the hadoop-env.sh file to 
> include HBase's jar libraries. You can add the following to hadoop-env.sh to 
> have it load HBase classes:
> export HBASE_HOME=/Users/quellish/Desktop/hadoop/hbase-0.1.2
> export 
> HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.1.2.jar:$HBASE_HOME/conf:$HBASE_HOME/hbase-0.1.2-test.jar
> Second, make sure the hbase-backup.jar file is on the classpath for Hadoop as 
> well. While you can put this into a system-wide class path directory such as 
> ${JAVA_HOME}/lib , it's much easier to just put it into
> ${HADOOP_HOME}/lib
> With that done, you are ready to go. Start up hadoop and HBase normally and 
> you will be able to run a backup and restore.
> Backing up
> ------------------
> Backups are run using the Exporter class. From  ${HADOOP_HOME} :
> bin/hadoop com.mahalo.hadoop.hbase.Exporter -output backup -table text 
> -columns text_flags: text_data:
> This will output the backup into the new directory "backup" in the Hadoop 
> File System, and will back up the columns "old_flags" and "old_text", with 
> whatever the table's row identifier is. Colons are required in the column 
> names, and this will produce multiple files in the output directory (simply 
> 'cat' them together to form a single file). Note that if the backup directory 
> exists it will stop. This may be changed in a future version. The output 
> directory can also be any file system path or URL that Hadoop can understand, 
> such as an S3 URL.
> Restoring from a backup
> ------------------
> From  ${HADOOP_HOME} :
> bin/hadoop com.mahalo.hadoop.hbase.Importer backup/backup.tsv text
> This will load a single file (that you 'cat'd together from parts), 
> backup/backup.tsv into the table text. Note that the table must already 
> exist, and it can have data in it - those values can be overwritten by the 
> restore process. You can create the table easily using HBase's Shell. The 
> backup file can be loaded from any URL that Hadoop understands, such as a 
> file URL or S3 URL. A path not formatted as URL (such as shown above) assumes 
> a path from your user directory in the hadoop filesystem.
> Combining a file from pieces using cat
> ------------------
> As mentioned above, typically a MapReduce job will produce several files of 
> output that must be assembled together to make a single file. On a unix 
> system, this is fairly easy to do, using cat and the find command: First, 
> export your data from the hadoop filesystem to the local filesystem:
> bin/hadoop dfs -copyToLocal backup ~/mybackups
> Then:
> cd ~/
> find mybackups/. -name "part-00*" | xargs cat >> backup.tsv
> This will take all the files in the "backup" directory matching the pattern 
> "part-00*" and combine them into a file "backup.tsv"
> Troubleshooting
> ------------------
> During a restore/import, regionservers splitting or becoming unavailable is 
> normal, and the application will recover from it. You may see errors in the 
> logs, but this is normal.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-897) Backup/Export/Import Tool

Reply via email to