Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/HowToMigrate

------------------------------------------------------------------------------
  
   * Shut down the old instance of HBase.
   * If necessary, upgrade the underlying version of Hadoop to the version 
required by the new instance of HBase.  Refer to the 
[http://wiki.apache.org/hadoop/Hadoop%20Upgrade Hadoop Upgrade] page.
-  * Backup your hbase.rootdir.
+  * Optionally backup your hbase.rootdir.
   * Download and configure the new instance of HBase.  Make sure you configure 
the ''hbase.rootdir'' of the new instance to be the same as that from the old 
instance.
   * From the new instance of HBase, perform the HBase migration.  Run 
{{{{$HBASE_HOME}/bin/hbase migrate}}} for usage.  See the version-specific 
notes below for more specific information on this process.
   * Start the new instance of HBase.
@@ -15, +15 @@

  
  Below are general migration notes followed by specifics on how to migrate 
between particular versions, newest to oldest.
  
- == Other Migration-Related Concerns ==
+ == General Migration Notes ==
  
  Migration is only supported between the file system version of the previous 
release and the file system version of the current release. If the existing 
HBase installation has an older file system version, it will be necessary to 
install a HBase release which can perform the upgrade, run the migration tool 
and then install the desired release and run its migration script. (Note that 
if the existing installation is several versions old, it may be necessary to 
repeat this process).
- 
- === Redo Logs ===
- 
- (Below does not apply if migrating from hbase 0.19 to hbase 0.20)
- 
- It is possible that, when running the HBase migration command, the migration 
will fail because of "unrecovered redo logs."  Redo logs are generated every 
time HBase is started, and under normal circumstances they are removed when 
HBase is stopped cleanly.  However, if you have ever stopped HBase in some 
atypical way (for example, using {{{kill -9}}}), these redo logs will persist 
in Hadoop DFS.  To see if you have any unrecovered redo logs, stop any 
currently-running instances of HBase and enter: {{{{$HADOOP_HOME}/bin/hadoop 
dfs -ls /hbase}}}.  All existing redo logs will be in this directory.  Redo log 
directories can be removed using dfs {{{-rm}}} option.  WARNING: redo logs are 
the only way to recover any data entered before HBase was improperly stopped.  
Removing redo logs with file size greater than zero may result in irreversible 
data loss.
  
  == Version-Specific Migration Notes ==
  
@@ -33, +27 @@

  
  You can only migrate to 0.20.x from 0.19.x.  If you have an earlier hbase, 
you will need to install 0.19, migrate your old instance, and then install 
0.20.x.
  
- Migration does not work for transactional hbase installs or for indexed hbase 
installs.  Talk to us if you need this.
+ Migration has not been tested on transactional hbase installs or for indexed 
hbase installs.  It may just work.  It may not.
  
  This migration rewrites all data.  It will take a while.
  
@@ -41, +35 @@

  
  ==== Preparing for Migration ====
  
- You must do a few things first before you can begin migration of either 
hadoop or hbase.
+ You MUST do a few things first before you can begin migration of either 
hadoop or hbase.
- 
- ===== Can you back up your data? =====
- Migration has been tested but if you have sufficient space in hdfs to make a 
copy of your hbase rootdir, do so.  Just in case.  Use hdfs distcp.
  
  ===== Major Compacting all Tables =====
- Before you begin, you MUST run a major compaction on all tables including 
.META. table.  Migration will not work without your completing major 
compaction.  To major compact from the shell, hbase must be running.  For 
example, the below cluster has only one table named 'a'.  See how we run a 
major_compaction on each:
+ Before you begin, you MUST run a major compaction on all tables including 
.META. table.  A major compaction compacts all store files in a family together 
dropping deleted and expired cells.  Major compaction is necessary because the 
way deletes work changed in 0.20 hbase.  Migration will not work without your 
completing major compaction.  Use the shell to start up major compactions.  For 
example, the below cluster has only one table named 'a'.  See how we run a 
major_compaction on each:
  
  {{{st...@connelly:~/checkouts/hbase/branches/0.19$ ./bin/hbase shell
  HBase Shell; enter 'help<RETURN>' for list of supported commands.
@@ -62, +53 @@

  hbase(main):004:0> major_compact '-ROOT-'
  0 row(s) in 0.0173 seconds}}}
  
- In the above, the compaction took no time.  The case will likely be different 
for you if you have big tables.  The way to confirm that the major compaction 
completed is to do a listing of the hbase rootdir in hdfs.  For each region on 
the filesystem, each of its stores should have one mapfile only if major 
compaction succeeded.  For example, below we list whats under the 'a' table 
directory under the hbase rootdir:
+ In the above, the compaction took no time.  The case will likely be different 
for you if you have big tables.
+ 
+ The way to confirm that the major compaction completed is to do a listing of 
the hbase rootdir in hdfs.  For each region on the filesystem, each of its 
stores should have one mapfile only if major compaction succeeded.  For 
example, below we list whats under the 'a' table directory under the hbase 
rootdir:
  
  {{{/tmp/hbase-stack/hbase/a
  /tmp/hbase-stack/hbase/a/1833721875
@@ -77, +70 @@

  /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/.index.crc
  /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/index}}}
  
- There is one column family in this table named 'a' (unfortunately).  The 
table has one region whose encoded name is 1833721875.  Under this region 
directory, there are the info -- for metadata -- and mapfile directories.  
There is only one mapfile in our case above, named 8167759949199600085 
(MapFiles are made of data and index files).
+ There is one column family in this table named 'a' (unfortunately, since it 
muddles the example, the table name is also 'a').  The table has one region 
whose encoded name is 1833721875.  Under this region directory, there are 
family directories -- in this case there is one for the 'a' family -- and under 
each family directory, there is the {{{info}}} -- for store file metadata -- 
and the {{{mapfiles}}} directories.  There is only one mapfile in our case 
above, named 8167759949199600085 (MapFiles are made of data and index files).
  
- You cannot migrate unless all has been major compacted first.  Major 
compaction is necessary because the way deletes work changed in 0.20 hbase.
+ You cannot migrate unless all has been major compacted first.  
  
- -ROOT- and .META. flush frequently so could mess up your nice and tidy 
single-file per store major_compacted hbase layout.  They won't flush if there 
have not been edits.  So, make sure your cluster is not taking writes and 
hasn't been doing so for a good while before starting up the major compaction 
process.  Getting your cluster to shutdown with one file only in -ROOT- and 
.META. may be a bit tough so to help, facility has been added to the HEAD of 
the 0.19 branch that will allow you major compact catalog regions in a shutdown 
hbase.  This facility only works on the -ROOT- and .META. catalog tables, not 
on user space tables.  For usage, type:
+ -ROOT- and .META. flush frequently so they can mess up your nice and tidy 
single-file per store major_compacted hbase layout.  They won't flush if there 
have not been edits so, make sure your cluster is not taking writes and hasn't 
been doing so for a good while before starting up the major compaction process. 
 Getting your cluster to shutdown with one file only in -ROOT- and .META. may 
be a bit tough so to help, facility has been added to the HEAD of the 0.19 
branch that will allow you major compact catalog regions in a shutdown hbase.  
This facility only works on the -ROOT- and .META. catalog tables, not on user 
space tables.  For usage, type:
  
  {{{./bin/hbase org.apache.hadoop.hbase.regionserver.HRegion}}}
  
- For example:
+ For example, to major compact the -ROOT-:
  
  {{{$ ./bin/hbase org.apache.hadoop.hbase.regionserver.HRegion 
hdfs://aa0-000-12:9002/hbasetrunk2/-ROOT- major_compact}}}
  
@@ -93, +86 @@

  
  I had to copy the hadoop-site.xml to a location where it would be picked up 
by the above script -- e.g. from my hadoop 0.19 install to my 
{{{$HBASE_HOME/conf}}} -- so the above script could find the right HDFS 
otherwise it was going against local filesystem.
  
+ ===== Can you back up your data? =====
+ Migration has been tested but if you have sufficient space in hdfs to make a 
copy of your hbase rootdir, do so.  Just in case.  Use hdfs distcp.
+ 
  ==== Migrating ====
  
  Migrate hadoop. Refer to the [http://wiki.apache.org/hadoop/Hadoop%20Upgrade 
Hadoop Upgrade] page.
  
- Migrate HBase.
+ Migrate HBase.  The bulk of the time involved migration is the rewriting of 
the hbase storefiles from their 0.19 format into the new 0.20 format.  Each 
rewrite takes about 6-10 seconds.  In the filesystem, count roughly how many 
regions you have (or get it off the UI).  Multiple regions * 10 seconds.  If 
the migration will take longer than you are prepared to wait, there is a 
mapreduce job to do the file convertions only:
+ 
+ {{{$./bin/hadoop jar hbase.jar hsf2sf}}}
+ 
+ This job takes an empty input and output directory.  It will first run 
through your filesystem to find all mapfile to convert, write a file to the 
input directory and then startup the mapreduce job to do the convertions.
+ 
+ Now, run the hbase migration script.  If you have run the mapreduce job, it 
will notice that all storefiles have been rewritten and will skip the rewrite 
step.  Otherwise, the migration script first does this.
+ 
+ {{{$./bin/hbase migrate upgrade}}}
+ 
  
  
  ==== Post-Migration ====

Reply via email to