[Hadoop Wiki] Update of "Hbase/HowToMigrate" by stack

Apache Wiki Thu, 23 Jul 2009 13:48:02 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/HowToMigrate

------------------------------------------------------------------------------
  
  The way to confirm that the major compaction completed is to do a listing of 
the hbase rootdir in hdfs.  For each region on the filesystem, each of its 
stores should have one mapfile only if major compaction succeeded.  For 
example, below we list whats under the 'a' table directory under the hbase 
rootdir:
  
+ {{{/tmp/hbase-stack/hbase/a
+ /tmp/hbase-stack/hbase/a/1833721875
+ /tmp/hbase-stack/hbase/a/1833721875/a
+ /tmp/hbase-stack/hbase/a/1833721875/a/info
+ /tmp/hbase-stack/hbase/a/1833721875/a/info/8167759949199600085
+ /tmp/hbase-stack/hbase/a/1833721875/a/info/.8167759949199600085.crc
+ /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles
+ /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085
+ /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/data
+ /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/.data.crc
+ /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/.index.crc
+ /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/index}}}
+ 
+ There is one column family in this table named 'a' (unfortunately, since it 
muddles the example, the table name is also 'a').  The table has one region 
whose encoded name is 1833721875.  Under this region directory, there are 
family directories -- in this case there is one for the 'a' family -- and under 
each family directory, there is the {{{info}}} -- for store file metadata -- 
and the {{{mapfiles}}} directories.  There is only one mapfile in our case 
above, named 8167759949199600085 (MapFiles are made of data and index files).
+ 
+ You cannot migrate unless all has been major compacted first.  
+ 
+ -ROOT- and .META. flush frequently so they can mess up your nice and tidy 
single-file per store major_compacted hbase layout.  They won't flush if there 
have not been edits so, make sure your cluster is not taking writes and hasn't 
been doing so for a good while before starting up the major compaction process. 
 Getting your cluster to shutdown with one file only in -ROOT- and .META. may 
be a bit tough so to help, facility has been added to the HEAD of the 0.19 
branch that will allow you major compact catalog regions in a shutdown hbase.  
This facility only works on the -ROOT- and .META. catalog tables, not on user 
space tables.  For usage, type:
+ 
+ {{{./bin/hbase org.apache.hadoop.hbase.regionserver.HRegion}}}
+ 
+ For example, to major compact the -ROOT-:
+ 
+ {{{$ ./bin/hbase org.apache.hadoop.hbase.regionserver.HRegion 
hdfs://aa0-000-12:9002/hbasetrunk2/-ROOT- major_compact}}}
+ 
+ Don't forget the 'major_compact' off the end else it just lists out the 
content of the region.
+ 
+ I had to copy the hadoop-site.xml to a location where it would be picked up 
by the above script -- e.g. from my hadoop 0.19 install to my 
{{{$HBASE_HOME/conf}}} -- so the above script could find the right HDFS 
otherwise it was going against local filesystem.
+ 
+ ===== Can you back up your data? =====
+ Migration has been tested but if you have sufficient space in hdfs to make a 
copy of your hbase rootdir, do so.  Just in case.  Use hdfs distcp.
+ 
+ ==== Migrating ====
+ 
+ Migrate hadoop. Refer to the [http://wiki.apache.org/hadoop/Hadoop%20Upgrade 
Hadoop Upgrade] page.
+ 
+ Migrate HBase.  The bulk of the time involved migration is the rewriting of 
the hbase storefiles from their 0.19 format into the new 0.20 format.  Each 
rewrite takes about 6-10 seconds.  In the filesystem, count roughly how many 
regions you have (or get it off the UI).  Multiple regions * 10 seconds.  If 
the migration will take longer than you are prepared to wait, there is a 
mapreduce job to do the file convertions only:
+ 
+ {{{$./bin/hadoop jar hbase-0.20.x.jar hsf2sf}}}
+ 
+ This job takes an empty input and output directory.  It will first run 
through your filesystem to find all mapfile to convert, write a file to the 
input directory and then startup the mapreduce job to do the convertions.
+ 
+ Now, run the hbase migration script.  If you have run the mapreduce job, it 
will notice that all storefiles have been rewritten and will skip the rewrite 
step.  Otherwise, the migration script first does this.
+ 
+ {{{$./bin/hbase migrate upgrade}}}
+ 
+ 
+ ==== Post-Migration ====
+ Make sure you replace all under {{{$HBASE_HOME/conf}}} with files from the 
new release.  For example, be sure to replace your old hbase-default.xml with 
the version from the new hbase release.
+ 
+ Read the new 'Getting Started' carefully before starting up your cluster.  
Basic configuration properties have changed.  For example 
{{{hbase.master}}}/{{{hbase.master.hostname}}} is no longer used.  They are 
replaced by {{{hbase.cluster.distributed}}}.  See the 'Getting Started' for 
detail on how to set the new properties.  While your cluster will likely come 
up on the old configuration settings, you should move to the new configuration.
+ 
+ == From 0.1.x to 0.2.x or 0.18.x ==
+ 
+ The following are step-by-step instructions for migrating from HBase 0.1 to 
0.2 or 0.18.  Migration from 0.1 to 0.2 requires an upgrade from Hadoop 0.16 to 
0.17, and migration from 0.1 to 0.18 requires an upgrade from Hadoop 0.16 to 
0.18. The [http://wiki.apache.org/hadoop/Hadoop%20Upgrade Hadoop Upgrade 
Instructions] are slightly out-of-date (as of this writing, September 2008).  
As such, the below instructions also clarify the necessary steps for upgrading 
Hadoop.
+ 
+ Assume Hadoop 0.16 and HBase 0.1 are already running with data you wish 
migrate to HBase 0.2.
+  * Stop HBase 0.1.
+  * From the [http://wiki.apache.org/hadoop/Hadoop%20Upgrade Hadoop Upgrade 
Instructions], perform steps 1-4 and 9-10 (and optionally 5-8, 11-12) on your 
instance of Hadoop 0.16.
+  * Run {{{{$HADOOP_HOME_0.17}/bin/start-dfs.sh -upgrade}}}
+  * Perform Hadoop upgrade steps 16-19 on your instance of Hadoop 0.17.
+  * Run {{{{$HADOOP_HOME_0.17}/bin/hadoop dfsadmin -finalizeUpgrade}}}
+  * Download and configure HBase 0.2.  Make sure ''hbase.rootdir'' is 
configured to be the same as it was in HBase 0.1.
+  * Run {{{{$HBASE_HOME_0.2}/bin/hbase migrate upgrade}}}
+  * Start HBase 0.2.
+ 
+ As you will notice, the [http://wiki.apache.org/hadoop/Hadoop%20Upgrade 
Hadoop Upgrade Instructions] (specifically steps 2-4, 16-18) ask you to 
generate several logs to compare and ensure that the upgrade ran correctly.  I 
did notice some inconsistency in my logs between ''dfs-v-old-report-1.log'' and 
''dfs-v-new-report-1.log''; specifically the ''Total effective bytes'' and 
''Effective replication multiplier'' fields did not match (in the new log, the 
values reported were zero and infinity, respectively).  Additionally, 
''dfs-v-new-report-1.log'' claimed that the update was not finalized.  Running 
{{{{$HADOOP_HOME}/bin/hadoop dfsadmin -finalizeUpgrade}}} resolves the second 
issue, finalizing the upgrade as expected.  I could not find a way to resolve 
the inconsistencies with the ''Total effective bytes'' and ''Effective 
replication multiplier'' fields.  Nonetheless, I found no problems with the 
migration and the data appeared to be completely intact.
+ 
+ The API in 0.2 is not backward-compatible with hbase 0.1 versions.  See 
[http://wiki.apache.org/hadoop/Hbase/Plan-0.2/APIChanges API Changes] for 
discussion of the main differences.
+

[Hadoop Wiki] Update of "Hbase/HowToMigrate" by stack

Reply via email to