Rick:
Thanks for the great feedback.
Other comments interspersed:
Rick Hangartner wrote:
...
2) In both approaches, when we tried to do the data migration from
hbase-0.1.3 to hbase-0.2.0 we first got migration failures due to
"unrecovered region server logs". Following the 'Redo Logs' comments
in the "http://wiki.apache.org/hadoop/Hbase/HowToMigrate" doc,
starting afresh with a new copy of our virtualized system each time,
we tried these methods of getting rid of the those logs and the fatal
error:
a) deleting just the log files in the "/hbase" directory
Did this not work? How'd you do it?
b) deleting the entire contents of the "/hbase" directory (which
means we lost our data, but we are just investigating the upgrade
path, after all)
c) deleting the "/hbase" directory entirely and creating a new
"/hbase" directory.
I should also note that we would need to repeat approach a) to be 100%
certain of our results for that case. (We've already repeated
approaches b) and c) and have just run out of time for these upgrade
tests because we need to get to other things).
In all cases, the migrate then failed as:
[EMAIL PROTECTED]:~/hbase$ bin/hbase migrate upgrade
08/07/22 18:03:16 INFO util.Migrate: Verifying that file system is
available...
08/07/22 18:03:16 INFO util.Migrate: Verifying that HBase is not
running...
08/07/22 18:03:17 INFO ipc.Client: Retrying connect to server:
savory1/10.0.0.45:60000. Already tried 1 time(s).
...
08/07/22 18:03:26 INFO ipc.Client: Retrying connect to server:
savory1/10.0.0.45:60000. Already tried 10 time(s).
08/07/22 18:03:27 INFO util.Migrate: Starting upgrade
08/07/22 18:03:27 FATAL util.Migrate: Upgrade failed
java.io.IOException: Install 0.1.x of hbase and run its migration first
at org.apache.hadoop.hbase.util.Migrate.run(Migrate.java:181)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.hbase.util.Migrate.main(Migrate.java:446)
This would seem to indicate that the hbase.version file is missing from
under /hbase directory.
As you can see, we were then in a bit of a Catch-22. To re-install
HBase-0.1.x also required re-installing Hadoop-0.16.4 (we tried
reinstalling HBase-0.1.x without doing that!) so there was no way to
proceed. Attempting to start up HBase-0.2.0 just resulted in an error
message that we needed to do the migrate.
Hmm. I suppose this error message is kinda useless. If you've already
made the committment to 0.17.x, you can't really go back.
Luo Ning made a patch so you can run 0.1.3 hbase on 0.17.1 hadoop:
https://issues.apache.org/jira/browse/HBASE-749. Maybe this is what we
should be recommending folks do in the migration doc. and in the
migration emission?
3) Since this was just a test, we then blew away the disk used by
Hadoop and re-built the namenode per a standard Hadoop new install.
Hadoop-0.17.1 and Hbase-0.2.0 then started up just fine. We only ran
a few tests with the new Hbase command line shell in some ways we used
the old HQL shell for sanity checks, and everything seems copacetic.
A few other comments:
- The new shell takes a bit of getting used to, but seems quite
functional (we're not the biggest Ruby fans, but hey, someone took
this on and upgraded the shell so we just say: Thanks!)
Smile.
- We really like how timestamps have become first-class objects in the
HBase-0.2.0 API . Although, we were in the middle of developing some
code under HBase-0.1.3 with workarounds for timestamps not being
first-class objects and we will have to decide whether we should
backup and re-develop for HBase-0.2.0 (we know we should), or plunge
ahead with what we were doing under HBase-0.1.3 just to discard it in
the near future because of the other advantages of HBase-0.2.0. Is
there anything we should consider in making this decision, perhaps
about timing of any bug fixes and an official release of HBase-0.2.0?
(HBase-0.2.1?).
We're sick of looking at 0.1.x hbase. Can you factor that into your
decision regards hbase 0.2 or 0.1?
Joking aside, a stable offering is our #1 priority ahead of all else
whether new features, performance, etc. In some ways, I'd guess 0.2.0
will probably be less stable than 0.1.3 being new but in others it will
be more so (e.g. it has region balancing so no more will you start a
cluster and see on node carrying 2 regions and then its partner 200).
Its hard to say. Best thing to do is do as your doing, testing the
0.2.0 release candidate. With enough folks banging on it, its possible
that it will not only have more features but also be as stable if not
more so than our current 0.1.3.
Thanks again for the feedback.
St.Ack