Rick:

Thanks for the great feedback.

Other comments interspersed:

Rick Hangartner wrote:
...
2) In both approaches, when we tried to do the data migration from hbase-0.1.3 to hbase-0.2.0 we first got migration failures due to "unrecovered region server logs". Following the 'Redo Logs' comments in the "http://wiki.apache.org/hadoop/Hbase/HowToMigrate"; doc, starting afresh with a new copy of our virtualized system each time, we tried these methods of getting rid of the those logs and the fatal error:

   a) deleting just the log files in the "/hbase" directory

Did this not work?  How'd you do it?


b) deleting the entire contents of the "/hbase" directory (which means we lost our data, but we are just investigating the upgrade path, after all) c) deleting the "/hbase" directory entirely and creating a new "/hbase" directory.

I should also note that we would need to repeat approach a) to be 100% certain of our results for that case. (We've already repeated approaches b) and c) and have just run out of time for these upgrade tests because we need to get to other things).

In all cases, the migrate then failed as:

[EMAIL PROTECTED]:~/hbase$ bin/hbase migrate upgrade
08/07/22 18:03:16 INFO util.Migrate: Verifying that file system is
available...
08/07/22 18:03:16 INFO util.Migrate: Verifying that HBase is not running...
08/07/22 18:03:17 INFO ipc.Client: Retrying connect to server:
savory1/10.0.0.45:60000. Already tried 1 time(s).
 ...
08/07/22 18:03:26 INFO ipc.Client: Retrying connect to server:
savory1/10.0.0.45:60000. Already tried 10 time(s).
08/07/22 18:03:27 INFO util.Migrate: Starting upgrade
08/07/22 18:03:27 FATAL util.Migrate: Upgrade failed
java.io.IOException: Install 0.1.x of hbase and run its migration first
       at org.apache.hadoop.hbase.util.Migrate.run(Migrate.java:181)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at org.apache.hadoop.hbase.util.Migrate.main(Migrate.java:446)

This would seem to indicate that the hbase.version file is missing from under /hbase directory.


As you can see, we were then in a bit of a Catch-22. To re-install HBase-0.1.x also required re-installing Hadoop-0.16.4 (we tried reinstalling HBase-0.1.x without doing that!) so there was no way to proceed. Attempting to start up HBase-0.2.0 just resulted in an error message that we needed to do the migrate.

Hmm. I suppose this error message is kinda useless. If you've already made the committment to 0.17.x, you can't really go back.

Luo Ning made a patch so you can run 0.1.3 hbase on 0.17.1 hadoop: https://issues.apache.org/jira/browse/HBASE-749. Maybe this is what we should be recommending folks do in the migration doc. and in the migration emission?


3) Since this was just a test, we then blew away the disk used by Hadoop and re-built the namenode per a standard Hadoop new install. Hadoop-0.17.1 and Hbase-0.2.0 then started up just fine. We only ran a few tests with the new Hbase command line shell in some ways we used the old HQL shell for sanity checks, and everything seems copacetic.

A few other comments:

- The new shell takes a bit of getting used to, but seems quite functional (we're not the biggest Ruby fans, but hey, someone took this on and upgraded the shell so we just say: Thanks!)

Smile.

- We really like how timestamps have become first-class objects in the HBase-0.2.0 API . Although, we were in the middle of developing some code under HBase-0.1.3 with workarounds for timestamps not being first-class objects and we will have to decide whether we should backup and re-develop for HBase-0.2.0 (we know we should), or plunge ahead with what we were doing under HBase-0.1.3 just to discard it in the near future because of the other advantages of HBase-0.2.0. Is there anything we should consider in making this decision, perhaps about timing of any bug fixes and an official release of HBase-0.2.0? (HBase-0.2.1?).
We're sick of looking at 0.1.x hbase. Can you factor that into your decision regards hbase 0.2 or 0.1?

Joking aside, a stable offering is our #1 priority ahead of all else whether new features, performance, etc. In some ways, I'd guess 0.2.0 will probably be less stable than 0.1.3 being new but in others it will be more so (e.g. it has region balancing so no more will you start a cluster and see on node carrying 2 regions and then its partner 200). Its hard to say. Best thing to do is do as your doing, testing the 0.2.0 release candidate. With enough folks banging on it, its possible that it will not only have more features but also be as stable if not more so than our current 0.1.3.

Thanks again for the feedback.

St.Ack

Reply via email to