It would be handy to be able to easily dump data from postgresql straight to hbase. Then keep the data in hbase up to date.
I've made a simple python tool called hbreplic (I'm very willing to come up with an easier to type name). It has two main purposes, bootstrap, where it copies columns from postgresql tables to hbase. And, play, where it processes incoming insert, update and delete events on the postgresql tables and update hbase with them. The hbase table/family/column layout is whatever you want it to be. The hbase row keys at the moment are taken from a specified postgresql column (presumably the primary key, but not enforced), with an optional prefix. It handles schema changes, in that it doesn't care what the table looks like as long as the table has the columns that you specify in an ini file. It makes use of PgQ which is part of skytools (a bunch of postgresql database tools released by skype). PgQ is a queuing management thing for events. It depends on python, skytools, and thrift. It's pretty rudimentary at the moment, but easy to use. We'd like to open source it and make it better. Would people be interested in this? Is there some kind of hbase contrib we could potentially add this to? On Monday we'll probably make the source available somewhere with instructions. ~Tim Sell.
