Tim Sell wrote:
It would be handy to be able to easily dump data from postgresql
straight to hbase. Then keep the data in hbase up to date.
I've made a simple python tool called hbreplic (I'm very willing to
come up with an easier to type name).
How do you pronounce that?
It has two main purposes, bootstrap, where it copies columns from
postgresql tables to hbase.
And, play, where it processes incoming insert, update and delete
events on the postgresql tables and update hbase with them.
The hbase table/family/column layout is whatever you want it to be.
The hbase row keys at the moment are taken from a specified postgresql
column (presumably the primary key, but not enforced), with an
optional prefix.
It handles schema changes, in that it doesn't care what the table
looks like as long as the table has the columns that you specify in an
ini file.
It makes use of PgQ which is part of skytools (a bunch of postgresql
database tools released by skype).
PgQ is a queuing management thing for events.
It depends on python, skytools, and thrift.
It's pretty rudimentary at the moment, but easy to use.
We'd like to open source it and make it better.
Would people be interested in this?
Is there some kind of hbase contrib we could potentially add this to?
On Monday we'll probably make the source available somewhere with instructions.
It sounds excellent Tim. A nice contrib. If you want to add it, add it
to a JIRA and I'll add it under hbase/contrib. Add a bit of doc. so
browsers can figure what it is -- especially since current name gives no
clue what it does (smile).
St.Ack