If you data is not to complex with multi fields etc. you could try to use mysql bin logs just use mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to process bin logs and generate a text version of the logs and process them with a map and then reduce in to the table. this would not provide live data but you could run a simple shell script to process the bin logs then delete or move them if you needed to sync up you could call mysql to start a new bin log the shell script could be ran as a cron job and it would pick up the latest bin log and start the job.

I would use linux command
find /binlog/location/*.bin -mmin +5
to find the logs that are ready to process.
That will give you all the bin logs that have not been modflyed in 5 mins

If your insert/update querys are not to complex to process it would be simple

Billy



"Brian Forney" <[email protected]> wrote in message news:[email protected]...
Ryan,

Thanks. Yep, I've read the Bigtable paper (now and in 2006) and understand that HBase and Bigtable are essentially large maps and do not use the relational model.

Still interested in hearing if others have successfully done this. (I'm mostly looking for ways to speed up the implementation of a one- way replication: from a relational DB to HBase.)

Thanks,
Brian

On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:

HBase is not a relational database, so many things that are in a SQL
database dont exist.

eg:
- sequences
- secondary declarative keys
- joins
- advance query features such as order by, group by
- operators of any kind

Given conventions (eg: naming of index tables), it might be possible  to
semi-automatedly convert data, but it might not efficiently take advantage
of HBase's unique schema-less design.

I suggest you have a look at the Google's bigtable paper, as it has the same
underlying model that HBase does.

Good luck!


On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney <[email protected]> wrote:

Hi all,

I'd like to replicate a large dataset from a relational database into HBase for better throughput of MapReduce jobs. Has anyone had success replicating
from a relational database (in my case SQL Server) to HBase?

Thanks,
Brian





Reply via email to