You might take a look at Tim Sells' postgres to hbase uploader scripts here for ideas: http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/examples/uploaders/ St.Ack
2009/4/18 Billy Pearson <[email protected]> > If you data is not to complex with multi fields etc. you could try to use > mysql bin logs just use > mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to > process bin logs and generate > a text version of the logs and process them with a map and then reduce in > to the table. this > would not provide live data but you could run a simple shell script to > process > the bin logs then delete or move them if you needed to sync up you could > call mysql to start a new bin log the shell > script could be ran as a cron job and it would pick up the latest bin log > and start the job. > > I would use linux command > find /binlog/location/*.bin -mmin +5 > to find the logs that are ready to process. > That will give you all the bin logs that have not been modflyed in 5 mins > > If your insert/update querys are not to complex to process it would be > simple > > Billy > > > > "Brian Forney" <[email protected]> wrote in message > news:[email protected]... > > Ryan, >> >> Thanks. Yep, I've read the Bigtable paper (now and in 2006) and understand >> that HBase and Bigtable are essentially large maps and do not use the >> relational model. >> >> Still interested in hearing if others have successfully done this. (I'm >> mostly looking for ways to speed up the implementation of a one- way >> replication: from a relational DB to HBase.) >> >> Thanks, >> Brian >> >> On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote: >> >> HBase is not a relational database, so many things that are in a SQL >>> database dont exist. >>> >>> eg: >>> - sequences >>> - secondary declarative keys >>> - joins >>> - advance query features such as order by, group by >>> - operators of any kind >>> >>> Given conventions (eg: naming of index tables), it might be possible to >>> semi-automatedly convert data, but it might not efficiently take >>> advantage >>> of HBase's unique schema-less design. >>> >>> I suggest you have a look at the Google's bigtable paper, as it has the >>> same >>> underlying model that HBase does. >>> >>> Good luck! >>> >>> >>> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney <[email protected]> >>> wrote: >>> >>> Hi all, >>>> >>>> I'd like to replicate a large dataset from a relational database into >>>> HBase >>>> for better throughput of MapReduce jobs. Has anyone had success >>>> replicating >>>> from a relational database (in my case SQL Server) to HBase? >>>> >>>> Thanks, >>>> Brian >>>> >>>> >> >> > >
