Re: Replicating data into HBase

stack Sat, 18 Apr 2009 09:52:43 -0700

You might take a look at Tim Sells' postgres to hbase uploader scripts here
for ideas:
http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/examples/uploaders/
St.Ack


2009/4/18 Billy Pearson <[email protected]>

> If you data is not to complex with multi fields etc. you could try to use
> mysql bin logs just use
> mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to
> process bin logs and generate
> a text version of the logs and process them with a map and then reduce in
> to the table. this
> would not provide live data but you could run a simple shell script to
> process
> the bin logs then delete or move them if you needed to sync up you could
> call mysql to start a new bin log the shell
> script could be ran as a cron job and it would pick up the latest bin log
> and start the job.
>
> I would use linux command
> find /binlog/location/*.bin -mmin +5
> to find the logs that are ready to process.
> That will give you all the bin logs that have not been modflyed in 5 mins
>
> If your insert/update querys are not to complex to process it would be
> simple
>
> Billy
>
>
>
> "Brian Forney" <[email protected]> wrote in message
> news:[email protected]...
>
>  Ryan,
>>
>> Thanks. Yep, I've read the Bigtable paper (now and in 2006) and understand
>> that HBase and Bigtable are essentially large maps and do  not use the
>> relational model.
>>
>> Still interested in hearing if others have successfully done this.  (I'm
>> mostly looking for ways to speed up the implementation of a one- way
>> replication: from a relational DB to HBase.)
>>
>> Thanks,
>> Brian
>>
>> On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:
>>
>>  HBase is not a relational database, so many things that are in a SQL
>>> database dont exist.
>>>
>>> eg:
>>> - sequences
>>> - secondary declarative keys
>>> - joins
>>> - advance query features such as order by, group by
>>> - operators of any kind
>>>
>>> Given conventions (eg: naming of index tables), it might be possible  to
>>> semi-automatedly convert data, but it might not efficiently take
>>> advantage
>>> of HBase's unique schema-less design.
>>>
>>> I suggest you have a look at the Google's bigtable paper, as it has  the
>>> same
>>> underlying model that HBase does.
>>>
>>> Good luck!
>>>
>>>
>>> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney <[email protected]>
>>> wrote:
>>>
>>>  Hi all,
>>>>
>>>> I'd like to replicate a large dataset from a relational database  into
>>>> HBase
>>>> for better throughput of MapReduce jobs. Has anyone had success
>>>> replicating
>>>> from a relational database (in my case SQL Server) to HBase?
>>>>
>>>> Thanks,
>>>> Brian
>>>>
>>>>
>>
>>
>
>

Re: Replicating data into HBase

Reply via email to