Re: Replicating data into HBase

Tim Sell Sun, 19 Apr 2009 14:36:08 -0700

That script depends on pgq, which is a postgres specific event queue.
It's handy for tracking table changes. If there is something similar
for sql server it might be helpful.


2009/4/18 stack <[email protected]>:
> You might take a look at Tim Sells' postgres to hbase uploader scripts here
> for ideas:
> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/examples/uploaders/
> St.Ack
>
> 2009/4/18 Billy Pearson <[email protected]>
>
>> If you data is not to complex with multi fields etc. you could try to use
>> mysql bin logs just use
>> mysqlbinlog http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html to
>> process bin logs and generate
>> a text version of the logs and process them with a map and then reduce in
>> to the table. this
>> would not provide live data but you could run a simple shell script to
>> process
>> the bin logs then delete or move them if you needed to sync up you could
>> call mysql to start a new bin log the shell
>> script could be ran as a cron job and it would pick up the latest bin log
>> and start the job.
>>
>> I would use linux command
>> find /binlog/location/*.bin -mmin +5
>> to find the logs that are ready to process.
>> That will give you all the bin logs that have not been modflyed in 5 mins
>>
>> If your insert/update querys are not to complex to process it would be
>> simple
>>
>> Billy
>>
>>
>>
>> "Brian Forney" <[email protected]> wrote in message
>> news:[email protected]...
>>
>>  Ryan,
>>>
>>> Thanks. Yep, I've read the Bigtable paper (now and in 2006) and understand
>>> that HBase and Bigtable are essentially large maps and do  not use the
>>> relational model.
>>>
>>> Still interested in hearing if others have successfully done this.  (I'm
>>> mostly looking for ways to speed up the implementation of a one- way
>>> replication: from a relational DB to HBase.)
>>>
>>> Thanks,
>>> Brian
>>>
>>> On Apr 17, 2009, at 5:45 PM, Ryan Rawson wrote:
>>>
>>>  HBase is not a relational database, so many things that are in a SQL
>>>> database dont exist.
>>>>
>>>> eg:
>>>> - sequences
>>>> - secondary declarative keys
>>>> - joins
>>>> - advance query features such as order by, group by
>>>> - operators of any kind
>>>>
>>>> Given conventions (eg: naming of index tables), it might be possible  to
>>>> semi-automatedly convert data, but it might not efficiently take
>>>> advantage
>>>> of HBase's unique schema-less design.
>>>>
>>>> I suggest you have a look at the Google's bigtable paper, as it has  the
>>>> same
>>>> underlying model that HBase does.
>>>>
>>>> Good luck!
>>>>
>>>>
>>>> On Fri, Apr 17, 2009 at 3:30 PM, Brian Forney <[email protected]>
>>>> wrote:
>>>>
>>>>  Hi all,
>>>>>
>>>>> I'd like to replicate a large dataset from a relational database  into
>>>>> HBase
>>>>> for better throughput of MapReduce jobs. Has anyone had success
>>>>> replicating
>>>>> from a relational database (in my case SQL Server) to HBase?
>>>>>
>>>>> Thanks,
>>>>> Brian
>>>>>
>>>>>
>>>
>>>
>>
>>
>

Re: Replicating data into HBase

Reply via email to