Re: How to apply RDBMS table updates and deletes into Hadoop

Edward Capriolo Wed, 09 Jun 2010 18:25:40 -0700

On Wed, Jun 9, 2010 at 8:29 PM, atreju <[email protected]> wrote:

> Insert/Update/Delete is nothing but "put" command for another file to the
> same directory. Only problem is during "flush" that would replace the files.
> I assume it would use the similar kind of logic of Hive's "insert overwrite"
> (create the file in a temporary space and replace the Hive file(s) when MR
> output is ready). Only for that "replace" (move command?) the flush has to
> talk to Namenode to wait for currently running MR jobs to finish and put
> others on hold until the file is replaced. That is of course the high level
> idea. I am not sure if it is practical.
>
>
> On Wed, Jun 9, 2010 at 4:56 PM, Ted Yu <[email protected]> wrote:
>
>> When hive is running the map-reduce job, how do we handle concurrent
>> update/deletion/insertion ?
>>
>>
Atreju,


Your work is great. Personally I would not get too tied up in the
transactional side of hive. Once you start dealing with locking and
concurrency the problem becomes tricky.

We hivers have a long time tradition on 'punting' on complicated stuff we do
not want to deal with. :) Thus we only have 'Insert Overwrite' no 'insert
update' :)

Again, I think you wrote a really cool application. It would make a great
use case, blog post, or a stand alone application. Call it HiveMysqlRsync or
something :). However you mention several requirements that are specific to
your application timestamp and primary key. If you can abstract all your
application specific logic it could make it's way into hive. But it might be
a stand alone program because hive to rdbms replication might be a little
out of scope.

Edward

Re: How to apply RDBMS table updates and deletes into Hadoop

Reply via email to