RE: implementing join on two Hbase tables

Andrew Purtell Sat, 06 Dec 2008 01:28:01 -0800

What I would do is, as Jonathan mentions, run two
mappers that select the data you'd like and then writes
the selected records using a Hadoop output format to 
temporary files in DFS, then run a third join step
that combines the temporary files.


As an additional step I would suggest if you have 
common (sub)queries that require this kind of
processing, you may want to proactively run them to
materialize views into HBase tables set up for that
purpose. This is especially true if your application
can handle gaps of "freshness" of data less than or
equal to the frequency such jobs may be run at (and
complete within). 

Hope this helps,

   - Andy

> From: Jonathan Gray <[EMAIL PROTECTED]>
> Subject: RE: implementing join on two Hbase tables
> To: [email protected]
> Date: Friday, December 5, 2008, 10:34 AM
[...]
> My personal favorite is Cascading
> (http://www.cascading.org) by Chris
> Wensel.
[...]
> Hive and Pig are other projects that help with this, but
> they also do nothave HBase hooks yet (that I'm aware of).
[...]
> You might also consider something like Pigi
> (http://www.pigi-project.org),
[...]
> Otherwise, you'll need to write your own jobs. 
> You'd need probably three different MR jobs.  Two that
> Map from each of the HBase tables you're interested in.
> Then another job that would read from combined output of
> those two jobs and perform the join.  You might use the
> Map->Reduce sort step to perform the join if possible,
> depends on the details of what you want to do.
[...]
> 
> > From: abhinit [mailto:[EMAIL PROTECTED]
> > Sent: Friday, December 05, 2008 2:32 AM
> > To: [email protected]
> > Subject: implementing join on two Hbase tables
> > 
> > I am trying to implement hash-join and nested join on
> > two Hbase tables.
> > However, I am stuck.
[...]
> > Thanks a lot
> > -Abhinit

RE: implementing join on two Hbase tables

Reply via email to