What I would do is, as Jonathan mentions, run two mappers that select the data you'd like and then writes the selected records using a Hadoop output format to temporary files in DFS, then run a third join step that combines the temporary files.
As an additional step I would suggest if you have common (sub)queries that require this kind of processing, you may want to proactively run them to materialize views into HBase tables set up for that purpose. This is especially true if your application can handle gaps of "freshness" of data less than or equal to the frequency such jobs may be run at (and complete within). Hope this helps, - Andy > From: Jonathan Gray <[EMAIL PROTECTED]> > Subject: RE: implementing join on two Hbase tables > To: [email protected] > Date: Friday, December 5, 2008, 10:34 AM [...] > My personal favorite is Cascading > (http://www.cascading.org) by Chris > Wensel. [...] > Hive and Pig are other projects that help with this, but > they also do nothave HBase hooks yet (that I'm aware of). [...] > You might also consider something like Pigi > (http://www.pigi-project.org), [...] > Otherwise, you'll need to write your own jobs. > You'd need probably three different MR jobs. Two that > Map from each of the HBase tables you're interested in. > Then another job that would read from combined output of > those two jobs and perform the join. You might use the > Map->Reduce sort step to perform the join if possible, > depends on the details of what you want to do. [...] > > > From: abhinit [mailto:[EMAIL PROTECTED] > > Sent: Friday, December 05, 2008 2:32 AM > > To: [email protected] > > Subject: implementing join on two Hbase tables > > > > I am trying to implement hash-join and nested join on > > two Hbase tables. > > However, I am stuck. [...] > > Thanks a lot > > -Abhinit
