I'm not aware of anything that is completely equipped for the task, however this could be done more simply with one of the Hadoop MapReduce tools.
My personal favorite is Cascading (http://www.cascading.org) by Chris Wensel. This can help you with doing something like reading in two different tables from two different Maps and bringing them together. Unfortunately, there is not yet an HBase Tap. If you're interested in developing one, I have been told that it should not be difficult. Check out #cascading on freenode and you should be able to get some help. If you go down that route, please let me know because I'm interested in an HBase Tap as well but have not had the time to work on it. Hive and Pig are other projects that help with this, but they also do not have HBase hooks yet (that I'm aware of). You might also consider something like Pigi (http://www.pigi-project.org), which is an ORM. It supports indexing and searching, unsure if there are any mechanisms for joins available or planned. Otherwise, you'll need to write your own jobs. You'd need probably three different MR jobs. Two that Map from each of the HBase tables you're interested in. Then another job that would read from combined output of those two jobs and perform the join. You might use the Map->Reduce sort step to perform the join if possible, depends on the details of what you want to do. If you go down this path, you can certainly get plenty of help from this list or the IRC channel #hbase as this would be very useful to the community. JG > -----Original Message----- > From: abhinit [mailto:[EMAIL PROTECTED] > Sent: Friday, December 05, 2008 2:32 AM > To: [email protected] > Subject: implementing join on two Hbase tables > > I am trying to implement hash-join and nested join on two Hbase tables. > However, I am stuck. > > I came across the package *org.apache.hadoop.mapred.join* which joins > two sorted datasets before map. However, I want to implement joins > using > map/reduce methods so that I have more control on how to join the data. > > I found the package *org.apache.hadoop.contrib.utils.join* after a bit > of > searching > which has something I am looking for (not too sure as I have not read > the > code completely). > It would be great if someone who has used this package can give me a > pointer > on my problem, > > Is there a way I can take two tables as input in TableMap's map method? > (my > guess is no) > If not, does the current hadoop/hbase implementation provide features > for > implementing user-defined joins > > Thanks a lot > -Abhinit
