Thanks, I will have a look at what you mentioned. I have another question. In Pig Latin data analysis tasks are expressed as queries. Pig Latin has join and cogroup operators which does the task using Map/Reduce on hadoop. Can anyone share how does Pig Latin implementation do it?
Thanks On Fri, Dec 5, 2008 at 1:34 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > I'm not aware of anything that is completely equipped for the task, however > this could be done more simply with one of the Hadoop MapReduce tools. > > My personal favorite is Cascading (http://www.cascading.org) by Chris > Wensel. This can help you with doing something like reading in two > different tables from two different Maps and bringing them together. > Unfortunately, there is not yet an HBase Tap. If you're interested in > developing one, I have been told that it should not be difficult. Check > out > #cascading on freenode and you should be able to get some help. If you go > down that route, please let me know because I'm interested in an HBase Tap > as well but have not had the time to work on it. > > Hive and Pig are other projects that help with this, but they also do not > have HBase hooks yet (that I'm aware of). > > You might also consider something like Pigi (http://www.pigi-project.org), > which is an ORM. It supports indexing and searching, unsure if there are > any mechanisms for joins available or planned. > > Otherwise, you'll need to write your own jobs. You'd need probably three > different MR jobs. Two that Map from each of the HBase tables you're > interested in. Then another job that would read from combined output of > those two jobs and perform the join. You might use the Map->Reduce sort > step to perform the join if possible, depends on the details of what you > want to do. If you go down this path, you can certainly get plenty of help > from this list or the IRC channel #hbase as this would be very useful to > the > community. > > JG > > > > -----Original Message----- > > From: abhinit [mailto:[EMAIL PROTECTED] > > Sent: Friday, December 05, 2008 2:32 AM > > To: [email protected] > > Subject: implementing join on two Hbase tables > > > > I am trying to implement hash-join and nested join on two Hbase tables. > > However, I am stuck. > > > > I came across the package *org.apache.hadoop.mapred.join* which joins > > two sorted datasets before map. However, I want to implement joins > > using > > map/reduce methods so that I have more control on how to join the data. > > > > I found the package *org.apache.hadoop.contrib.utils.join* after a bit > > of > > searching > > which has something I am looking for (not too sure as I have not read > > the > > code completely). > > It would be great if someone who has used this package can give me a > > pointer > > on my problem, > > > > Is there a way I can take two tables as input in TableMap's map method? > > (my > > guess is no) > > If not, does the current hadoop/hbase implementation provide features > > for > > implementing user-defined joins > > > > Thanks a lot > > -Abhinit > > -- Abhinit [EMAIL PROTECTED] 215-796-5136
