Re: implementing join on two Hbase tables

abhinit Fri, 05 Dec 2008 16:17:40 -0800

Thanks, I will have a look at what you mentioned.
I have another question. In Pig Latin data analysis tasks are expressed as
queries.
Pig Latin has join and cogroup operators which does the task using
Map/Reduce
on hadoop. Can anyone share how does Pig Latin implementation do it?


Thanks

On Fri, Dec 5, 2008 at 1:34 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote:

> I'm not aware of anything that is completely equipped for the task, however
> this could be done more simply with one of the Hadoop MapReduce tools.
>
> My personal favorite is Cascading (http://www.cascading.org) by Chris
> Wensel.  This can help you with doing something like reading in two
> different tables from two different Maps and bringing them together.
> Unfortunately, there is not yet an HBase Tap.  If you're interested in
> developing one, I have been told that it should not be difficult.  Check
> out
> #cascading on freenode and you should be able to get some help.  If you go
> down that route, please let me know because I'm interested in an HBase Tap
> as well but have not had the time to work on it.
>
> Hive and Pig are other projects that help with this, but they also do not
> have HBase hooks yet (that I'm aware of).
>
> You might also consider something like Pigi (http://www.pigi-project.org),
> which is an ORM.  It supports indexing and searching, unsure if there are
> any mechanisms for joins available or planned.
>
> Otherwise, you'll need to write your own jobs.  You'd need probably three
> different MR jobs.  Two that Map from each of the HBase tables you're
> interested in.  Then another job that would read from combined output of
> those two jobs and perform the join.  You might use the Map->Reduce sort
> step to perform the join if possible, depends on the details of what you
> want to do.  If you go down this path, you can certainly get plenty of help
> from this list or the IRC channel #hbase as this would be very useful to
> the
> community.
>
> JG
>
>
> > -----Original Message-----
> > From: abhinit [mailto:[EMAIL PROTECTED]
> > Sent: Friday, December 05, 2008 2:32 AM
> > To: [email protected]
> > Subject: implementing join on two Hbase tables
> >
> > I am trying to implement hash-join and nested join on two Hbase tables.
> > However, I am stuck.
> >
> > I came across the package *org.apache.hadoop.mapred.join* which joins
> > two sorted datasets before map. However, I want to implement joins
> > using
> > map/reduce methods so that I have more control on how to join the data.
> >
> > I found the package *org.apache.hadoop.contrib.utils.join* after a bit
> > of
> > searching
> > which has something I am looking for (not too sure as I have not read
> > the
> > code completely).
> > It would be great if someone who has used this package can give me a
> > pointer
> > on my problem,
> >
> > Is there a way I can take two tables as input in TableMap's map method?
> > (my
> > guess is no)
> > If not, does the current hadoop/hbase implementation provide features
> > for
> > implementing user-defined joins
> >
> > Thanks a lot
> > -Abhinit
>
>


-- 
Abhinit
[EMAIL PROTECTED]
215-796-5136

Re: implementing join on two Hbase tables

Reply via email to