The answer is, it depends.

What are the details of what you are trying to join? Is it just a simple 1-to-1 join, or 1-to-many or what? At a minimum, the join would require two round-trips. However, 0.20 can do simple queries in the 1-10ms time-range (closer to 1ms when the blocks are already cached).

The comparison to an RDBMS cannot be made directly because a single-node RDBMS with a smallish table will be quite fast at simple index-based joins. I would guess that unloaded, single machine performance of this join operation would be much faster in an RDBMS.

But if your table has millions or billions of rows, it's a different situation. HBase performance will stay nearly constant as your table increases, as long as you have the nodes to support your dataset and the load.

What are your targets for time (sub 100ms? 10ms?), and what are the details of what you're joining?

As far as code is concerned, there is not much to a simple join, so I'm not sure how helpful it would be. If you give some detail perhaps I can provide some pseudo-code for you.

JG

bharath vissapragada wrote:
JG thanks for ur reply,

Actually iam trying to implement a realtime join of two tables on HBase .
Actually i tried the idea of denormalizing the tables to avoid the Joins ,
but when we do that Updating the data is really difficult .  I understand
that the features i am trying to implement are that of a RDBMS and HBase is
used for a different purpose . Even then i want (rather i would like to try)
to store the data  the data in HBase and implement Joins so that i  could
test its performance and if its effective (atleast on large number of nodes)
, it maybe of somehelp to me . I know some ppl have already tried this . If
anyone of already tried this can you just tellme how the results are .. i
mean are they good , when compared to RDBMS join on a single machine ...

Thanks

On Wed, Jul 15, 2009 at 8:35 PM, Jonathan Gray <[email protected]> wrote:

Bharath,

You need to outline what your actual requirements are if you want more
help.  Open-ended questions that just ask for code are usually not answered.

What exactly are you trying to join?  Does this join need to happen in
"realtime" or is this part of a batch process?

Could you denormalize your data to prevent needing the join at runtime?

If you provide details about exactly what your data/schema is like (or a
similar example if this is confidential), then many of us are more than
happy to help you figure out what approach my work best.

When working with HBase, figuring out how you want to pull your data out is
key to how you want to put the data in.

JG


bharath vissapragada wrote:

Amandeep , can you tell me what kinds of joins u have implemented ? and
which works the best (based on observation ).. Can u show us the source
code
(if possible)

Thanks in advance

On Wed, Jul 15, 2009 at 10:46 AM, Amandeep Khurana <[email protected]>
wrote:

 I've been doing joins by writing my own MR jobs. That works best.
Not tried cascading yet.

-ak

On 7/14/09, bharath vissapragada <[email protected]>
wrote:

Thats fine .. I know that hbase has completely different usage compared

to

SQL .. But for my application there is some kind of dependency involved
among the tables . So i need to implement a Join . I wanted to know

whether

there is some kind of implementation already
..

Thanks
On Wed, Jul 15, 2009 at 10:30 AM, Ryan Rawson <[email protected]>

wrote:

HBase != SQL.
You might want map reduce or cascading.

On Tue, Jul 14, 2009 at 9:56 PM, bharath
vissapragada<[email protected]> wrote:

Hi all ,

I want to join(similar to relational databases join) two tables in

HBase
.
Can anyone tell me whether  it is already implemented in the source !

Thanks in Advance


--


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz



Reply via email to