Re: Hbase

Ryan Rawson Fri, 22 May 2009 02:56:04 -0700

Hey Burcu,

The list archive has a number of queries of the "how do I model data" vein.
There have been many answers as well. Jon Gray has been known to do a brain
dump from time to time.


One of the biggest differences between something like HBase and a SQL
database is the 'no-join factor'.  Since data for different tables are
stored on different regionservers, there is no way HBase to do the join for
you.  There are 2 general ways of doing the join:

- Map-Reduce job, read both tables, join them in the MR and filter out the
unwanted rows.  Obviously not an online application.
- Read one side of the join, then issue N queries to retrieve the other side
of the join.  You may or may not be able to take advantage of scan ranges
for the other side of the join.

Thus you don't want to design a normal relational schema, you'd end up
making too many RPCs.  There are several solutions:
- store more data per row.  With near-unlimited columns, you won't need to
have a different table to have a variable number of columns.
- store serialized data structures in a cell value.  This can help if you
have a set of data that is commonly read at once (like the fields of an
address box).  Thrift and protobuf are 2 well known libraries that do this
for you.
- Duplicate and denormalize data.

Since the main approach by which you can get performance is row-locality, be
sure to exploit the careful design of your row-keys.  Remember they are
sorted in lexographic order, so prepend '0' to those numbers you want in
numerical order. Columns are also stored in lexographical order within a
row, so if your plans include a very large number of columns, you may want
to exploit that as well.  While hbase 0.19.3 have issues with extremely wide
columns, these problems are being actively addressed in 0.20 and beyond.

-ryan

2009/5/20 Jim Kellerman (POWERSET) <[email protected]>

>  Questions such as these are better asked on the HBase user mailing list.
> See http://hadoop.apache.org/hbase/mailing_lists.html for how to
> subscribe. There is also an IRC channel on irc.freenode.net the channel's
> name is #hbase By asking questions on the list or in IRC, you are not
> dependent on a single person for an answer (for example, I was out of town
> unexpectedly for three weeks with limited email access).
>
>
>
> However to answer your question, HQL was removed quite some time ago and
> replaced with a jruby shell.
>
> I had thought we had removed all references to HQL from the wiki some time
> ago. If you could point us to the page you saw it on and how you found the
> page, we can clean it up.
>
>
>
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
>
>
> *From:* Burcu Hamamcioglu [mailto:[email protected]]
> *Sent:* Wednesday, May 20, 2009 7:48 AM
> *To:* [email protected]; Jim Kellerman (POWERSET)
> *Subject:* Hbase
>
>
>
> Hi Jim,
>
> I have 2 questions about hbase. In one of your documents I've seen that you
> can open the HQL window( just only for selecting data from tables). How can
> I open this web UI? From which port default?
>
>
>
> My second question is about getting know Hbase. I am a newbie on hbase and
> really confused understanding  it's data modelling. Is there any way or book
> that you can suggest me about Hbase data modelling? How can I achieve Hbase
> ?
>
>
>
> Sorry if bothering you with this email.
>
>
>
> Thanks in advance
>
>
>
> *Burcu Hamamcıoğlu* [image:
> http://www.tikle.com/images/signature/z-logo1.png] <http://www.tikle.com/>
>
>
>
> [image: cid:[email protected]] +90.212.285 1214 / 306
>
> [image: http://www.tikle.com/images/signature/spacer.gif][image:
> http://www.tikle.com/images/signature/fax-z.png] +90.212.285 1217
>
> [email protected]
>
>
>

Re: Hbase

Reply via email to