Re: for one specific row: are the values of all columns of one family stored in one physical/grid node?

Billy Pearson Tue, 09 Jun 2009 18:36:35 -0700

You should read over the
http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture


The data is sorted by row key, then column:label, timestamp

In that order so if you have row key1 all the labels for columnval1 will bestored together in the same fileWe do flush more the one file to disk as data is added so the values are notalways stored together until after a major compaction/merge all store filestogetherBut what we mean by stored together is all column1 will be stored in onefile and column2 would be stored in a separate set of files so if you onlyone data from column1 then you only need to read the data from one set offiles not all the columns for that row key.

also columns for key1 will not be on all the nodes but just one node in thecluster. The table is split by the key values so keys1-100 would be oneregion keys101-200 would be another region all in the same tableWe split when the size get to large they split and become two regions and soon.

So we look up a key we only have to look at one server

Billy

"Ric Wang" <[email protected]> wrote inmessage news:[email protected]...

Hi,

Very new to Hadoop and HBase. And sorry about the rudimentary question:

I store my artifacts as rows in an HBase table, and the attributes of each
artifact as labels within one single column family (ex. myFamily). I may

have tens of thousands of labels, and millions and millions of rows. Nowasthe data size grows, some document says that, the values of one familywill

be "stored together". I wonder what that really means.

For example, for a given row key (my.key.123), will HBase guarantee thatALLits attributes (ie. the values of ALL the labels in "myFamily") of thatrow

key be stored on one physical/grid node? In other words, if I want to find
out ONE contain matching row key "my.key.123" based on its attributes
(column values), at the implementation level, will HBase be

1. traversing all the distributed nodes and interrogating the columnvalues;

aggregating the results coming from all the nodes; and finally finding out
the matching row key

or

2. doing atomic operations in parallel on each node locally; and finally,
only one node will return the matching row key (if there is a match).

My guess is the that the answer depends on if all attributes (in myFamily)
of a given row are stored on one and only one node.

Hope I didn't make my question very confusing. Very new to column based
database; please help and bare with me.

Thanks!
Ric

Re: for one specific row: are the values of all columns of one family stored in one physical/grid node?

Reply via email to