Hi, Very new to Hadoop and HBase. And sorry about the rudimentary question:
I store my artifacts as rows in an HBase table, and the attributes of each artifact as labels within one single column family (ex. myFamily). I may have tens of thousands of labels, and millions and millions of rows. Now as the data size grows, some document says that, the values of one family will be "stored together". I wonder what that really means. For example, for a given row key (my.key.123), will HBase guarantee that ALL its attributes (ie. the values of ALL the labels in "myFamily") of that row key be stored on one physical/grid node? In other words, if I want to find out ONE contain matching row key "my.key.123" based on its attributes (column values), at the implementation level, will HBase be 1. traversing all the distributed nodes and interrogating the column values; aggregating the results coming from all the nodes; and finally finding out the matching row key or 2. doing atomic operations in parallel on each node locally; and finally, only one node will return the matching row key (if there is a match). My guess is the that the answer depends on if all attributes (in myFamily) of a given row are stored on one and only one node. Hope I didn't make my question very confusing. Very new to column based database; please help and bare with me. Thanks! Ric
