The scanner only scans the relevant rows. On Tue, Jun 9, 2009 at 2:10 PM, Ric Wang <[email protected]> wrote:
> Hi, > > My HBase table has millions of rows; and on given column (ex. > famliyA:labelB), only a couple of thousand rows really have values > (sparse). > Now my task is to find out the set of row keys whose column value of > "familyA:labelB" satisfy some kind of condition. > > For that task, I am getting a scanner on the column "familyA:labelB"; > looping over the values of that column (I guess I'd better off using some > kind of filter instead, but regardless...); if the value matches my > condition, I get the corresponding row key and add it into the result set. > > My questions are: > > 1. When the scanner loops over the column, is it scanning the whole table > of > millions of rows, or mostly just the ones that really have values for that > particular column? My guess is that it's NOT scanning the whole table per > my > very limited understanding of how column-based database works; seems that'd > be awfully inefficient. Can someone please let me know? > > 2. If in the unfortunate case, that whole table scan does have to happen, > any suggestions on how I could change my table design (adding index..?) to > avoid the performance hit? > > Thanks very much for your help! > Ric >
