How does the scanner know how to get ONLY the "relevant" rows, without a whole table scan?
Thanks! Ric On Tue, Jun 9, 2009 at 4:31 PM, Naveen Koorakula <[email protected]> wrote: > The scanner only scans the relevant rows. > > On Tue, Jun 9, 2009 at 2:10 PM, Ric Wang <[email protected]> wrote: > > > Hi, > > > > My HBase table has millions of rows; and on given column (ex. > > famliyA:labelB), only a couple of thousand rows really have values > > (sparse). > > Now my task is to find out the set of row keys whose column value of > > "familyA:labelB" satisfy some kind of condition. > > > > For that task, I am getting a scanner on the column "familyA:labelB"; > > looping over the values of that column (I guess I'd better off using some > > kind of filter instead, but regardless...); if the value matches my > > condition, I get the corresponding row key and add it into the result > set. > > > > My questions are: > > > > 1. When the scanner loops over the column, is it scanning the whole table > > of > > millions of rows, or mostly just the ones that really have values for > that > > particular column? My guess is that it's NOT scanning the whole table per > > my > > very limited understanding of how column-based database works; seems > that'd > > be awfully inefficient. Can someone please let me know? > > > > 2. If in the unfortunate case, that whole table scan does have to happen, > > any suggestions on how I could change my table design (adding index..?) > to > > avoid the performance hit? > > > > Thanks very much for your help! > > Ric > > > -- Ric Wang [email protected]
