Billy, Thank you, it's clearer to me now. But WITHIN the one family where the column-label that needs to be scanned over lives (since I only have one family for the entire table), it will still have to scan EVERY row in that family no matter if each cell on that column-label has value or not?
-Ric On Wed, Jun 10, 2009 at 1:03 AM, Billy Pearson <[email protected]>wrote: > It will not scan every row if there is more then one column family only the > rows that have data for that column. > > You do have parallelism when scanning large tables the mr job should be > splitting the job in to one mapper per region > if coded setup correctly. New patches in dev set for 0.20 will allow more > mappers per region speeding up this in some cases. > > Row-based database can have index but they do not scale well index require > more memory > Hbase is designed to be Distributed parallel fault tolerant that scales > easy from 1 to hundreds to thousands of servers > > Billy > > > > "Ric Wang" <[email protected]> wrote in message > news:[email protected]... > > Hi, >> >> Thanks. But if it is still scanning EVERY row in the entire table, how >> does >> HBase achieve better scan performance, compared to a row-based database? >> >> Thanks, >> Ric >> >> >> >> On Tue, Jun 9, 2009 at 9:35 PM, Ryan Rawson <[email protected]> wrote: >> >> Without the use of indexes, there is no easy way to get the info without >>> touching every row. >>> >>> So yes you'll be scanning every row. But hbase has good bulk scan perf. >>> >>> On Jun 9, 2009 7:24 PM, "Ric Wang" <[email protected]> wrote: >>> >>> How does the scanner know how to get ONLY the "relevant" rows, without a >>> whole table scan? >>> >>> Thanks! >>> Ric >>> >>> On Tue, Jun 9, 2009 at 4:31 PM, Naveen Koorakula <[email protected]> >>> wrote: >>> > The scanner only s... >>> -- >>> >>> Ric Wang [email protected] >>> >>> >> >> >> -- >> Ric Wang >> [email protected] >> >> > > -- Ric Wang [email protected]
