It will not scan every row if there is more then one column family only the rows that have data for that column.

You do have parallelism when scanning large tables the mr job should be splitting the job in to one mapper per region if coded setup correctly. New patches in dev set for 0.20 will allow more mappers per region speeding up this in some cases.

Row-based database can have index but they do not scale well index require more memory Hbase is designed to be Distributed parallel fault tolerant that scales easy from 1 to hundreds to thousands of servers

Billy



"Ric Wang" <[email protected]> wrote in message news:[email protected]...
Hi,

Thanks. But if it is still scanning EVERY row in the entire table, how does
HBase achieve better scan performance, compared to a row-based database?

Thanks,
Ric



On Tue, Jun 9, 2009 at 9:35 PM, Ryan Rawson <[email protected]> wrote:

Without the use of indexes, there is no easy way to get the info without
touching every row.

So yes you'll be scanning every row.  But hbase has good bulk scan perf.

On Jun 9, 2009 7:24 PM, "Ric Wang" <[email protected]> wrote:

How does the scanner know how to get ONLY the "relevant" rows, without a
whole table scan?

Thanks!
Ric

On Tue, Jun 9, 2009 at 4:31 PM, Naveen Koorakula <[email protected]>
wrote:
> The scanner only s...
--

Ric Wang [email protected]




--
Ric Wang
[email protected]



Reply via email to