It will not scan every row if there is more then one column family only the
rows that have data for that column.
You do have parallelism when scanning large tables the mr job should be
splitting the job in to one mapper per region
if coded setup correctly. New patches in dev set for 0.20 will allow more
mappers per region speeding up this in some cases.
Row-based database can have index but they do not scale well index require
more memory
Hbase is designed to be Distributed parallel fault tolerant that scales easy
from 1 to hundreds to thousands of servers
Billy
"Ric Wang" <[email protected]> wrote in
message news:[email protected]...
Hi,
Thanks. But if it is still scanning EVERY row in the entire table, how
does
HBase achieve better scan performance, compared to a row-based database?
Thanks,
Ric
On Tue, Jun 9, 2009 at 9:35 PM, Ryan Rawson
<[email protected]> wrote:
Without the use of indexes, there is no easy way to get the info without
touching every row.
So yes you'll be scanning every row. But hbase has good bulk scan perf.
On Jun 9, 2009 7:24 PM, "Ric Wang"
<[email protected]> wrote:
How does the scanner know how to get ONLY the "relevant" rows, without a
whole table scan?
Thanks!
Ric
On Tue, Jun 9, 2009 at 4:31 PM, Naveen Koorakula
<[email protected]>
wrote:
> The scanner only s...
--
Ric Wang [email protected]
--
Ric Wang
[email protected]