Re: scanner on a given column: whole table scan or just the rows that have values

Billy Pearson Tue, 09 Jun 2009 23:03:54 -0700

It will not scan every row if there is more then one column family only therows that have data for that column.

You do have parallelism when scanning large tables the mr job should besplitting the job in to one mapper per regionif coded setup correctly. New patches in dev set for 0.20 will allow moremappers per region speeding up this in some cases.

Row-based database can have index but they do not scale well index requiremore memoryHbase is designed to be Distributed parallel fault tolerant that scales easyfrom 1 to hundreds to thousands of servers


Billy

"Ric Wang" <[email protected]> wrote inmessage news:[email protected]...

Hi,
Thanks. But if it is still scanning EVERY row in the entire table, howdoes
HBase achieve better scan performance, compared to a row-based database?

Thanks,
Ric
On Tue, Jun 9, 2009 at 9:35 PM, Ryan Rawson<[email protected]> wrote:
Without the use of indexes, there is no easy way to get the info without
touching every row.

So yes you'll be scanning every row.  But hbase has good bulk scan perf.
On Jun 9, 2009 7:24 PM, "Ric Wang"<[email protected]> wrote:
How does the scanner know how to get ONLY the "relevant" rows, without a
whole table scan?

Thanks!
Ric
On Tue, Jun 9, 2009 at 4:31 PM, Naveen Koorakula<[email protected]>
wrote:
> The scanner only s...
--

Ric Wang [email protected]
--
Ric Wang
[email protected]

Re: scanner on a given column: whole table scan or just the rows that have values

Reply via email to