ZhaoWei:

FYI, there is a script in o.a.h.h.mapred called RowCount that runs a MapReduce job to count rows. Run it on a multinode cluster and your count should finish in < 80 minutes.

St.Ack




ZhaoWei wrote:
Thanks J-D, that sounds annoying. Should the row count be a piece of meta data?
How does a RDBMS do when one types "selct count(xxx) from xxx"?

Zhao,

Yes, the only way is to use a scanner but it will take a _long_ time. HBASE-32
<https://issues.apache.org/jira/browse/HBASE-32>is about adding a row count
estimator. For those who want to know why it's so slow, having a scanner
that goes on each row of a table requires doing a read request on disk for
each one of them (except for the stuff in the memcache that waits to be
flushed). If you have 6 500 000 rows like I saw last week on the IRC
channel, i may take well over 80 minutes (it depends on the cpu/io/network
load, hardware, etc).

J-D

On Mon, Jul 21, 2008 at 5:21 AM, ZhaoWei <[EMAIL PROTECTED]> wrote:

Hi J-D,
 How to get row count of a table, only scanner?


Thanks!

Daniel,

Sorry, this feature is still missing in HBase. For the moment, the best
you
can do is to use HDFS web UI. If you would like to this in a future
release,
feel free to fill a Jira: https://issues.apache.org/jira/browse/HBASE

J-D

On Sat, Jul 19, 2008 at 5:58 PM, Daniel <[EMAIL PROTECTED]> wrote:

hi all,
   it's a bit strange, but i cant find some class or method to get the
'size' of a created table - maybe the total size of all the HStores ?
or is there any command in HQL can do this?
   Thanks.

Daniel


Reply via email to