RowCounter job counts rows. Its answer will be how many distinct row keys were in the table approximately at a given time range.
Even if the implementation uses first kv filter nothing about what I just said is false. A KeyValue counter would tell you how many cells and versions there were total don't you think? On Jul 29, 2010 7:58 PM, "Angus He" <[email protected]> wrote: > Column names are just optional for RowCounter job. > > To be more accurate, RowCounter is a KeyValueCounter. > If no columns are specified, only the first KeyValues of each row are > included, then get the RowCounter. > > > On Fri, Jul 30, 2010 at 9:28 AM, Ted Yu <[email protected]> wrote: >> If someone can share the commandline for running RowCounter, that would be >> great. >> >> Also, hbase shell count doesn't require column name. Why does RowCounter >> require it ? >> >> Thanks >> >> On Thu, Jul 29, 2010 at 4:55 PM, Ryan Rawson <[email protected]> wrote: >> >>> Hi, >>> >>> That table appears to be empty. Eg: >>> >>> 10/07/29 22:38:43 INFO mapred.JobClient: Map input records=0 >>> >>> >>> So back to the count issue... Counting in databases is a classic >>> problem. Unless your DB system is keeping stats on how many >>> inserts/deletes and thus how big it thinks the table is, you have to >>> count all the rows by reading them. HBase is no different, and a >>> little harder, because we have a variable length data format, so we >>> can't just estimate row sizes from file sizes. Keeping distributed >>> stats is not impossible, but certainly not on any priority list to be >>> implemented - of course JIRAs/patches welcome etc. >>> >>> -ryan >>> >>> >>> On Thu, Jul 29, 2010 at 3:48 PM, Ted Yu <[email protected]> wrote: >>> > We use HBase 0.20.5 >>> > >>> > Here is the snippet from RowCounter output: >>> > >>> > 10/07/29 22:38:42 DEBUG client.HTable$ClientScanner: Finished with >>> scanning >>> > at REGION => {NAME => >>> > >>> '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0,DFF46493EB352D0E31CBFA4652E6EC06,1280412540858', >>> > STARTKEY => 'DFF46493EB352D0E31CBFA4652E6EC06', ENDKEY => '', ENCODED => >>> > 1375318608, TABLE => {{NAME => >>> > '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0', >>> FAMILIES >>> > => [{NAME => 'd', COMPRESSION => 'GZ', VERSIONS => '2', TTL => >>> '31536000', >>> > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME >>> => >>> > 'i', COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', BLOCKSIZE >>> => >>> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => 'v', >>> > COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', BLOCKSIZE => >>> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}} >>> > 10/07/29 22:38:42 INFO mapred.TaskRunner: >>> Task:attempt_local_0001_m_000000_0 >>> > is done. And is in the process of commiting >>> > 10/07/29 22:38:42 INFO mapred.LocalJobRunner: >>> > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task >>> attempt_local_0001_m_000000_0 >>> > is allowed to commit now >>> > 10/07/29 22:38:42 INFO mapred.FileOutputCommitter: Saved output of task >>> > 'attempt_local_0001_m_000000_0' to >>> > file:/usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc >>> > 10/07/29 22:38:42 INFO mapred.LocalJobRunner: >>> > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task >>> > 'attempt_local_0001_m_000000_0' done. >>> > 10/07/29 22:38:43 INFO mapred.JobClient: map 100% reduce 0% >>> > 10/07/29 22:38:43 INFO mapred.JobClient: Job complete: job_local_0001 >>> > 10/07/29 22:38:43 INFO mapred.JobClient: Counters: 6 >>> > 10/07/29 22:38:43 INFO mapred.JobClient: FileSystemCounters >>> > 10/07/29 22:38:43 INFO mapred.JobClient: FILE_BYTES_READ=1592883 >>> > 10/07/29 22:38:43 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1624956 >>> > 10/07/29 22:38:43 INFO mapred.JobClient: Map-Reduce Framework >>> > 10/07/29 22:38:43 INFO mapred.JobClient: Map input records=0 >>> > 10/07/29 22:38:43 INFO mapred.JobClient: Spilled Records=0 >>> > 10/07/29 22:38:43 INFO mapred.JobClient: Map input bytes=0 >>> > 10/07/29 22:38:43 INFO mapred.JobClient: Map output records=0 >>> > >>> > [sjc1-hadoop8.sjc1:hadoop 3705]ls -l >>> > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000 >>> > -rwxrwxrwx 1 hadoop users 0 Jul 29 22:38 >>> > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000 >>> > >>> > But there are many records in the table I was querying. >>> > >>> > Can someone comment ? >>> > >>> > On Thu, Jul 29, 2010 at 2:26 PM, Jean-Daniel Cryans < [email protected] >>> >wrote: >>> > >>> >> In 0.89 you can specify CACHE for the count command. Set it higher (it >>> >> defaults to 10 rows per call). >>> >> >>> >> Also you can use the RowCounter MR job. >>> >> >>> >> J-D >>> >> >>> >> On Thu, Jul 29, 2010 at 2:22 PM, Ted Yu <[email protected]> wrote: >>> >> > Hi, >>> >> > The count method in HBase shell is quite slow. >>> >> > Is there a way to obtain count faster ? >>> >> > >>> >> > Thanks >>> >> > >>> >> >>> > >>> >> > > > > -- > Regards > Angus
