Thanks for the reply. I used rowcounter tool on 5 tables - we use striped tables before HBASE-2473 was implemented.
I logged https://issues.apache.org/jira/browse/HBASE-2891 On Thu, Jul 29, 2010 at 7:57 PM, Angus He <[email protected]> wrote: > Column names are just optional for RowCounter job. > > To be more accurate, RowCounter is a KeyValueCounter. > If no columns are specified, only the first KeyValues of each row are > included, then get the RowCounter. > > > On Fri, Jul 30, 2010 at 9:28 AM, Ted Yu <[email protected]> wrote: > > If someone can share the commandline for running RowCounter, that would > be > > great. > > > > Also, hbase shell count doesn't require column name. Why does RowCounter > > require it ? > > > > Thanks > > > > On Thu, Jul 29, 2010 at 4:55 PM, Ryan Rawson <[email protected]> wrote: > > > >> Hi, > >> > >> That table appears to be empty. Eg: > >> > >> 10/07/29 22:38:43 INFO mapred.JobClient: Map input records=0 > >> > >> > >> So back to the count issue... Counting in databases is a classic > >> problem. Unless your DB system is keeping stats on how many > >> inserts/deletes and thus how big it thinks the table is, you have to > >> count all the rows by reading them. HBase is no different, and a > >> little harder, because we have a variable length data format, so we > >> can't just estimate row sizes from file sizes. Keeping distributed > >> stats is not impossible, but certainly not on any priority list to be > >> implemented - of course JIRAs/patches welcome etc. > >> > >> -ryan > >> > >> > >> On Thu, Jul 29, 2010 at 3:48 PM, Ted Yu <[email protected]> wrote: > >> > We use HBase 0.20.5 > >> > > >> > Here is the snippet from RowCounter output: > >> > > >> > 10/07/29 22:38:42 DEBUG client.HTable$ClientScanner: Finished with > >> scanning > >> > at REGION => {NAME => > >> > > >> > '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0,DFF46493EB352D0E31CBFA4652E6EC06,1280412540858', > >> > STARTKEY => 'DFF46493EB352D0E31CBFA4652E6EC06', ENDKEY => '', ENCODED > => > >> > 1375318608, TABLE => {{NAME => > >> > '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0', > >> FAMILIES > >> > => [{NAME => 'd', COMPRESSION => 'GZ', VERSIONS => '2', TTL => > >> '31536000', > >> > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, > {NAME > >> => > >> > 'i', COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', > BLOCKSIZE > >> => > >> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => 'v', > >> > COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', BLOCKSIZE => > >> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}} > >> > 10/07/29 22:38:42 INFO mapred.TaskRunner: > >> Task:attempt_local_0001_m_000000_0 > >> > is done. And is in the process of commiting > >> > 10/07/29 22:38:42 INFO mapred.LocalJobRunner: > >> > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task > >> attempt_local_0001_m_000000_0 > >> > is allowed to commit now > >> > 10/07/29 22:38:42 INFO mapred.FileOutputCommitter: Saved output of > task > >> > 'attempt_local_0001_m_000000_0' to > >> > file:/usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc > >> > 10/07/29 22:38:42 INFO mapred.LocalJobRunner: > >> > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task > >> > 'attempt_local_0001_m_000000_0' done. > >> > 10/07/29 22:38:43 INFO mapred.JobClient: map 100% reduce 0% > >> > 10/07/29 22:38:43 INFO mapred.JobClient: Job complete: job_local_0001 > >> > 10/07/29 22:38:43 INFO mapred.JobClient: Counters: 6 > >> > 10/07/29 22:38:43 INFO mapred.JobClient: FileSystemCounters > >> > 10/07/29 22:38:43 INFO mapred.JobClient: FILE_BYTES_READ=1592883 > >> > 10/07/29 22:38:43 INFO mapred.JobClient: > FILE_BYTES_WRITTEN=1624956 > >> > 10/07/29 22:38:43 INFO mapred.JobClient: Map-Reduce Framework > >> > 10/07/29 22:38:43 INFO mapred.JobClient: Map input records=0 > >> > 10/07/29 22:38:43 INFO mapred.JobClient: Spilled Records=0 > >> > 10/07/29 22:38:43 INFO mapred.JobClient: Map input bytes=0 > >> > 10/07/29 22:38:43 INFO mapred.JobClient: Map output records=0 > >> > > >> > [sjc1-hadoop8.sjc1:hadoop 3705]ls -l > >> > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000 > >> > -rwxrwxrwx 1 hadoop users 0 Jul 29 22:38 > >> > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000 > >> > > >> > But there are many records in the table I was querying. > >> > > >> > Can someone comment ? > >> > > >> > On Thu, Jul 29, 2010 at 2:26 PM, Jean-Daniel Cryans < > [email protected] > >> >wrote: > >> > > >> >> In 0.89 you can specify CACHE for the count command. Set it higher > (it > >> >> defaults to 10 rows per call). > >> >> > >> >> Also you can use the RowCounter MR job. > >> >> > >> >> J-D > >> >> > >> >> On Thu, Jul 29, 2010 at 2:22 PM, Ted Yu <[email protected]> wrote: > >> >> > Hi, > >> >> > The count method in HBase shell is quite slow. > >> >> > Is there a way to obtain count faster ? > >> >> > > >> >> > Thanks > >> >> > > >> >> > >> > > >> > > > > > > -- > Regards > Angus >
