Hi, That table appears to be empty. Eg:
10/07/29 22:38:43 INFO mapred.JobClient: Map input records=0 So back to the count issue... Counting in databases is a classic problem. Unless your DB system is keeping stats on how many inserts/deletes and thus how big it thinks the table is, you have to count all the rows by reading them. HBase is no different, and a little harder, because we have a variable length data format, so we can't just estimate row sizes from file sizes. Keeping distributed stats is not impossible, but certainly not on any priority list to be implemented - of course JIRAs/patches welcome etc. -ryan On Thu, Jul 29, 2010 at 3:48 PM, Ted Yu <[email protected]> wrote: > We use HBase 0.20.5 > > Here is the snippet from RowCounter output: > > 10/07/29 22:38:42 DEBUG client.HTable$ClientScanner: Finished with scanning > at REGION => {NAME => > '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0,DFF46493EB352D0E31CBFA4652E6EC06,1280412540858', > STARTKEY => 'DFF46493EB352D0E31CBFA4652E6EC06', ENDKEY => '', ENCODED => > 1375318608, TABLE => {{NAME => > '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0', FAMILIES > => [{NAME => 'd', COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => > 'i', COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', BLOCKSIZE => > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => 'v', > COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', BLOCKSIZE => > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}} > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 > is done. And is in the process of commiting > 10/07/29 22:38:42 INFO mapred.LocalJobRunner: > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task attempt_local_0001_m_000000_0 > is allowed to commit now > 10/07/29 22:38:42 INFO mapred.FileOutputCommitter: Saved output of task > 'attempt_local_0001_m_000000_0' to > file:/usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc > 10/07/29 22:38:42 INFO mapred.LocalJobRunner: > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task > 'attempt_local_0001_m_000000_0' done. > 10/07/29 22:38:43 INFO mapred.JobClient: map 100% reduce 0% > 10/07/29 22:38:43 INFO mapred.JobClient: Job complete: job_local_0001 > 10/07/29 22:38:43 INFO mapred.JobClient: Counters: 6 > 10/07/29 22:38:43 INFO mapred.JobClient: FileSystemCounters > 10/07/29 22:38:43 INFO mapred.JobClient: FILE_BYTES_READ=1592883 > 10/07/29 22:38:43 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1624956 > 10/07/29 22:38:43 INFO mapred.JobClient: Map-Reduce Framework > 10/07/29 22:38:43 INFO mapred.JobClient: Map input records=0 > 10/07/29 22:38:43 INFO mapred.JobClient: Spilled Records=0 > 10/07/29 22:38:43 INFO mapred.JobClient: Map input bytes=0 > 10/07/29 22:38:43 INFO mapred.JobClient: Map output records=0 > > [sjc1-hadoop8.sjc1:hadoop 3705]ls -l > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000 > -rwxrwxrwx 1 hadoop users 0 Jul 29 22:38 > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000 > > But there are many records in the table I was querying. > > Can someone comment ? > > On Thu, Jul 29, 2010 at 2:26 PM, Jean-Daniel Cryans > <[email protected]>wrote: > >> In 0.89 you can specify CACHE for the count command. Set it higher (it >> defaults to 10 rows per call). >> >> Also you can use the RowCounter MR job. >> >> J-D >> >> On Thu, Jul 29, 2010 at 2:22 PM, Ted Yu <[email protected]> wrote: >> > Hi, >> > The count method in HBase shell is quite slow. >> > Is there a way to obtain count faster ? >> > >> > Thanks >> > >> >
