Thanks, Ryan.

Yes, It only count rows.  :)




On Fri, Jul 30, 2010 at 11:13 AM, Ryan Rawson <[email protected]> wrote:
> RowCounter job counts rows. Its answer will be how many distinct row keys
> were in the table approximately at a given time range.
>
> Even if the implementation uses first kv filter nothing about what I just
> said is false.
>
> A KeyValue counter would tell you how many cells and versions there were
> total don't you think?
>
> On Jul 29, 2010 7:58 PM, "Angus He" <[email protected]> wrote:
>> Column names are just optional for RowCounter job.
>>
>> To be more accurate, RowCounter is a KeyValueCounter.
>> If no columns are specified, only the first KeyValues of each row are
>> included, then get the RowCounter.
>>
>>
>> On Fri, Jul 30, 2010 at 9:28 AM, Ted Yu <[email protected]> wrote:
>>> If someone can share the commandline for running RowCounter, that would
> be
>>> great.
>>>
>>> Also, hbase shell count doesn't require column name. Why does RowCounter
>>> require it ?
>>>
>>> Thanks
>>>
>>> On Thu, Jul 29, 2010 at 4:55 PM, Ryan Rawson <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> That table appears to be empty.  Eg:
>>>>
>>>> 10/07/29 22:38:43 INFO mapred.JobClient:     Map input records=0
>>>>
>>>>
>>>> So back to the count issue... Counting in databases is a classic
>>>> problem. Unless your DB system is keeping stats on how many
>>>> inserts/deletes and thus how big it thinks the table is, you have to
>>>> count all the rows by reading them.  HBase is no different, and a
>>>> little harder, because we have a variable length data format, so we
>>>> can't just estimate row sizes from file sizes.  Keeping distributed
>>>> stats is not impossible, but certainly not on any priority list to be
>>>> implemented - of course JIRAs/patches welcome etc.
>>>>
>>>> -ryan
>>>>
>>>>
>>>> On Thu, Jul 29, 2010 at 3:48 PM, Ted Yu <[email protected]> wrote:
>>>> > We use HBase 0.20.5
>>>> >
>>>> > Here is the snippet from RowCounter output:
>>>> >
>>>> > 10/07/29 22:38:42 DEBUG client.HTable$ClientScanner: Finished with
>>>> scanning
>>>> > at REGION => {NAME =>
>>>> >
>>>>
> '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0,DFF46493EB352D0E31CBFA4652E6EC06,1280412540858',
>>>> > STARTKEY => 'DFF46493EB352D0E31CBFA4652E6EC06', ENDKEY => '', ENCODED
> =>
>>>> > 1375318608, TABLE => {{NAME =>
>>>> > '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0',
>>>> FAMILIES
>>>> > => [{NAME => 'd', COMPRESSION => 'GZ', VERSIONS => '2', TTL =>
>>>> '31536000',
>>>> > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'},
> {NAME
>>>> =>
>>>> > 'i', COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000',
> BLOCKSIZE
>>>> =>
>>>> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => 'v',
>>>> > COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', BLOCKSIZE =>
>>>> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
>>>> > 10/07/29 22:38:42 INFO mapred.TaskRunner:
>>>> Task:attempt_local_0001_m_000000_0
>>>> > is done. And is in the process of commiting
>>>> > 10/07/29 22:38:42 INFO mapred.LocalJobRunner:
>>>> > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task
>>>> attempt_local_0001_m_000000_0
>>>> > is allowed to commit now
>>>> > 10/07/29 22:38:42 INFO mapred.FileOutputCommitter: Saved output of
> task
>>>> > 'attempt_local_0001_m_000000_0' to
>>>> > file:/usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc
>>>> > 10/07/29 22:38:42 INFO mapred.LocalJobRunner:
>>>> > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task
>>>> > 'attempt_local_0001_m_000000_0' done.
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient:  map 100% reduce 0%
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient: Job complete: job_local_0001
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient: Counters: 6
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient:   FileSystemCounters
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient:     FILE_BYTES_READ=1592883
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient:
> FILE_BYTES_WRITTEN=1624956
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient:   Map-Reduce Framework
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient:     Map input records=0
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient:     Spilled Records=0
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient:     Map input bytes=0
>>>> > 10/07/29 22:38:43 INFO mapred.JobClient:     Map output records=0
>>>> >
>>>> > [sjc1-hadoop8.sjc1:hadoop 3705]ls -l
>>>> > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000
>>>> > -rwxrwxrwx 1 hadoop users 0 Jul 29 22:38
>>>> > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000
>>>> >
>>>> > But there are many records in the table I was querying.
>>>> >
>>>> > Can someone comment ?
>>>> >
>>>> > On Thu, Jul 29, 2010 at 2:26 PM, Jean-Daniel Cryans <
> [email protected]
>>>> >wrote:
>>>> >
>>>> >> In 0.89 you can specify CACHE for the count command. Set it higher
> (it
>>>> >> defaults to 10 rows per call).
>>>> >>
>>>> >> Also you can use the RowCounter MR job.
>>>> >>
>>>> >> J-D
>>>> >>
>>>> >> On Thu, Jul 29, 2010 at 2:22 PM, Ted Yu <[email protected]> wrote:
>>>> >> > Hi,
>>>> >> > The count method in HBase shell is quite slow.
>>>> >> > Is there a way to obtain count faster ?
>>>> >> >
>>>> >> > Thanks
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>
>>
>>
>>
>> --
>> Regards
>> Angus
>



-- 
Regards
Angus

Reply via email to