I think OR is more reasonable. On Thu, Jul 29, 2010 at 8:54 PM, Angus He <[email protected]> wrote:
> By the way > > If users input multiple columns, it seems that the current > implementation of RowCounter employs the OR logical operation. > > Is the AND more reasonable? > > > > On Fri, Jul 30, 2010 at 11:13 AM, Ryan Rawson <[email protected]> wrote: > > RowCounter job counts rows. Its answer will be how many distinct row keys > > were in the table approximately at a given time range. > > > > Even if the implementation uses first kv filter nothing about what I just > > said is false. > > > > A KeyValue counter would tell you how many cells and versions there were > > total don't you think? > > > > On Jul 29, 2010 7:58 PM, "Angus He" <[email protected]> wrote: > >> Column names are just optional for RowCounter job. > >> > >> To be more accurate, RowCounter is a KeyValueCounter. > >> If no columns are specified, only the first KeyValues of each row are > >> included, then get the RowCounter. > >> > >> > >> On Fri, Jul 30, 2010 at 9:28 AM, Ted Yu <[email protected]> wrote: > >>> If someone can share the commandline for running RowCounter, that would > > be > >>> great. > >>> > >>> Also, hbase shell count doesn't require column name. Why does > RowCounter > >>> require it ? > >>> > >>> Thanks > >>> > >>> On Thu, Jul 29, 2010 at 4:55 PM, Ryan Rawson <[email protected]> > wrote: > >>> > >>>> Hi, > >>>> > >>>> That table appears to be empty. Eg: > >>>> > >>>> 10/07/29 22:38:43 INFO mapred.JobClient: Map input records=0 > >>>> > >>>> > >>>> So back to the count issue... Counting in databases is a classic > >>>> problem. Unless your DB system is keeping stats on how many > >>>> inserts/deletes and thus how big it thinks the table is, you have to > >>>> count all the rows by reading them. HBase is no different, and a > >>>> little harder, because we have a variable length data format, so we > >>>> can't just estimate row sizes from file sizes. Keeping distributed > >>>> stats is not impossible, but certainly not on any priority list to be > >>>> implemented - of course JIRAs/patches welcome etc. > >>>> > >>>> -ryan > >>>> > >>>> > >>>> On Thu, Jul 29, 2010 at 3:48 PM, Ted Yu <[email protected]> wrote: > >>>> > We use HBase 0.20.5 > >>>> > > >>>> > Here is the snippet from RowCounter output: > >>>> > > >>>> > 10/07/29 22:38:42 DEBUG client.HTable$ClientScanner: Finished with > >>>> scanning > >>>> > at REGION => {NAME => > >>>> > > >>>> > > > '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0,DFF46493EB352D0E31CBFA4652E6EC06,1280412540858', > >>>> > STARTKEY => 'DFF46493EB352D0E31CBFA4652E6EC06', ENDKEY => '', > ENCODED > > => > >>>> > 1375318608, TABLE => {{NAME => > >>>> > '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0', > >>>> FAMILIES > >>>> > => [{NAME => 'd', COMPRESSION => 'GZ', VERSIONS => '2', TTL => > >>>> '31536000', > >>>> > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, > > {NAME > >>>> => > >>>> > 'i', COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', > > BLOCKSIZE > >>>> => > >>>> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => 'v', > >>>> > COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', BLOCKSIZE > => > >>>> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}} > >>>> > 10/07/29 22:38:42 INFO mapred.TaskRunner: > >>>> Task:attempt_local_0001_m_000000_0 > >>>> > is done. And is in the process of commiting > >>>> > 10/07/29 22:38:42 INFO mapred.LocalJobRunner: > >>>> > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task > >>>> attempt_local_0001_m_000000_0 > >>>> > is allowed to commit now > >>>> > 10/07/29 22:38:42 INFO mapred.FileOutputCommitter: Saved output of > > task > >>>> > 'attempt_local_0001_m_000000_0' to > >>>> > file:/usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc > >>>> > 10/07/29 22:38:42 INFO mapred.LocalJobRunner: > >>>> > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task > >>>> > 'attempt_local_0001_m_000000_0' done. > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: map 100% reduce 0% > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: Job complete: > job_local_0001 > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: Counters: 6 > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: FileSystemCounters > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: FILE_BYTES_READ=1592883 > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: > > FILE_BYTES_WRITTEN=1624956 > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: Map-Reduce Framework > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: Map input records=0 > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: Spilled Records=0 > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: Map input bytes=0 > >>>> > 10/07/29 22:38:43 INFO mapred.JobClient: Map output records=0 > >>>> > > >>>> > [sjc1-hadoop8.sjc1:hadoop 3705]ls -l > >>>> > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000 > >>>> > -rwxrwxrwx 1 hadoop users 0 Jul 29 22:38 > >>>> > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000 > >>>> > > >>>> > But there are many records in the table I was querying. > >>>> > > >>>> > Can someone comment ? > >>>> > > >>>> > On Thu, Jul 29, 2010 at 2:26 PM, Jean-Daniel Cryans < > > [email protected] > >>>> >wrote: > >>>> > > >>>> >> In 0.89 you can specify CACHE for the count command. Set it higher > > (it > >>>> >> defaults to 10 rows per call). > >>>> >> > >>>> >> Also you can use the RowCounter MR job. > >>>> >> > >>>> >> J-D > >>>> >> > >>>> >> On Thu, Jul 29, 2010 at 2:22 PM, Ted Yu <[email protected]> > wrote: > >>>> >> > Hi, > >>>> >> > The count method in HBase shell is quite slow. > >>>> >> > Is there a way to obtain count faster ? > >>>> >> > > >>>> >> > Thanks > >>>> >> > > >>>> >> > >>>> > > >>>> > >>> > >> > >> > >> > >> -- > >> Regards > >> Angus > > > > > > -- > Regards > Angus >
