[jira] Updated: (HBASE-559) MR example job to count table rows

stack (JIRA) Fri, 04 Apr 2008 11:11:58 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack updated HBASE-559:
------------------------

    Attachment: 559-0.1-v2.patch

Attached is a tested patch.  Also makes it so hbase.jar is now a hadoop job 
jar.   There's a Driver under mapred.  Add MR jobs to its list to make it so 
can do:

{code}
./bin/hadoop -jar hbase.jar
{code}

... and you'll get a list of the hbase MR jobs.

Here is how our dumb rowcounter looks:

{code}
durruti$ ./bin/hbase org.apache.hadoop.hbase.mapred.Driver rowcounter 
/tmp/output x x:
08/04/04 10:59:58 INFO jvm.JvmMetrics: Initializing JVM Metrics with 
processName=JobTracker, sessionId=
08/04/04 10:59:58 WARN mapred.JobClient: No job jar file set.  User classes may 
not be found. See JobConf(Class) or JobConf#setJar(String).
08/04/04 10:59:58 INFO mapred.JobClient: Running job: job_local_1
08/04/04 10:59:58 INFO mapred.MapTask: numReduceTasks: 1
08/04/04 10:59:58 INFO hbase.HTable: Creating scanner over x starting at key 
08/04/04 10:59:58 INFO mapred.LocalJobRunner: 
08/04/04 10:59:58 INFO mapred.TaskRunner: Task 'job_local_1_map_0000' done.
08/04/04 10:59:58 INFO mapred.LocalJobRunner: reduce > reduce
08/04/04 10:59:58 INFO mapred.TaskRunner: Task 'reduce_qv8ybc' done.
08/04/04 10:59:58 INFO mapred.TaskRunner: Saved output of task 'reduce_qv8ybc' 
to file:/tmp/output
08/04/04 10:59:59 INFO mapred.JobClient: Job complete: job_local_1
08/04/04 10:59:59 INFO mapred.JobClient: Counters: 10
08/04/04 10:59:59 INFO mapred.JobClient:   RowCounter
08/04/04 10:59:59 INFO mapred.JobClient:     Rows=1
08/04/04 10:59:59 INFO mapred.JobClient:   Map-Reduce Framework
08/04/04 10:59:59 INFO mapred.JobClient:     Map input records=1
08/04/04 10:59:59 INFO mapred.JobClient:     Map output records=1
08/04/04 10:59:59 INFO mapred.JobClient:     Map input bytes=0
08/04/04 10:59:59 INFO mapred.JobClient:     Map output bytes=7
08/04/04 10:59:59 INFO mapred.JobClient:     Combine input records=0
08/04/04 10:59:59 INFO mapred.JobClient:     Combine output records=0
08/04/04 10:59:59 INFO mapred.JobClient:     Reduce input groups=1
08/04/04 10:59:59 INFO mapred.JobClient:     Reduce input records=1
08/04/04 10:59:59 INFO mapred.JobClient:     Reduce output records=1
{code}

Here is commit comment:

{code}
M build.xml
    (Jar target): Add copying of any properties files under src/java
    Also added Main-Class to manifest.
M  src/java/org/apache/hadoop/hbase/mapred/RowCounter_Counters.properties
    Added resource so MR can print out counters for RowCounter MR job
M  src/java/org/apache/hadoop/hbase/mapred/RowCounter.java
    Example, simple MR job that counts non-empty rows.
M  src/java/org/apache/hadoop/hbase/mapred/Driver.java
    Driver class. General entry point for hbase MR jobs.
{code}

Will apply to branch after 0.1.1 goes out.  Will apply to TRUNK when MR API 
settles.

> MR example job to count table rows
> ----------------------------------
>
>                 Key: HBASE-559
>                 URL: https://issues.apache.org/jira/browse/HBASE-559
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 559-0.1-v2.patch, 559.patch
>
>
> The Lars' import is a little messy; he's not sure how many records were 
> imported.  Running a select takes a couple of hours.  He happens to have an 
> idle MR cluster standing by.  An example MR job that just did a count of 
> records would be generally useful.  Could even output row keys so you'd have 
> a list of what made it in.   Later, if this tool becomes popular with 
> derivatives and similiars, we can bundle a jar of MR jobs to run against your 
> tables that can answer common queries and that are amenable to 
> subclassing/modification.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-559) MR example job to count table rows

Reply via email to