[
https://issues.apache.org/jira/browse/HBASE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-559:
------------------------
Attachment: 559-0.1-v2.patch
Attached is a tested patch. Also makes it so hbase.jar is now a hadoop job
jar. There's a Driver under mapred. Add MR jobs to its list to make it so
can do:
{code}
./bin/hadoop -jar hbase.jar
{code}
... and you'll get a list of the hbase MR jobs.
Here is how our dumb rowcounter looks:
{code}
durruti$ ./bin/hbase org.apache.hadoop.hbase.mapred.Driver rowcounter
/tmp/output x x:
08/04/04 10:59:58 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
08/04/04 10:59:58 WARN mapred.JobClient: No job jar file set. User classes may
not be found. See JobConf(Class) or JobConf#setJar(String).
08/04/04 10:59:58 INFO mapred.JobClient: Running job: job_local_1
08/04/04 10:59:58 INFO mapred.MapTask: numReduceTasks: 1
08/04/04 10:59:58 INFO hbase.HTable: Creating scanner over x starting at key
08/04/04 10:59:58 INFO mapred.LocalJobRunner:
08/04/04 10:59:58 INFO mapred.TaskRunner: Task 'job_local_1_map_0000' done.
08/04/04 10:59:58 INFO mapred.LocalJobRunner: reduce > reduce
08/04/04 10:59:58 INFO mapred.TaskRunner: Task 'reduce_qv8ybc' done.
08/04/04 10:59:58 INFO mapred.TaskRunner: Saved output of task 'reduce_qv8ybc'
to file:/tmp/output
08/04/04 10:59:59 INFO mapred.JobClient: Job complete: job_local_1
08/04/04 10:59:59 INFO mapred.JobClient: Counters: 10
08/04/04 10:59:59 INFO mapred.JobClient: RowCounter
08/04/04 10:59:59 INFO mapred.JobClient: Rows=1
08/04/04 10:59:59 INFO mapred.JobClient: Map-Reduce Framework
08/04/04 10:59:59 INFO mapred.JobClient: Map input records=1
08/04/04 10:59:59 INFO mapred.JobClient: Map output records=1
08/04/04 10:59:59 INFO mapred.JobClient: Map input bytes=0
08/04/04 10:59:59 INFO mapred.JobClient: Map output bytes=7
08/04/04 10:59:59 INFO mapred.JobClient: Combine input records=0
08/04/04 10:59:59 INFO mapred.JobClient: Combine output records=0
08/04/04 10:59:59 INFO mapred.JobClient: Reduce input groups=1
08/04/04 10:59:59 INFO mapred.JobClient: Reduce input records=1
08/04/04 10:59:59 INFO mapred.JobClient: Reduce output records=1
{code}
Here is commit comment:
{code}
M build.xml
(Jar target): Add copying of any properties files under src/java
Also added Main-Class to manifest.
M src/java/org/apache/hadoop/hbase/mapred/RowCounter_Counters.properties
Added resource so MR can print out counters for RowCounter MR job
M src/java/org/apache/hadoop/hbase/mapred/RowCounter.java
Example, simple MR job that counts non-empty rows.
M src/java/org/apache/hadoop/hbase/mapred/Driver.java
Driver class. General entry point for hbase MR jobs.
{code}
Will apply to branch after 0.1.1 goes out. Will apply to TRUNK when MR API
settles.
> MR example job to count table rows
> ----------------------------------
>
> Key: HBASE-559
> URL: https://issues.apache.org/jira/browse/HBASE-559
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Attachments: 559-0.1-v2.patch, 559.patch
>
>
> The Lars' import is a little messy; he's not sure how many records were
> imported. Running a select takes a couple of hours. He happens to have an
> idle MR cluster standing by. An example MR job that just did a count of
> records would be generally useful. Could even output row keys so you'd have
> a list of what made it in. Later, if this tool becomes popular with
> derivatives and similiars, we can bundle a jar of MR jobs to run against your
> tables that can answer common queries and that are amenable to
> subclassing/modification.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.