Thank you, that really helped, I appreciate it. I have a final question
about the following code you posted:

if (partitioner != null) {
    job.setPartitionerClass(HRegionPartitioner.class);
    HTable outputTable = new HTable(new HBaseConfiguration(job), table);
    int regions = outputTable.getRegionsInfo().size();
    if (job.getNumReduceTasks() > regions){
    job.setNumReduceTasks(outputTable.getRegionsInfo().size())
  }

Why do we need to set the number of the reduce tasks according to the number
of regions? Would it make a performance difference?
I am asking this because I didn't use it in my implementation. I configure
the table name and output formats inside the run method which looks like
this:

public int run(String[] args) throws Exception {
....
conf.setOutputKeyClass(ImmutableBytesWritable.class);
conf.setOutputValueClass(BatchUpdate.class);
conf.set("output.table.name",args[1]);
...

}

Notice that I don't have access to the partitioner unlike the
initTableReduceJob method. Is there a way to overcome this?

Thanks
Jim

On Mon, Dec 22, 2008 at 3:43 PM, stack <[email protected]> wrote:

> Jim Twensky wrote:
>
>> Hello Jonathan,
>>
>> Thanks for the fast response. Yes, my question is on other methods to put
>> the same data layout into HBase from my map reduce jobs. I've seen the
>> TableOutputFormat but I couldn't find any example usages of it.
>>
>>
>
> Try src/example/mapred.  See SampleUploader.  It does like you want with a
> mapper prepping data and then a reducer to insert into hbase table.  For how
> the reduce is configured, see the initTableReduceJob in TableMapReduceUtil.
>  It looks like this:
>
>   job.setOutputFormat(TableOutputFormat.class);
>   job.setReducerClass(reducer);
>   job.set(TableOutputFormat.OUTPUT_TABLE, table);
>   job.setOutputKeyClass(ImmutableBytesWritable.class);
>   job.setOutputValueClass(BatchUpdate.class);
>   if (partitioner != null) {
>     job.setPartitionerClass(HRegionPartitioner.class);
>     HTable outputTable = new HTable(new HBaseConfiguration(job), table);
>     int regions = outputTable.getRegionsInfo().size();
>     if (job.getNumReduceTasks() > regions){
>       job.setNumReduceTasks(outputTable.getRegionsInfo().size());
>     }
>   }
>
> ... sets the output format, reducer, output keys, table name, and tries to
> set reduce count.
>
> If any issue with the above, let us know.
> St.Ack
>

Reply via email to