Thank you, that really helped, I appreciate it. I have a final question
about the following code you posted:
if (partitioner != null) {
job.setPartitionerClass(HRegionPartitioner.class);
HTable outputTable = new HTable(new HBaseConfiguration(job), table);
int regions = outputTable.getRegionsInfo().size();
if (job.getNumReduceTasks() > regions){
job.setNumReduceTasks(outputTable.getRegionsInfo().size())
}
Why do we need to set the number of the reduce tasks according to the number
of regions? Would it make a performance difference?
I am asking this because I didn't use it in my implementation. I configure
the table name and output formats inside the run method which looks like
this:
public int run(String[] args) throws Exception {
....
conf.setOutputKeyClass(ImmutableBytesWritable.class);
conf.setOutputValueClass(BatchUpdate.class);
conf.set("output.table.name",args[1]);
...
}
Notice that I don't have access to the partitioner unlike the
initTableReduceJob method. Is there a way to overcome this?
Thanks
Jim
On Mon, Dec 22, 2008 at 3:43 PM, stack <[email protected]> wrote:
> Jim Twensky wrote:
>
>> Hello Jonathan,
>>
>> Thanks for the fast response. Yes, my question is on other methods to put
>> the same data layout into HBase from my map reduce jobs. I've seen the
>> TableOutputFormat but I couldn't find any example usages of it.
>>
>>
>
> Try src/example/mapred. See SampleUploader. It does like you want with a
> mapper prepping data and then a reducer to insert into hbase table. For how
> the reduce is configured, see the initTableReduceJob in TableMapReduceUtil.
> It looks like this:
>
> job.setOutputFormat(TableOutputFormat.class);
> job.setReducerClass(reducer);
> job.set(TableOutputFormat.OUTPUT_TABLE, table);
> job.setOutputKeyClass(ImmutableBytesWritable.class);
> job.setOutputValueClass(BatchUpdate.class);
> if (partitioner != null) {
> job.setPartitionerClass(HRegionPartitioner.class);
> HTable outputTable = new HTable(new HBaseConfiguration(job), table);
> int regions = outputTable.getRegionsInfo().size();
> if (job.getNumReduceTasks() > regions){
> job.setNumReduceTasks(outputTable.getRegionsInfo().size());
> }
> }
>
> ... sets the output format, reducer, output keys, table name, and tries to
> set reduce count.
>
> If any issue with the above, let us know.
> St.Ack
>