Hi again,
>>>Pardon me but which 'run' method? Why do you not have access? Its a
public class? (Sorry if I'm missing an obvious -- still on >>>first cup of
coffee).
So here is how my class looks like:
public class PhraseGenerator extends Configured implements Tool {
...
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), PhraseGenerator.class);
conf.setJobName("PhraseGenerator");
....
}
}
I tried to look around to find a method to get the partitioner via JobConf
but there is no such thing. At this point, this question is related to
Hadoop rather than Hbase and sorry if I'm asking something too obvious but I
usually check the API documentations and the tutorials before asking
questions and I got stuck.
Thanks,
Jim
On Tue, Dec 23, 2008 at 10:05 AM, stack <[email protected]> wrote:
> Jim Twensky wrote:
>
>> ...
>> Why do we need to set the number of the reduce tasks according to the
>> number
>> of regions? Would it make a performance difference?
>>
>>
>
> Regions are the 'natural' division in hbase. My guess is that the
> partitioner was an attempt at calculating an N for reducers that was other
> than 1 or just some hard-coding.
>
> Other considerations are that at the reduce stage, keys are sorted so
> inserts into hbase will be ordered. In this case, cutting the key space so
> its divided at region boundaries could help distributing the upload and help
> performance. I'd imagine this would work best in a mature table, one that
> is already carrying a load, and where the upload is some smallish percentage
> of the total. Otherwise, regions splitting would throw this partitioner
> calculation out of kilter.
>
> I am asking this because I didn't use it in my implementation. I configure
>> the table name and output formats inside the run method which looks like
>> this:
>>
>> public int run(String[] args) throws Exception {
>> ....
>> conf.setOutputKeyClass(ImmutableBytesWritable.class);
>> conf.setOutputValueClass(BatchUpdate.class);
>> conf.set("output.table.name",args[1]);
>> ...
>>
>> }
>>
>> Notice that I don't have access to the partitioner unlike the
>> initTableReduceJob method. Is there a way to overcome this?
>>
>>
>>
> Pardon me but which 'run' method? Why do you not have access? Its a public
> class? (Sorry if I'm missing an obvious -- still on first cup of coffee).
>
> St.Ack
>