In which of the three functions would I have to set the ranges? In the configure function? Would the configure be called once for every mapper? Thank you!
On Wed, Aug 25, 2010 at 12:50 PM, David Rosenstrauch <dar...@darose.net>wrote: > On 08/25/2010 12:40 PM, Mithila Nagendra wrote: > >> In order to avoid this I was thinking of >> passing the range boundaries to the partitioner. How would I do that? Is >> there an alternative? Any suggestion would prove useful. >> > > We use a custom partitioner, for which we pass in configuration data that > gets used in the partitioning calculations. > > We do it by making the Partitioner implement Configurable, and then grab > the needed config data from the configuration object that we're given. (We > set the needed config data on the config object when we submit the job). > i.e., like so: > > import org.apache.hadoop.mapreduce.Partitioner; > import org.apache.hadoop.conf.Configurable; > import org.apache.hadoop.conf.Configuration; > > public class OurPartitioner extends Partitioner<BytesWritable, Writable> > implements Configurable { > ... > > public int getPartition(BytesWritable key, Writable value, int > numPartitions) { > ... > } > > public Configuration getConf() { > return conf; > } > > public void setConf(Configuration conf) { > this.conf = conf; > > configure(); > } > > @SuppressWarnings("unchecked") > private void configure() throws IOException { > String <parmValue> = conf.get(<parmKey>); > if (<parmValue> == null) { > throw new RuntimeException(.....); > } > } > > private Configuration conf; > } > > HTH, > > DR >