Re: Custom partitioner for hadoop

Mithila Nagendra Wed, 25 Aug 2010 13:23:52 -0700

In which of the three functions would I have to set the ranges? In the
configure function? Would the configure be called once for every mapper?
Thank you!


On Wed, Aug 25, 2010 at 12:50 PM, David Rosenstrauch <dar...@darose.net>wrote:

> On 08/25/2010 12:40 PM, Mithila Nagendra wrote:
>
>> In order to avoid this I was thinking of
>> passing the range boundaries to the partitioner. How would I do that? Is
>> there an alternative? Any suggestion would prove useful.
>>
>
> We use a custom partitioner, for which we pass in configuration data that
> gets used in the partitioning calculations.
>
> We do it by making the Partitioner implement Configurable, and then grab
> the needed config data from the configuration object that we're given. (We
> set the needed config data on the config object when we submit the job).
>  i.e., like so:
>
> import org.apache.hadoop.mapreduce.Partitioner;
> import org.apache.hadoop.conf.Configurable;
> import org.apache.hadoop.conf.Configuration;
>
> public class OurPartitioner extends Partitioner<BytesWritable, Writable>
> implements Configurable {
> ...
>
>        public int getPartition(BytesWritable key, Writable value, int
> numPartitions) {
> ...
>        }
>
>        public Configuration getConf() {
>                return conf;
>        }
>
>        public void setConf(Configuration conf) {
>                this.conf = conf;
>
>                configure();
>        }
>
>        @SuppressWarnings("unchecked")
>        private void configure() throws IOException {
>                String <parmValue> = conf.get(<parmKey>);
>                if (<parmValue> == null) {
>                        throw new RuntimeException(.....);
>                }
>        }
>
>        private Configuration conf;
> }
>
> HTH,
>
> DR
>

Re: Custom partitioner for hadoop

Reply via email to