Re: Custom partitioner for hadoop

David Rosenstrauch Wed, 25 Aug 2010 13:39:01 -0700

If you define a Hadoop object as implementing Configurable, then itssetConf() method will be called once, right after it gets instantiated.So each partitioner that gets instantiated will have its setConf()method called right afterwards.

I'm taking advantage of that fact by calling my own (private)"configure()" method when the Partitioner gets its configuration. So inthat configure method, you would grab the ranges from out of theconfiguration object.

The flip side of this is that your ranges won't just magically appear inthe configuration object. You'll have to set them on the configurationobject used in the Job that you're submitting.

A copy of the job's config object will then get passed to each task inyour job, which you can then use to configure that task.


HTH,

DR

On 08/25/2010 04:23 PM, Mithila Nagendra wrote:

In which of the three functions would I have to set the ranges? In the
configure function? Would the configure be called once for every mapper?
Thank you!

On Wed, Aug 25, 2010 at 12:50 PM, David Rosenstrauch<dar...@darose.net>wrote:

On 08/25/2010 12:40 PM, Mithila Nagendra wrote:

In order to avoid this I was thinking of
passing the range boundaries to the partitioner. How would I do that? Is
there an alternative? Any suggestion would prove useful.


We use a custom partitioner, for which we pass in configuration data that
gets used in the partitioning calculations.

We do it by making the Partitioner implement Configurable, and then grab
the needed config data from the configuration object that we're given. (We
set the needed config data on the config object when we submit the job).

Re: Custom partitioner for hadoop

Reply via email to