If you define a Hadoop object as implementing Configurable, then its
setConf() method will be called once, right after it gets instantiated.
So each partitioner that gets instantiated will have its setConf()
method called right afterwards.
I'm taking advantage of that fact by calling my own (private)
"configure()" method when the Partitioner gets its configuration. So in
that configure method, you would grab the ranges from out of the
configuration object.
The flip side of this is that your ranges won't just magically appear in
the configuration object. You'll have to set them on the configuration
object used in the Job that you're submitting.
A copy of the job's config object will then get passed to each task in
your job, which you can then use to configure that task.
HTH,
DR
On 08/25/2010 04:23 PM, Mithila Nagendra wrote:
In which of the three functions would I have to set the ranges? In the
configure function? Would the configure be called once for every mapper?
Thank you!
On Wed, Aug 25, 2010 at 12:50 PM, David Rosenstrauch<dar...@darose.net>wrote:
On 08/25/2010 12:40 PM, Mithila Nagendra wrote:
In order to avoid this I was thinking of
passing the range boundaries to the partitioner. How would I do that? Is
there an alternative? Any suggestion would prove useful.
We use a custom partitioner, for which we pass in configuration data that
gets used in the partitioning calculations.
We do it by making the Partitioner implement Configurable, and then grab
the needed config data from the configuration object that we're given. (We
set the needed config data on the config object when we submit the job).