Thank you so much for a response. I had one last question.

What if I don't want a particular <K, V> pair to be put into a partition?
For example, if K=5, then I want the partitioner to skip this Key. How would
I do this? I tried to return -1 when I don't want a key to go to any
partition, but that causes an "illegal partition" error. How would I do
this?

Thanks!
Mithila

On Wed, Aug 25, 2010 at 1:38 PM, David Rosenstrauch <dar...@darose.net>wrote:

> If you define a Hadoop object as implementing Configurable, then its
> setConf() method will be called once, right after it gets instantiated.  So
> each partitioner that gets instantiated will have its setConf() method
> called right afterwards.
>
> I'm taking advantage of that fact by calling my own (private) "configure()"
> method when the Partitioner gets its configuration.  So in that configure
> method, you would grab the ranges from out of the configuration object.
>
> The flip side of this is that your ranges won't just magically appear in
> the configuration object.  You'll have to set them on the configuration
> object used in the Job that you're submitting.
>
> A copy of the job's config object will then get passed to each task in your
> job, which you can then use to configure that task.
>
> HTH,
>
> DR
>
>
> On 08/25/2010 04:23 PM, Mithila Nagendra wrote:
>
>> In which of the three functions would I have to set the ranges? In the
>> configure function? Would the configure be called once for every mapper?
>> Thank you!
>>
>> On Wed, Aug 25, 2010 at 12:50 PM, David Rosenstrauch<dar...@darose.net
>> >wrote:
>>
>>  On 08/25/2010 12:40 PM, Mithila Nagendra wrote:
>>>
>>>  In order to avoid this I was thinking of
>>>> passing the range boundaries to the partitioner. How would I do that? Is
>>>> there an alternative? Any suggestion would prove useful.
>>>>
>>>>
>>> We use a custom partitioner, for which we pass in configuration data that
>>> gets used in the partitioning calculations.
>>>
>>> We do it by making the Partitioner implement Configurable, and then grab
>>> the needed config data from the configuration object that we're given.
>>> (We
>>> set the needed config data on the config object when we submit the job).
>>>
>>

Reply via email to