Thank you so much for a response. I had one last question. What if I don't want a particular <K, V> pair to be put into a partition? For example, if K=5, then I want the partitioner to skip this Key. How would I do this? I tried to return -1 when I don't want a key to go to any partition, but that causes an "illegal partition" error. How would I do this?
Thanks! Mithila On Wed, Aug 25, 2010 at 1:38 PM, David Rosenstrauch <dar...@darose.net>wrote: > If you define a Hadoop object as implementing Configurable, then its > setConf() method will be called once, right after it gets instantiated. So > each partitioner that gets instantiated will have its setConf() method > called right afterwards. > > I'm taking advantage of that fact by calling my own (private) "configure()" > method when the Partitioner gets its configuration. So in that configure > method, you would grab the ranges from out of the configuration object. > > The flip side of this is that your ranges won't just magically appear in > the configuration object. You'll have to set them on the configuration > object used in the Job that you're submitting. > > A copy of the job's config object will then get passed to each task in your > job, which you can then use to configure that task. > > HTH, > > DR > > > On 08/25/2010 04:23 PM, Mithila Nagendra wrote: > >> In which of the three functions would I have to set the ranges? In the >> configure function? Would the configure be called once for every mapper? >> Thank you! >> >> On Wed, Aug 25, 2010 at 12:50 PM, David Rosenstrauch<dar...@darose.net >> >wrote: >> >> On 08/25/2010 12:40 PM, Mithila Nagendra wrote: >>> >>> In order to avoid this I was thinking of >>>> passing the range boundaries to the partitioner. How would I do that? Is >>>> there an alternative? Any suggestion would prove useful. >>>> >>>> >>> We use a custom partitioner, for which we pass in configuration data that >>> gets used in the partitioning calculations. >>> >>> We do it by making the Partitioner implement Configurable, and then grab >>> the needed config data from the configuration object that we're given. >>> (We >>> set the needed config data on the config object when we submit the job). >>> >>