Hi Mete A custom Paritioner class can control the flow of keys to the desired reducer. It gives you more control on which key to which reducer.
Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: mete <efk...@gmail.com> Date: Fri, 27 Apr 2012 09:19:21 To: <common-user@hadoop.apache.org> Reply-To: common-user@hadoop.apache.org Subject: reducers and data locality Hello folks, I have a lot of input splits (10k-50k - 128 mb blocks) which contains text files. I need to process those line by line, then copy the result into roughly equal size of "shards". So i generate a random key (from a range of [0:numberOfShards]) which is used to route the map output to different reducers and the size is more less equal. I know that this is not really efficient and i was wondering if i could somehow control how keys are routed. For example could i generate the randomKeys with hostname prefixes and control which keys are sent to each reducer? What do you think? Kind regards Mete