You should define your own partitioner. 2011/3/23 Luca Pireddu <[email protected]>
> On March 22, 2011 16:54:34 Shi Yu wrote: > > I guess you need to define a Partitioner to send hased keys to different > > reducers (sorry, I am still using the old API so probably there is > > something new in the trunk release). Basically you try to segment the > > keys into different zones, 0-10, 11-20, ... > > > > maybe check the hashCode() function and see how to categorize these > zones? > > > > Shi > > > > On 3/22/2011 9:24 AM, JunYoung Kim wrote: > > > hi, > > > > > > I run almost 60 ruduce tasks for a single job. > > > > > > if the outputs of a job are from part00 to part 59. > > > > > > is there way to write rows sequentially by sorted keys? > > > > > > curretly my outputs are like this. > > > > > > part00) > > > 1 > > > 10 > > > 12 > > > 14 > > > > > > part 01) > > > 2 > > > 4 > > > 6 > > > 11 > > > 13 > > > > > > part 02) > > > 3 > > > 5 > > > 7 > > > 8 > > > 9 > > > > > > but, my aim is to get the following results. > > > > > > part00) > > > 1 > > > 2 > > > 3 > > > 4 > > > 5 > > > > > > part01) > > > 6 > > > 7 > > > 8 > > > 9 > > > 10 > > > > > > part02) > > > 11 > > > 12 > > > 13 > > > 14 > > > 15 > > > > > > the hadoop is able to support this kind of one? > > > > > > thanks > > > You can look at TeraSort in the examples to see how to do this. There's > even > a short write-up by Owen O'Malley about it here: > http://sortbenchmark.org/YahooHadoop.pdf > > > > -- > Luca Pireddu > CRS4 - Distributed Computing Group > Loc. Pixina Manna Edificio 1 > Pula 09010 (CA), Italy > Tel: +39 0709250452 >
