Hi all ,

 I am working on a sort function and it is working perfectly fine with a
single map task.

 When I give 2 map tasks, the entire data is replicated twice  (sorted
output) . When giving 4 map tasks , it gives 4 times the sorted data. and so
on ....

 I modified the Terasort for this.
 Major modifications : HashPartitioner instead of the TotalOrderPartitioner
                                No Sampler
                                IdentityMapper
                                IdentityReducer

 I have been trying to run the function in a single node.
 I tried printing the length of the fileSplits when they are generated. All
that makes sense.. But final output is getting n times. How to debug this ?
 Some one please tell me whats wrong with my FileSplit / Map tasks whatever
...


  Regards,

  Matthew

Reply via email to