Hi all ,
I am working on a sort function and it is working perfectly fine with a
single map task.
When I give 2 map tasks, the entire data is replicated twice (sorted
output) . When giving 4 map tasks , it gives 4 times the sorted data. and so
on ....
I modified the Terasort for this.
Major modifications : HashPartitioner instead of the TotalOrderPartitioner
No Sampler
IdentityMapper
IdentityReducer
I have been trying to run the function in a single node.
I tried printing the length of the fileSplits when they are generated. All
that makes sense.. But final output is getting n times. How to debug this ?
Some one please tell me whats wrong with my FileSplit / Map tasks whatever
...
Regards,
Matthew