On Sep 25, 2011, at 2:01 PM, He Chen wrote: > Hi Arun and Harsh J > > Thank you for your replies. > > Yes, there will be two finally. But during the map running, there are more > than two. > > The scenario I mentioned before will not occur with the Hadoop default > partitioner. If there is a partitioner lead to above problem. Is there any > security policy prevent this? >
Irrespective of the partitioner used a single file stores all keys/values during a single iteration of each 'spill' after sorting records in the sort-buffer. You could have multiple spills, but you have lots of keys/values in each spill - we never do file per record. You'd very quickly run out of inodes. In very early days we had a file per reducer and that caused huge issues, never mind file per record. Arun