Yes, it works if the node only has a single split, if it has multiple, that's still a problem since not all data has been processed.
On Wed, Aug 25, 2010 at 11:08 PM, David Rosenstrauch <[email protected]>wrote: > On 08/25/2010 10:36 AM, Anfernee Xu wrote: > >> Thanks all for your help. >> >> The challenge is that suppose I have 4 datanodes in cluster, but for a >> given >> input, I have 2 splits, therefore only 2 nodes out of 4 will run M/R job, >> say nodeA and nodeB, after the job completes, the data from input has been >> stored in datastore on nodeA and nodeB, nodeC and nodeD are intact at this >> moment, for now I need to run a post-processing on nodeA and nodeB to get >> my >> data ready, originally I think I can have another M/R job also with 2 >> splits, but I cannot tell which node will be selected to run these splits, >> I >> expected the same nodes will be selected. >> >> Anfernee >> > > Well then you could put your post-processing in Mapper.cleanup. > > > http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup%28org.apache.hadoop.mapreduce.Mapper.Context%29 > > DR > -- --Anfernee
