I'm new to Hadoop and I want to use it for my data processing. My understanding is that each Split will be processed by a mapper task, so for my application I have mapper in which I populate backend data store with data from splits, after all splits are consumed, I want to run a piece of code to post-processing the data stored in backend data store, is there any clean way to do this?
Can I have the post-processing running only at the node which involed in mapper phase? Since the number of splits may be less than number of nodes in the cluster, so some nodes may not involve in the job, I do not want them involved in this post-processing either. Thanks for your help. -- --Anfernee
