I'm new to Hadoop and I want to use it for my data processing. My
understanding is that each Split will be processed by a mapper task, so for
my application I have mapper in which I populate backend data store with
data from splits, after all splits are consumed, I want to run a piece of
code to post-processing the data stored in backend data store, is there any
clean way to do this?

Can I have the post-processing running only at the node which involed in
mapper phase? Since the number of splits may be less than number of nodes in
the cluster, so some nodes may not involve in the job, I do not want them
involved in this post-processing either.

Thanks for your help.

-- 
--Anfernee

Reply via email to