Thanks for the fast reply, but I don't see how a custom record reader will help. Consider again the k-means: the mappers need to stand-by until all the reducers finish to calculate the new clusters' center. Only then, after the reducers finish their work, the stand-by mappers get back to life and perform their work.
On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <ha...@cloudera.com> wrote: > Sure you can, as we provide pluggable code points via the API. Just write > a custom record reader that doubles the work (first round reads actual > input, second round reads your known output and reiterates). In the mapper, > separate the first and second logic via a flag. > > > On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <yaron.go...@gmail.com> wrote: > >> Hi, >> Is there a way to keep a map-task alive after it has finished its work, >> to later perform another task on its same input? >> For example, consider the k-means clustering algorithm (k-means >> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop >> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>). >> The only thing changing between iterations is the clusters centers. All the >> input points remain the same. Keeping the mapper alive, and performing the >> next round of map-tasks on the same node will save a lot of communication >> cost. >> >> Thanks, >> Yaron >> > > > > -- > Harsh J >