Thanks. As I see it, it cannot be done in the MapReduce 1 framework without changing TaskTracker and JobTracker. Problem is I'm not familiar at all with YARN... it might be possible there. Thanks again!
On Mon, Aug 6, 2012 at 1:21 AM, Harsh J <ha...@cloudera.com> wrote: > Ah, my bad - I skipped over the K-Means part of your original post. > > There currently isn't a way to do this with the existing MR framework and > APIs. A Reducer is initiated upon map completion and the Task JVM is canned > away after the Maps end. Perhaps you can use YARN to write something of > what you desire? > > > On Mon, Aug 6, 2012 at 12:11 AM, Yaron Gonen <yaron.go...@gmail.com>wrote: > >> Thanks for the fast reply, but I don't see how a custom record reader >> will help. >> Consider again the k-means: the mappers need to stand-by until all the >> reducers finish to calculate the new clusters' center. Only then, after the >> reducers finish their work, the stand-by mappers get back to life and >> perform their work. >> >> >> On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <ha...@cloudera.com> wrote: >> >>> Sure you can, as we provide pluggable code points via the API. Just >>> write a custom record reader that doubles the work (first round reads >>> actual input, second round reads your known output and reiterates). In the >>> mapper, separate the first and second logic via a flag. >>> >>> >>> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <yaron.go...@gmail.com>wrote: >>> >>>> Hi, >>>> Is there a way to keep a map-task alive after it has finished its work, >>>> to later perform another task on its same input? >>>> For example, consider the k-means clustering algorithm (k-means >>>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop >>>> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>). >>>> The only thing changing between iterations is the clusters centers. All the >>>> input points remain the same. Keeping the mapper alive, and performing the >>>> next round of map-tasks on the same node will save a lot of communication >>>> cost. >>>> >>>> Thanks, >>>> Yaron >>>> >>> >>> >>> >>> -- >>> Harsh J >>> >> >> > > > -- > Harsh J >