Re: Keeping Map-Tasks alive

Harsh J Sun, 05 Aug 2012 15:22:33 -0700

Ah, my bad - I skipped over the K-Means part of your original post.

There currently isn't a way to do this with the existing MR framework and
APIs. A Reducer is initiated upon map completion and the Task JVM is canned
away after the Maps end. Perhaps you can use YARN to write something of
what you desire?


On Mon, Aug 6, 2012 at 12:11 AM, Yaron Gonen <yaron.go...@gmail.com> wrote:

> Thanks for the fast reply, but I don't see how a custom record reader will
> help.
> Consider again the k-means: the mappers need to stand-by until all the
> reducers finish to calculate the new clusters' center. Only then, after the
> reducers finish their work, the stand-by mappers get back to life and
> perform their work.
>
>
> On Sun, Aug 5, 2012 at 7:49 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Sure you can, as we provide pluggable code points via the API. Just write
>> a custom record reader that doubles the work (first round reads actual
>> input, second round reads your known output and reiterates). In the mapper,
>> separate the first and second logic via a flag.
>>
>>
>> On Sun, Aug 5, 2012 at 4:17 PM, Yaron Gonen <yaron.go...@gmail.com>wrote:
>>
>>> Hi,
>>> Is there a way to keep a map-task alive after it has finished its work,
>>> to later perform another task on its same input?
>>> For example, consider the k-means clustering algorithm (k-means
>>> description <http://en.wikipedia.org/wiki/K-means_clustering> and hadoop
>>> implementation<http://codingwiththomas.blogspot.co.il/2011/05/k-means-clustering-with-mapreduce.html>).
>>> The only thing changing between iterations is the clusters centers. All the
>>> input points remain the same. Keeping the mapper alive, and performing the
>>> next round of map-tasks on the same node will save a lot of communication
>>> cost.
>>>
>>> Thanks,
>>> Yaron
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>
>


-- 
Harsh J

Re: Keeping Map-Tasks alive

Reply via email to