my problem is more general (than graph problems) and doesn't need to
have logic built around synchronization or failure. for example, when
a mapper is finished successfully, it just writes/persists to a
storage location (could be disk, could be database, could be memory,
etc...). when the next input is processed (could be on the same mapper
or different mapper), i just need to do a lookup from the storage
location (that is accessible by all task nodes). if the mapper fails,
this doesn't hurt my processing, although i would like for no failures
(and it's good if hadoop can spawn another task to mitigate).



On Wed, Sep 26, 2012 at 11:43 AM, Bertrand Dechoux <decho...@gmail.com> wrote:
> The difficulty with data transfer between tasks is handling synchronisation
> and failure.
> You may want to look at graph processing done on top of Hadoop (like
> Giraph).
> That's one way to do it but whether it is relevant or not to you will
> depend on your context.
>
> Regards
>
> Bertrand
>
> On Wed, Sep 26, 2012 at 5:36 PM, Jane Wayne <jane.wayne2...@gmail.com>wrote:
>
>> hi,
>>
>> i know that some algorithms cannot be parallelized and adapted to the
>> mapreduce paradigm. however, i have noticed that in most cases where i
>> find myself struggling to express an algorithm in mapreduce, the
>> problem is mainly due to no ability to cross-communicate between
>> mappers or reducers.
>>
>> one naive approach i've seen mentioned here and elsewhere, is to use a
>> database to store data for use by all the mappers. however, i have
>> seen many arguments (that i agree with largely) against this approach.
>>
>> in general, my question is this: has anyone tried to implement an
>> algorithm using mapreduce where mappers required cross-communications?
>> how did you solve this limitation of mapreduce?
>>
>> thanks,
>>
>> jane.
>>
>
>
>
> --
> Bertrand Dechoux

Reply via email to