my problem is more general (than graph problems) and doesn't need to have logic built around synchronization or failure. for example, when a mapper is finished successfully, it just writes/persists to a storage location (could be disk, could be database, could be memory, etc...). when the next input is processed (could be on the same mapper or different mapper), i just need to do a lookup from the storage location (that is accessible by all task nodes). if the mapper fails, this doesn't hurt my processing, although i would like for no failures (and it's good if hadoop can spawn another task to mitigate).
On Wed, Sep 26, 2012 at 11:43 AM, Bertrand Dechoux <decho...@gmail.com> wrote: > The difficulty with data transfer between tasks is handling synchronisation > and failure. > You may want to look at graph processing done on top of Hadoop (like > Giraph). > That's one way to do it but whether it is relevant or not to you will > depend on your context. > > Regards > > Bertrand > > On Wed, Sep 26, 2012 at 5:36 PM, Jane Wayne <jane.wayne2...@gmail.com>wrote: > >> hi, >> >> i know that some algorithms cannot be parallelized and adapted to the >> mapreduce paradigm. however, i have noticed that in most cases where i >> find myself struggling to express an algorithm in mapreduce, the >> problem is mainly due to no ability to cross-communicate between >> mappers or reducers. >> >> one naive approach i've seen mentioned here and elsewhere, is to use a >> database to store data for use by all the mappers. however, i have >> seen many arguments (that i agree with largely) against this approach. >> >> in general, my question is this: has anyone tried to implement an >> algorithm using mapreduce where mappers required cross-communications? >> how did you solve this limitation of mapreduce? >> >> thanks, >> >> jane. >> > > > > -- > Bertrand Dechoux