Re: strategies to share information between mapreduce tasks

Harsh J Wed, 26 Sep 2012 09:38:47 -0700

Apache Giraph is a framework for graph processing, currently runs over
"MR" (but is getting its own coordination via YARN soon):
http://giraph.apache.org.


You may also checkout the generic BSP system (Giraph uses BSP too, if
am not wrong, but doesn't use Hama - works over MR instead), Apache
Hama: http://hama.apache.org

On Wed, Sep 26, 2012 at 9:51 PM, Jane Wayne <jane.wayne2...@gmail.com> wrote:
> i'll look for myself, but could you please let me know what is giraph?
> is it another layer on hadoop like hive/pig or an api like mahout?
>
>
>
> On Wed, Sep 26, 2012 at 12:09 PM, Jonathan Bishop <jbishop....@gmail.com> 
> wrote:
>> Yes, Giraph seems like the best way to go - it is mainly a vertex
>> evaluation with message passing between vertices. Synchronization is
>> handled for you.
>>
>> On Wed, Sep 26, 2012 at 8:36 AM, Jane Wayne <jane.wayne2...@gmail.com>wrote:
>>
>>> hi,
>>>
>>> i know that some algorithms cannot be parallelized and adapted to the
>>> mapreduce paradigm. however, i have noticed that in most cases where i
>>> find myself struggling to express an algorithm in mapreduce, the
>>> problem is mainly due to no ability to cross-communicate between
>>> mappers or reducers.
>>>
>>> one naive approach i've seen mentioned here and elsewhere, is to use a
>>> database to store data for use by all the mappers. however, i have
>>> seen many arguments (that i agree with largely) against this approach.
>>>
>>> in general, my question is this: has anyone tried to implement an
>>> algorithm using mapreduce where mappers required cross-communications?
>>> how did you solve this limitation of mapreduce?
>>>
>>> thanks,
>>>
>>> jane.
>>>



-- 
Harsh J

Re: strategies to share information between mapreduce tasks

Reply via email to