Hi,
just my 2 cents.
You are right that MapReduce does not fit to all problems. Expecially, when
it comes to processing graphs. On the other hand I think even in Google they
stick with MapReduce for complex graph processing (and they do it in Yahoo!
as well). I had a chance to talk to one Google engineer and I asked him
exactly this question ("Do you use MapReduce even if it is known that
specific problems can be solved with different architecture in a more
efficient way?"). And the answer was "yes". I don't know if Google engineers
are using different architectures - and I would be surprised if not - but
they probably don't use it at the same scale as they do with MapReduce.
There are several good reasons for this: MapReduce is easy to learn (which
means that even fresh interns can use it very quickly - also in not
efficient way), they probably have very nice visual tools for managing
MapReduce clusters and finally they have a lot of HW resources.BTW: Andrzej, do you consider contributing your graph processing utilis into Hadoop or Mahout? Regards, Lukas On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <[email protected]> wrote: > Andrzej, > without deeper understanding of exactly what you are doing, I have a gut > feeling that a different distributed system might be a better fit for this > specific task. I assume, you are dealing with very large graphs if you are > using Hadoop, and you want grid processing. But the linear nature of > Map/Reduce may make it hard to fit. As the MapReduce paper said, not every > task can be easily expressed this way. > > The other technology I mean is JavaSpaces, of which I usually use the > GigaSpaces implementation. This allows more complex algorithms. You will > store your complete graph as an appropriate structure in a JavaSpace, and > will also restructure it for parallel processing, as outlined in some > JavaSpaces books. Then you can have as many workers as you want, working on > individual nodes. > > Mark > > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <[email protected]> wrote: > > > Hi, > > > > I'm looking for an advice. I need to process a directed graph encoded as > a > > list of <from, to> pairs. The goal is to compute a list of longest paths > in > > the graph. There is no guarantee that the graph is acyclic, so there > should > > be some mechanism to detect cycles. > > > > Currently I'm using a simple approach consisting of the following: I > encode > > the graph as <vertex, <neighbor, direction, distance>>, and extending the > > paths by one degree at a time. This means that in order to find the > longest > > path of degree N it takes N + 1 map-reduce jobs. > > > > Are you perhaps aware of a smarter way to do it? I would appreciate any > > pointers. > > > > -- > > Best regards, > > Andrzej Bialecki <>< > > ___. ___ ___ ___ _ _ __________________________________ > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > ___|||__|| \| || | Embedded Unix, System Integration > > http://www.sigram.com Contact: info at sigram dot com > > > > > -- http://blog.lukas-vlcek.com/
