Yes, Map/Reduce model itself is simple. But transforming a non-trivial algorithm into Map/Reduce is not simple.
I wonder if there is any effort from the academia to look into build a catalog of how each of our familiar algorithms can be transformed into Map/Reduce, such as ... 1) Sorting 2) Searching (index tree, hash, geo/spacial search) 3) Network/Graph processing (min spanning tree, shortest path, network diameter) 4) Computational Geometry (Ray tracing, Convex Hull) 5) Optimization problem (Maximum flow, Linear programming, Hill climbing) 6) Machine learning (logistic regression, nearest neighbor, cluster, Bayesian classification, decision tree, neural network) It is also important to recognize there are other models in our tool box (besides map/reduce) for parallelizing an algorithm, such as ... a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace) b) Dependency graph / Data flow programming (e.g. Dryad) c) MPI d) Multi-thread / Shared memory model e) ... anything else ... Rgds, Ricky -----Original Message----- From: Lukáš Vlček [mailto:[email protected]] Sent: Sunday, February 01, 2009 11:37 AM To: [email protected] Subject: Re: Finding longest path in a graph Hi, just my 2 cents. You are right that MapReduce does not fit to all problems. Expecially, when it comes to processing graphs. On the other hand I think even in Google they stick with MapReduce for complex graph processing (and they do it in Yahoo! as well). I had a chance to talk to one Google engineer and I asked him exactly this question ("Do you use MapReduce even if it is known that specific problems can be solved with different architecture in a more efficient way?"). And the answer was "yes". I don't know if Google engineers are using different architectures - and I would be surprised if not - but they probably don't use it at the same scale as they do with MapReduce. There are several good reasons for this: MapReduce is easy to learn (which means that even fresh interns can use it very quickly - also in not efficient way), they probably have very nice visual tools for managing MapReduce clusters and finally they have a lot of HW resources. BTW: Andrzej, do you consider contributing your graph processing utilis into Hadoop or Mahout? Regards, Lukas On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <[email protected]> wrote: > Andrzej, > without deeper understanding of exactly what you are doing, I have a gut > feeling that a different distributed system might be a better fit for this > specific task. I assume, you are dealing with very large graphs if you are > using Hadoop, and you want grid processing. But the linear nature of > Map/Reduce may make it hard to fit. As the MapReduce paper said, not every > task can be easily expressed this way. > > The other technology I mean is JavaSpaces, of which I usually use the > GigaSpaces implementation. This allows more complex algorithms. You will > store your complete graph as an appropriate structure in a JavaSpace, and > will also restructure it for parallel processing, as outlined in some > JavaSpaces books. Then you can have as many workers as you want, working on > individual nodes. > > Mark > > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <[email protected]> wrote: > > > Hi, > > > > I'm looking for an advice. I need to process a directed graph encoded as > a > > list of <from, to> pairs. The goal is to compute a list of longest paths > in > > the graph. There is no guarantee that the graph is acyclic, so there > should > > be some mechanism to detect cycles. > > > > Currently I'm using a simple approach consisting of the following: I > encode > > the graph as <vertex, <neighbor, direction, distance>>, and extending the > > paths by one degree at a time. This means that in order to find the > longest > > path of degree N it takes N + 1 map-reduce jobs. > > > > Are you perhaps aware of a smarter way to do it? I would appreciate any > > pointers. > > > > -- > > Best regards, > > Andrzej Bialecki <>< > > ___. ___ ___ ___ _ _ __________________________________ > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > ___|||__|| \| || | Embedded Unix, System Integration > > http://www.sigram.com Contact: info at sigram dot com > > > > > -- http://blog.lukas-vlcek.com/
