Did you look at the paper which is a base for Mahout implementation? http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf
I think it dicusses some theoretical aspects of such transformation. Lukas 2009/2/2 Ricky Ho <[email protected]> > I heard about Mahout address the machine learning portion that but haven't > look at any detail yet. > > I am looking more from the theory side (how to transform the algorithm into > the Map/Reduce form) and not necessary an implementation. > > Rgds, > Ricky > -----Original Message----- > From: Lukáš Vlček [mailto:[email protected]] > Sent: Sunday, February 01, 2009 10:29 PM > To: [email protected] > Subject: Re: Finding longest path in a graph > > Ricky, > Are you aware of Mahout project? > http://lucene.apache.org/mahout/ > > I think this project tries to addess some of the algorithms you have > mentioned. > > Regards, > Lukas > > 2009/2/2 Ricky Ho <[email protected]> > > > Yes, Map/Reduce model itself is simple. But transforming a non-trivial > > algorithm into Map/Reduce is not simple. > > > > I wonder if there is any effort from the academia to look into build a > > catalog of how each of our familiar algorithms can be transformed into > > Map/Reduce, such as ... > > 1) Sorting > > 2) Searching (index tree, hash, geo/spacial search) > > 3) Network/Graph processing (min spanning tree, shortest path, network > > diameter) > > 4) Computational Geometry (Ray tracing, Convex Hull) > > 5) Optimization problem (Maximum flow, Linear programming, Hill climbing) > > 6) Machine learning (logistic regression, nearest neighbor, cluster, > > Bayesian classification, decision tree, neural network) > > > > It is also important to recognize there are other models in our tool box > > (besides map/reduce) for parallelizing an algorithm, such as ... > > a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace) > > b) Dependency graph / Data flow programming (e.g. Dryad) > > c) MPI > > d) Multi-thread / Shared memory model > > e) ... anything else ... > > > > Rgds, > > Ricky > > > > -----Original Message----- > > From: Lukáš Vlček [mailto:[email protected]] > > Sent: Sunday, February 01, 2009 11:37 AM > > To: [email protected] > > Subject: Re: Finding longest path in a graph > > > > Hi, > > just my 2 cents. > > You are right that MapReduce does not fit to all problems. Expecially, > when > > it comes to processing graphs. On the other hand I think even in Google > > they > > stick with MapReduce for complex graph processing (and they do it in > Yahoo! > > as well). I had a chance to talk to one Google engineer and I asked him > > exactly this question ("Do you use MapReduce even if it is known that > > specific problems can be solved with different architecture in a more > > efficient way?"). And the answer was "yes". I don't know if Google > > engineers > > are using different architectures - and I would be surprised if not - but > > they probably don't use it at the same scale as they do with MapReduce. > > There are several good reasons for this: MapReduce is easy to learn > (which > > means that even fresh interns can use it very quickly - also in not > > efficient way), they probably have very nice visual tools for managing > > MapReduce clusters and finally they have a lot of HW resources. > > > > BTW: Andrzej, do you consider contributing your graph processing utilis > > into > > Hadoop or Mahout? > > > > Regards, > > Lukas > > > > On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <[email protected]> > > wrote: > > > > > Andrzej, > > > without deeper understanding of exactly what you are doing, I have a > gut > > > feeling that a different distributed system might be a better fit for > > this > > > specific task. I assume, you are dealing with very large graphs if you > > are > > > using Hadoop, and you want grid processing. But the linear nature of > > > Map/Reduce may make it hard to fit. As the MapReduce paper said, not > > every > > > task can be easily expressed this way. > > > > > > The other technology I mean is JavaSpaces, of which I usually use the > > > GigaSpaces implementation. This allows more complex algorithms. You > will > > > store your complete graph as an appropriate structure in a JavaSpace, > and > > > will also restructure it for parallel processing, as outlined in some > > > JavaSpaces books. Then you can have as many workers as you want, > working > > on > > > individual nodes. > > > > > > Mark > > > > > > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <[email protected]> > > wrote: > > > > > > > Hi, > > > > > > > > I'm looking for an advice. I need to process a directed graph encoded > > as > > > a > > > > list of <from, to> pairs. The goal is to compute a list of longest > > paths > > > in > > > > the graph. There is no guarantee that the graph is acyclic, so there > > > should > > > > be some mechanism to detect cycles. > > > > > > > > Currently I'm using a simple approach consisting of the following: I > > > encode > > > > the graph as <vertex, <neighbor, direction, distance>>, and extending > > the > > > > paths by one degree at a time. This means that in order to find the > > > longest > > > > path of degree N it takes N + 1 map-reduce jobs. > > > > > > > > Are you perhaps aware of a smarter way to do it? I would appreciate > any > > > > pointers. > > > > > > > > -- > > > > Best regards, > > > > Andrzej Bialecki <>< > > > > ___. ___ ___ ___ _ _ __________________________________ > > > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > > > ___|||__|| \| || | Embedded Unix, System Integration > > > > http://www.sigram.com Contact: info at sigram dot com > > > > > > > > > > > > > > > > > > > -- > > http://blog.lukas-vlcek.com/ > > > > > > -- > http://blog.lukas-vlcek.com/ > -- http://blog.lukas-vlcek.com/
