I heard about Mahout address the machine learning portion that but haven't look at any detail yet.
I am looking more from the theory side (how to transform the algorithm into the Map/Reduce form) and not necessary an implementation. Rgds, Ricky -----Original Message----- From: Lukáš Vlček [mailto:[email protected]] Sent: Sunday, February 01, 2009 10:29 PM To: [email protected] Subject: Re: Finding longest path in a graph Ricky, Are you aware of Mahout project? http://lucene.apache.org/mahout/ I think this project tries to addess some of the algorithms you have mentioned. Regards, Lukas 2009/2/2 Ricky Ho <[email protected]> > Yes, Map/Reduce model itself is simple. But transforming a non-trivial > algorithm into Map/Reduce is not simple. > > I wonder if there is any effort from the academia to look into build a > catalog of how each of our familiar algorithms can be transformed into > Map/Reduce, such as ... > 1) Sorting > 2) Searching (index tree, hash, geo/spacial search) > 3) Network/Graph processing (min spanning tree, shortest path, network > diameter) > 4) Computational Geometry (Ray tracing, Convex Hull) > 5) Optimization problem (Maximum flow, Linear programming, Hill climbing) > 6) Machine learning (logistic regression, nearest neighbor, cluster, > Bayesian classification, decision tree, neural network) > > It is also important to recognize there are other models in our tool box > (besides map/reduce) for parallelizing an algorithm, such as ... > a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace) > b) Dependency graph / Data flow programming (e.g. Dryad) > c) MPI > d) Multi-thread / Shared memory model > e) ... anything else ... > > Rgds, > Ricky > > -----Original Message----- > From: Lukáš Vlček [mailto:[email protected]] > Sent: Sunday, February 01, 2009 11:37 AM > To: [email protected] > Subject: Re: Finding longest path in a graph > > Hi, > just my 2 cents. > You are right that MapReduce does not fit to all problems. Expecially, when > it comes to processing graphs. On the other hand I think even in Google > they > stick with MapReduce for complex graph processing (and they do it in Yahoo! > as well). I had a chance to talk to one Google engineer and I asked him > exactly this question ("Do you use MapReduce even if it is known that > specific problems can be solved with different architecture in a more > efficient way?"). And the answer was "yes". I don't know if Google > engineers > are using different architectures - and I would be surprised if not - but > they probably don't use it at the same scale as they do with MapReduce. > There are several good reasons for this: MapReduce is easy to learn (which > means that even fresh interns can use it very quickly - also in not > efficient way), they probably have very nice visual tools for managing > MapReduce clusters and finally they have a lot of HW resources. > > BTW: Andrzej, do you consider contributing your graph processing utilis > into > Hadoop or Mahout? > > Regards, > Lukas > > On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <[email protected]> > wrote: > > > Andrzej, > > without deeper understanding of exactly what you are doing, I have a gut > > feeling that a different distributed system might be a better fit for > this > > specific task. I assume, you are dealing with very large graphs if you > are > > using Hadoop, and you want grid processing. But the linear nature of > > Map/Reduce may make it hard to fit. As the MapReduce paper said, not > every > > task can be easily expressed this way. > > > > The other technology I mean is JavaSpaces, of which I usually use the > > GigaSpaces implementation. This allows more complex algorithms. You will > > store your complete graph as an appropriate structure in a JavaSpace, and > > will also restructure it for parallel processing, as outlined in some > > JavaSpaces books. Then you can have as many workers as you want, working > on > > individual nodes. > > > > Mark > > > > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <[email protected]> > wrote: > > > > > Hi, > > > > > > I'm looking for an advice. I need to process a directed graph encoded > as > > a > > > list of <from, to> pairs. The goal is to compute a list of longest > paths > > in > > > the graph. There is no guarantee that the graph is acyclic, so there > > should > > > be some mechanism to detect cycles. > > > > > > Currently I'm using a simple approach consisting of the following: I > > encode > > > the graph as <vertex, <neighbor, direction, distance>>, and extending > the > > > paths by one degree at a time. This means that in order to find the > > longest > > > path of degree N it takes N + 1 map-reduce jobs. > > > > > > Are you perhaps aware of a smarter way to do it? I would appreciate any > > > pointers. > > > > > > -- > > > Best regards, > > > Andrzej Bialecki <>< > > > ___. ___ ___ ___ _ _ __________________________________ > > > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > > > ___|||__|| \| || | Embedded Unix, System Integration > > > http://www.sigram.com Contact: info at sigram dot com > > > > > > > > > > > > -- > http://blog.lukas-vlcek.com/ > -- http://blog.lukas-vlcek.com/
