Did you look at the paper which is a base for Mahout implementation?
http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf

I think it dicusses some theoretical aspects of such transformation.

Lukas

2009/2/2 Ricky Ho <[email protected]>

> I heard about Mahout address the machine learning portion that but haven't
> look at any detail yet.
>
> I am looking more from the theory side (how to transform the algorithm into
> the Map/Reduce form) and not necessary an implementation.
>
> Rgds,
> Ricky
> -----Original Message-----
> From: Lukáš Vlček [mailto:[email protected]]
> Sent: Sunday, February 01, 2009 10:29 PM
> To: [email protected]
> Subject: Re: Finding longest path in a graph
>
> Ricky,
> Are you aware of Mahout project?
> http://lucene.apache.org/mahout/
>
> I think this project tries to addess some of the algorithms you have
> mentioned.
>
> Regards,
> Lukas
>
> 2009/2/2 Ricky Ho <[email protected]>
>
> > Yes, Map/Reduce model itself is simple.  But transforming a non-trivial
> > algorithm into Map/Reduce is not simple.
> >
> > I wonder if there is any effort from the academia to look into build a
> > catalog of how each of our familiar algorithms can be transformed into
> > Map/Reduce, such as ...
> > 1) Sorting
> > 2) Searching (index tree, hash, geo/spacial search)
> > 3) Network/Graph processing (min spanning tree, shortest path, network
> > diameter)
> > 4) Computational Geometry (Ray tracing, Convex Hull)
> > 5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
> > 6) Machine learning (logistic regression, nearest neighbor, cluster,
> > Bayesian classification, decision tree, neural network)
> >
> > It is also important to recognize there are other models in our tool box
> > (besides map/reduce) for parallelizing an algorithm, such as ...
> > a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
> > b) Dependency graph / Data flow programming (e.g. Dryad)
> > c) MPI
> > d) Multi-thread / Shared memory model
> > e) ... anything else ...
> >
> > Rgds,
> > Ricky
> >
> > -----Original Message-----
> > From: Lukáš Vlček [mailto:[email protected]]
> > Sent: Sunday, February 01, 2009 11:37 AM
> > To: [email protected]
> > Subject: Re: Finding longest path in a graph
> >
> > Hi,
> > just my 2 cents.
> > You are right that MapReduce does not fit to all problems. Expecially,
> when
> > it comes to processing graphs. On the other hand I think even in Google
> > they
> > stick with MapReduce for complex graph processing (and they do it in
> Yahoo!
> > as well). I had a chance to talk to one Google engineer and I asked him
> > exactly this question ("Do you use MapReduce even if it is known that
> > specific problems can be solved with different architecture in a more
> > efficient way?"). And the answer was "yes". I don't know if Google
> > engineers
> > are using different architectures - and I would be surprised if not - but
> > they probably don't use it at the same scale as they do with MapReduce.
> > There are several good reasons for this: MapReduce is easy to learn
> (which
> > means that even fresh interns can use it very quickly - also in not
> > efficient way), they probably have very nice visual tools for managing
> > MapReduce clusters and finally they have a lot of HW resources.
> >
> > BTW: Andrzej, do you consider contributing your graph processing utilis
> > into
> > Hadoop or Mahout?
> >
> > Regards,
> > Lukas
> >
> > On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <[email protected]>
> > wrote:
> >
> > > Andrzej,
> > > without deeper understanding of exactly what you are doing, I have a
> gut
> > > feeling that a different distributed system might be a better fit for
> > this
> > > specific task. I assume, you are dealing with very large graphs if you
> > are
> > > using Hadoop, and you want grid processing. But the linear nature of
> > > Map/Reduce may make it hard to fit. As the MapReduce paper said, not
> > every
> > > task can be easily expressed this way.
> > >
> > > The other technology I mean is JavaSpaces, of which I usually use the
> > > GigaSpaces implementation. This allows more complex algorithms. You
> will
> > > store your complete graph as an appropriate structure in a JavaSpace,
> and
> > > will also restructure it for parallel processing, as outlined in some
> > > JavaSpaces books. Then you can have as many workers as you want,
> working
> > on
> > > individual nodes.
> > >
> > > Mark
> > >
> > > On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <[email protected]>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm looking for an advice. I need to process a directed graph encoded
> > as
> > > a
> > > > list of <from, to> pairs. The goal is to compute a list of longest
> > paths
> > > in
> > > > the graph. There is no guarantee that the graph is acyclic, so there
> > > should
> > > > be some mechanism to detect cycles.
> > > >
> > > > Currently I'm using a simple approach consisting of the following: I
> > > encode
> > > > the graph as <vertex, <neighbor, direction, distance>>, and extending
> > the
> > > > paths by one degree at a time. This means that in order to find the
> > > longest
> > > > path of degree N it takes N + 1 map-reduce jobs.
> > > >
> > > > Are you perhaps aware of a smarter way to do it? I would appreciate
> any
> > > > pointers.
> > > >
> > > > --
> > > > Best regards,
> > > > Andrzej Bialecki     <><
> > > >  ___. ___ ___ ___ _ _   __________________________________
> > > > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > > http://www.sigram.com  Contact: info at sigram dot com
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > http://blog.lukas-vlcek.com/
> >
>
>
>
> --
> http://blog.lukas-vlcek.com/
>



-- 
http://blog.lukas-vlcek.com/

Reply via email to