Yes, Map/Reduce model itself is simple.  But transforming a non-trivial 
algorithm into Map/Reduce is not simple.

I wonder if there is any effort from the academia to look into build a catalog 
of how each of our familiar algorithms can be transformed into Map/Reduce, such 
as ...
1) Sorting
2) Searching (index tree, hash, geo/spacial search)
3) Network/Graph processing (min spanning tree, shortest path, network diameter)
4) Computational Geometry (Ray tracing, Convex Hull)
5) Optimization problem (Maximum flow, Linear programming, Hill climbing)
6) Machine learning (logistic regression, nearest neighbor, cluster, Bayesian 
classification, decision tree, neural network)

It is also important to recognize there are other models in our tool box 
(besides map/reduce) for parallelizing an algorithm, such as ...
a) Blackboard architecture / Tuple space (like JavaSpace, Gigaspace)
b) Dependency graph / Data flow programming (e.g. Dryad)
c) MPI
d) Multi-thread / Shared memory model
e) ... anything else ...

Rgds,
Ricky

-----Original Message-----
From: Lukáš Vlček [mailto:[email protected]] 
Sent: Sunday, February 01, 2009 11:37 AM
To: [email protected]
Subject: Re: Finding longest path in a graph

Hi,
just my 2 cents.
You are right that MapReduce does not fit to all problems. Expecially, when
it comes to processing graphs. On the other hand I think even in Google they
stick with MapReduce for complex graph processing (and they do it in Yahoo!
as well). I had a chance to talk to one Google engineer and I asked him
exactly this question ("Do you use MapReduce even if it is known that
specific problems can be solved with different architecture in a more
efficient way?"). And the answer was "yes". I don't know if Google engineers
are using different architectures - and I would be surprised if not - but
they probably don't use it at the same scale as they do with MapReduce.
There are several good reasons for this: MapReduce is easy to learn (which
means that even fresh interns can use it very quickly - also in not
efficient way), they probably have very nice visual tools for managing
MapReduce clusters and finally they have a lot of HW resources.

BTW: Andrzej, do you consider contributing your graph processing utilis into
Hadoop or Mahout?

Regards,
Lukas

On Thu, Jan 29, 2009 at 6:26 PM, Mark Kerzner <[email protected]> wrote:

> Andrzej,
> without deeper understanding of exactly what you are doing, I have a gut
> feeling that a different distributed system might be a better fit for this
> specific task. I assume, you are dealing with very large graphs if you are
> using Hadoop, and you want grid processing. But the linear nature of
> Map/Reduce may make it hard to fit. As the MapReduce paper said, not every
> task can be easily expressed this way.
>
> The other technology I mean is JavaSpaces, of which I usually use the
> GigaSpaces implementation. This allows more complex algorithms. You will
> store your complete graph as an appropriate structure in a JavaSpace, and
> will also restructure it for parallel processing, as outlined in some
> JavaSpaces books. Then you can have as many workers as you want, working on
> individual nodes.
>
> Mark
>
> On Thu, Jan 29, 2009 at 11:20 AM, Andrzej Bialecki <[email protected]> wrote:
>
> > Hi,
> >
> > I'm looking for an advice. I need to process a directed graph encoded as
> a
> > list of <from, to> pairs. The goal is to compute a list of longest paths
> in
> > the graph. There is no guarantee that the graph is acyclic, so there
> should
> > be some mechanism to detect cycles.
> >
> > Currently I'm using a simple approach consisting of the following: I
> encode
> > the graph as <vertex, <neighbor, direction, distance>>, and extending the
> > paths by one degree at a time. This means that in order to find the
> longest
> > path of degree N it takes N + 1 map-reduce jobs.
> >
> > Are you perhaps aware of a smarter way to do it? I would appreciate any
> > pointers.
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >  ___. ___ ___ ___ _ _   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> >
>



-- 
http://blog.lukas-vlcek.com/

Reply via email to