Good point, then at the very least it should be called Map-Sort-Reduce,
where all of the phases are optional.  :)

- Doug

On 1/24/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

Doug Judd wrote:
> Part of the problem is that calling the paradigm "Map-Reduce" is
somewhat
> misleading.  It is really just a distributed sort.  The sort is where
> all of
> the complexity comes from.  Invoking map() over the input is O(n),
> invoking
> reduce() over the intermediate results is O(n) as well.  The sort is
> O(nlogn).  A more appropriate name for this algorithm would be
> "Distributed
> Sort with a Pre-map Phase and a Post-reduce Phase"  Calling it
Map-Reduce
> and leaving out the word "sort" (the most important part) is a source of
> confusion.
>
> If you think of it in these terms, I think it's easier to see where
> and how
> it applies.

:) Sure, that's one point of view on this - however, in quite a few
applications sort is definitely less important than the ability to split
the processing load in map() and reduce() over many machines. Sometimes
I don't care about the sorting at all (in all cases where
IdentityReducer is used).

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Reply via email to