Doug Judd wrote:
Part of the problem is that calling the paradigm "Map-Reduce" is somewhat
misleading. It is really just a distributed sort. The sort is where all of the complexity comes from. Invoking map() over the input is O(n), invoking
reduce() over the intermediate results is O(n) as well.  The sort is
O(nlogn). A more appropriate name for this algorithm would be "Distributed
Sort with a Pre-map Phase and a Post-reduce Phase"  Calling it Map-Reduce
and leaving out the word "sort" (the most important part) is a source of
confusion.

If you think of it in these terms, I think it's easier to see where and how
it applies.

:) Sure, that's one point of view on this - however, in quite a few applications sort is definitely less important than the ability to split the processing load in map() and reduce() over many machines. Sometimes I don't care about the sorting at all (in all cases where IdentityReducer is used).

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to