Doug Judd wrote:
Part of the problem is that calling the paradigm "Map-Reduce" is somewhat
misleading. It is really just a distributed sort. The sort is where
all of
the complexity comes from. Invoking map() over the input is O(n),
invoking
reduce() over the intermediate results is O(n) as well. The sort is
O(nlogn). A more appropriate name for this algorithm would be
"Distributed
Sort with a Pre-map Phase and a Post-reduce Phase" Calling it Map-Reduce
and leaving out the word "sort" (the most important part) is a source of
confusion.
If you think of it in these terms, I think it's easier to see where
and how
it applies.
:) Sure, that's one point of view on this - however, in quite a few
applications sort is definitely less important than the ability to split
the processing load in map() and reduce() over many machines. Sometimes
I don't care about the sorting at all (in all cases where
IdentityReducer is used).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com