Good point, then at the very least it should be called Map-Sort-Reduce, where all of the phases are optional. :)
- Doug On 1/24/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Doug Judd wrote: > Part of the problem is that calling the paradigm "Map-Reduce" is somewhat > misleading. It is really just a distributed sort. The sort is where > all of > the complexity comes from. Invoking map() over the input is O(n), > invoking > reduce() over the intermediate results is O(n) as well. The sort is > O(nlogn). A more appropriate name for this algorithm would be > "Distributed > Sort with a Pre-map Phase and a Post-reduce Phase" Calling it Map-Reduce > and leaving out the word "sort" (the most important part) is a source of > confusion. > > If you think of it in these terms, I think it's easier to see where > and how > it applies. :) Sure, that's one point of view on this - however, in quite a few applications sort is definitely less important than the ability to split the processing load in map() and reduce() over many machines. Sometimes I don't care about the sorting at all (in all cases where IdentityReducer is used). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
