Re: What do people use Hadoop for?

Andrzej Bialecki Wed, 24 Jan 2007 11:04:18 -0800

Doug Judd wrote:

Part of the problem is that calling the paradigm "Map-Reduce" is somewhat
misleading. It is really just a distributed sort. The sort is whereall ofthe complexity comes from. Invoking map() over the input is O(n),invoking
reduce() over the intermediate results is O(n) as well.  The sort is
O(nlogn). A more appropriate name for this algorithm would be"Distributed
Sort with a Pre-map Phase and a Post-reduce Phase"  Calling it Map-Reduce
and leaving out the word "sort" (the most important part) is a source of
confusion.
If you think of it in these terms, I think it's easier to see whereand how
it applies.

:) Sure, that's one point of view on this - however, in quite a fewapplications sort is definitely less important than the ability to splitthe processing load in map() and reduce() over many machines. SometimesI don't care about the sorting at all (in all cases whereIdentityReducer is used).


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: What do people use Hadoop for?

Reply via email to