MapReduce as database replacement

Gus Wirth Fri, 18 Jan 2008 15:37:11 -0800

Normally I am loath to link stuff from Slashdot, but the article onMapReduce<http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html>caught my attention because I had just read an article in"Communications of the ACM" January 2008, pages 107-113 by Jeffrey Deanand Sanjay Ghemawat on just this subject.

There are several fallacies in the linked article that are cleverlyhidden. One of them that I see is the reference to indexes to supportqueries. Although an index can be helpful for repeated queries on thesame dataset, they neglect to say that building the index takes up timeand space, and for queries on ad-hoc datasets you will not have anyadvantage over a single pass through the dataset since you have to dothat anyway to build the index.

Perhaps the biggest problem with the linked article is their totalsilence on the question of what you do in the event of failure. The ACMarticle addresses this specifically and shows how they recover from bothfailed and even just slow nodes.

As an example of what MapReduce at Google (where the ACM article authorswork) can do, the ACM article shows doing a grep on 10^10 (10 billion)100-byte records looking for a relatively rare three-character pattern.The whole process is farmed out to 1800 nodes and completes in 180seconds from start to finish. They have some nice little graphs to goalong with it.


Gus


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

MapReduce as database replacement

Reply via email to