Re: MapReduce as database replacement

Andrew Lentvorski Fri, 18 Jan 2008 15:53:57 -0800

Gus Wirth wrote:

Normally I am loath to link stuff from Slashdot, but the article onMapReduce<http://www.databasecolumn.com/2008/01/mapreduce-a-major-step-back.html>caught my attention because I had just read an article in"Communications of the ACM" January 2008, pages 107-113 by Jeffrey Deanand Sanjay Ghemawat on just this subject.

I agree. It is the desperate ramblings of RDBMS guys attempting to holdonto slipping mindshare.

The problem is that there are good points to be made against MapReduce,and they miss them all.

MapReduce has few guarantees. It gets *most* of the records *most* ofthe time *normally* with an acceptable performance. MapReduce worksvery well over data which has an unknown structure or when you arehunting through a known structure in a new way. Thus, MapReduce is goodfor data mining. MapReduce scales *extremely* well.

The moment you need a guarantee, MapReduce falls over. There is noguarantee MapReduce will find a particular record. There is noguarantee MapReduce will not *lose* a record. There is no guaranteeMapReduce will return in a reasonable time. And MapReduce eatsbandwidth and storage for breakfast, lunch, and dinner.

I'm betting this is one of the reasons GMail sucked for so long. Theyprobably threw the mail store into the MapReduce cluster. Well, that'snice, but they probably needed to replicate it *way* too much forend-user guarantees. I'm betting that Gmail is now off the MapReducecluster for functionality and just copies mail messages into theMapReduce cluster for searching and mining.

What I would really like to see is MapReduce folded into peer-to-peerlike BitTorrent. The problem there is inter-node bandwidth.


-a


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Re: MapReduce as database replacement

Reply via email to