This looks right, but there is definitely a lot more to mapreduce implementations than algebraic signatures!
It might also be considered that there are lots of people using MapReduce technology on things other than "bare metal" Hadoop/MapReduce, etc.. Lots of data analysts, ML people, etc.., use Apache Pig, Hive, HBase, etc... One thing I've really been interested in seeing is a (presumably comonadic) interface to Apache Pig: many uses of Hadoop simply deal with dataflow-like programming. Having a (co?)monad where the values represent "types" of terms in the Pig language, that produces and runs Pig programs seems like a fun idea: I'd definitely try it out if anyone came up with it. But of course, don't forget about Cloud Haskell and the other related efforts! Kris On Thu, Apr 18, 2013 at 2:49 PM, John D. Ramsdell <ramsde...@gmail.com> wrote: > I'm learning about the Map Reduce computation frequently used with big data. > For the fun of it, I decided to write a very high-level spec of Map Reduce. > Here is what I came up with. Enjoy. > > John > >> module MapReduce where >> import Data.List (nub) > > A high-level specification of Map Reduce as a Haskell program. The > program uses lists to represent multisets. As multisets have no > implied ordering, the ordering implied by lists in this specification > should be ignored. > > The database is a multiset of key-value pairs. > >> type Key = String >> type Value = String >> type Datum = (Key, Value) >> type Data = [Datum] > > A mapper maps a datum to a finite multiset of key-value pairs. > >> type Mapper = Datum -> Data > > A reducer takes a key and a multiset of values and produces a finite > multiset of values. > >> type Reducer = (Key, [Value]) -> [Value] > > A step is a mapper followed by a reducer > >> type Step = (Mapper, Reducer) > > A program is a finite sequence of steps > >> type Program = [Step] > > The semantics of a program is provided by the run function. > >> run :: Program -> Data -> Data >> run [] d = d >> run (s : p) d = >> run p (step s d) > > The three parts of a step are mapping, shuffling, and reducing. > >> step :: Step -> Data -> Data >> step (m, r) d = >> let mapped = transform m d >> shuffled = shuffle mapped in >> reduce r shuffled > > The first part of a step is to transform the data by applying the > mapper to each datum and collecting the results. > >> transform :: Mapper -> Data -> Data >> transform m d = >> [p | u <- d, p <- m u] > > Next, values with common keys are collected. Keys are unique after > shuffling. > >> shuffle :: Data -> [(Key, [Value])] >> shuffle d = >> [(k, vs) | k <- nub (map fst d), -- nub eliminates duplicate keys >> let vs = [v | (k', v) <- d, k' == k]] > > A reducer is applied to the data associated with one key, and always > produces data with that key. > >> reduce :: Reducer -> [(Key, [Value])] -> Data >> reduce r rs = >> [(k, v) | (k, vs) <- rs, v <- r (k, vs)] > > > _______________________________________________ > Haskell-Cafe mailing list > Haskell-Cafe@haskell.org > http://www.haskell.org/mailman/listinfo/haskell-cafe > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe