I'm learning about the Map Reduce computation frequently used with big data. For the fun of it, I decided to write a very high-level spec of Map Reduce. Here is what I came up with. Enjoy.
John > module MapReduce where > import Data.List (nub) A high-level specification of Map Reduce as a Haskell program. The program uses lists to represent multisets. As multisets have no implied ordering, the ordering implied by lists in this specification should be ignored. The database is a multiset of key-value pairs. > type Key = String > type Value = String > type Datum = (Key, Value) > type Data = [Datum] A mapper maps a datum to a finite multiset of key-value pairs. > type Mapper = Datum -> Data A reducer takes a key and a multiset of values and produces a finite multiset of values. > type Reducer = (Key, [Value]) -> [Value] A step is a mapper followed by a reducer > type Step = (Mapper, Reducer) A program is a finite sequence of steps > type Program = [Step] The semantics of a program is provided by the run function. > run :: Program -> Data -> Data > run [] d = d > run (s : p) d = > run p (step s d) The three parts of a step are mapping, shuffling, and reducing. > step :: Step -> Data -> Data > step (m, r) d = > let mapped = transform m d > shuffled = shuffle mapped in > reduce r shuffled The first part of a step is to transform the data by applying the mapper to each datum and collecting the results. > transform :: Mapper -> Data -> Data > transform m d = > [p | u <- d, p <- m u] Next, values with common keys are collected. Keys are unique after shuffling. > shuffle :: Data -> [(Key, [Value])] > shuffle d = > [(k, vs) | k <- nub (map fst d), -- nub eliminates duplicate keys > let vs = [v | (k', v) <- d, k' == k]] A reducer is applied to the data associated with one key, and always produces data with that key. > reduce :: Reducer -> [(Key, [Value])] -> Data > reduce r rs = > [(k, v) | (k, vs) <- rs, v <- r (k, vs)]
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe