I could be cheeky and point you to the book... http://manning.com/owen
But I can also give you an overview, which is kind of what you see from surveying the code. RecommenderJob runs everything. It kicks off 5 different mapreduces, in order. 1. ItemIDIndexMapper / ItemIDIndexReducer Since item IDs are longs, and vector indices are ints, we have to hash the longs to ints, but also remember the reverse mapping for later. That's all this does, write down the mapping. 2. ToItemPrefsMapper / ToUserVectorReducer This converts the file of preferences into proper Vectors. Here, there is one vector per user, and item IDs (hashed) are dimensions and preference values are dimension values. 3. UserVectorToCooccurrenceMapper / UserVectorToCooccurrenceReducer This is a somewhat complex step that does one thing -- counts co-occurrence. It counts the number of times item A and item B appeared in one user's preferences 4. CooccurrenceColumnWrapperMapper + UserVectorSplitterMapper / PartialMultiplyReducer This has two mappers which output one item's cooccurrences (one column of the co-occurrence matrix), and all user preferences for that item, in a clever way. The reducer multiplies those preference values by the co-occurrence column, and outputs the result vectors, keyed by user. These are part of the final recommendation vector for one user. 5. (IdentityMapper) / AggregateAndRecommendReducer This adds up the partial vectors to make the final recommendation vector for each user. The highest values are the recommended items. The item index is mapped back to item ID and recommendations are output. That's it at a very high level, we can discuss more as you look at the code. Sean