Here is a skeleton project I stuffed up on github (feel free to offer other suggestions/alternatives). There is a wiki, a place to commit code, a place to fork around, etc..
Over the next couple of days I’ll try and put up some sample samples for people to poke around with. Feel free to attack the wiki, contribute code, etc... If anyone can derive some cool pseudo code to write map reduce type algorithms that’d be great. Cheers James. On 2010-07-21, at 10:51 AM, James Seigel wrote: > Jeff, I agree that cascading looks cool and might/should have a place in > everyone’s tool box, however at some corps it takes a while to get those > kinds of changes in place and therefore they might have to hand craft some > java code before moving (if they ever can) to a different technology. > > I will get something up and going and post a link back for whomever is > interested. > > To answer Himanshu’s question, I am thinking something like this (with some > code): > > Hadoop M/R Patterns, and ones that match Pig Structures > > 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same as > reducer. [Reducer] count = count + next.value. [Emit] Single result. > 2. FREQ COUNT: [Mapper] Item, 1. [Combiner] Same as reducer. [Reducer] count > = count + next.value. [Emit] list of Key, count > 3. UNIQUE: [Mapper] Item, One. [Combiner] None. [Reducer + Emit] spit out > list of keys and no value. > > I think adding a description of why the technique works would be helpful for > people learning as well. I see some questions from people not understanding > what happens to the data between mappers and reducers, or what data they will > see when it gets to the reducer...etc... > > Cheers > James. >