Oh yeah, it would help if I put the url: http://github.com/seigel/MRPatterns
James On 2010-07-21, at 2:55 PM, James Seigel wrote: > Here is a skeleton project I stuffed up on github (feel free to offer other > suggestions/alternatives). There is a wiki, a place to commit code, a place > to fork around, etc.. > > Over the next couple of days I’ll try and put up some sample samples for > people to poke around with. Feel free to attack the wiki, contribute code, > etc... > > If anyone can derive some cool pseudo code to write map reduce type > algorithms that’d be great. > > Cheers > James. > > > On 2010-07-21, at 10:51 AM, James Seigel wrote: > >> Jeff, I agree that cascading looks cool and might/should have a place in >> everyone’s tool box, however at some corps it takes a while to get those >> kinds of changes in place and therefore they might have to hand craft some >> java code before moving (if they ever can) to a different technology. >> >> I will get something up and going and post a link back for whomever is >> interested. >> >> To answer Himanshu’s question, I am thinking something like this (with some >> code): >> >> Hadoop M/R Patterns, and ones that match Pig Structures >> >> 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same as >> reducer. [Reducer] count = count + next.value. [Emit] Single result. >> 2. FREQ COUNT: [Mapper] Item, 1. [Combiner] Same as reducer. [Reducer] >> count = count + next.value. [Emit] list of Key, count >> 3. UNIQUE: [Mapper] Item, One. [Combiner] None. [Reducer + Emit] spit out >> list of keys and no value. >> >> I think adding a description of why the technique works would be helpful for >> people learning as well. I see some questions from people not understanding >> what happens to the data between mappers and reducers, or what data they >> will see when it gets to the reducer...etc... >> >> Cheers >> James. >> >