Oh yeah, it would help if I put the url: 

http://github.com/seigel/MRPatterns

James

On 2010-07-21, at 2:55 PM, James Seigel wrote:

> Here is a skeleton project I stuffed up on github (feel free to offer other 
> suggestions/alternatives).  There is a wiki, a place to commit code, a place 
> to fork around, etc..
> 
> Over the next couple of days I’ll try and put up some sample samples for 
> people to poke around with.  Feel free to attack the wiki, contribute code, 
> etc...
> 
> If anyone can derive some cool pseudo code to write map reduce type 
> algorithms that’d be great.
> 
> Cheers
> James.
> 
> 
> On 2010-07-21, at 10:51 AM, James Seigel wrote:
> 
>> Jeff, I agree that cascading looks cool and might/should have a place in 
>> everyone’s tool box, however at some corps it takes a while to get those 
>> kinds of changes in place and therefore they might have to hand craft some 
>> java code before moving (if they ever can) to a different technology.
>> 
>> I will get something up and going and post a link back for whomever is 
>> interested.
>> 
>> To answer Himanshu’s question, I am thinking something like this (with some 
>> code):
>> 
>> Hadoop M/R Patterns, and ones that match Pig Structures
>> 
>> 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same as 
>> reducer. [Reducer] count = count + next.value.  [Emit] Single result.
>> 2. FREQ COUNT: [Mapper] Item, 1.  [Combiner] Same as reducer. [Reducer] 
>> count = count + next.value.  [Emit] list of Key, count
>> 3. UNIQUE: [Mapper] Item, One.  [Combiner] None.  [Reducer + Emit] spit out 
>> list of keys and no value.
>> 
>> I think adding a description of why the technique works would be helpful for 
>> people learning as well.  I see some questions from people not understanding 
>> what happens to the data between mappers and reducers, or what data they 
>> will see when it gets to the reducer...etc...
>> 
>> Cheers
>> James.
>> 
> 

Reply via email to