Jeff, I agree that cascading looks cool and might/should have a place in 
everyone’s tool box, however at some corps it takes a while to get those kinds 
of changes in place and therefore they might have to hand craft some java code 
before moving (if they ever can) to a different technology.

I will get something up and going and post a link back for whomever is 
interested.

To answer Himanshu’s question, I am thinking something like this (with some 
code):

Hadoop M/R Patterns, and ones that match Pig Structures

1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same as 
reducer. [Reducer] count = count + next.value.  [Emit] Single result.
2. FREQ COUNT: [Mapper] Item, 1.  [Combiner] Same as reducer. [Reducer] count = 
count + next.value.  [Emit] list of Key, count
3. UNIQUE: [Mapper] Item, One.  [Combiner] None.  [Reducer + Emit] spit out 
list of keys and no value.

I think adding a description of why the technique works would be helpful for 
people learning as well.  I see some questions from people not understanding 
what happens to the data between mappers and reducers, or what data they will 
see when it gets to the reducer...etc...

Cheers
James.

Reply via email to