Jeff, I agree that cascading looks cool and might/should have a place in everyone’s tool box, however at some corps it takes a while to get those kinds of changes in place and therefore they might have to hand craft some java code before moving (if they ever can) to a different technology.
I will get something up and going and post a link back for whomever is interested. To answer Himanshu’s question, I am thinking something like this (with some code): Hadoop M/R Patterns, and ones that match Pig Structures 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same as reducer. [Reducer] count = count + next.value. [Emit] Single result. 2. FREQ COUNT: [Mapper] Item, 1. [Combiner] Same as reducer. [Reducer] count = count + next.value. [Emit] list of Key, count 3. UNIQUE: [Mapper] Item, One. [Combiner] None. [Reducer + Emit] spit out list of keys and no value. I think adding a description of why the technique works would be helpful for people learning as well. I see some questions from people not understanding what happens to the data between mappers and reducers, or what data they will see when it gets to the reducer...etc... Cheers James.