'hamake' on github looks like a handy tool as well- haven't used it. It does the old unix 'make' timestamp dependency trick on the input&output file sets, to decide which jobs to run in sequence. And possibly in parallel.
Lance On Wed, Sep 1, 2010 at 12:27 PM, James Seigel <ja...@tynt.com> wrote: > Sounds good! Please give some examples :) > > I just got back from some holidays and will start posting some more stuff > shortly > > Cheers > James. > > > On 2010-07-21, at 7:22 PM, Jeff Zhang wrote: > >> Cool, James. I am very interested to contribute to this. >> I think group by, join and order by can been added to the examples. >> >> >> On Thu, Jul 22, 2010 at 4:59 AM, James Seigel <ja...@tynt.com> wrote: >> >>> Oh yeah, it would help if I put the url: >>> >>> http://github.com/seigel/MRPatterns >>> >>> James >>> >>> On 2010-07-21, at 2:55 PM, James Seigel wrote: >>> >>>> Here is a skeleton project I stuffed up on github (feel free to offer >>> other suggestions/alternatives). There is a wiki, a place to commit code, a >>> place to fork around, etc.. >>>> >>>> Over the next couple of days I’ll try and put up some sample samples for >>> people to poke around with. Feel free to attack the wiki, contribute code, >>> etc... >>>> >>>> If anyone can derive some cool pseudo code to write map reduce type >>> algorithms that’d be great. >>>> >>>> Cheers >>>> James. >>>> >>>> >>>> On 2010-07-21, at 10:51 AM, James Seigel wrote: >>>> >>>>> Jeff, I agree that cascading looks cool and might/should have a place in >>> everyone’s tool box, however at some corps it takes a while to get those >>> kinds of changes in place and therefore they might have to hand craft some >>> java code before moving (if they ever can) to a different technology. >>>>> >>>>> I will get something up and going and post a link back for whomever is >>> interested. >>>>> >>>>> To answer Himanshu’s question, I am thinking something like this (with >>> some code): >>>>> >>>>> Hadoop M/R Patterns, and ones that match Pig Structures >>>>> >>>>> 1. COUNT: [Mapper] Spit out one key and the value of 1. [Combiner] Same >>> as reducer. [Reducer] count = count + next.value. [Emit] Single result. >>>>> 2. FREQ COUNT: [Mapper] Item, 1. [Combiner] Same as reducer. [Reducer] >>> count = count + next.value. [Emit] list of Key, count >>>>> 3. UNIQUE: [Mapper] Item, One. [Combiner] None. [Reducer + Emit] spit >>> out list of keys and no value. >>>>> >>>>> I think adding a description of why the technique works would be helpful >>> for people learning as well. I see some questions from people not >>> understanding what happens to the data between mappers and reducers, or what >>> data they will see when it gets to the reducer...etc... >>>>> >>>>> Cheers >>>>> James. >>>>> >>>> >>> >>> >> >> >> -- >> Best Regards >> >> Jeff Zhang > > -- Lance Norskog goks...@gmail.com