The new API ChainMapper/ChainReducer came into the 0.21 release and are available in 0.22 and 0.23 presently, but not in 0.20.x/1.x releases.
You can grab a patch from https://issues.apache.org/jira/browse/MAPREDUCE-372 though. Or perhaps reopen https://issues.apache.org/jira/browse/MAPREDUCE-3673 with a backport patch as https://issues.apache.org/jira/browse/MAPREDUCE-3607 didn't cover this one (was not demanded/provided) - if you need a future apache stable release cut to carry it. I'll be happy to review and commit it in for you. On Thu, Feb 9, 2012 at 7:49 PM, Wellington Chevreuil <wellington.chevre...@gmail.com> wrote: > Hi Harsh, > > I had noticed that this ChainMapper belongs to the old version package > (org.apache.hadoop.mapred instead of org.apache.hadoop.mapreduce). > Although it takes generic Class types as it's method argument, is this > class able to work with Mappers from the new version package > (org.apache.hadoop.mapreduce)? > > Thanks, > Wellington. > > 2012/2/9 Harsh J <ha...@cloudera.com>: >> Vamshi, >> >> What problem are you exactly trying to solve by trying to attempt >> this? If you are only interested in records being streamed from one >> mapper into another, why can't it be chained together? Remember that >> map-only jobs do not sort their data output -- so I still see no >> benefit here in consuming record-by-record from a whole new task when >> it could be done from the very same. >> >> Btw, ChainMapper is an API abstraction to run several mapper >> implementations in sequence (chain) for each record input and >> transform them all along (helpful if you have several utility mappers >> and want to build composites). It does not touch disk. >> >> On Thu, Feb 9, 2012 at 12:15 PM, Vamshi Krishna <vamshi2...@gmail.com> wrote: >>> thank you harsh for your reply. Here what chainMapper does is, once the >>> first mapper finishes, then only second map starts using that file written >>> by first mapper. Its just like chain. But what i want is like pipelining i.e >>> after first map starts and before it finishes only second map has to start >>> and kepp on reading from the same file that is being written by first map. >>> It is almost like produce-consumer like scenario, where first map writes in >>> to the file, and second map keeps on reading the same file. So that >>> pipelining effect is seen between two maps. >>> Hope you got what i am trying to tell.. >>> >>> please help.. >>> >>> >>> On Wed, Feb 8, 2012 at 12:47 PM, Harsh J <ha...@cloudera.com> wrote: >>>> >>>> Vamsi, >>>> >>>> Is it not possible to express your M-M-R phase chain as a simple, single >>>> M-R? >>>> >>>> Perhaps look at the ChainMapper class @ >>>> >>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html >>>> >>>> On Wed, Feb 8, 2012 at 12:28 PM, Vamshi Krishna <vamshi2...@gmail.com> >>>> wrote: >>>> > Hi all >>>> > i have an important question about mapreduce. >>>> > i have 2 hadoop mapreduce jobs. job1 has only mapper but no reducer. >>>> > Job1 >>>> > started and in its map() it is writing to a "file1" using >>>> > context(Arg1,Arg2). If i wanted to start job2 (immidietly after job1) , >>>> > which should take the "file1" (output still being written by above job's >>>> > map >>>> > phase) as input and do processing in its own map/reduce phases, and job2 >>>> > should keep on taking the newly written data to "file1" , untill job1 >>>> > finishes, what i should do? >>>> > >>>> > how can i do that, Please can anybody help? >>>> > >>>> > -- >>>> > Regards >>>> > >>>> > Vamshi Krishna >>>> > >>>> >>>> >>>> >>>> -- >>>> Harsh J >>>> Customer Ops. Engineer >>>> Cloudera | http://tiny.cloudera.com/about >>> >>> >>> >>> >>> -- >>> Regards >>> >>> Vamshi Krishna >>> >> >> >> >> -- >> Harsh J >> Customer Ops. Engineer >> Cloudera | http://tiny.cloudera.com/about -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about