Re: job taking input file, which "is being" written by its preceding job's map phase

Harsh J Thu, 09 Feb 2012 07:26:51 -0800

The new API ChainMapper/ChainReducer came into the 0.21 release and
are available in 0.22 and 0.23 presently, but not in 0.20.x/1.x
releases.


You can grab a patch from
https://issues.apache.org/jira/browse/MAPREDUCE-372 though. Or perhaps
reopen https://issues.apache.org/jira/browse/MAPREDUCE-3673 with a
backport patch as https://issues.apache.org/jira/browse/MAPREDUCE-3607
didn't cover this one (was not demanded/provided) - if you need a
future apache stable release cut to carry it. I'll be happy to review
and commit it in for you.

On Thu, Feb 9, 2012 at 7:49 PM, Wellington Chevreuil
<wellington.chevre...@gmail.com> wrote:
> Hi Harsh,
>
> I had noticed that this ChainMapper belongs to the old version package
> (org.apache.hadoop.mapred instead of org.apache.hadoop.mapreduce).
> Although it takes generic Class types as it's method argument, is this
> class able to work with Mappers from the new version package
> (org.apache.hadoop.mapreduce)?
>
> Thanks,
> Wellington.
>
> 2012/2/9 Harsh J <ha...@cloudera.com>:
>> Vamshi,
>>
>> What problem are you exactly trying to solve by trying to attempt
>> this? If you are only interested in records being streamed from one
>> mapper into another, why can't it be chained together? Remember that
>> map-only jobs do not sort their data output -- so I still see no
>> benefit here in consuming record-by-record from a whole new task when
>> it could be done from the very same.
>>
>> Btw, ChainMapper is an API abstraction to run several mapper
>> implementations in sequence (chain) for each record input and
>> transform them all along (helpful if you have several utility mappers
>> and want to build composites). It does not touch disk.
>>
>> On Thu, Feb 9, 2012 at 12:15 PM, Vamshi Krishna <vamshi2...@gmail.com> wrote:
>>> thank you harsh for your reply. Here what chainMapper does is, once the
>>> first mapper finishes, then only second map starts using that file written
>>> by first mapper. Its just like chain. But what i want is like pipelining i.e
>>> after first map starts and before it finishes only second map has to start
>>> and kepp on reading from the same file that is being written by first map.
>>> It is almost like produce-consumer like scenario, where first map writes in
>>> to the file, and second map keeps on reading the same file. So that
>>> pipelining effect is seen between two maps.
>>> Hope you got what i am trying to tell..
>>>
>>> please help..
>>>
>>>
>>> On Wed, Feb 8, 2012 at 12:47 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>> Vamsi,
>>>>
>>>> Is it not possible to express your M-M-R phase chain as a simple, single
>>>> M-R?
>>>>
>>>> Perhaps look at the ChainMapper class @
>>>>
>>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html
>>>>
>>>> On Wed, Feb 8, 2012 at 12:28 PM, Vamshi Krishna <vamshi2...@gmail.com>
>>>> wrote:
>>>> > Hi all
>>>> > i have an important question about mapreduce.
>>>> >  i have 2 hadoop mapreduce jobs. job1 has only mapper but no reducer.
>>>> > Job1
>>>> > started and in its map() it is writing to a "file1" using
>>>> > context(Arg1,Arg2). If i wanted to start job2 (immidietly after job1) ,
>>>> > which should take the "file1" (output still being written by above job's
>>>> > map
>>>> > phase) as input and do processing in its own map/reduce phases, and job2
>>>> > should keep on taking the newly written data to "file1" , untill job1
>>>> > finishes, what i should do?
>>>> >
>>>> > how can i do that, Please can anybody help?
>>>> >
>>>> > --
>>>> > Regards
>>>> >
>>>> > Vamshi Krishna
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>> Customer Ops. Engineer
>>>> Cloudera | http://tiny.cloudera.com/about
>>>
>>>
>>>
>>>
>>> --
>>> Regards
>>>
>>> Vamshi Krishna
>>>
>>
>>
>>
>> --
>> Harsh J
>> Customer Ops. Engineer
>> Cloudera | http://tiny.cloudera.com/about



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

Re: job taking input file, which "is being" written by its preceding job's map phase

Reply via email to