Hi, I understand that given a file, the file is split across 'n' mapper instances, which is the normal case.
The scenario i have is : 1. Two files which are not totally identical in terms of number of columns (but have data that is similar in a few columns) need to be processed and after computation a single output file has to be generated. Note : CV - computedvalue File1 belonging to one dataset has data for : Date,counter1,counter2, CV1,CV2 File2 belonging to another dataset has data for : Date,counter1,counter2,CV3,CV4,CV5 Computation to be carried out on these two files is : CV6 =(CV1*CV5)/100 And the final emitted output file should have data in the sequence: Date,counter1,counter2,CV6 The idea is to have two mappers (not instances) run on each of the file, and a single reducer that emits the final result file. Thanks, Sahana On Wed, Sep 7, 2011 at 2:40 PM, Harsh J <ha...@cloudera.com> wrote: > Sahana, > > Yes. But, isn't that how it is normally? What makes you question this > capability? > > On Wed, Sep 7, 2011 at 2:37 PM, Sahana Bhat <sana.b...@gmail.com> wrote: > > Hi, > > Is it possible to have multiple mappers where each mapper is > > operating on a different input file and whose result (which is a key > value > > pair from different mappers) is processed by a single reducer? > > Regards, > > Sahana > > > > -- > Harsh J >