You only need two InputFormats, one for SequenceFile (SequenceFileInputFormat or its subsets for Binary and Text, or your own extension), the other for Text (TextInputFormat, perhaps). Since both your Mappers are going to act on the same type of keys and values, you need only one Mapper implementation doing what you want it to do. Look at MultipleInputs.addInputPath() in the API to then add it to your job. [API link: http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleInputs.html]
The mapper can simply do an operation and collect to its default output collector and be done with it. The reducer class will get grouped keys from both sources. It is as simple as that. On Thu, Dec 30, 2010 at 10:04 PM, Yin Lou <yin.lou...@gmail.com> wrote: > Hi, > > I have two data sources of different format, one sequence file and the other > text. They share the same key, so I 'd like to have the following, > > map1: <k, v1> -> <k, v2> > map2: <k, v1'> -> <k, v2'> > Both v2 and v2' are of the same type, say, BytesWritable. > > I wonder if anyone could give me an example of MultipleInputs so that I can > process these two data sources in the reducer. > > Thanks, > Yin > -- Harsh J www.harshj.com