I've hadoop ver. 0.18, it's not support MultipleInputs, but I used job configuration property "map.input.file" to distinguish between the different inputs. The rest of the solution worked great for me, and solved the problem.
Thanks very much. -----Original Message----- From: Enis Soztutar [mailto:enis....@gmail.com] Sent: Wednesday, March 18, 2009 3:07 PM To: core-user@hadoop.apache.org Subject: Re: merging files Use MultipleInputs and use two different mappers for the inputs. map1 should be IdentityMapper, mapper 2 should output key, value pairs where value is a peudo marker value(same for all keys), which marks that the value is null/empty. In the reducer just output the key/value pairs which does not include the marker value in their values. in your example suppose that we use -1 as a marker value, then in mapper2, the output will be 4, -1 2, -1 and the reducer will get : 2, {1,3,5,-1} 3, {1,2} 4, {7,9,-1} 6, {3} then reducer will output : 3, 1 3, 2 6, 3 Nir Zohar wrote: > Hi, > > > > I would like your help with the below question. > > I have 2 files: file1 (key, value), file2 (only key) and I need to exclude > all records from file1 that these key records not in file2. > > 1. The output format is key-value, not only keys. > > 2. The key is not primary key; hence it's not possible to have joined in the > end. > > > > Can you assist? > > > > Thanks, > > Nir. > > > > > > Example: > > > > file1: > > 2,1 > > 2,3 > > 2,5 > > 3,1 > > 3,2 > > 4,7 > > 4,9 > > 6,3 > > > > file2: > > 4 > > 2 > > > > Output: > > 3,1 > > 3,2 > > 6,3 > > > > > > > > >