Use MultipleInputs and use two different mappers for the inputs. map1 should be IdentityMapper, mapper 2 should output key, value pairs where value is a peudo marker value(same for all keys), which marks that the value is null/empty. In the reducer just output the key/value pairs which does not include the marker value in their values.

in your example suppose that we use -1 as a marker value, then in mapper2, the output will be
4, -1
2, -1

and the reducer will get :

2, {1,3,5,-1}
3, {1,2}
4, {7,9,-1}
6, {3}

then reducer will output :

3, 1
3, 2
6, 3



Nir Zohar wrote:
Hi,

I would like your help with the below question.

I have 2 files: file1 (key, value), file2 (only key) and I need to exclude
all records from file1 that these key records not in file2.

1. The output format is key-value, not only keys.

2. The key is not primary key; hence it's not possible to have joined in the
end.

Can you assist?

Thanks,

Nir.

Example:

file1:

2,1

2,3

2,5

3,1

3,2

4,7

4,9

6,3

file2:

4

2

Output:

3,1

3,2

6,3



Reply via email to