pig-user  

Outer joins

Iván de Prado
Wed, 11 Jun 2008 07:17:08 -0700

Lets suppose 1.txt is:

1       1       1
2       1       1
3       1       1
4       1       1
5       1       1

And 2.txt is:

4       2       2
5       2       2
6       2       2
7       2       2

The script: 

A = LOAD 'ivan/1.txt' USING PigStorage();
B = LOAD 'ivan/2.txt' USING PigStorage();
C = COGROUP A by $0, B by $0;
D = FOREACH C GENERATE flatten(A), flatten(B);

dump C:

(1, {(1, 1, 1)}, {})
(2, {(2, 1, 1)}, {})
(3, {(3, 1, 1)}, {})
(4, {(4, 1, 1)}, {(4, 2, 2)})
(5, {(5, 1, 1)}, {(5, 2, 2)})
(6, {}, {(6, 2, 2)})
(7, {}, {(7, 2, 2)})
(8, {}, {(8, 2, 2)})

dump D;

(4, 1, 1, 4, 2, 2)
(5, 1, 1, 5, 2, 2)

But this is not the result that I expected. I would like to obtain this
result:

(1, 1, 1, 1, '', '', '')
(2, 2, 1, 1, '', '', '')
(2, 3, 1, 1, '', '', '')
(4, 1, 1, 4, 2, 2)
(5, 1, 1, 5, 2, 2)
('','','',6, 2, 2)
('','','',7, 2, 2)
('','','',8, 2, 2)

This is the expected result if you do an outer join in SQL. How can I
modify the script to get this result?  (apart of doing 3 FILTERs over C
and then a UNION)

Thanks and regards,
Iván de Prado
www.ivanprado.es