Iván de Prado
Wed, 11 Jun 2008 07:17:08 -0700
Lets suppose 1.txt is:
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
And 2.txt is:
4 2 2
5 2 2
6 2 2
7 2 2
The script:
A = LOAD 'ivan/1.txt' USING PigStorage();
B = LOAD 'ivan/2.txt' USING PigStorage();
C = COGROUP A by $0, B by $0;
D = FOREACH C GENERATE flatten(A), flatten(B);
dump C:
(1, {(1, 1, 1)}, {})
(2, {(2, 1, 1)}, {})
(3, {(3, 1, 1)}, {})
(4, {(4, 1, 1)}, {(4, 2, 2)})
(5, {(5, 1, 1)}, {(5, 2, 2)})
(6, {}, {(6, 2, 2)})
(7, {}, {(7, 2, 2)})
(8, {}, {(8, 2, 2)})
dump D;
(4, 1, 1, 4, 2, 2)
(5, 1, 1, 5, 2, 2)
But this is not the result that I expected. I would like to obtain this
result:
(1, 1, 1, 1, '', '', '')
(2, 2, 1, 1, '', '', '')
(2, 3, 1, 1, '', '', '')
(4, 1, 1, 4, 2, 2)
(5, 1, 1, 5, 2, 2)
('','','',6, 2, 2)
('','','',7, 2, 2)
('','','',8, 2, 2)
This is the expected result if you do an outer join in SQL. How can I
modify the script to get this result? (apart of doing 3 FILTERs over C
and then a UNION)
Thanks and regards,
Iván de Prado
www.ivanprado.es