Yes you can do it. In pig you would write something like A = load ‘a.txt’ as (id, name, age, ...) B = load ‘b.txt’ as (id, address, ...) C = JOIN A BY id, B BY id; STORE C into ‘c.txt’
Hive can do it similarly too. Or you could write your own directly in map/redcue or using the data_join jar. --Bobby Evans On 5/29/12 4:08 AM, "lzg" <lzg_...@163.com> wrote: Hi, I wonder that if Hadoop can solve effectively the question as following: ========================================== input file: a.txt, b.txt result: c.txt a.txt: id1,name1,age1,... id2,name2,age2,... id3,name3,age3,... id4,name4,age4,... b.txt: id1,address1,... id2,address2,... id3,address3,... c.txt id1,name1,age1,address1,... id2,name2,age2,address2,... ======================================== I know that it can be done well by database. But I want to handle it with hadoop if possible. Can hadoop meet the requirement? Any suggestion can help me. Thank you very much! Best Regards, Gump