I am evaluating hadoop for a problem that do a Cartesian product of input
from one file of 600K (File A) with another set of file set (FileB1, FileB2,
FileB3) with 2 millions line in total.

Each line from FileA gets compared with every line from FileB1, FileB2 etc.
etc. FileB1, FileB2 etc. are in a different input directory

So....

Two input directories 

1. input1 directory with a single file of 600K records - FileA
2. input2 directory segmented into different files with 2Million records -
FileB1, FileB2 etc.

How can I have a map that reads a line from a FileA in directory input1 and
compares the line with each line from input2? 

What is the best way forward? I have seen plenty of examples that maps each
record from single input file and reduces into an output forward.

thanks
-- 
View this message in context: 
http://www.nabble.com/multiple-file-input-tp24095358p24095358.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to