I am evaluating hadoop for a problem that do a Cartesian product of input from one file of 600K (File A) with another set of file set (FileB1, FileB2, FileB3) with 2 millions line in total.
Each line from FileA gets compared with every line from FileB1, FileB2 etc. etc. FileB1, FileB2 etc. are in a different input directory So.... Two input directories 1. input1 directory with a single file of 600K records - FileA 2. input2 directory segmented into different files with 2Million records - FileB1, FileB2 etc. How can I have a map that reads a line from a FileA in directory input1 and compares the line with each line from input2? What is the best way forward? I have seen plenty of examples that maps each record from single input file and reduces into an output forward. thanks -- View this message in context: http://www.nabble.com/multiple-file-input-tp24095358p24095358.html Sent from the Hadoop core-user mailing list archive at Nabble.com.