I have two files FileA (with 600K records) and FileB (With 2million records)
FileA has a key which is same of all the records 123 724101722493 123 781676672721 FileB has the same key as FileA 123 5026328101569 123 5026328001562 Using hadoop join package I can create output file with tuples and cross product of FileA and FileB. 123 [724101722493,5026328101569] 123 [724101722493,5026328001562] 123 [781676672721,5026328101569] 123 [781676672721,5026328001562] How does CompositeInputFormat scale when we want to join 600K with 2 millions records. Does it run on the node with single map/reduce? Also how can I not write the result into a file instead input split the result into different nodes where I can compare the tuples e.g. comparing 724101722493 with 5026328101569 using some heuristics. thanks -- View this message in context: http://www.nabble.com/CompositeInputFormat-scalbility-tp24192957p24192957.html Sent from the Hadoop core-user mailing list archive at Nabble.com.