CompositeInputFormat scalbility

pmg Wed, 24 Jun 2009 14:10:02 -0700

I have two files FileA (with 600K records) and FileB (With 2million records)


FileA has a key which is same of all the records 

123    724101722493
123    781676672721

FileB has the same key as FileA

123    5026328101569
123    5026328001562

Using hadoop join package I can create output file with tuples and cross
product of FileA and FileB. 

123    [724101722493,5026328101569]
123    [724101722493,5026328001562]
123    [781676672721,5026328101569]
123    [781676672721,5026328001562]

How does CompositeInputFormat scale when we want to join 600K with 2
millions records. Does it run on the node with single map/reduce? 

Also how can I not write the result into a file instead input split the
result into different nodes where I can compare the tuples e.g. comparing
724101722493 with 5026328101569 using some heuristics.

thanks
-- 
View this message in context: 
http://www.nabble.com/CompositeInputFormat-scalbility-tp24192957p24192957.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

CompositeInputFormat scalbility

Reply via email to