any comments?
________________________________ From: Shantian Purkad <[email protected]> To: "[email protected]" <[email protected]> Sent: Tuesday, June 7, 2011 3:53 PM Subject: Linear scalability question Hi, I have a question on the linear scalability of Hadoop. We have a situation where we have to do reduce side joins on two big tables (10+ TB). This causes lot of data to be transferred over network and network is becoming a bottleneck. In few years these table will have 100TB + data and the reduce side joins will demand lot of data transfer over network. Since network bandwidth is limited and can not be addressed by adding more nodes, hadoop will no longer be linearly scalable in this case. Is my understanding correct? Am I missing anything here? How do people address these kind of bottlenecks? Thanks and Regards, Shantian
