any comments?


________________________________
From: Shantian Purkad <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Tuesday, June 7, 2011 3:53 PM
Subject: Linear scalability question

Hi,

I have a question on the linear scalability of Hadoop.

We have a situation where we have to do reduce side joins on two big tables 
(10+ TB). This causes lot of data to be transferred over network and network is 
becoming a bottleneck.

In few years these table will have 100TB + data and the reduce side joins will 
demand lot of data transfer over network. Since network bandwidth is limited 
and can not be addressed by adding more nodes, hadoop will no longer be 
linearly scalable in this case.

Is my understanding correct? Am I missing anything here? How do people address 
these kind of bottlenecks?

Thanks and Regards,
Shantian

Reply via email to