Thanks for reply, Steve,

I totally agree benchmark is a good idea. But the problem is I don't have
switch to play with rather than a small cluster.
I am curious of this and post the question.
Can some experienced ppl can share their knowledge with us?

Cheers

On Mon, Jun 6, 2011 at 7:28 PM, Steve Loughran <[email protected]> wrote:

> On 06/06/11 08:22, elton sky wrote:
>
>> hello everyone,
>>
>> As I don't have experience with big scale cluster, I cannot figure out why
>> the inter-rack communication in a mapreduce job is "significantly" slower
>> than intra-rack.
>> I saw cisco catalyst 4900 series switch can reach upto 320Gbps forwarding
>> capacity. Connected with 48 nodes with 1Gbps ethernet each, it should not
>> be
>> much contention at the switch, is it?
>>
>
> I don't know enough about these switches; I do hear stories about buffering
> and the like, and I also hear that a lot of switches don't always expect all
> the ports to light up simultaneously.
>
> Outside hadoop, try setting up some simple bandwidth tests to measure
> inter-rack bandwidth: have every node on one rack try and talk to one on
> another at full rate.
>
> Set up every node talking to every other node at least once, to make sure
> there aren't odd problems between two nodes, which can happen if one of the
> NICs is playing up.
>
> Once you are happy that the basic bandwidth between servers is OK, then
> it's time to start worrying adding hadoop to the mix
>
> -steve
>

Reply via email to