Michael Will wrote:
We are not using jumbo packets (they are not on by default, are they?)
do an
ifconfig | grep -i mtu
and see if anything is above 1500
and
I have tried both udp and tcp mounts, same symptom. The nics tested
where
syskonnect as well as tg3 and it behaved identically.
We had intel nics that worked great (on the server), and broadcom (tg3)
on the client. When I moved to an Intel nic it went away (kind of
expensive for a cluster if you have to buy the cards and insert them).
When we moved to the bcm5700 and forced tcp mounts and "normal" MTU the
problems went away. I seem to remember trying a syskonnect card using
the sk98lin module on the clients, and it worked fine, but I pulled down
the new version of the driver for some reason.
You might also be filtering RPC and portmap, or the "smart" switch could
be doing some of that as well.
What does rpcinfo report? This is for a Centos 4.3 looking at a SuSE
10.1 server.
[EMAIL PROTECTED] ~]# rpcinfo -p dualcore
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 32768 status
100021 1 udp 32768 nlockmgr
100021 3 udp 32768 nlockmgr
100021 4 udp 32768 nlockmgr
100024 1 tcp 53220 status
100021 1 tcp 53220 nlockmgr
100021 3 tcp 53220 nlockmgr
100021 4 tcp 53220 nlockmgr
[EMAIL PROTECTED] ~]#
I am not exporting any drives right now, so if I do a showmount ...
[EMAIL PROTECTED] ~]# showmount -e dualcore
mount clntudp_create: RPC: Program not registered
-----Original Message-----
From: Joe Landman [mailto:[EMAIL PROTECTED]
Sent: Friday, September 15, 2006 9:28 AM
To: Michael Will
Cc: Brent Franks; Chris Samuel; [email protected]
Subject: Re: NFS Performance (was Re: [Beowulf] GPFS on Linux (x86))
Michael Will wrote:
I am puzzled by an sles9sp3 (2.6.9 kernel) nfs server that serves
rhel3
(2.4.21 kernel) compute nodes. For some reason a lot of times the
mounts fail (with default as well as modified parameters). The symptom
is
mount: rpc timeout. The server logs all authentification requests as
successful. The switch is an oversubscribed hp 4108gl.
Yes. This is what we ran into last year. A SuSE box serving a Rocks
cluster (Rocks 4.0). Basic idea: use tcp mounts and turn off jumbo
packets. Also, we had major issues with the tg3 driver, and moved the
RHEL units to a BCM5700 driver. After this, most of the problems went
away.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf