Michael Will wrote:
We are not using jumbo packets (they are not on by default, are they?)

do an

        ifconfig | grep -i mtu

and see if anything is above 1500

and
I have tried both udp and tcp mounts, same symptom. The nics tested
where
syskonnect as well as tg3 and it behaved identically.

We had intel nics that worked great (on the server), and broadcom (tg3) on the client. When I moved to an Intel nic it went away (kind of expensive for a cluster if you have to buy the cards and insert them). When we moved to the bcm5700 and forced tcp mounts and "normal" MTU the problems went away. I seem to remember trying a syskonnect card using the sk98lin module on the clients, and it worked fine, but I pulled down the new version of the driver for some reason.

You might also be filtering RPC and portmap, or the "smart" switch could be doing some of that as well.

What does rpcinfo report? This is for a Centos 4.3 looking at a SuSE 10.1 server.

[EMAIL PROTECTED] ~]# rpcinfo -p dualcore
   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  32768  status
    100021    1   udp  32768  nlockmgr
    100021    3   udp  32768  nlockmgr
    100021    4   udp  32768  nlockmgr
    100024    1   tcp  53220  status
    100021    1   tcp  53220  nlockmgr
    100021    3   tcp  53220  nlockmgr
    100021    4   tcp  53220  nlockmgr
[EMAIL PROTECTED] ~]#

I am not exporting any drives right now, so if I do a showmount ...
        
[EMAIL PROTECTED] ~]# showmount -e dualcore
mount clntudp_create: RPC: Program not registered





-----Original Message-----
From: Joe Landman [mailto:[EMAIL PROTECTED] Sent: Friday, September 15, 2006 9:28 AM
To: Michael Will
Cc: Brent Franks; Chris Samuel; [email protected]
Subject: Re: NFS Performance (was Re: [Beowulf] GPFS on Linux (x86))

Michael Will wrote:
I am puzzled by an sles9sp3 (2.6.9 kernel) nfs server that serves rhel3 (2.4.21 kernel) compute nodes. For some reason a lot of times the mounts fail (with default as well as modified parameters). The symptom

is
mount: rpc timeout. The server logs all authentification requests as successful. The switch is an oversubscribed hp 4108gl.

Yes.  This is what we ran into last year.  A SuSE box serving a Rocks
cluster (Rocks 4.0).  Basic idea: use tcp mounts and turn off jumbo
packets.  Also, we had major issues with the tg3 driver, and moved the
RHEL units to a BCM5700 driver.  After this, most of the problems went
away.




--

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615

_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to