On Tue, 2007-05-22 at 08:34 -0700, Scott Weitzenkamp (sweitzen) wrote: > What server model and CPU model do you have?
cat /proc/cpuinfo processor : 7 vendor_id : AuthenticAMD cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 8218 stepping : 2 cpu MHz : 2600.202 cache size : 1024 KB physical id : 3 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm cr8_legacy bogomips : 5200.54 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc > > This could be https://bugs.openfabrics.org//show_bug.cgi?id=229. Try > setting RENICE_IB_MAD=yes in /etc/infiniband/openibd.conf, then reboot > or run /etc/init.d/openibd restart, and see if that helps. AHA, this is interesting. I'll do it tomorrow! > > Scott Weitzenkamp > SQA and Release Manager > Server Virtualization Business Unit > Cisco Systems > > > > ______________________________________________________________ > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > SEGERS Koen > Sent: Tuesday, May 22, 2007 6:44 AM > To: Ami Perlmutter; Shirley Ma > Cc: [EMAIL PROTECTED]; > [email protected] > Subject: RE: [ofa-general] GPFS node loses IB-connection > > > > I did the iperf tests on servers with OFED-1.2-RC3. > > > > It also gives the same result. Actually, it is even worse: > when the interface dies, it gets in PORT_INIT state, but it > doesn’t go to PORT_ACTIVE again. At least not within 10 > minutes. > > > > I’ll give you the test script I ran: > > > > ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 5001 & > > ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 5002 & > > ssh 10.224.158.114 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 5003 & > > ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 6001 & > > ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 6002 & > > ssh 10.224.158.115 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 6003 & > > ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 7001 & > > ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 7002 & > > ssh 10.224.158.116 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 7003 & > > ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 8001 & > > ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 8002 & > > ssh 10.224.158.117 LD_PRELOAD=libsdp.so SIMPLE_LIBSDP=OK iperf > -s -p 8003 & > > > > sleep 5 > > > > for i in 14 15 16 17 > > do > > ssh 10.224.158.111 LD_PRELOAD=libsdp.so > SIMPLE_LIBSDP=OK iperf -c 192.168.2.$i -p $((i-9))001 -t 120 > -d -P 5 & > > ssh 10.224.158.112 LD_PRELOAD=libsdp.so > SIMPLE_LIBSDP=OK iperf -c 192.168.2.$i -p $((i-9))002 -t 120 > -d -P 5 & > > ssh 10.224.158.113 LD_PRELOAD=libsdp.so > SIMPLE_LIBSDP=OK iperf -c 192.168.2.$i -p $((i-9))003 -t 120 > -d -P 5 & > > done > > > > Any ideas? > > > > Regards, > > > > Koen > > > ______________________________________________________________ > Van: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Namens SEGERS > Koen > Verzonden: dinsdag 22 mei 2007 10:55 > Aan: Ami Perlmutter; Shirley Ma > CC: [EMAIL PROTECTED]; > [email protected] > Onderwerp: RE: [ofa-general] GPFS node loses IB-connection > > > > > GPFS keeps its connection constantly open. > > > > We did some more tests with iperf: > > If we don’t run bidirectional tests, all connections keeps > running smoothly. If we add bidirectional tests, it becomes > unstable. Certainly if this is done on multiple nodes. Is this > normal? > > > > The failed iperf tests give the same error in the switch log: > > May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO: > Generate SM OUT_OF_SERVICE trap for > GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:08:a8:71 > > May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO: > Generate SM DELETE_MC_GROUP trap for > GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:08:a8:71 > > May 22 08:14:59 topspin-120sc ib_sm.x[618]: %IB-6-INFO: > Configuration caused by discovering removed ports > > May 22 08:15:00 topspin-120sc ib_sm.x[621]: %IB-6-INFO: > Program switch port state to down, > node=00:05:ad:00:00:0b:a2:cc, port= 6, due to non-responding > CA > > May 22 08:15:00 topspin-120sc port_mgr.x[497]: %PORT-6-INFO: > port down - port=1/6, type=ib4xTXP > > May 22 08:15:00 topspin-120sc diag_mgr.x[508]: %DIAG-6-INFO: > in portTblFindEntry() - IfIndex=70(1/6) > > May 22 08:15:00 topspin-120sc diag_mgr.x[508]: %DIAG-6-INFO: > cannot find entry - IfIndex=70(1/6) > > May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO: > Configuration caused by discovering new ports > > May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO: > Configuration caused by multicast membership change > > May 22 08:15:04 topspin-120sc ib_sm.x[618]: %IB-6-INFO: > Generate SM IN_SERVICE trap for > GID=fe:80:00:00:00:00:00:00:00:05:ad:00:00:08:a8:71 > > May 22 08:15:05 topspin-120sc port_mgr.x[497]: %PORT-6-INFO: > port up - port=1/6, type=ib4xTXP > > May 22 08:15:07 topspin-120sc ib_sm.x[632]: %IB-6-INFO: > Generate SM CREATE_MC_GROUP trap for > GID=ff:12:60:1b:ff:ff:00:00:00:00:00:01:ff:08:a8:71 > > May 22 08:15:08 topspin-120sc ib_sm.x[618]: %IB-6-INFO: > Configuration caused by multicast membership change > > > > RC3 is just installed. Results will follow soon. > > > > Regards, > > > > Koen > > > > > ______________________________________________________________ > Van: Ami Perlmutter [mailto:[EMAIL PROTECTED] > Verzonden: dinsdag 22 mei 2007 10:33 > Aan: Shirley Ma > CC: SEGERS Koen; [EMAIL PROTECTED]; > [email protected] > Onderwerp: Re: [ofa-general] GPFS node loses IB-connection > > > > > does the application constantly open and close connections? > > *** Disclaimer *** > > Vlaamse Radio- en Televisieomroep > Auguste Reyerslaan 52, 1043 Brussel > > nv van publiek recht > BTW BE 0244.142.664 > RPR Brussel > http://www.vrt.be/disclaimer > > > *** Disclaimer *** > > Vlaamse Radio- en Televisieomroep > Auguste Reyerslaan 52, 1043 Brussel > > nv van publiek recht > BTW BE 0244.142.664 > RPR Brussel > http://www.vrt.be/disclaimer > *** Disclaimer *** Vlaamse Radio- en Televisieomroep Auguste Reyerslaan 52, 1043 Brussel nv van publiek recht BTW BE 0244.142.664 RPR Brussel http://www.vrt.be/disclaimer _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
